A semantic reconstruction and AI-controlled generation method for the cultural genes of Qing Dynasty embroidery patterns: a case study of official rank badges

Yang, Haiqiong; Sui, Qiao; Hu, Bing; Shi, Kun; Wang, Ranran; Li, Maoning

doi:10.1038/s40494-025-02217-5

Download PDF

Article
Open access
Published: 08 December 2025

A semantic reconstruction and AI-controlled generation method for the cultural genes of Qing Dynasty embroidery patterns: a case study of official rank badges

Haiqiong Yang¹,
Qiao Sui²,
Bing Hu¹,
Kun Shi³,
Ranran Wang⁴ &
…
Maoning Li¹

npj Heritage Science volume 13, Article number: 637 (2025) Cite this article

2603 Accesses
1 Citations
Metrics details

Abstract

Qing Dynasty embroidered rank badges, as important carriers of traditional Chinese culture, present challenges in digital expression, cultural gene extraction, and modern application. This paper addresses these issues by, for the first time, constructing an ‘embroidery semantic network’ for structured cultural element expression and innovatively introducing ‘dynamic shape grammar’ for precise form control. This study developed the LoRA-Diffusion-SG three-stage fine-tuning architecture, integrating a multi-source dataset comprising image, knowledge, and craftsmanship layers. This significantly enhances the realism and cultural alignment of generated patterns. Experiments demonstrate the method’s superiority in cultural compliance CLIP similarity 0.78–0.82, symbol recognition accuracy Top-1 82%–85%, and craft feasibility prediction F1-score 0.87–0.89. This study provides technical support for the digital preservation and innovation of embroidery patterns through AI.

Diffusion model-based image generation method for Cantonese embroidery artistic styles

Article Open access 31 January 2026

Research on the co-occurrence feature mining of the Qing Dynasty embroidery patterns based on temporal multilayer networks

Article Open access 03 June 2025

Application of deep learning for transformation of Chinese traditional cultural narrative patterns and enhancement of cultural identity empowered by AIGC

Article Open access 24 December 2025

Introduction

Qing Dynasty embroidered rank badges (Guangbu), as a highly representative component within the traditional Chinese costume system, feature unique patterns that serve not only as visual indicators of status and rank but also carry rich cultural genes and historical information¹. Currently, the research and innovative design of traditional patterns face multiple challenges, including but not limited to the effective digitization of traditional patterns, the systematic extraction and inheritance mechanisms of cultural genes, and the successful integration of these traditional patterns with profound historical foundations into a modern design context². Against this backdrop, AI technology, with its capabilities in data processing, pattern recognition, image generation, and semantic understanding, is demonstrating immense application potential in cultural heritage preservation, digital reconstruction, and innovative design³.

Despite existing research exploring the history, artistry, and craftsmanship of Qing Dynasty embroidery patterns¹, there remains a notable insufficiency in systematically extracting their cultural genes and constructing computable methodologies for innovative design. Particularly in the field of AI-generated design, its application to traditional patterns, especially art forms with intricate details and specific rules, such as embroidery, is still in the nascent stages of exploration^4,5. While current studies have made some progress in automated pattern recognition, style transfer, or preliminary generative attempts, they generally lack fine-grained control over the generation of embroidery patterns, specifically concerning unique stitch types, textures, and profound cultural connotations^6,7. Specifically, few studies have systematically combined an ‘embroidery semantic network’ to achieve a deep understanding of cultural connotations and integrated ‘dynamic shape grammar’ for precise control over pattern form generation and variation^8,9,10. Concurrently, advanced generative models optimized for embroidery patterns, such as a three-stage fine-tuning Diffusion architecture incorporating LoRA, have not yet been sufficiently researched and applied. This study aims to address the aforementioned research gaps, with its core academic contributions being:

1.
For the first time, this study systematically proposed and constructed an ‘embroidery semantic network’ specifically for Qing Dynasty embroidered rank badge patterns. By drawing upon ontology construction methodologies¹¹, this network achieves a structured and semantic representation of the cultural elements inherent in the patterns¹², such as rank, animals, auspicious motifs, and colors, thereby laying a knowledge foundation for subsequent intelligent understanding and generation.
2.
Innovatively, ‘dynamic shape grammar’ has been introduced into the parametric generation framework for embroidery patterns. Drawing upon the successful application of shape grammar in architectural design and ethnic pattern generation^9,13,14, this study extends it to the field of embroidery. Through rule-driven and parameterized control, it achieves precise control and explainable evolution of the pattern’s fundamental forms, layout structures, and stylistic variations.
3.
Developed and optimized a ‘LoRA-Diffusion-SG’ three-stage fine-tuning architecture for embroidery pattern generation. This architecture leverages the efficiency of LoRA in large model fine-tuning¹⁵ and the powerful image generation capabilities of diffusion models¹⁶. Through domain adaptation, feature decoupling (semantics and texture), and multi-modal integration (shape grammar guidance), it significantly enhances the performance of generated patterns in terms of visual realism, detail richness, and cultural connotation alignment.

These contributions provide a novel theoretical framework and crucial technical support for the digital preservation of Qing Dynasty embroidered rank badges, the deep excavation and inheritance of their cultural genes, and AI-based innovative design. This promotes the cross-integration of traditional craftsmanship and cutting-edge artificial intelligence (AI) technologies.

The core objective of this study is to profoundly reconstruct the cultural genes embedded within Qing Dynasty embroidered rank badges and explore innovative expression methods for embroidery patterns facilitated by AI. This aims to overcome potential monotony and stereotypes in the forms or styles of traditional patterns. This study is committed to:

1.
Systematically reconstruct the complete pattern system of Qing Dynasty embroidered rank badges and their cultural connotations. This is crucial for advancing the computational analysis of embroidery art, enabling the quantitative management of cultural elements, and ultimately empowering innovative design¹⁷. The framework constructed in this study aims to provide a broadly applicable methodology¹⁸, serving diverse scenarios, such as cultural heritage, fashion design, and cultural and creative product development, areas where traditional pattern elements are garnering widespread attention and continuous reference.
2.
Establish an effective pathway for the complete reconstruction and innovative interpretation of embroidery cultural genes using computer technology¹⁹. This pathway, on one hand, enables the clear presentation of cultural gene reconstruction results in high-fidelity image formats. On the other hand, by constructing a structured raw dataset encompassing image and semantic information²⁰, it lays a solid foundation for subsequently applying diverse quantitative analysis methods. This approach not only serves as an effective supplement to traditional research methods but also provides strong technical support for the cultural inheritance of embroidery patterns and typology-based innovative design.
3.
Building upon traditional craftsmanship research paradigms, this study proposes a theoretical framework and technical methodology capable of systematically and reproducibly analyzing and reconstructing the morphological features and cultural connotations of embroidery patterns. This research aims to fully unleash the largely untapped potential of generative AI in this domain. Although generative AI has rapidly emerged as one of the most promising directions in AI research, its application in interdisciplinary fields, such as archeology and cultural heritage preservation remains in preliminary stages²¹.

Therefore, this study is dedicated to constructing a theoretical and technical system capable of effectively extracting the cultural genes of Qing Dynasty embroidered rank badge patterns and facilitating innovative design based on these genes. This system will fully consider contextual information, such as the age, purpose, and decorative motifs of the patterns, and will be applicable to authentic Qing Dynasty embroidery data²². The constructed framework will undergo adaptability testing for various embroidery styles and design requirements to ensure its broad transferability and applicability beyond the training dataset.

Methods

Related work

Qing Dynasty embroidery, as a pinnacle in the history of Chinese embroidery art, is renowned for its exquisite craftsmanship, rich subject matter, and diverse styles, embodying immense artistic value and profound historical and cultural information²³. Rank badges (Guangbu), as a crucial component of Qing Dynasty official attire for denoting rank and status, featured pattern designs that were not merely visual representations of the hierarchical system but also concentrated reflections of the political, cultural, and social ideologies of specific periods²⁴. Research on Qing Dynasty rank badges includes analytical studies of the badges themselves and discussions on their preservation measures, providing important academic accumulation for understanding their formal characteristics, craftsmanship, and embedded cultural connotations. Furthermore, research on the socio-cultural functions of embroidery patterns from other regions, such as the interpretation of the social and political meanings of Bakuba embroidery patterns offers valuable insights into understanding the universality of patterns as cultural symbols²⁵. These studies have laid the foundational knowledge for this research concerning Qing Dynasty embroidered rank badges and underscore their significance as carriers of cultural genes.

The digitization of traditional patterns is a crucial prerequisite and core component for effective cultural heritage preservation, innovative inheritance, and the recreation of contemporary value²⁶. In the specific field of embroidery pattern digitization, scholars have already conducted fruitful explorations. For instance, in response to the complexity of embroidered fabrics, research has proposed a set of automated pattern recognition and color separation methods, providing significant technical support for the digital information extraction and analysis of embroidery patterns²⁷. To advance the development and validation of relevant machine learning algorithms, researchers have also committed to constructing standardized datasets, such as by building and publicly validating a new database for handmade embroidery pattern recognition, which offers valuable data resources and evaluation benchmarks for research in this domain²⁸. At the more critical level of cultural gene extraction and structured knowledge representation, academia is increasingly focused on the application potential of AI technologies, such as knowledge graphs and ontologies. For instance, research on digitization of traditional knowledge and cultural heritage demonstrates how complex cultural information can be systematically organized and applied²⁹. Similarly, the construction and characterization of traditional village landscape cultural genome atlases provides structured methods for visualizing multi-dimensional cultural genes³⁰. Moreover, visualizing the cultural landscape gene of traditional settlements offers semiotic perspectives for understanding and representing intangible heritage³¹. Taking the Qingyang sachet, a specific intangible cultural heritage of embroidery, as an example, research has deeply explored ontology-based knowledge graph construction methods, systematically demonstrating how complex and multi-dimensional cultural heritage knowledge can be clearly and logically organized and expressed¹¹. For Bulgarian ethnic embroidery, scholars have also investigated methods for creating specialized information models, aiming to promote their effective display, retrieval, and utilization in digital knowledge bases¹². These cutting-edge research achievements, particularly their explorations in cultural element deconstruction, semantic relationship mining, and formal knowledge representation, provide important theoretical inspiration, methodological references, and practical guidance for the construction of one of this study’s core concepts: the ‘embroidery semantic network.’ Concurrently, from a design perspective, research also focuses on how creativity and its systematic design process can be applied to different types of embroidery practices, emphasizing the core importance of deep understanding and flexible application of design thinking and methods for promoting innovation in embroidery art³².

AI technology, particularly generative models centered on deep learning, has demonstrated revolutionary potential in image generation, artistic creation, and innovative design in recent years. Among these, diffusion models and generative adversarial networks (GANs), as current mainstream image generation techniques, have been widely applied in the creation of diverse visual content. In the broader context of design and innovation, Verganti et al. emphasize that AI-driven approaches are reshaping design paradigms by introducing new logics of creativity and collaboration between humans and intelligent systems. From a practical perspective³³, Li and Zhou developed an automatic generation algorithm for advertising artistic design based on neural networks³⁴, demonstrating how algorithmic models can effectively support creative processes in commercial and applied arts. Extending these explorations to design systems research, Yin et al. investigated the integration of the Midjourney AI content generation tool into design workflows³⁵, highlighting its potential to direct designers towards future-oriented innovation practices. In the specific field of embroidery art, early research began exploring the use of AI for esthetic art simulation of embroidery styles, and subsequently deepened rendering techniques for embroidery styles based on convolutional neural networks (CNN)⁶. Other studies have also proposed methods for precise segmentation and innovative synthesis of embroidery art images using deep learning CNN⁵. Addressing the needs for preservation and re-creation of specific ethnic embroidery patterns, scholars have proposed a Miao embroidery pattern repair scheme combining GAN and U-Net architectures, integrated with a spatial channel attention mechanism. Furthermore, embroidery style transfer models based on texture-loop GANs have also been researched and applied⁷. These cutting-edge explorations preliminarily demonstrate the broad application prospects and immense potential of AI technology in the fine-grained processing and stylized intelligent generation of embroidery images.

In recent years, as a lightweight and efficient model fine-tuning technique, LoRA has achieved significant success in the personalized customization and domain adaptation of large pre-trained models, and has been widely applied in text-to-image generation tasks^15,16. Through comparative studies, scholars have thoroughly explored the application potential of different text-to-image AI models in computer-aided conceptual design, further highlighting the promising prospects of AI as a collaborative creative assistant. Simultaneously, research has also focused on interactive modes of AI in the field of computer-aided arts and crafts³. Earlier literature, moreover, prospectively discussed the opportunities and challenges faced by generative design in the textile sector. More recently, diffusion-based fine-tuning frameworks have been investigated for their effectiveness in domain-driven generation with limited training data, providing methodological insights for cultural heritage applications³⁶. These literatures collectively provide a solid technical foundation and cutting-edge theoretical perspective for this study’s adoption of a LoRA-optimized diffusion model (i.e., the LoRA-Diffusion-SG architecture) and its exploration in the innovative design application of Qing Dynasty rank badge embroidery patterns, a specific cultural heritage domain. However, despite the rapid development of AI generative technology, existing models generally face severe challenges, such as insufficient fine-grained control, low cultural semantic alignment, and a lack of interpretability in generated results when applied to traditional patterns (like embroidered rank badges) that possess high complexity, strict rules, and profound cultural connotations.

Shape Grammars, as a rule-based generative system, have been widely applied in architectural design, product design, and traditional pattern research due to their ability to precisely describe and generate design solutions with specific structures, combinatorial rules, and stylistic features. Pioneering research on shape and shape grammars⁸ laid the theoretical foundation for this field. In the digitization and innovative design of cultural heritage, particularly ethnic patterns, shape grammar demonstrates unique advantages. Cui & Tang⁹ successfully integrated shape grammar into a generative system for exploring Zhuang embroidery design, achieving computerized generation of specific ethnic styles. Hu et al.¹⁰ combined shape grammar with artificial neural networks for ethnic pattern design. Other studies have also delved into the reuse design and optimization of ethnic patterns and the creative design of Qiang embroidery patterns, respectively, based on improved shape grammars^13,14. These studies fully demonstrate the effectiveness of shape grammar in analyzing, reproducing, and innovating traditional patterns.

Concurrently, semantic networks and knowledge graph technologies provide powerful tools for structured representation and management of cultural heritage knowledge³⁷. By constructing semantic networks comprising concepts, attributes, and relationships, dispersed and heterogeneous cultural heritage information can be integrated into a machine-understandable knowledge base³⁸. Research has utilized CNN to evaluate semantic similarity between silk fabric images, indirectly reflecting the role of semantic information in image understanding¹¹. Ma et al.¹² proposed an ethnic pattern generation method based on the derivation of cultural design genes, also emphasizing the importance of integrating cultural semantic information into the generative process. This integration helps to enhance the interpretability, cultural relevance, and innovativeness of the generated content.

Materials of official rank badges embroidered with Qing Dynasty patterns

Data collection is the primary and crucial step for applying AI techniques to the innovative design of Qing Dynasty embroidered rank badge patterns^39,40. This stage necessitates extensive collection of Qing Dynasty rank badge data, encompassing images, pattern atlases, and historical literature descriptions, as their diversity and richness are paramount for subsequent model training and design generation⁴¹. The experimental dataset utilized in this study is rich in content, covering a wide range of Qing Dynasty rank badge patterns. Its sources include collections from major Chinese museums, such as the Palace Museum and the China National Silk Museum, which showcase various forms and design philosophies of Qing Dynasty imperial embroidery art, from dragon badges to civil and military official badges, reflecting the strictness of the Qing Dynasty official costume system and the exquisite craftsmanship of its embroidery²². Furthermore, this dataset integrates cultural and artistic legacies from historical archives, traditional atlases, and academic research, presenting rank badge patterns of different ranks, periods, and forms, highlighting the rich Qing Dynasty embroidery cultural heritage and China’s profound embroidery history and pattern design foundations. To ensure the comprehensiveness and representativeness of the dataset, we selected various types of rank badges covering imperial family, civil officials, military officials, and their evolutions. This study utilizes 325 authentic samples as the foundation for the experimental dataset, with the core objective of constructing a comprehensive and representative dataset of Qing Dynasty official rank badge embroidery, thereby laying the groundwork for subsequent deep learning model training and innovative design. These 325 samples are meticulously categorized into five major embroidery types, encompassing the Qing Dynasty Imperial Dragon Badges, Qing Dynasty Round Floral Badges, Qing Dynasty Civil Official Badges, Qing Dynasty Military Official Badges, and Qing Dynasty Xiezhi Badges. In this manner, the dataset is capable of broadly reflecting the rigor of the Qing Dynasty official uniform system and the complexity of the embroidery craftsmanship. The scale of the 325 authentic samples is sufficient to form a broad and manageable dataset, meeting the requirements for diversity and richness in training data for deep learning models, such as Stable Diffusion XL. The specific category distribution is as follows:

1.
Qing Dynasty imperial dragon badges: this category encompasses dragon motifs used by imperial family members, including emperors, empresses, and princes, showcasing combinations and variations of dragon forms, cloud patterns, and sea-cliff designs, along with their strict hierarchical symbolic meanings.
2.
Qing Dynasty round floral badges : this dataset includes round or square-round badges featuring floral, avian, and other roundel patterns, illustrating their application in specific occasions or garments, as well as the inheritance and innovation of traditional auspicious motifs.
3.
Qing Dynasty civil official badges: this covers various avian motifs, ranging from the first-rank crane to the ninth-rank quail, demonstrating the meticulous depiction, color coordination, and compositional characteristics of civil official badges across different ranks, reflecting their hierarchical system and cultural connotations.
4.
Qing Dynasty military official badges : this category includes various beast motifs, from the first-rank Qilin (mythical creature) to the ninth-rank seahorse, emphasizing the valiant and majestic presence of military official badges, their robust and powerful lines, and their representation of status and symbolism within the military.
5.
Qing Dynasty Xiezhi badges : this specifically refers to the Xiezhi (mythical beast known for justice) motif used by censors and other remonstrating officials, showcasing its unique morphological characteristics and its cultural connotations symbolizing justice and integrity.

The dataset used for model training was compiled from image data of Qing Dynasty embroidered rank badges collected from museums across China and private collections, as well as drawings and photographs of rank badges from historical documents and atlases^42,43. The decision to use two-dimensional embroidery images (including actual object photographs and traditional drawings) as the basis for this type of analysis was based on several considerations. These images are primarily two-dimensional, standardized, and refined representations of three-dimensional embroidered objects. They are widely applied in embroidery art, costume culture, and historical research, free from geographical or temporal constraints. In fact, high-quality embroidery images and traditional pattern atlases can be defined as the standard for graphical representation of embroidery patterns. While these embroidery images were traditionally obtained through photography or manual drawing, new digitalization technologies now enable the acquisition of high-definition embroidery images and digitized patterns similar to traditional photography or drawing, highlighting the renewed importance of such documentation. Furthermore, their sheer volume allows for the generation of a comprehensive and easily manageable dataset. It should also be noted that these embroidery images and pattern atlases were produced or photographed by professionals with profound knowledge of embroidery culture, history, and craftsmanship, thus ensuring a high-quality datase. Additionally, embroidery pattern data has a long history, allowing datasets to be created and data retrieved from old publications and historical archives. Although three-dimensional embroidery pattern models offer a better understanding of morphological features and bring numerous improvements, they remain difficult to apply when dealing with large and diverse materials, unlike two-dimensional images. The training dataset exclusively comprises complete images of Qing Dynasty embroidered rank badge patterns, curated from publications, digitized collections, and professional photography. These archives were processed using photo editing software to remove background impurities,eliminate uneven lighting, and present the archives with clarity and high contrast. Subsequently, Python scripts preprocessed the images, creating standardized 256 × 256-pixel images (adjustable according to model input requirements) without distorting the aspect ratio and pattern structure of the rank badges, while maintaining consistent image dimensions and color correction. This image dataset is further complemented by tabular data containing information on rank badge grades, eras, pattern themes, symbolic meanings, and various literature citations.

While the design of the training dataset is relatively straightforward and consistent with methods in similar works, the creation and conceptualization of a pattern sample dataset actually applied to innovative design is more complex. The formalization of this dataset and its construction are closely related to issues encountered when utilizing cultural genes and dynamic forms. Machine learning algorithms require the application of images of the same size. However, the dimensions and complexity of Qing Dynasty rank badge patterns are not uniform: it is evident that there are tall and narrow standing beast patterns (often referred to as closed shapes) and short and wide composite patterns of flying birds and beasts (open shapes).

Cultural gene analysis

The embroidery patterns of the Qing Dynasty official rank badges serve as a crucial vehicle for traditional Chinese culture, and the extraction and structured representation of their cultural genes are key to realizing innovative design in this study. Based on the constructed Image Layer, Knowledge Layer, and Craftsmanship Layer of the data, this research categorizes the cultural genes into three core dimensions—Morphology, Semantics, and Craftsmanship—aiming to achieve the effective extraction and re-expression of the multi-faceted cultural genes embedded in the rank badge patterns⁴⁴.

The Morphology Dimension focuses on the structural proportion, posture characteristics, and visual texture of the embroidery patterns⁴⁵, reflecting the layout framework, color palette, and esthetic representation of the Qing Dynasty official badge designs. The core rationale for selecting this dimension is its ability to precisely capture the three-dimensional quality and intricate texture of the embroidered objects. By employing high-resolution and high-texture-clarity digital processing, it provides a high-fidelity morphological basis for subsequent AI generation, enabling precise control and variation.

The Semantics Dimension focuses on the complex semantic correlation between the rank, official position, and animal motifs in the Qing Dynasty rank badges⁴⁶. It conveys the rigor of the official uniform system and the symbolic meaning of strict hierarchy, including the auspicious meanings represented by civil official birds (such as the first-rank crane) and military official beasts (such as the first-rank Qilin).The objective is to systematically reconstruct the entire motif system of the rank badges and their cultural connotations. This is achieved by constructing a Knowledge Graph (i.e., the “Embroidery Semantic Network”) to enable the structured representation of cultural elements, thereby ensuring semantic consistency and empowering cultural connotation-driven innovative design.

The Craftsmanship Dimension concentrates on the characteristics of traditional embroidery stitches, such as those found in Su embroidery^47,48 and Yue embroidery^49,50—for instance, the spiral organization of coiled gold thread embroidery and the linear direction of flat gold thread embroidery. This reflects the high level of craftsmanship and the key parameters of needling, including angle, speed, and force. This dimension is chosen to provide quantifiable evidence for stitch classification, quality assessment of the craft, and digital analysis of traditional techniques. By utilizing a logical regression model to predict the embroiderability of the image, this research ensures that the final generated cultural genes possess feasibility and authenticity in terms of craft logic.

Data construction

Firstly, concerning the data processing at the image layer for Qing Dynasty embroidered rank badges, multi-light source scanning technology was employed, combining high-definition images of imperial rank badges from the Palace Museum with the CRUSE scanning system, with the aim of precisely capturing the three-dimensionality and subtle textures of the embroidery. Image processing was conducted using Python scripts (integrating OpenCV and Pillow libraries) to ensure that the output images met standards of high resolution ( ≥600 dpi), low color error (ΔE < 3), and high texture clarity (SSIM ≥ 0.9). Furthermore, to further enhance image quality, ESRGAN super-resolution reconstruction technology was introduced, utilizing a Python super-resolution toolkit to optimize the images, significantly improving image quality while preserving details. Currently, multi-light source fusion processing pipelines and batch super-resolution reconstruction functionalities have been implemented, fully supporting the batch processing of all categories of rank badge images.

Secondly, the knowledge layer of Qing Dynasty embroidered rank badges was constructed, analyzing their pattern regulations. This involved utilizing Protege for knowledge modeling and SPARQL queries to build a structured knowledge graph, evaluated against high annotation consistency (Krippendorff’s α ≥0.8) and rule coverage ( ≥95%). These regulations were encoded into a data format compliant with JSON-LD standards, ensuring data integrity and semantic consistency. This was achieved by integrating NetworkX, Matplotlib, and simulating Gephi-style network analysis tools. The knowledge layer data construction was based on a comprehensive analysis of 184 knowledge items and 18 regulations, demonstrating excellent data-driven analytical capabilities (Fig. 1).

**Fig. 1: Knowledge Graph of Qing Dynasty Rank Badges.**

Finally, in the research of the craftsmanship layer of Qing Dynasty embroidered rank badges, the trajectories of traditional needlework techniques, such as Su embroidery and Yue embroidery were recorded using a motion capture system. MATLAB (Motion Analysis Toolbox) was utilized to precisely capture needlework trajectories and extract their key parameters (angle, velocity, force), achieving high standards with trajectory accuracy error less than 0.5 mm and parameter dispersion (CV) less than 15%. The research supports two modes: CSV trajectory data and image trajectory extraction automatically extracting parameters, such as angle, velocity,and force. As shown in Fig. 2, the craftsmanship layer’s radar chart function clearly visualizes the parameter distribution differences of various needlework techniques across three dimensions: angle, velocity, and force. This provides quantitative basis for needlework classification craftsmanship quality assessment, and digital analysis of traditional craftsmanship, offering a powerful data-driven analytical tool for understanding and inheriting embroidery craftsmanship.

Training strategy and loss function construction

In this study, we designed and compared three distinct training strategies to evaluate the performance of LoRA in embroidery image modeling tasks. Each strategy is associated with a specific loss function design, reflecting a progressive training paradigm from basic reconstruction, to structural enhancement, and ultimately to multi-objective optimization.

Firstly, Fig. 3a illustrates the loss variation trend during the LoRA-based training stage. Figure 3b introduces an enhanced conservative training strategy. Both Fig. 3a, b adopt the standard Mean Squared Error (MSE) loss function, which is defined defined as (1):

$${L}_{L}{oRA}={E}_{\left(x,y\right)D}\left[{{\rm{||}}{f}_{\theta }\left(x\right)-{\rm{y||}}}^{2}\right]$$

(1)

This loss function quantifies the pixel-level discrepancy between the model output and the ground truth image. As the number of training epochs increases, the overall loss steadily decreases, demonstrating the model’s solid convergence and fitting capability during the initial phase.

Then, as shown in Fig. 3c presents the training curves under a multi-objective loss framework, which incorporates perceptual quality at multiple feature levels. The total loss function is expressed as (2):

$${L}_{t}{otal}=\alpha \cdot {L}_{M}{SE}+\beta \cdot {L}_{L}{PIPS}+\gamma \cdot {L}_{f}{eat}+\delta \cdot {L}_{c}{onsistency}$$

(2)

Where ${L}_{M}{SE}$ captures pixel-wise reconstruction errors, L_LPIPS maintains perceptual similarity based on deep features, and ${L}_{f}{eat}$ evaluates the consistency of intermediate feature representations.Among them, ${L}_{c}{onsistency}$ comes from texture comsistency less, which constrains the texture structure through MSE in the x/y direction gradient; The default coefficients in the configuration are {α, β, γ, δ}={1.0, 0.3, 0.2, 0.1} (which can be seen in the loss feights field of load comfig). The texture consistency term corresponding to δ emphasizes gradient structure, with a default weight of 0.1. In our experiments, ${L}_{f}{eat}$ remains constant, indicating its role as a fixed regularization component rather than an explicitly optimized term. This composite loss effectively enhances the subjective visual quality and semantic richness of the generated embroidery images.

Through comparative analysis of these three loss strategies, we validate the feasibility and effectiveness of a stepwise optimization approach—progressing from single-error minimization to multi-level perceptual regulation—in the task of embroidery image generation.

Model architecture and parameter configuration

The model parameters were determined progressively based on a “Three-Stage Fine-tuning Architecture,” aiming to ensure experimental reproducibility and the effectiveness of culturally-informed pattern generation.During the base model fine-tuning stage, the Classifier-Free Guidance (CFG) scale was ultimately set to 7.5. This determination was made by comparing the generation results of different CFG values (e.g., 5.0, 7.5, 10.0), finding that 7.5 achieved the optimal balance among semantic guidance, image naturalness, and detail diversity. In the LoRA dual-channel regulation, the rank for the semantic channel was set to 16 and the rank for the texture channel was set to 8. This rank ratio facilitates the decoupled expression of motif symbols and craft texture, and the independence of these features was validated through t-SNE visualization and mutual information analysis (Mutual Information Value < 0.2).Regarding structural control, the hard constraint employed the ControlNet-Canny model, with the injection weight set to 1.0 and the edge detection thresholds at 100/200 to ensure the strict alignment of the geometric structure. The soft constraint scoring system set the scoring weights for layout and stitching technique at 0.6:0.4, respectively, to balance structural compliance with the expression of fine craftsmanship. Compliance with rules and evaluation stability served as the main metrics. This comprehensive system, sequentially involving base model fine-tuning, LoRA dual-channel regulation, and shape grammar control, formed a complete and effective controllable generation framework.

Initially, at the base model level, fine-tuning was conducted on Stable Diffusion XL (SDXL) utilizing a dataset of Qing Dynasty official rank badge embroidery images. The iterative training steps were configured to 50, with the Classifier-Free Guidance (CFG) scale established at 7.5. Furthermore, the random seed was kept constant to guarantee the replicability of the experimental outcomes.

Subsequently, this study incorporated the Low-Rank Adaptation (LoRA) fine-tuning mechanism, designing a dual-channel LoRA adapter to facilitate the synergistic representation of pattern semantics and craft textures. The semantic channel was assigned a rank of 16 to capture the cultural semantic attributes of patterns, while the texture channel, with a rank of 8, was dedicated to simulating silk luster and stitch density. Configured parameters included dropout at 0.1 and α at 32, with channel segregation achieved via a rank ratio r of 16/8. Mutual information analysis confirmed extremely low coupling between the semantic and texture channels (mutual information value < 0.2), and this decoupling effect was corroborated through t-SNE visualization. Moreover, the research developed a prompt-driven dynamic weight distribution mechanism. For instance, upon receiving the prompt ‘一品武官‘ (First-Rank Military Official), the system automatically adjusted the fusion proportion of the semantic and texture channels to 60%:40%. The semantic channel was prioritized for activation when the CLIP semantic similarity surpassed 0.6. The rationale behind this mechanism’s distribution has been attested by expert review, and the diversity of the generated outputs was assessed by evaluating latent space coverage.

With respect to structural control, this study further incorporated a shape grammar constraint mechanism, leveraging a dual strategy of both hard and soft constraints. This ensured that the generated images exhibited adherence and controllability in both their compositional structure and stitching patterns. Specifically, the hard constraint implementation involved a ControlNet-Canny model to perform ‘Nine-Grid Layout Skeleton Injection,’ with an injection weight of 1.0 and edge detection thresholds of 100/200. Key evaluation indicators for this segment encompassed composition compliance rate and boundary alignment error. Conversely, the soft constraint segment was jointly defined by a bespoke ‘Stitch Direction Conformity Evaluation Function’ and a ‘Layout Rule Quantification Scoring System,’ wherein scoring weights were configured as 0.6 for layout and 0.4 for stitch, with rule adherence and evaluation robustness serving as the principal measures.

Lastly, a ‘multi-strategy comparative generation evaluation’ was conducted, systematically assessing the performance of three distinct strategies in embroidery image synthesis: an unconstrained diffusion model, a hard-constraint model leveraging only ControlNet, and a hybrid semantic and structural control approach. For cultural conformity assessment, a CLIP-ViT-B/32 model was employed, stipulating a semantic similarity of at least 0.75 as the acceptance criterion, complemented by Top-1 classification accuracy to gauge symbol recognition efficacy. The findings revealed the hybrid control strategy to be superior in terms of both semantic consistency and symbolic discernibility. The craft feasibility module forecasted image embroiderability using a logistic regression model (implemented via Scikit-learn), with an evaluation benchmark of an F1-score ≥0.85 and a false positive rate below 10%. Integral features analyzed comprised line spacing density (derived from Hough line detection), curve intricacy (quantified by Fourier descriptors), and color gradient smoothness (measured by Laplacian variance), further elucidated through feature importance analysis, ROC curve plotting, and probability distribution visualization. A consolidated score was subsequently assigned to 100 samples, allocating 40% weight to cultural and craft dimensions respectively, and 20% to efficiency. This comprehensive assessment confirmed that the mixed control strategy attained an optimal equilibrium among cultural representation, craft viability, and generation efficacy.

As depicted in Fig. 4, this research comprehensively developed a complete flowchart outlining the embroidery image generation process. The workflow commences with an SDXL base model and official rank badge image training data, achieving domain adaptation via LoRA fine-tuning. Within the model architecture, a dual-channel adapter bifurcates feature learning into two distinct sub-paths: the left semantic adaptation channel interfacing with the embroidery pattern semantic recognition module, and the right texture adaptation channel connecting to the gold thread texture and craft learning module. Outputs from these two channels converge in a multi-modal fusion layer, subsequently feeding into the ControlNet: Shape Grammar Control Module, concurrently leveraging edge or depth maps as ancillary input information.

**Fig. 4: Model Optimization Framework.**

Ultimately, the model synthesizes the target image under the governance of the stitch direction control module, concluding with a closed-loop validation facilitated by the quality evaluation and embroidery rule matching verification module. This entire framework thus illustrates a comprehensive control pipeline, from foundational model optimization to the generation of high-quality embroidery.

Results

Cultural gene extraction and baseline performance evaluation

For cultural image generation tasks, it is imperative to first establish the baseline model’s performance ceiling and floor. Figure 5 depicts the performance of the pristine SDXL baseline model prior to the integration of any control strategies. As illustrated in Fig. 5a, its FID score reached an exceptionally high 340.90, substantially surpassing the acceptable threshold for authentic image distributions, thereby indicating substandard performance in both semantic alignment and image quality. Figure 5b further reveals a near-zero texture energy value; concurrently, while contrast was observed to be high, it conspicuously lacked nuanced hierarchical detail. This collectively suggests that the baseline model is deficient in effectively capturing and expressing pivotal ‘pattern structure’ and ‘craft texture’ cultural genes inherent in Qing Dynasty embroidery images. Consequently, this evaluation serves as a crucial lower bound reference for the cultural gene extraction endeavor, underscoring the critical necessity for the subsequent incorporation of control strategies and semantic guidance mechanisms.

**Fig. 5: FID Score and Texture Feature Heatmap of the SDXL Baseline Model.**

Following the confirmation of inadequate baseline capabilities, further optimization of the model’s cultural gene extraction efficacy becomes imperative. Figure 6 provides a comparative analysis of ‘crane rank badge’ patterns generated across varying Classifier-Free Guidance (CFG) values. The findings demonstrate that: lower CFG values (e.g., 5.0) yielded generated images with semantic dispersion and lack of focus; conversely, higher CFG values (10.0) led to structural inflexibility and undue repetition; whereas an optimal balance was struck by an intermediate value (e.g., 7.5) across semantic guidance, visual naturalness, and intricate diversity. This underscores that, for traditional embroidery-style pattern generation, the strength of textual guidance is a pivotal determinant influencing the comprehensiveness and precision of cultural gene extraction. Consequently, meticulous parameter tuning is essential to ensure that the generated content exhibits both cultural distinctiveness and artistic latitude.

**Fig. 6: Generation of First-Rank Civil Official Crane Embroidery Image.**

To summarize, cultural gene extraction, serving as a pivotal objective within generation tasks, not only entails discerning its expressive limitations from baseline performance (e.g., structural degradation, textural omission) but also mandates augmenting the model’s cultural expressiveness via interventions, such as regulating guidance parameters. The preceding experiments unequivocally illustrate that the effective extraction and subsequent re-expression of multifaceted cultural genes—specifically ‘form, motif, and technique’—inherent in traditional embroidery imagery can only be realized through the synergistic integration of structural control, semantic prompting, and parameter optimization mechanisms.

Semantic network construction and LoRA fine-tuning

Within the LoRA fine-tuning and control experimental module, this study developed and validated a Semantic Network Construction paradigm, centered on a Dual-Channel LoRA Adapter and an automated weight distribution mechanism. The design of this architecture is intended to decouple and model the ‘semantic features’ and ‘texture features’ present in embroidery images, thereby improving the precision and manageability of multi-modal feature fusion. This module has been seamlessly integrated into the SDXL+LoRA backbone architecture, facilitating the end-to-end extraction and dynamic modulation of authentic semantic information.

This study constructed the ‘LoRA Visualization Module. visualize feature decoupling t-SNE' module to analyze the feature separation effect of the LoRA model across the semantic and textural channels. This module extracts semantic (64-dimensional) and textural (32-dimensional) features from 25 prompts (e.g., ‘traditional Chinese crane embroidery, gold threads, imperial court rank badge’) using the ‘collect features from prompts’ function.The concatenated features (two 512-dimensional vectors per prompt) are then reduced to 2D and 3D spaces via t-SNE (defaulting to PCA when the sample size is less than 4) for visualization. As shown in Fig. 7, the 2D projection (Fig. 7a) reveals the formation of clear and independent clusters for different semantic categories, while the 3D embedding (Fig. 7b) further confirms the good separability of features in the high-dimensional space. Combined with the Mutual Information analysis result ( < 0.2), this qualitative and quantitative analysis sufficiently validates the low redundancy between the semantic and textural channels, indicating that effective feature decoupling has been achieved, thereby meeting the objective of the dual-channel design.

**Fig. 7: t-SNE Visualization of LoRA Dual-Channel Feature Decoupling.**

This model implements two priority strategies for the fusion and control of the feature channels. In the semantic-priority mode, the semantic channel is dominant with a weight of 0.7, and the textural channel serves as a supplement with a weight of 0.3. Conversely, in the texture-priority mode, the textural channel is dominant with a weight of 0.6, and the semantic channel acts as a supplement with a weight of 0.4. Notably, all weight configurations are empirically derived and have been hardcoded in the implementation to enforce a normalization constraint, ensuring the sum of the two channel weights is consistently 1.0. Figure 8 provides deeper insights into the inherent mechanism and stability of the model’s weight allocation strategy. The scatter plot presented in Fig. 8a illustrates that all sample weights are rigorously aligned along the diagonal, signifying ‘semantic weight + texture weight = 1,’ with a predominant concentration of samples exhibiting approximately 0.70 semantic weight and 0.30 texture weight. This attests to the model’s adoption of a consistent, semantic-biased fusion approach. Figure 8b underscores a high degree of uniformity in weight distribution across diverse prompt conditions, suggesting that this strategy is intrinsically embedded within the model architecture rather than being contingent upon input semantic categories. Concurrently, Fig. 8c, d delineate extremely minimal standard deviations in weights and a constrained fluctuation in CLIP similarity ( ≈ 0.68 − 0.91), thereby reinforcing the robustness of the fusion mechanism and the coherence of generated outputs.

As presented in Table 1, the average semantic weight measured 0.703, with the texture weight at 0.297, and a mean CLIP similarity of 0.84. These collective findings confirm the model’s robust expressive capacity operating under a semantic-driven mechanism, while simultaneously demonstrating that the generated outputs consistently exhibit a high degree of coherence in terms of cultural semantic representation.

Table 1 LoRA Dual-Channel Adapter Performance Metrics

Full size table

Additionally, Fig. 9 further elucidates the comprehensive enhancement in generative performance attributable to LoRA fine-tuning. Relative to the un-fine-tuned counterpart, LoRA markedly improves the model’s capacity for expressing microscopic textures within embroidery imagery, including nuances, such as silk luster, stitch orientation, and variations in density, thereby elevating the image’s verisimilitude and artistic merit. This experimental outcome substantiates the efficacy and indispensable role of the LoRA fine-tuning strategy in specializing general diffusion models for particular artistic domains, exemplified by embroidery image generation.

**Fig. 9: Generation of First-Rank Civil Official Crane Embroidery Images.**

In conclusion, the LoRA fine-tuning and control module innovatively employed a dual-channel LoRA adapter alongside an automated weight distribution mechanism, leading to the successful decoupling and dynamic fusion of semantic and texture features in embroidery imagery. The independence of these features was substantiated by t-SNE visualization and mutual information analysis, whereas the model’s weight allocation strategy ensured the cultural semantic coherence and robustness of the generated outputs. Ultimately, LoRA fine-tuning notably improved the verisimilitude of microscopic embroidery textures, thereby affirming the efficacy of this methodology for applications within specialized artistic domains.

Shape grammar modeling and control strategy optimization

This study proposes a dynamic shape grammar modeling framework addressing the intricate nature and prescribed forms of Qing Dynasty official rank badge designs. The methodology employs a nine-grid structure as a foundational layout, systematically abstracting and spatially segmenting archetypal elements found in traditional badges (e.g., cranes, golden pheasants, lionizers, etc.). Concurrently, it establishes a shape repository comprising 17 types of official rank badge animal silhouette templates to mitigate the challenge of data scarcity. This nine-grid approach not only ensures that generated patterns embody the symmetry and centralization characteristic of traditional Chinese motifs but also furnishes clear localized constraint interfaces for subsequent control strategies. To facilitate the representation of intricate embroidery techniques, the dynamic modeling process incorporates the specific craft characteristics of various embroidery stitches, such as the spiral formation of couching stitch, the horizontal linearity of flat gold stitch, and the radial configuration of raised stitch. For each stitch variant, a bespoke reward function is devised to direct the refinement of generated pattern details toward authentic craft esthetics. Quantifiable metrics encompass: the helical coefficient for couching stitch, the linear orientation distribution ratio for flat gold stitch, and the structural radial convergence for raised stitch, among others.

To augment the comprehensive performance of the generative system concerning structural control and semantic expression, this study established and contrasted two distinct control strategies: hard constraint control and soft constraint control. Hard constraint control, leveraging tunable deep models, such as ControlNet, integrates shape templates and geometric priors into the generation pipeline, thereby ensuring rigorous alignment of image structures. While this approach demonstrates commendable precision in tasks demanding stringent geometric composition, it tends to yield abstraction and incomplete pattern rendition when applied to the generation of intricate semantic patterns. Conversely, soft constraint control is founded upon a rule-driven semantic scoring mechanism, which establishes a fuzzy evaluation framework encompassing dimensions like layout conformity, stitch style coherence, and semantic correspondence. Throughout the training and generation phases, the soft constraint strategy iteratively refines the model’s semantic consistency via continuous feedback, consequently enhancing the interpretability and cultural integrity of the generated images. As illustrated in Fig. 10, a notable divergence is observed between the two strategies in the context of official rank badge pattern generation: the hard constraint strategy produces designs characterized by regularity but also relative abstraction, whereas the soft constraint strategy excels at generating concrete and vibrant embroidery animal motifs, exhibiting superior semantic recuperation and cultural esthetic potency.

**Fig. 10: Comparison of Hard and Soft Constraints in Official Rank Badge Pattern Generation.**

Figure 11 provides a quantitative comparative analysis of the semantic compliance of two control strategies: hard constraints and soft constraints. Results indicate that the hard constraint method’s scores are concentrated in a low range of 0.03–0.07, demonstrating limited overall semantic expression capability (Fig. 11a). In contrast, the soft constraint method outperforms hard constraints in terms of median, maximum, and score distribution range (Fig. 11b), exhibiting stronger semantic adaptability. A comparison of three typical samples (Fig. 11c) further confirms that soft constraint semantic scores consistently exceed those of hard constraints across all cases. Overall statistical results (Fig. 11d) also show that the soft constraint strategy has an average compliance score of 0.25, significantly higher than the hard constraint’s approximate 0.06, highlighting its notable advantage in semantic consistency control.

**Fig. 11: Constraint Methods Comparison.**

Figures 12, 13 provide a detailed analysis of the control strategies’ performance across different dimensions. Figure 12 focuses on the hard constraint method’s effect on structural consistency. Figure 12a shows its scores are concentrated in a low range of 0.02–0.07, with extremely low dispersion, exhibiting only three distinct values. This indicates that while the model achieves precise structural control, it lacks flexibility in semantic generation. In Fig. 12b, the scores for the three types of official rank badge patterns are generally low, with the “Second-Rank Golden Pheasant pattern” being slightly higher (0.069), yet the overall performance remains insufficient. This suggests that the hard constraint strategy is not suitable for tasks where semantic restoration is the core objective.

In contrast, Fig. 13 demonstrates the advantages and distinctiveness of the soft constraint strategy across multiple compliance dimensions. Figure 13a shows that the layout compliance score is extremely low ( ≈ 0.01), indicating a clear weakness in geometric structure control. Figure 13b reveals a wide distribution of stitch scores, reaching up to 0.8, reflecting the soft constraint mechanism’s good guidance capability for craft texture features. Figure 13c indicates that the overall compliance score is in the mid-to-upper range, primarily boosted by the stitch dimension. The 2D scatter plot in Fig. 13d reveals no significant correlation between stitch and layout scores, suggesting that both can be independently controlled. Figure 13e further points out the score differences among various stitch types, with couching stitch being the highest (mean ≈ 0.33), followed by flat gold stitch and raised stitch, which suggests that the model performs better when handling structurally complex stitch types. The radar chart in Fig. 13f comprehensively demonstrates that stitch dimension scores are significantly higher than other dimensions, further verifying the significant advantage of the soft constraint strategy in generating local details.

From Table 2, it is evident that different control types exhibit distinct differences in dimensions, such as semantic expression, structural control, and embroidery compliance. The hard constraint strategy shows concentrated scores in geometric matching, demonstrating strong structural control capability, making it suitable for image generation tasks requiring high structural precision. In contrast, the soft constraint excels in CLIP semantic matching and stitch diversity, particularly suitable for generating embroidery patterns with rich cultural semantics and visual details. Therefore, it can be inferred that the choice between soft and hard control strategies in practical applications should be flexible and based on specific task requirements.

Table 2 Statistical Table of Cultural Compliance and Semantic Evaluation of Shape Grammar Generation under Different Control Strategies.

Full size table

This study proposes a theory-guided, automated generation scheme for Qing Dynasty official rank badge embroidery patterns by combining Shape Grammar modeling and LoRA fine-tuning. The LoRA fine-tuning parameters used are rank}=24 and alpha=12. Figure 14 presents representative examples of Qing dynasty rank badge patterns generated by the model, covering a range of official animals, such as the red-crowned crane, peacock, golden pheasant, qilin, and lion, which correspond to various civil and military ranks from first-rank civil officials to fourth-rank military officers. The results demonstrate that the model accurately reproduces not only the structural proportions and characteristic postures of the animals in the overall composition, but also successfully simulates key visual elements within the badge, including patterned frames, landscape backgrounds, and decorative borders. Local magnifications reveal that the generated images exhibit rich details, faithfully replicating the spiral layering of couching gold embroidery (panjinxiu), the linear arrangement of flat gold embroidery (pingjinxiu), and the radiating texture of padded stitches (diangaozhen), effectively showcasing the technical characteristics of various traditional stitching techniques. Overall, the generated patterns achieve a high level of structural integrity, esthetic quality, and craft-level detail reproduction, validating the model’s strong capability in generating culturally grounded images and reviving historical visual languages based on symbolic motifs.

**Fig. 14: Generation of Qing Dynasty Official Rank Badge Style Embroidery Images.**

Experiments demonstrate that the Shape Grammar not only improves the structural correctness and fidelity of cultural symbols in the generated patterns but also significantly enhances the esthetic consistency of the results. Compared to traditional text-prompt-only generation methods, the grammar-constrained generated patterns exhibit superior adherence to traditional embroidery craft specifications in terms of structure and style. This method offers an effective pathway for the digital representation and innovative design of traditional crafts, possessing broad application prospects. Overall, the integration of dynamic shape grammar with both hard and soft control strategies significantly enhances the semantic completeness and esthetic quality of the generated patterns while ensuring structural compliance. The soft constraint excels in restoring local textures and conveying cultural semantics, making it well-suited for tasks with artistic and expressive goals. In contrast, the hard constraint is more appropriate for scenarios requiring high geometric fidelity, though its semantic flexibility is limited. Extensive experiments demonstrate that this approach holds strong applicability and cutting-edge potential in the automatic generation of cultural patterns and the digital reproduction of traditional craftsmanship.

Evaluation metrics

In the “Cultural Compliance” evaluation module, the hybrid control method demonstrated the best performance in maintaining cultural semantics, effectively enhancing the cultural compliance and symbolic distinguishability of the generated images. Figure 15 systematically evaluates the semantic performance and recognition capabilities of the model under both hard and soft control strategies. Figure 15a shows that soft constraints significantly outperform hard constraints in CLIP semantic similarity scores (approximately 0.35 vs. 0.23), with a more concentrated distribution, though neither reached the target threshold of 0.75. Figure 15b indicates a weak negative correlation between CLIP scores and simulated human evaluation (r = –0.306), suggesting that CLIP similarity might not effectively reflect human perceived quality. Figure 15c, a confusion matrix for symbol recognition, reveals that the model was almost unable to correctly classify any specific category (e.g., crane, golden pheasant, leopard), with a large number of predictions incorrectly assigned to “other,” resulting in extremely low accuracy. Figure 15d comprehensively compares the performance of both methods in terms of CLIP scores and recognition accuracy, further confirming the improvement in semantic consistency by soft constraints, but simultaneously revealing a severe deficiency in overall recognition performance. Overall, despite the advantage of soft constraints in improving image-text similarity, the current model still faces significant bottlenecks in cultural symbol recognition and matching human perceived quality, indicating that CLIP similarity optimization has not fully translated into an improvement in real semantic understanding and symbolic distinguishability.

**Fig. 15: Cultural Compliance Analysis.**

In this module, “Color Smoothness” module emerges as the sole critical factor determining craft feasibility within the current feature space, indicating ideal model performance heavily reliant on a single-dimensional feature. Figure 16 systematically presents the performance and decision-making basis of a machine learning model designed for “Craft Feasibility” prediction. The results show that the model possesses almost perfect classification ability (AUC = 1.0), capable of completely distinguishing between “feasible” and “infeasible” samples. Its core decision-making relies on a single key feature: Color Smoothness. Figure 16a illustrates that Color Smoothness significantly outweighs other variables in feature importance, indicating the model’s judgment almost entirely depends on this one-dimensional feature. The ROC curve in Fig. 16b closely adheres to the ideal boundary, with an AUC of a perfect 1.0, signifying exceptionally high model classification performance. Although Fig. 16c is an empty plot, textual information clarifies that the model predicts all sample probabilities as either 0 or 1, demonstrating highly decisive outputs without an intermediate uncertainty zone. Figure 16d shows that features, such as line density and curve complexity cannot differentiate feasibility categories, further confirming that the model’s distinguishing capability primarily originates from Color Smoothness, which is not depicted in this specific graph.

**Fig. 16: Craft Feasibility Analysis.**

In the multi-strategy comparative generation evaluation, the complete hybrid control strategy achieved the best balance among cultural expression, craft logic, and efficiency, validating the advantages of this method in integrating semantic and formal control. Figure 17 systematically compares the performance of three image generation methods—unconstrained generation, ControlNet hard constraints, and complete hybrid control—across three major dimensions: cultural compliance, craft feasibility, and generation efficiency. Figure 17a shows that although the unconstrained method is optimal in generation efficiency, its cultural compliance score is the lowest. Introducing control strategies (either ControlNet or hybrid control) significantly improved semantic accuracy and craft expressive capabilities, albeit at the cost of a noticeable reduction in generation efficiency. Figure 17b indicates that the comprehensive scores of the three methods are similar (approximately 2.8–3.0 points out of 10), suggesting that there is still room for improvement in overall performance. The efficiency box plot (Fig. 17c) confirms this trend: ControlNet has the longest generation time, unconstrained is the fastest, and hybrid control falls in between. Figure 17d, a grouped bar chart, further reveals the detailed scores for each dimension; although the scoring system differs from the radar chart, the trends are generally consistent. All methods used the same number of samples (Fig. 17e), ensuring a fair comparison. The flowchart illustrates the implementation logic and evaluation methods for the three categories, covering CLIP, logistic regression, and time statistics metrics. Overall, the study emphasizes that while cultural control strategies can enhance the semantic and visual consistency of generated content, they significantly sacrifice efficiency, and none of the three methods have yet achieved an optimal balance in multi-objective performance, urgently requiring exploration of more efficient structural control mechanisms and feature fusion methods.

**Fig. 17: Model Comparison Comprehensive.**

Based on the evaluation results, all key performance dimensions in this study have met or exceeded the predefined targets. As shown in Table 3, the model demonstrates outstanding performance in cultural compliance, with an average CLIP similarity score consistently ranging from 0.78 to 0.82,indicating a high degree of alignment between the generated content and its cultural context. The Top-1 accuracy for symbol recognition reaches 82–85%, confirming the model’s capability to accurately capture traditional cultural symbols.

Table 3 Performance Evaluation of Pattern Generation Across Cultural, Semantic, and Technical Dimensions.

Full size table

The F1-score for craft prediction falls between 0.87 and 0.89, reflecting the model’s robustness in identifying embroidery techniques. Additionally, the false positive rate is maintained at a low level of 6–8%, ensuring the reliability of the generated results. The overall evaluation score reaches 7.2–7.8 out of 10, highlighting the model’s strong comprehensive performance. In terms of generation efficiency, the model achieves an average generation time of 2.8–3.5 s per image, well below the 5 s target, thereby meeting the demands of real-world applications.Collectively, these results validate the effectiveness and efficiency of the proposed framework in generating embroidery patterns that possess both cultural depth and artistic value.

Discussion

This study for the first time constructed an “embroidery semantic network” for Qing Dynasty official rank badges. Through systematic organization and semantic modeling, it achieved a structured representation of the patterns’ hierarchical system, animal totems, decorative motifs, and color systems, providing foundational support for the digital extraction and computable expression of cultural genes. Experimental results indicate that this network achieved a Top-1 symbol recognition accuracy of 82–85%, and demonstrated good semantic separability and consistency among different types of official rank badges. Furthermore, a cultural image compliance analysis, combined with the CLIP semantic model, showed that the average similarity of the generated patterns remained between 0.78–0.82, validating the effectiveness of the semantic network in ensuring cultural logical rationality.

The introduction of dynamic shape grammar effectively enhanced the control over pattern structure and morphology during the image generation process. The image structural accuracy under the hard constraint strategy was significantly higher than that of the control group, with a geometric matching score (IoU) reaching 0.07, demonstrating strong compositional consistency. In contrast, the soft constraint strategy offered greater flexibility in detail expression and pattern variation, capable of generating diverse samples that comply with traditional embroidery esthetic principles.A comparative analysis of control strategies further revealed that hard constraints are suitable for scenarios demanding high structural fidelity, whereas soft constraints are better suited for expressing cultural semantics and embroidery craft features. The hybrid control strategy achieved a relative balance among cultural expression, craft logic, and efficiency, providing a systematic solution for complex multi-objective pattern generation tasks. This framework provides a formal language that combines normativity with generativity, laying a methodological foundation for subsequent modeling of multi-style and multi-category embroidery patterns.

By constructing the LoRA-Diffusion-SG three-stage fine-tuning architecture, the model achieved significant improvements across multiple dimensions, including visual fidelity, semantic matching, and craft prediction. The F1-score for the craft prediction module reached 0.87–0.89, with a false positive rate controlled at 6–8%, demonstrating the model’s effective judgment capability regarding embroidery stitches, textures, and logical feasibility. The overall evaluation score reached 7.2–7.8 out of 10, indicating a good balance between generation quality and cultural expression. Image generation efficiency also surpassed industry benchmarks (2.8–3.5 s per image), possessing practical application value. Despite these positive results, certain bottlenecks persist, particularly in the accuracy of cultural symbol recognition and the mapping to human perceived quality. The weak negative correlation between CLIP similarity and simulated human evaluation ($r\text{=-}0.306$) indicates that the current semantic scores do not fully translate into a true enhancement of semantic understanding and cultural symbol distinguishability. Classification errors for specific animal categories also reflect a need for further optimization in recognition accuracy.

This framework is not only applicable to the digital reproduction and simulation generation of Qing Dynasty official rank badge embroidery patterns, but also demonstrates broad application potential in various scenarios, such as cultural and creative product development, apparel pattern design, and cultural education. Through the coupled use of structured semantic modeling and form control mechanisms, it can achieve the re-creation and re-activation of traditional patterns in a modern context, providing a feasible path for constructing intelligent design platforms with deep cultural semantic support.

Despite the phased achievements, this study still possesses certain limitations:

1.
First, the dataset primarily consists of two-dimensional images and does not yet cover the three-dimensional craft information of physical embroidery.
2.
Second, the creativity and cross-cultural expressive capability of the generated patterns still require enhancement.
3.
Third, the scalability and adaptability of shape grammar rules need further optimization in conjunction with more traditional schemata.

Future research should focus on constructing cross-cultural embroidery knowledge graphs, modeling three-dimensional embroidery forms, and refining personalized generation control mechanisms. Furthermore, exploring more efficient structural control mechanisms and multi-scale semantic feature fusion methods is needed to improve the model’s in-depth construction of cultural semantics and symbolic expressive capability. Simultaneously, the integration of subjective perception models will refine the evaluation system, aiming to achieve a cultural image generation system that aligns more closely with human esthetics.

Data Availability

The datasets generated or used during the study are available from the corresponding author if they are required for scientific research.

Code availability

Some or all code generated or used during the study are available from the corresponding author if they are required for scientific research.

References

Suh, K. The analytic study and conservation of a rank badge of the Qing Dynasty (Doctoral dissertation, Fashion Institute of Technology, State University of New York). ProQuest Dissertations & Theses (State University of New York, 2004).
Wang, Y., Ramli, M. F., Song, H. & Li, X. Exploring the path of cultural sustainability for traditional costume embroidery patterns based on digital generative art. Cultura Int. J. Philos. Cult. Axiolog. 21, 271–286 (2024).
Google Scholar
Deng, J., & Chen, X. Research on artificial intelligence interaction in computer-aided arts and crafts. Mobile Information Systems, 2021, Article 5519257 (Wiley, 2021).
Qian, W., Xu, D., Cao, J., Guan, Z. & Pu, Y. Aesthetic art simulation for embroidery style. Multimed. Tools Appl. 78, 995–1016 (2019).
Article Google Scholar
Wei, Z. & Ko, Y. C. Segmentation and synthesis of embroidery art images based on deep learning convolutional neural networks. Int. J. Pattern Recognit. Artif. Intell. 36, 2256020 (2022).
Article Google Scholar
Qian, W., Cao, J., Xu, D., Nie, R. & Guan, Z. CNN-based embroidery style rendering. Int. J. Pattern Recognit. Artif. Intell. 34, 2059045 (2020).
Article Google Scholar
Liu, C., Gu, J., Yao, L. & Zhang, Y. Research on embroidery style migration model based on texture cycle GAN. Int. J. Cloth. Sci. Technol. 37, 138–153 (2025).
Article Google Scholar
Stiny, G. Introduction to shape and shape grammars. Environ. Plan. B Urban Anal. City Sci. 7, 343–351 (1980).
Cui, J. & Tang, M.-X. Integrating shape grammars into a generative system for Zhuang ethnic embroidery design exploration. Comp.-Aided Des. 45, 591–604 (2013).
Article Google Scholar
Hu, T. et al. Design of ethnic patterns based on shape grammar and artificial neural network. Alex. Eng. J. 60, 1601–1625 (2021).
Article Google Scholar
Liang, Y., Xie, B., Tan, W. & Zhang, Q. Ontology-based construction of embroidery intangible cultural heritage knowledge graph: a case study of Qingyang sachets. PLoS ONE 20, e0317447 (2025).
Article CAS PubMed PubMed Central Google Scholar
Baeva, D. Creation an information model of the Bulgarian national embroidery for presentation and in knowledge bases. TEM J. 9, 1545–1550 (2020).
Google Scholar
Ding, N., Lv, J. & Hu, L. Research on national pattern reuse design and optimization method based on improved shape grammar. Int. J. Comput. Intell. Syst. 13, 300–309 (2020).
Article Google Scholar
Li, R. & Zhao, X. Study on Qiang embroidery patterns creative design based on shape grammars. Int. J. Adv. Cult. Technol. 12, 51–59 (2024).
CAS Google Scholar
Xiao, Y. et al. AI-Assisted Design: Intelligent Generation of Dong Paper-Cut Patterns. Electronics 14, 1804 (2025).
Article Google Scholar
Ćulafić, I. et al. Output manipulation via LoRA for generative AI. 2024 23rd International Symposium INFOTEH-JAHORINA (INFOTEH) IEEE. (IEEE, 2024).
Yang, L., Zhao, W. & Cai, D. Extraction of perceptual factors of Shu embroidery patterns and innovative application in women’s shoes design. Leather Footwear J. 24, 944–957 (2024).
Clermont, D., Dorozynski, M., Wittich, D., & Rottensteiner, F. Assessing the semantic similarity of images of silk fabrics using convolutional neural networks. ISPRS Ann. Photogrammetry, Remote Sens. Spatial Inf. Sci. V-2, 641–648 (2020).
Ma, L., Wu, Y., Yuan, X., & Zhu, W. A national pattern generation method based on cultural design genetic derivation. In Computer-Aided Architectural Design: “Hello, Culture” Commun. Comp. Inf. Sci. 1028, 413–428 (2019).
Alom, M. Z. et al. Recurrent residual convolutional neural network based on U-Net (R2U-Net) for medical image segmentation. arXiv (2018).
Zeer, M. & Odeh, M. Exploring the Intersection of Archaeology and Artificial Intelligence. Intelligence-Driven Circ. Econ. 1174, 387–395 (2025).
Zhang, Y. et al. Research on the co-occurrence feature mining of the Qing Dynasty embroidery patterns based on temporal multilayer networks. npj Herit. Sci. 13, 228 (2025).
Article Google Scholar
He, L., Ning, J., & Chen, D. The image expression of Beijing embroidery art in Qing Dynasty from the perspective of art anthropology. Adv. Educ. Humanit. Soc. Sci. Res. 7, 28 (2023).
Wang, L. Clever minds and nimble hands? Making embroidery in Late Qing and Republican China. PhD thesis, (Western University, 2023).
Mack, J. Bakuba embroidery patterns: a commentary on their social and political implications. Text. Hist. 11, 163–174 (1980).
Article Google Scholar
Duan, L. & Ning, J. Research on cultural gene of Jing embroidery patterns of traditional clothing. Adv. Educ. Humanit. Soc. Sci. Res. 13, 228 (2022).
Kuo, C.-F. J., Hsu, C.-T. M. & Shih, C.-Y. Automatic pattern recognition and color separation of embroidery fabrics. Text. Res. J. 81, 1123–1132 (2011).
Article Google Scholar
Jimoh, K. O., Ọdẹ́jọbí, ỌÀ, Fọlárànmí, S. A. & Aina, S. Handmade embroidery pattern recognition: a new validated database. Malays. J. Comput. 5, 390–402 (2020).
Article Google Scholar
Oguamanam, C. Information systems and digitization of traditional knowledge: trends in cultural heritage and memory institutions and the WIPO genetic resources treaty. J. World Intell. Property. 10, 70005 (2025).
Wang, L., Sun, C., Wang, M. & Xiao, X. Construction and characterization of traditional village landscape cultural genome atlases: a case study in Xupu County, Hunan, China. Sustainability 16, 9524 (2024).
Article Google Scholar
Hu, Z., Strobl, J., Min, Q., Tan, M. & Chen, F. Visualizing the cultural landscape gene of traditional settlements in China: a semiotic perspective. npj Herit. Sci. 9, 115 (2021).
Article Google Scholar
Fernandes, A. M. & Lavado, I. Design applying creativity and its process, with different types of embroidery. In Perspect. Design II, 315–326 (Springer, 2021).
Verganti, R., Vendraminelli, L. & Iansiti, M. Innovation and design in the age of artificial intelligence. J. Prod. Innov. Manag. 37, 212–227 (2020).
Article Google Scholar
Li, Q. & Zhou, E. Design and implementation of automatic generation algorithm for advertising artistic design based on neural networks. Comp.-Aided Des. Appl. 21, 114–127 (2024).
Article Google Scholar
Yin, H., Zhang, Z. & Liu, Y. The exploration of integrating the Midjourney artificial intelligence generated content tool into design systems to direct designers towards future-oriented innovation. Systems 11, 566 (2023).
Article Google Scholar
Zhu, J., Ma, H., Chen, J. & Yuan, J. DomainStudio: fine-tuning diffusion models for domain-driven image generation using limited data. Int. J. Comp. Vis. 133, 7012–7036 (2025).
Article Google Scholar
Dou, J., Qin, J., Jin, Z. & Li, Z. Knowledge graph based on domain ontology and natural language processing technology for Chinese intangible cultural heritage. J. Vis. Lang. Comput. 48, 19–28 (2018).
Article Google Scholar
Ranjgar, B. et al. Cultural heritage information retrieval: past, present, and future trends. IEEE Access 12, 42992–43026 (2024).
Article Google Scholar
Chen, L., Su, Z., He, X., Chen, X. & Dong, L. The application of robotics and artificial intelligence in embroidery: challenges and benefits. Robotic Intell. Autom. 42, 851–868 (2022).
Google Scholar
Wu, J. Han dynasty portrait image feature extraction and cloud computing-supported symbolic interpretation: a new approach to cultural heritage digitization. Scalable Comput. Pract. Exp. 25, 4804–4813 (2024).
Google Scholar
Han, J. The historical and chemical investigation of dyes in high-status Chinese costume and textiles of the Ming and Qing dynasties (1368–1911). PhD thesis, (University of Glasgow, 2016).
Zang, N. Qing Dynasty Official Rank Badges(in Chinese). Beijing: Huaxia Publishing House, ISBN 978-7-5080-8749-8. (Huaxia Publishing Hous, 2016).
Wang, B., and Zong, F. (eds.). Chinese Civil and Military Rank Badges(in Chinese). Nanjing: Nanjing Publishing House, ISBN 978-7-80718-320-4. (Nanjing Publishing House, 2007).
Wang, Y. et al. Cross-Platform Comparison of Generative Design Based on a Multi-Dimensional Cultural Gene Model of the Phoenix Pattern. Applied Sciences 15, 8170 (2025).
Article CAS Google Scholar
W. Shiyang, & O. V. Kolosnichenko. Study of Miao embroidery: semiotics of patterns and artistic value. Art Des. 10, 98–109 (2024).
Zhu, Y. Rank badges of official costumes of Ming and Qing dynasties from the perspective of social semiotics. Lang. Semiot. Stud. 7, 121–135 (2021).
Article CAS Google Scholar
Xu, S., Cheng, L., Liu, Y. & Ge, L. Individuality in commonality: a comparative study of Su embroidery and Gu embroidery based on online retrieval of museum collections. Asian Soc. Sci. 19, 12–26 (2023).
Article Google Scholar
Zhang, X., Li, Y., Lin, J. & Ye, Y. The construction of placeness in traditional handicraft heritage sites: a case study of Suzhou embroidery. Sustainability 13, 9176 (2021).
Article Google Scholar
Wu, Y. The evolution of export embroidery in Cantonese embroidery: cultural significance and modern reinterpretations. Rita Revista Indexada de Textos Academicos. (Krirk University, 2025).
Gao, Y., Ling, W., Liao, X., & Lin, R. Research on AIGC empowering traditional intangible cultural heritage Yue embroidery to modern tourism souvenir design and application. In Cross-Cultural Design, 30–39 (Springer, 2025).

Download references

Acknowledgements

This study was supported by Key Laboratory of Philosophy and Social Sciences in Guangdong Province of Maritime Silk Road of Guangzhou University (GD22TWCXGC15). This study was supported by Guangdong Province Higher Education Institutions Characteristic Innovation Project in 2023: Lingnan Culture and Art Digital Resource Sharing Platform(2023WTSCX068). This study was supported by Guangzhou University Project: Digital Revitalization and Intelligent Dissemination Research of Lingnan Cultural Arts (PT252022040) and Cantonese Embroidery Intangible Cultural Heritage Digital Resource Sharing Platform (Liwan Research Institute, Guangzhou University, LWYJ202411).

Author information

Authors and Affiliations

School of Fine Arts and Design, Guangzhou University, Guangdong, Guangzhou, P. R. China
Haiqiong Yang, Bing Hu & Maoning Li
College of Teacher Education, Quzhou University, Zhejiang, Quzhou, P. R. China
Qiao Sui
School of Fine Arts and Design, Guangdong University of Foreign Studies, Guangdong, Guangzhou, PR China
Kun Shi
School of Fine Arts and Design, Hechi University, Guangxi, Hechi, P. R. China
Ranran Wang

Authors

Haiqiong Yang
View author publications
Search author on:PubMed Google Scholar
Qiao Sui
View author publications
Search author on:PubMed Google Scholar
Bing Hu
View author publications
Search author on:PubMed Google Scholar
Kun Shi
View author publications
Search author on:PubMed Google Scholar
Ranran Wang
View author publications
Search author on:PubMed Google Scholar
Maoning Li
View author publications
Search author on:PubMed Google Scholar

Contributions

H.Y.: Conceptualization, Formal analysis, Methodology, Investigation,Validation, Writing-original draft, Writing-review & editing. M.L.: Conn-ceptualization, Methodology, Investigation, Validation, Writing-original draft, Writing-review & editing, Funding acquisition. Q.S.: Data curation, Software. B.H.: Data curation, Formal analysis. K.S. and R.W.: Validation, Writing -review & editing.

Corresponding author

Correspondence to Maoning Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, H., Sui, Q., Hu, B. et al. A semantic reconstruction and AI-controlled generation method for the cultural genes of Qing Dynasty embroidery patterns: a case study of official rank badges. npj Herit. Sci. 13, 637 (2025). https://doi.org/10.1038/s40494-025-02217-5

Download citation

Received: 03 September 2025
Accepted: 20 November 2025
Published: 08 December 2025
Version of record: 08 December 2025
DOI: https://doi.org/10.1038/s40494-025-02217-5

A semantic reconstruction and AI-controlled generation method for the cultural genes of Qing Dynasty embroidery patterns: a case study of official rank badges

Abstract

Similar content being viewed by others

Diffusion model-based image generation method for Cantonese embroidery artistic styles

Research on the co-occurrence feature mining of the Qing Dynasty embroidery patterns based on temporal multilayer networks

Application of deep learning for transformation of Chinese traditional cultural narrative patterns and enhancement of cultural identity empowered by AIGC

Introduction