Introduction

Qing Dynasty embroidered rank badges (Guangbu), as a highly representative component within the traditional Chinese costume system, feature unique patterns that serve not only as visual indicators of status and rank but also carry rich cultural genes and historical information1. Currently, the research and innovative design of traditional patterns face multiple challenges, including but not limited to the effective digitization of traditional patterns, the systematic extraction and inheritance mechanisms of cultural genes, and the successful integration of these traditional patterns with profound historical foundations into a modern design context2. Against this backdrop, AI technology, with its capabilities in data processing, pattern recognition, image generation, and semantic understanding, is demonstrating immense application potential in cultural heritage preservation, digital reconstruction, and innovative design3.

Despite existing research exploring the history, artistry, and craftsmanship of Qing Dynasty embroidery patterns1, there remains a notable insufficiency in systematically extracting their cultural genes and constructing computable methodologies for innovative design. Particularly in the field of AI-generated design, its application to traditional patterns, especially art forms with intricate details and specific rules, such as embroidery, is still in the nascent stages of exploration4,5. While current studies have made some progress in automated pattern recognition, style transfer, or preliminary generative attempts, they generally lack fine-grained control over the generation of embroidery patterns, specifically concerning unique stitch types, textures, and profound cultural connotations6,7. Specifically, few studies have systematically combined an ‘embroidery semantic network’ to achieve a deep understanding of cultural connotations and integrated ‘dynamic shape grammar’ for precise control over pattern form generation and variation8,9,10. Concurrently, advanced generative models optimized for embroidery patterns, such as a three-stage fine-tuning Diffusion architecture incorporating LoRA, have not yet been sufficiently researched and applied. This study aims to address the aforementioned research gaps, with its core academic contributions being:

  1. 1.

    For the first time, this study systematically proposed and constructed an ‘embroidery semantic network’ specifically for Qing Dynasty embroidered rank badge patterns. By drawing upon ontology construction methodologies11, this network achieves a structured and semantic representation of the cultural elements inherent in the patterns12, such as rank, animals, auspicious motifs, and colors, thereby laying a knowledge foundation for subsequent intelligent understanding and generation.

  2. 2.

    Innovatively, ‘dynamic shape grammar’ has been introduced into the parametric generation framework for embroidery patterns. Drawing upon the successful application of shape grammar in architectural design and ethnic pattern generation9,13,14, this study extends it to the field of embroidery. Through rule-driven and parameterized control, it achieves precise control and explainable evolution of the pattern’s fundamental forms, layout structures, and stylistic variations.

  3. 3.

    Developed and optimized a ‘LoRA-Diffusion-SG’ three-stage fine-tuning architecture for embroidery pattern generation. This architecture leverages the efficiency of LoRA in large model fine-tuning15 and the powerful image generation capabilities of diffusion models16. Through domain adaptation, feature decoupling (semantics and texture), and multi-modal integration (shape grammar guidance), it significantly enhances the performance of generated patterns in terms of visual realism, detail richness, and cultural connotation alignment.

These contributions provide a novel theoretical framework and crucial technical support for the digital preservation of Qing Dynasty embroidered rank badges, the deep excavation and inheritance of their cultural genes, and AI-based innovative design. This promotes the cross-integration of traditional craftsmanship and cutting-edge artificial intelligence (AI) technologies.

The core objective of this study is to profoundly reconstruct the cultural genes embedded within Qing Dynasty embroidered rank badges and explore innovative expression methods for embroidery patterns facilitated by AI. This aims to overcome potential monotony and stereotypes in the forms or styles of traditional patterns. This study is committed to:

  1. 1.

    Systematically reconstruct the complete pattern system of Qing Dynasty embroidered rank badges and their cultural connotations. This is crucial for advancing the computational analysis of embroidery art, enabling the quantitative management of cultural elements, and ultimately empowering innovative design17. The framework constructed in this study aims to provide a broadly applicable methodology18, serving diverse scenarios, such as cultural heritage, fashion design, and cultural and creative product development, areas where traditional pattern elements are garnering widespread attention and continuous reference.

  2. 2.

    Establish an effective pathway for the complete reconstruction and innovative interpretation of embroidery cultural genes using computer technology19. This pathway, on one hand, enables the clear presentation of cultural gene reconstruction results in high-fidelity image formats. On the other hand, by constructing a structured raw dataset encompassing image and semantic information20, it lays a solid foundation for subsequently applying diverse quantitative analysis methods. This approach not only serves as an effective supplement to traditional research methods but also provides strong technical support for the cultural inheritance of embroidery patterns and typology-based innovative design.

  3. 3.

    Building upon traditional craftsmanship research paradigms, this study proposes a theoretical framework and technical methodology capable of systematically and reproducibly analyzing and reconstructing the morphological features and cultural connotations of embroidery patterns. This research aims to fully unleash the largely untapped potential of generative AI in this domain. Although generative AI has rapidly emerged as one of the most promising directions in AI research, its application in interdisciplinary fields, such as archeology and cultural heritage preservation remains in preliminary stages21.

Therefore, this study is dedicated to constructing a theoretical and technical system capable of effectively extracting the cultural genes of Qing Dynasty embroidered rank badge patterns and facilitating innovative design based on these genes. This system will fully consider contextual information, such as the age, purpose, and decorative motifs of the patterns, and will be applicable to authentic Qing Dynasty embroidery data22. The constructed framework will undergo adaptability testing for various embroidery styles and design requirements to ensure its broad transferability and applicability beyond the training dataset.

Methods

Related work

Qing Dynasty embroidery, as a pinnacle in the history of Chinese embroidery art, is renowned for its exquisite craftsmanship, rich subject matter, and diverse styles, embodying immense artistic value and profound historical and cultural information23. Rank badges (Guangbu), as a crucial component of Qing Dynasty official attire for denoting rank and status, featured pattern designs that were not merely visual representations of the hierarchical system but also concentrated reflections of the political, cultural, and social ideologies of specific periods24. Research on Qing Dynasty rank badges includes analytical studies of the badges themselves and discussions on their preservation measures, providing important academic accumulation for understanding their formal characteristics, craftsmanship, and embedded cultural connotations. Furthermore, research on the socio-cultural functions of embroidery patterns from other regions, such as the interpretation of the social and political meanings of Bakuba embroidery patterns offers valuable insights into understanding the universality of patterns as cultural symbols25. These studies have laid the foundational knowledge for this research concerning Qing Dynasty embroidered rank badges and underscore their significance as carriers of cultural genes.

The digitization of traditional patterns is a crucial prerequisite and core component for effective cultural heritage preservation, innovative inheritance, and the recreation of contemporary value26. In the specific field of embroidery pattern digitization, scholars have already conducted fruitful explorations. For instance, in response to the complexity of embroidered fabrics, research has proposed a set of automated pattern recognition and color separation methods, providing significant technical support for the digital information extraction and analysis of embroidery patterns27. To advance the development and validation of relevant machine learning algorithms, researchers have also committed to constructing standardized datasets, such as by building and publicly validating a new database for handmade embroidery pattern recognition, which offers valuable data resources and evaluation benchmarks for research in this domain28. At the more critical level of cultural gene extraction and structured knowledge representation, academia is increasingly focused on the application potential of AI technologies, such as knowledge graphs and ontologies. For instance, research on digitization of traditional knowledge and cultural heritage demonstrates how complex cultural information can be systematically organized and applied29. Similarly, the construction and characterization of traditional village landscape cultural genome atlases provides structured methods for visualizing multi-dimensional cultural genes30. Moreover, visualizing the cultural landscape gene of traditional settlements offers semiotic perspectives for understanding and representing intangible heritage31. Taking the Qingyang sachet, a specific intangible cultural heritage of embroidery, as an example, research has deeply explored ontology-based knowledge graph construction methods, systematically demonstrating how complex and multi-dimensional cultural heritage knowledge can be clearly and logically organized and expressed11. For Bulgarian ethnic embroidery, scholars have also investigated methods for creating specialized information models, aiming to promote their effective display, retrieval, and utilization in digital knowledge bases12. These cutting-edge research achievements, particularly their explorations in cultural element deconstruction, semantic relationship mining, and formal knowledge representation, provide important theoretical inspiration, methodological references, and practical guidance for the construction of one of this study’s core concepts: the ‘embroidery semantic network.’ Concurrently, from a design perspective, research also focuses on how creativity and its systematic design process can be applied to different types of embroidery practices, emphasizing the core importance of deep understanding and flexible application of design thinking and methods for promoting innovation in embroidery art32.

AI technology, particularly generative models centered on deep learning, has demonstrated revolutionary potential in image generation, artistic creation, and innovative design in recent years. Among these, diffusion models and generative adversarial networks (GANs), as current mainstream image generation techniques, have been widely applied in the creation of diverse visual content. In the broader context of design and innovation, Verganti et al. emphasize that AI-driven approaches are reshaping design paradigms by introducing new logics of creativity and collaboration between humans and intelligent systems. From a practical perspective33, Li and Zhou developed an automatic generation algorithm for advertising artistic design based on neural networks34, demonstrating how algorithmic models can effectively support creative processes in commercial and applied arts. Extending these explorations to design systems research, Yin et al. investigated the integration of the Midjourney AI content generation tool into design workflows35, highlighting its potential to direct designers towards future-oriented innovation practices. In the specific field of embroidery art, early research began exploring the use of AI for esthetic art simulation of embroidery styles, and subsequently deepened rendering techniques for embroidery styles based on convolutional neural networks (CNN)6. Other studies have also proposed methods for precise segmentation and innovative synthesis of embroidery art images using deep learning CNN5. Addressing the needs for preservation and re-creation of specific ethnic embroidery patterns, scholars have proposed a Miao embroidery pattern repair scheme combining GAN and U-Net architectures, integrated with a spatial channel attention mechanism. Furthermore, embroidery style transfer models based on texture-loop GANs have also been researched and applied7. These cutting-edge explorations preliminarily demonstrate the broad application prospects and immense potential of AI technology in the fine-grained processing and stylized intelligent generation of embroidery images.

In recent years, as a lightweight and efficient model fine-tuning technique, LoRA has achieved significant success in the personalized customization and domain adaptation of large pre-trained models, and has been widely applied in text-to-image generation tasks15,16. Through comparative studies, scholars have thoroughly explored the application potential of different text-to-image AI models in computer-aided conceptual design, further highlighting the promising prospects of AI as a collaborative creative assistant. Simultaneously, research has also focused on interactive modes of AI in the field of computer-aided arts and crafts3. Earlier literature, moreover, prospectively discussed the opportunities and challenges faced by generative design in the textile sector. More recently, diffusion-based fine-tuning frameworks have been investigated for their effectiveness in domain-driven generation with limited training data, providing methodological insights for cultural heritage applications36. These literatures collectively provide a solid technical foundation and cutting-edge theoretical perspective for this study’s adoption of a LoRA-optimized diffusion model (i.e., the LoRA-Diffusion-SG architecture) and its exploration in the innovative design application of Qing Dynasty rank badge embroidery patterns, a specific cultural heritage domain. However, despite the rapid development of AI generative technology, existing models generally face severe challenges, such as insufficient fine-grained control, low cultural semantic alignment, and a lack of interpretability in generated results when applied to traditional patterns (like embroidered rank badges) that possess high complexity, strict rules, and profound cultural connotations.

Shape Grammars, as a rule-based generative system, have been widely applied in architectural design, product design, and traditional pattern research due to their ability to precisely describe and generate design solutions with specific structures, combinatorial rules, and stylistic features. Pioneering research on shape and shape grammars8 laid the theoretical foundation for this field. In the digitization and innovative design of cultural heritage, particularly ethnic patterns, shape grammar demonstrates unique advantages. Cui & Tang9 successfully integrated shape grammar into a generative system for exploring Zhuang embroidery design, achieving computerized generation of specific ethnic styles. Hu et al.10 combined shape grammar with artificial neural networks for ethnic pattern design. Other studies have also delved into the reuse design and optimization of ethnic patterns and the creative design of Qiang embroidery patterns, respectively, based on improved shape grammars13,14. These studies fully demonstrate the effectiveness of shape grammar in analyzing, reproducing, and innovating traditional patterns.

Concurrently, semantic networks and knowledge graph technologies provide powerful tools for structured representation and management of cultural heritage knowledge37. By constructing semantic networks comprising concepts, attributes, and relationships, dispersed and heterogeneous cultural heritage information can be integrated into a machine-understandable knowledge base38. Research has utilized CNN to evaluate semantic similarity between silk fabric images, indirectly reflecting the role of semantic information in image understanding11. Ma et al.12 proposed an ethnic pattern generation method based on the derivation of cultural design genes, also emphasizing the importance of integrating cultural semantic information into the generative process. This integration helps to enhance the interpretability, cultural relevance, and innovativeness of the generated content.

Materials of official rank badges embroidered with Qing Dynasty patterns

Data collection is the primary and crucial step for applying AI techniques to the innovative design of Qing Dynasty embroidered rank badge patterns39,40. This stage necessitates extensive collection of Qing Dynasty rank badge data, encompassing images, pattern atlases, and historical literature descriptions, as their diversity and richness are paramount for subsequent model training and design generation41. The experimental dataset utilized in this study is rich in content, covering a wide range of Qing Dynasty rank badge patterns. Its sources include collections from major Chinese museums, such as the Palace Museum and the China National Silk Museum, which showcase various forms and design philosophies of Qing Dynasty imperial embroidery art, from dragon badges to civil and military official badges, reflecting the strictness of the Qing Dynasty official costume system and the exquisite craftsmanship of its embroidery22. Furthermore, this dataset integrates cultural and artistic legacies from historical archives, traditional atlases, and academic research, presenting rank badge patterns of different ranks, periods, and forms, highlighting the rich Qing Dynasty embroidery cultural heritage and China’s profound embroidery history and pattern design foundations. To ensure the comprehensiveness and representativeness of the dataset, we selected various types of rank badges covering imperial family, civil officials, military officials, and their evolutions. This study utilizes 325 authentic samples as the foundation for the experimental dataset, with the core objective of constructing a comprehensive and representative dataset of Qing Dynasty official rank badge embroidery, thereby laying the groundwork for subsequent deep learning model training and innovative design. These 325 samples are meticulously categorized into five major embroidery types, encompassing the Qing Dynasty Imperial Dragon Badges, Qing Dynasty Round Floral Badges, Qing Dynasty Civil Official Badges, Qing Dynasty Military Official Badges, and Qing Dynasty Xiezhi Badges. In this manner, the dataset is capable of broadly reflecting the rigor of the Qing Dynasty official uniform system and the complexity of the embroidery craftsmanship. The scale of the 325 authentic samples is sufficient to form a broad and manageable dataset, meeting the requirements for diversity and richness in training data for deep learning models, such as Stable Diffusion XL. The specific category distribution is as follows:

  1. 1.

    Qing Dynasty imperial dragon badges: this category encompasses dragon motifs used by imperial family members, including emperors, empresses, and princes, showcasing combinations and variations of dragon forms, cloud patterns, and sea-cliff designs, along with their strict hierarchical symbolic meanings.

  2. 2.

    Qing Dynasty round floral badges : this dataset includes round or square-round badges featuring floral, avian, and other roundel patterns, illustrating their application in specific occasions or garments, as well as the inheritance and innovation of traditional auspicious motifs.

  3. 3.

    Qing Dynasty civil official badges: this covers various avian motifs, ranging from the first-rank crane to the ninth-rank quail, demonstrating the meticulous depiction, color coordination, and compositional characteristics of civil official badges across different ranks, reflecting their hierarchical system and cultural connotations.

  4. 4.

    Qing Dynasty military official badges : this category includes various beast motifs, from the first-rank Qilin (mythical creature) to the ninth-rank seahorse, emphasizing the valiant and majestic presence of military official badges, their robust and powerful lines, and their representation of status and symbolism within the military.

  5. 5.

    Qing Dynasty Xiezhi badges : this specifically refers to the Xiezhi (mythical beast known for justice) motif used by censors and other remonstrating officials, showcasing its unique morphological characteristics and its cultural connotations symbolizing justice and integrity.

The dataset used for model training was compiled from image data of Qing Dynasty embroidered rank badges collected from museums across China and private collections, as well as drawings and photographs of rank badges from historical documents and atlases42,43. The decision to use two-dimensional embroidery images (including actual object photographs and traditional drawings) as the basis for this type of analysis was based on several considerations. These images are primarily two-dimensional, standardized, and refined representations of three-dimensional embroidered objects. They are widely applied in embroidery art, costume culture, and historical research, free from geographical or temporal constraints. In fact, high-quality embroidery images and traditional pattern atlases can be defined as the standard for graphical representation of embroidery patterns. While these embroidery images were traditionally obtained through photography or manual drawing, new digitalization technologies now enable the acquisition of high-definition embroidery images and digitized patterns similar to traditional photography or drawing, highlighting the renewed importance of such documentation. Furthermore, their sheer volume allows for the generation of a comprehensive and easily manageable dataset. It should also be noted that these embroidery images and pattern atlases were produced or photographed by professionals with profound knowledge of embroidery culture, history, and craftsmanship, thus ensuring a high-quality datase. Additionally, embroidery pattern data has a long history, allowing datasets to be created and data retrieved from old publications and historical archives. Although three-dimensional embroidery pattern models offer a better understanding of morphological features and bring numerous improvements, they remain difficult to apply when dealing with large and diverse materials, unlike two-dimensional images. The training dataset exclusively comprises complete images of Qing Dynasty embroidered rank badge patterns, curated from publications, digitized collections, and professional photography. These archives were processed using photo editing software to remove background impurities,eliminate uneven lighting, and present the archives with clarity and high contrast. Subsequently, Python scripts preprocessed the images, creating standardized 256 × 256-pixel images (adjustable according to model input requirements) without distorting the aspect ratio and pattern structure of the rank badges, while maintaining consistent image dimensions and color correction. This image dataset is further complemented by tabular data containing information on rank badge grades, eras, pattern themes, symbolic meanings, and various literature citations.

While the design of the training dataset is relatively straightforward and consistent with methods in similar works, the creation and conceptualization of a pattern sample dataset actually applied to innovative design is more complex. The formalization of this dataset and its construction are closely related to issues encountered when utilizing cultural genes and dynamic forms. Machine learning algorithms require the application of images of the same size. However, the dimensions and complexity of Qing Dynasty rank badge patterns are not uniform: it is evident that there are tall and narrow standing beast patterns (often referred to as closed shapes) and short and wide composite patterns of flying birds and beasts (open shapes).

Cultural gene analysis

The embroidery patterns of the Qing Dynasty official rank badges serve as a crucial vehicle for traditional Chinese culture, and the extraction and structured representation of their cultural genes are key to realizing innovative design in this study. Based on the constructed Image Layer, Knowledge Layer, and Craftsmanship Layer of the data, this research categorizes the cultural genes into three core dimensions—Morphology, Semantics, and Craftsmanship—aiming to achieve the effective extraction and re-expression of the multi-faceted cultural genes embedded in the rank badge patterns44.

The Morphology Dimension focuses on the structural proportion, posture characteristics, and visual texture of the embroidery patterns45, reflecting the layout framework, color palette, and esthetic representation of the Qing Dynasty official badge designs. The core rationale for selecting this dimension is its ability to precisely capture the three-dimensional quality and intricate texture of the embroidered objects. By employing high-resolution and high-texture-clarity digital processing, it provides a high-fidelity morphological basis for subsequent AI generation, enabling precise control and variation.

The Semantics Dimension focuses on the complex semantic correlation between the rank, official position, and animal motifs in the Qing Dynasty rank badges46. It conveys the rigor of the official uniform system and the symbolic meaning of strict hierarchy, including the auspicious meanings represented by civil official birds (such as the first-rank crane) and military official beasts (such as the first-rank Qilin).The objective is to systematically reconstruct the entire motif system of the rank badges and their cultural connotations. This is achieved by constructing a Knowledge Graph (i.e., the “Embroidery Semantic Network”) to enable the structured representation of cultural elements, thereby ensuring semantic consistency and empowering cultural connotation-driven innovative design.

The Craftsmanship Dimension concentrates on the characteristics of traditional embroidery stitches, such as those found in Su embroidery47,48 and Yue embroidery49,50—for instance, the spiral organization of coiled gold thread embroidery and the linear direction of flat gold thread embroidery. This reflects the high level of craftsmanship and the key parameters of needling, including angle, speed, and force. This dimension is chosen to provide quantifiable evidence for stitch classification, quality assessment of the craft, and digital analysis of traditional techniques. By utilizing a logical regression model to predict the embroiderability of the image, this research ensures that the final generated cultural genes possess feasibility and authenticity in terms of craft logic.

Data construction

Firstly, concerning the data processing at the image layer for Qing Dynasty embroidered rank badges, multi-light source scanning technology was employed, combining high-definition images of imperial rank badges from the Palace Museum with the CRUSE scanning system, with the aim of precisely capturing the three-dimensionality and subtle textures of the embroidery. Image processing was conducted using Python scripts (integrating OpenCV and Pillow libraries) to ensure that the output images met standards of high resolution ( ≥600 dpi), low color error (ΔE < 3), and high texture clarity (SSIM ≥ 0.9). Furthermore, to further enhance image quality, ESRGAN super-resolution reconstruction technology was introduced, utilizing a Python super-resolution toolkit to optimize the images, significantly improving image quality while preserving details. Currently, multi-light source fusion processing pipelines and batch super-resolution reconstruction functionalities have been implemented, fully supporting the batch processing of all categories of rank badge images.

Secondly, the knowledge layer of Qing Dynasty embroidered rank badges was constructed, analyzing their pattern regulations. This involved utilizing Protege for knowledge modeling and SPARQL queries to build a structured knowledge graph, evaluated against high annotation consistency (Krippendorff’s α ≥0.8) and rule coverage ( ≥95%). These regulations were encoded into a data format compliant with JSON-LD standards, ensuring data integrity and semantic consistency. This was achieved by integrating NetworkX, Matplotlib, and simulating Gephi-style network analysis tools. The knowledge layer data construction was based on a comprehensive analysis of 184 knowledge items and 18 regulations, demonstrating excellent data-driven analytical capabilities (Fig. 1).

Fig. 1: Knowledge Graph of Qing Dynasty Rank Badges.
Fig. 1: Knowledge Graph of Qing Dynasty Rank Badges.
Full size image

Presents the relationship between the nine ranks and their corresponding animal motifs for both Civil and Military Officials in the knowledge graph of Qing Dynasty official rank badges.

Finally, in the research of the craftsmanship layer of Qing Dynasty embroidered rank badges, the trajectories of traditional needlework techniques, such as Su embroidery and Yue embroidery were recorded using a motion capture system. MATLAB (Motion Analysis Toolbox) was utilized to precisely capture needlework trajectories and extract their key parameters (angle, velocity, force), achieving high standards with trajectory accuracy error less than 0.5 mm and parameter dispersion (CV) less than 15%. The research supports two modes: CSV trajectory data and image trajectory extraction automatically extracting parameters, such as angle, velocity,and force. As shown in Fig. 2, the craftsmanship layer’s radar chart function clearly visualizes the parameter distribution differences of various needlework techniques across three dimensions: angle, velocity, and force. This provides quantitative basis for needlework classification craftsmanship quality assessment, and digital analysis of traditional craftsmanship, offering a powerful data-driven analytical tool for understanding and inheriting embroidery craftsmanship.

Fig. 2: Stitch Type Radar.
Fig. 2: Stitch Type Radar.
Full size image

Compares the performance differences of various embroidery stitch types (Flat Stitch, Couching Stitch, Encroaching Stitch, and Seed Stitch) across three craft parameters: curve complexity, line density, and craft feasibility, presented in a radar chart format.

Training strategy and loss function construction

In this study, we designed and compared three distinct training strategies to evaluate the performance of LoRA in embroidery image modeling tasks. Each strategy is associated with a specific loss function design, reflecting a progressive training paradigm from basic reconstruction, to structural enhancement, and ultimately to multi-objective optimization.

Firstly, Fig. 3a illustrates the loss variation trend during the LoRA-based training stage. Figure 3b introduces an enhanced conservative training strategy. Both Fig. 3a, b adopt the standard Mean Squared Error (MSE) loss function, which is defined defined as (1):

$${L}_{L}{oRA}={E}_{\left(x,y\right)D}\left[{{\rm{||}}{f}_{\theta }\left(x\right)-{\rm{y||}}}^{2}\right]$$
(1)
Fig. 3: Training Loss Curve.
Fig. 3: Training Loss Curve.
Full size image

a LoRA Training Loss. b Enhanced Conservative Training Loss. c Advanced Training Curves. Displays the loss value (Loss) curves over training epochs for three different model training processes (LoRA, Enhanced Conservative Embroidery LoRA, and Advanced Training) to evaluate training effectiveness and convergence.

This loss function quantifies the pixel-level discrepancy between the model output and the ground truth image. As the number of training epochs increases, the overall loss steadily decreases, demonstrating the model’s solid convergence and fitting capability during the initial phase.

Then, as shown in Fig. 3c presents the training curves under a multi-objective loss framework, which incorporates perceptual quality at multiple feature levels. The total loss function is expressed as (2):

$${L}_{t}{otal}=\alpha \cdot {L}_{M}{SE}+\beta \cdot {L}_{L}{PIPS}+\gamma \cdot {L}_{f}{eat}+\delta \cdot {L}_{c}{onsistency}$$
(2)

Where \({L}_{M}{SE}\) captures pixel-wise reconstruction errors, LLPIPS maintains perceptual similarity based on deep features, and \({L}_{f}{eat}\) evaluates the consistency of intermediate feature representations.Among them, \({L}_{c}{onsistency}\) comes from texture comsistency less, which constrains the texture structure through MSE in the x/y direction gradient; The default coefficients in the configuration are {α, β, γ, δ}={1.0, 0.3, 0.2, 0.1} (which can be seen in the loss feights field of load comfig). The texture consistency term corresponding to δ emphasizes gradient structure, with a default weight of 0.1. In our experiments, \({L}_{f}{eat}\) remains constant, indicating its role as a fixed regularization component rather than an explicitly optimized term. This composite loss effectively enhances the subjective visual quality and semantic richness of the generated embroidery images.

Through comparative analysis of these three loss strategies, we validate the feasibility and effectiveness of a stepwise optimization approach—progressing from single-error minimization to multi-level perceptual regulation—in the task of embroidery image generation.

Model architecture and parameter configuration

The model parameters were determined progressively based on a “Three-Stage Fine-tuning Architecture,” aiming to ensure experimental reproducibility and the effectiveness of culturally-informed pattern generation.During the base model fine-tuning stage, the Classifier-Free Guidance (CFG) scale was ultimately set to 7.5. This determination was made by comparing the generation results of different CFG values (e.g., 5.0, 7.5, 10.0), finding that 7.5 achieved the optimal balance among semantic guidance, image naturalness, and detail diversity. In the LoRA dual-channel regulation, the rank for the semantic channel was set to 16 and the rank for the texture channel was set to 8. This rank ratio facilitates the decoupled expression of motif symbols and craft texture, and the independence of these features was validated through t-SNE visualization and mutual information analysis (Mutual Information Value < 0.2).Regarding structural control, the hard constraint employed the ControlNet-Canny model, with the injection weight set to 1.0 and the edge detection thresholds at 100/200 to ensure the strict alignment of the geometric structure. The soft constraint scoring system set the scoring weights for layout and stitching technique at 0.6:0.4, respectively, to balance structural compliance with the expression of fine craftsmanship. Compliance with rules and evaluation stability served as the main metrics. This comprehensive system, sequentially involving base model fine-tuning, LoRA dual-channel regulation, and shape grammar control, formed a complete and effective controllable generation framework.

Initially, at the base model level, fine-tuning was conducted on Stable Diffusion XL (SDXL) utilizing a dataset of Qing Dynasty official rank badge embroidery images. The iterative training steps were configured to 50, with the Classifier-Free Guidance (CFG) scale established at 7.5. Furthermore, the random seed was kept constant to guarantee the replicability of the experimental outcomes.

Subsequently, this study incorporated the Low-Rank Adaptation (LoRA) fine-tuning mechanism, designing a dual-channel LoRA adapter to facilitate the synergistic representation of pattern semantics and craft textures. The semantic channel was assigned a rank of 16 to capture the cultural semantic attributes of patterns, while the texture channel, with a rank of 8, was dedicated to simulating silk luster and stitch density. Configured parameters included dropout at 0.1 and α at 32, with channel segregation achieved via a rank ratio r of 16/8. Mutual information analysis confirmed extremely low coupling between the semantic and texture channels (mutual information value < 0.2), and this decoupling effect was corroborated through t-SNE visualization. Moreover, the research developed a prompt-driven dynamic weight distribution mechanism. For instance, upon receiving the prompt ‘一品武官‘ (First-Rank Military Official), the system automatically adjusted the fusion proportion of the semantic and texture channels to 60%:40%. The semantic channel was prioritized for activation when the CLIP semantic similarity surpassed 0.6. The rationale behind this mechanism’s distribution has been attested by expert review, and the diversity of the generated outputs was assessed by evaluating latent space coverage.

With respect to structural control, this study further incorporated a shape grammar constraint mechanism, leveraging a dual strategy of both hard and soft constraints. This ensured that the generated images exhibited adherence and controllability in both their compositional structure and stitching patterns. Specifically, the hard constraint implementation involved a ControlNet-Canny model to perform ‘Nine-Grid Layout Skeleton Injection,’ with an injection weight of 1.0 and edge detection thresholds of 100/200. Key evaluation indicators for this segment encompassed composition compliance rate and boundary alignment error. Conversely, the soft constraint segment was jointly defined by a bespoke ‘Stitch Direction Conformity Evaluation Function’ and a ‘Layout Rule Quantification Scoring System,’ wherein scoring weights were configured as 0.6 for layout and 0.4 for stitch, with rule adherence and evaluation robustness serving as the principal measures.

Lastly, a ‘multi-strategy comparative generation evaluation’ was conducted, systematically assessing the performance of three distinct strategies in embroidery image synthesis: an unconstrained diffusion model, a hard-constraint model leveraging only ControlNet, and a hybrid semantic and structural control approach. For cultural conformity assessment, a CLIP-ViT-B/32 model was employed, stipulating a semantic similarity of at least 0.75 as the acceptance criterion, complemented by Top-1 classification accuracy to gauge symbol recognition efficacy. The findings revealed the hybrid control strategy to be superior in terms of both semantic consistency and symbolic discernibility. The craft feasibility module forecasted image embroiderability using a logistic regression model (implemented via Scikit-learn), with an evaluation benchmark of an F1-score ≥0.85 and a false positive rate below 10%. Integral features analyzed comprised line spacing density (derived from Hough line detection), curve intricacy (quantified by Fourier descriptors), and color gradient smoothness (measured by Laplacian variance), further elucidated through feature importance analysis, ROC curve plotting, and probability distribution visualization. A consolidated score was subsequently assigned to 100 samples, allocating 40% weight to cultural and craft dimensions respectively, and 20% to efficiency. This comprehensive assessment confirmed that the mixed control strategy attained an optimal equilibrium among cultural representation, craft viability, and generation efficacy.

As depicted in Fig. 4, this research comprehensively developed a complete flowchart outlining the embroidery image generation process. The workflow commences with an SDXL base model and official rank badge image training data, achieving domain adaptation via LoRA fine-tuning. Within the model architecture, a dual-channel adapter bifurcates feature learning into two distinct sub-paths: the left semantic adaptation channel interfacing with the embroidery pattern semantic recognition module, and the right texture adaptation channel connecting to the gold thread texture and craft learning module. Outputs from these two channels converge in a multi-modal fusion layer, subsequently feeding into the ControlNet: Shape Grammar Control Module, concurrently leveraging edge or depth maps as ancillary input information.

Fig. 4: Model Optimization Framework.
Fig. 4: Model Optimization Framework.
Full size image

Presents the model optimization framework for generating Qing Dynasty official uniform embroidery patterns, which is based on the Stable Diffusion XL base model, uses a Dual-Channel LoRA Adapter for semantic and texture/craftsmanship adaptation, and generates/evaluates the embroidery image through multi-modal fusion, ControlNet control, and a Stitch Direction Module.

Ultimately, the model synthesizes the target image under the governance of the stitch direction control module, concluding with a closed-loop validation facilitated by the quality evaluation and embroidery rule matching verification module. This entire framework thus illustrates a comprehensive control pipeline, from foundational model optimization to the generation of high-quality embroidery.

Results

Cultural gene extraction and baseline performance evaluation

For cultural image generation tasks, it is imperative to first establish the baseline model’s performance ceiling and floor. Figure 5 depicts the performance of the pristine SDXL baseline model prior to the integration of any control strategies. As illustrated in Fig. 5a, its FID score reached an exceptionally high 340.90, substantially surpassing the acceptable threshold for authentic image distributions, thereby indicating substandard performance in both semantic alignment and image quality. Figure 5b further reveals a near-zero texture energy value; concurrently, while contrast was observed to be high, it conspicuously lacked nuanced hierarchical detail. This collectively suggests that the baseline model is deficient in effectively capturing and expressing pivotal ‘pattern structure’ and ‘craft texture’ cultural genes inherent in Qing Dynasty embroidery images. Consequently, this evaluation serves as a crucial lower bound reference for the cultural gene extraction endeavor, underscoring the critical necessity for the subsequent incorporation of control strategies and semantic guidance mechanisms.

Fig. 5: FID Score and Texture Feature Heatmap of the SDXL Baseline Model.
Fig. 5: FID Score and Texture Feature Heatmap of the SDXL Baseline Model.
Full size image

a FID Score of SDXL Baseline Model. b Texture Feature Heatmap. Displays the FID score (340.90) of the SDXL Baseline Model as a benchmark performance metric, and a heatmap showing the feature value distribution of the baseline model across different texture features (such as Energy and Contrast).

Following the confirmation of inadequate baseline capabilities, further optimization of the model’s cultural gene extraction efficacy becomes imperative. Figure 6 provides a comparative analysis of ‘crane rank badge’ patterns generated across varying Classifier-Free Guidance (CFG) values. The findings demonstrate that: lower CFG values (e.g., 5.0) yielded generated images with semantic dispersion and lack of focus; conversely, higher CFG values (10.0) led to structural inflexibility and undue repetition; whereas an optimal balance was struck by an intermediate value (e.g., 7.5) across semantic guidance, visual naturalness, and intricate diversity. This underscores that, for traditional embroidery-style pattern generation, the strength of textual guidance is a pivotal determinant influencing the comprehensiveness and precision of cultural gene extraction. Consequently, meticulous parameter tuning is essential to ensure that the generated content exhibits both cultural distinctiveness and artistic latitude.

Fig. 6: Generation of First-Rank Civil Official Crane Embroidery Image.
Fig. 6: Generation of First-Rank Civil Official Crane Embroidery Image.
Full size image

Displays the First-Rank Civil Official Crane rank badge embroidery image generated by the pattern generation model, comparing the changes in style and detail under different Classifier-Free Guidance (cfg) settings (5.0, 7.5, and 10.0).

To summarize, cultural gene extraction, serving as a pivotal objective within generation tasks, not only entails discerning its expressive limitations from baseline performance (e.g., structural degradation, textural omission) but also mandates augmenting the model’s cultural expressiveness via interventions, such as regulating guidance parameters. The preceding experiments unequivocally illustrate that the effective extraction and subsequent re-expression of multifaceted cultural genes—specifically ‘form, motif, and technique’—inherent in traditional embroidery imagery can only be realized through the synergistic integration of structural control, semantic prompting, and parameter optimization mechanisms.

Semantic network construction and LoRA fine-tuning

Within the LoRA fine-tuning and control experimental module, this study developed and validated a Semantic Network Construction paradigm, centered on a Dual-Channel LoRA Adapter and an automated weight distribution mechanism. The design of this architecture is intended to decouple and model the ‘semantic features’ and ‘texture features’ present in embroidery images, thereby improving the precision and manageability of multi-modal feature fusion. This module has been seamlessly integrated into the SDXL+LoRA backbone architecture, facilitating the end-to-end extraction and dynamic modulation of authentic semantic information.

This study constructed the ‘LoRA Visualization Module. visualize feature decoupling t-SNE' module to analyze the feature separation effect of the LoRA model across the semantic and textural channels. This module extracts semantic (64-dimensional) and textural (32-dimensional) features from 25 prompts (e.g., ‘traditional Chinese crane embroidery, gold threads, imperial court rank badge’) using the ‘collect features from prompts’ function.The concatenated features (two 512-dimensional vectors per prompt) are then reduced to 2D and 3D spaces via t-SNE (defaulting to PCA when the sample size is less than 4) for visualization. As shown in Fig. 7, the 2D projection (Fig. 7a) reveals the formation of clear and independent clusters for different semantic categories, while the 3D embedding (Fig. 7b) further confirms the good separability of features in the high-dimensional space. Combined with the Mutual Information analysis result ( < 0.2), this qualitative and quantitative analysis sufficiently validates the low redundancy between the semantic and textural channels, indicating that effective feature decoupling has been achieved, thereby meeting the objective of the dual-channel design.

Fig. 7: t-SNE Visualization of LoRA Dual-Channel Feature Decoupling.
Fig. 7: t-SNE Visualization of LoRA Dual-Channel Feature Decoupling.
Full size image

a T-sne 2 d Feature Decoupling. b T-sne 3 d Feature Decoupling. Utilizes t-SNE dimensionality reduction visualization to display the effect of LoRA dual-channel feature decoupling in 2D (a) and 3D (b) scatter plots, demonstrating that the feature vectors for the Semantic Channel and Texture Channel are effectively separated in the embedding space.

This model implements two priority strategies for the fusion and control of the feature channels. In the semantic-priority mode, the semantic channel is dominant with a weight of 0.7, and the textural channel serves as a supplement with a weight of 0.3. Conversely, in the texture-priority mode, the textural channel is dominant with a weight of 0.6, and the semantic channel acts as a supplement with a weight of 0.4. Notably, all weight configurations are empirically derived and have been hardcoded in the implementation to enforce a normalization constraint, ensuring the sum of the two channel weights is consistently 1.0. Figure 8 provides deeper insights into the inherent mechanism and stability of the model’s weight allocation strategy. The scatter plot presented in Fig. 8a illustrates that all sample weights are rigorously aligned along the diagonal, signifying ‘semantic weight + texture weight = 1,’ with a predominant concentration of samples exhibiting approximately 0.70 semantic weight and 0.30 texture weight. This attests to the model’s adoption of a consistent, semantic-biased fusion approach. Figure 8b underscores a high degree of uniformity in weight distribution across diverse prompt conditions, suggesting that this strategy is intrinsically embedded within the model architecture rather than being contingent upon input semantic categories. Concurrently, Fig. 8c, d delineate extremely minimal standard deviations in weights and a constrained fluctuation in CLIP similarity ( ≈ 0.68 − 0.91), thereby reinforcing the robustness of the fusion mechanism and the coherence of generated outputs.

Fig. 8: Weight Allocation Analysis.
Fig. 8: Weight Allocation Analysis.
Full size image

a Weight Allocation Scatter Plot (Color: CLIP Similarity). b Weight Allocation by Different Prompts. c Weight Distribution Histogram. d CLIP Similarity vs Weight Relationship. Analyzes and presents the allocation, distribution, and relationship of Semantic Weight and Texture Weight with the CLIP Similarity metric under different prompts.

As presented in Table 1, the average semantic weight measured 0.703, with the texture weight at 0.297, and a mean CLIP similarity of 0.84. These collective findings confirm the model’s robust expressive capacity operating under a semantic-driven mechanism, while simultaneously demonstrating that the generated outputs consistently exhibit a high degree of coherence in terms of cultural semantic representation.

Table 1 LoRA Dual-Channel Adapter Performance Metrics

Additionally, Fig. 9 further elucidates the comprehensive enhancement in generative performance attributable to LoRA fine-tuning. Relative to the un-fine-tuned counterpart, LoRA markedly improves the model’s capacity for expressing microscopic textures within embroidery imagery, including nuances, such as silk luster, stitch orientation, and variations in density, thereby elevating the image’s verisimilitude and artistic merit. This experimental outcome substantiates the efficacy and indispensable role of the LoRA fine-tuning strategy in specializing general diffusion models for particular artistic domains, exemplified by embroidery image generation.

Fig. 9: Generation of First-Rank Civil Official Crane Embroidery Images.
Fig. 9: Generation of First-Rank Civil Official Crane Embroidery Images.
Full size image

Displays multiple examples of the First-Rank Civil Official Crane rank badge embroidery images generated by the model, intuitively showcasing its generation capability and detail representation in different styles and compositions.

In conclusion, the LoRA fine-tuning and control module innovatively employed a dual-channel LoRA adapter alongside an automated weight distribution mechanism, leading to the successful decoupling and dynamic fusion of semantic and texture features in embroidery imagery. The independence of these features was substantiated by t-SNE visualization and mutual information analysis, whereas the model’s weight allocation strategy ensured the cultural semantic coherence and robustness of the generated outputs. Ultimately, LoRA fine-tuning notably improved the verisimilitude of microscopic embroidery textures, thereby affirming the efficacy of this methodology for applications within specialized artistic domains.

Shape grammar modeling and control strategy optimization

This study proposes a dynamic shape grammar modeling framework addressing the intricate nature and prescribed forms of Qing Dynasty official rank badge designs. The methodology employs a nine-grid structure as a foundational layout, systematically abstracting and spatially segmenting archetypal elements found in traditional badges (e.g., cranes, golden pheasants, lionizers, etc.). Concurrently, it establishes a shape repository comprising 17 types of official rank badge animal silhouette templates to mitigate the challenge of data scarcity. This nine-grid approach not only ensures that generated patterns embody the symmetry and centralization characteristic of traditional Chinese motifs but also furnishes clear localized constraint interfaces for subsequent control strategies. To facilitate the representation of intricate embroidery techniques, the dynamic modeling process incorporates the specific craft characteristics of various embroidery stitches, such as the spiral formation of couching stitch, the horizontal linearity of flat gold stitch, and the radial configuration of raised stitch. For each stitch variant, a bespoke reward function is devised to direct the refinement of generated pattern details toward authentic craft esthetics. Quantifiable metrics encompass: the helical coefficient for couching stitch, the linear orientation distribution ratio for flat gold stitch, and the structural radial convergence for raised stitch, among others.

To augment the comprehensive performance of the generative system concerning structural control and semantic expression, this study established and contrasted two distinct control strategies: hard constraint control and soft constraint control. Hard constraint control, leveraging tunable deep models, such as ControlNet, integrates shape templates and geometric priors into the generation pipeline, thereby ensuring rigorous alignment of image structures. While this approach demonstrates commendable precision in tasks demanding stringent geometric composition, it tends to yield abstraction and incomplete pattern rendition when applied to the generation of intricate semantic patterns. Conversely, soft constraint control is founded upon a rule-driven semantic scoring mechanism, which establishes a fuzzy evaluation framework encompassing dimensions like layout conformity, stitch style coherence, and semantic correspondence. Throughout the training and generation phases, the soft constraint strategy iteratively refines the model’s semantic consistency via continuous feedback, consequently enhancing the interpretability and cultural integrity of the generated images. As illustrated in Fig. 10, a notable divergence is observed between the two strategies in the context of official rank badge pattern generation: the hard constraint strategy produces designs characterized by regularity but also relative abstraction, whereas the soft constraint strategy excels at generating concrete and vibrant embroidery animal motifs, exhibiting superior semantic recuperation and cultural esthetic potency.

Fig. 10: Comparison of Hard and Soft Constraints in Official Rank Badge Pattern Generation.
Fig. 10: Comparison of Hard and Soft Constraints in Official Rank Badge Pattern Generation.
Full size image

Compares examples of rank badge pattern generation for the First-Rank Civil Official Crane, Second-Rank Civil Official Golden Pheasant, and Third-Rank Military Official Leopard, demonstrating the model’s ability to satisfy both Hard Constraints (e.g., geometric structure) and Soft Constraints (e.g., embroidery details and animal form).

Figure 11 provides a quantitative comparative analysis of the semantic compliance of two control strategies: hard constraints and soft constraints. Results indicate that the hard constraint method’s scores are concentrated in a low range of 0.03–0.07, demonstrating limited overall semantic expression capability (Fig. 11a). In contrast, the soft constraint method outperforms hard constraints in terms of median, maximum, and score distribution range (Fig. 11b), exhibiting stronger semantic adaptability. A comparison of three typical samples (Fig. 11c) further confirms that soft constraint semantic scores consistently exceed those of hard constraints across all cases. Overall statistical results (Fig. 11d) also show that the soft constraint strategy has an average compliance score of 0.25, significantly higher than the hard constraint’s approximate 0.06, highlighting its notable advantage in semantic consistency control.

Fig. 11: Constraint Methods Comparison.
Fig. 11: Constraint Methods Comparison.
Full size image

a Score Distribution Comparison. b Score Distribution Box Plot. c Sample-wise Score Comparison. d Method Performance Comparison. Compares the distribution, performance, and effect differences of the Compliance Score in pattern generation between Hard Constraints and Soft Constraints methods using four types of charts: histogram, box plot, sample-wise comparison plot, and mean comparison plot.

Figures 12, 13 provide a detailed analysis of the control strategies’ performance across different dimensions. Figure 12 focuses on the hard constraint method’s effect on structural consistency. Figure 12a shows its scores are concentrated in a low range of 0.02–0.07, with extremely low dispersion, exhibiting only three distinct values. This indicates that while the model achieves precise structural control, it lacks flexibility in semantic generation. In Fig. 12b, the scores for the three types of official rank badge patterns are generally low, with the “Second-Rank Golden Pheasant pattern” being slightly higher (0.069), yet the overall performance remains insufficient. This suggests that the hard constraint strategy is not suitable for tasks where semantic restoration is the core objective.

Fig. 12: Hard Constraints Analysis.
Fig. 12: Hard Constraints Analysis.
Full size image

a Hard Constraints: Compliance Score Distribution. b Hard Constraints: Score by Sample. Analyzes the performance of the Hard Constraints method by showing the distribution of the Compliance Score in a histogram (a) and the score differences across samples of different ranks and animal motifs (e.g., First-Rank Civil Official Crane, Second-Rank Civil Official Golden Pheasant, and Third-Rank Military Official Leopard) in a bar chart (b).

Fig. 13: Soft Constraints Analysis.
Fig. 13: Soft Constraints Analysis.
Full size image

a Layout Compliance Distribution. b Stitch Compliance Distribution. c Overall Score Distribution. d Layout vs Stitch Compliance. e Performance by Stitch Type. f Soft Constraints Performance Radar. Provides a multi-dimensional analysis of the Soft Constraints performance, showing the distribution of layout compliance, stitch compliance, and overall score, comparing the correlation between layout and stitch compliance, evaluating the performance of different stitch types, and summarizing the overall Soft Constraints performance in a radar chart.

In contrast, Fig. 13 demonstrates the advantages and distinctiveness of the soft constraint strategy across multiple compliance dimensions. Figure 13a shows that the layout compliance score is extremely low ( ≈ 0.01), indicating a clear weakness in geometric structure control. Figure 13b reveals a wide distribution of stitch scores, reaching up to 0.8, reflecting the soft constraint mechanism’s good guidance capability for craft texture features. Figure 13c indicates that the overall compliance score is in the mid-to-upper range, primarily boosted by the stitch dimension. The 2D scatter plot in Fig. 13d reveals no significant correlation between stitch and layout scores, suggesting that both can be independently controlled. Figure 13e further points out the score differences among various stitch types, with couching stitch being the highest (mean ≈ 0.33), followed by flat gold stitch and raised stitch, which suggests that the model performs better when handling structurally complex stitch types. The radar chart in Fig. 13f comprehensively demonstrates that stitch dimension scores are significantly higher than other dimensions, further verifying the significant advantage of the soft constraint strategy in generating local details.

From Table 2, it is evident that different control types exhibit distinct differences in dimensions, such as semantic expression, structural control, and embroidery compliance. The hard constraint strategy shows concentrated scores in geometric matching, demonstrating strong structural control capability, making it suitable for image generation tasks requiring high structural precision. In contrast, the soft constraint excels in CLIP semantic matching and stitch diversity, particularly suitable for generating embroidery patterns with rich cultural semantics and visual details. Therefore, it can be inferred that the choice between soft and hard control strategies in practical applications should be flexible and based on specific task requirements.

Table 2 Statistical Table of Cultural Compliance and Semantic Evaluation of Shape Grammar Generation under Different Control Strategies.

This study proposes a theory-guided, automated generation scheme for Qing Dynasty official rank badge embroidery patterns by combining Shape Grammar modeling and LoRA fine-tuning. The LoRA fine-tuning parameters used are rank}=24 and alpha=12. Figure 14 presents representative examples of Qing dynasty rank badge patterns generated by the model, covering a range of official animals, such as the red-crowned crane, peacock, golden pheasant, qilin, and lion, which correspond to various civil and military ranks from first-rank civil officials to fourth-rank military officers. The results demonstrate that the model accurately reproduces not only the structural proportions and characteristic postures of the animals in the overall composition, but also successfully simulates key visual elements within the badge, including patterned frames, landscape backgrounds, and decorative borders. Local magnifications reveal that the generated images exhibit rich details, faithfully replicating the spiral layering of couching gold embroidery (panjinxiu), the linear arrangement of flat gold embroidery (pingjinxiu), and the radiating texture of padded stitches (diangaozhen), effectively showcasing the technical characteristics of various traditional stitching techniques. Overall, the generated patterns achieve a high level of structural integrity, esthetic quality, and craft-level detail reproduction, validating the model’s strong capability in generating culturally grounded images and reviving historical visual languages based on symbolic motifs.

Fig. 14: Generation of Qing Dynasty Official Rank Badge Style Embroidery Images.
Fig. 14: Generation of Qing Dynasty Official Rank Badge Style Embroidery Images.
Full size image

Displays multiple examples of embroidery images generated by the model for Qing Dynasty official rank badges, covering different ranks and animal motifs, such as the First-Rank Civil Official Crane, Second-Rank Civil Official Golden Pheasant, Third-Rank Civil Official Peacock, Second-Rank Military Official Lion, and First-Rank Military Official Qilin, to intuitively showcase its generation capability.

Experiments demonstrate that the Shape Grammar not only improves the structural correctness and fidelity of cultural symbols in the generated patterns but also significantly enhances the esthetic consistency of the results. Compared to traditional text-prompt-only generation methods, the grammar-constrained generated patterns exhibit superior adherence to traditional embroidery craft specifications in terms of structure and style. This method offers an effective pathway for the digital representation and innovative design of traditional crafts, possessing broad application prospects. Overall, the integration of dynamic shape grammar with both hard and soft control strategies significantly enhances the semantic completeness and esthetic quality of the generated patterns while ensuring structural compliance. The soft constraint excels in restoring local textures and conveying cultural semantics, making it well-suited for tasks with artistic and expressive goals. In contrast, the hard constraint is more appropriate for scenarios requiring high geometric fidelity, though its semantic flexibility is limited. Extensive experiments demonstrate that this approach holds strong applicability and cutting-edge potential in the automatic generation of cultural patterns and the digital reproduction of traditional craftsmanship.

Evaluation metrics

In the “Cultural Compliance” evaluation module, the hybrid control method demonstrated the best performance in maintaining cultural semantics, effectively enhancing the cultural compliance and symbolic distinguishability of the generated images. Figure 15 systematically evaluates the semantic performance and recognition capabilities of the model under both hard and soft control strategies. Figure 15a shows that soft constraints significantly outperform hard constraints in CLIP semantic similarity scores (approximately 0.35 vs. 0.23), with a more concentrated distribution, though neither reached the target threshold of 0.75. Figure 15b indicates a weak negative correlation between CLIP scores and simulated human evaluation (r = –0.306), suggesting that CLIP similarity might not effectively reflect human perceived quality. Figure 15c, a confusion matrix for symbol recognition, reveals that the model was almost unable to correctly classify any specific category (e.g., crane, golden pheasant, leopard), with a large number of predictions incorrectly assigned to “other,” resulting in extremely low accuracy. Figure 15d comprehensively compares the performance of both methods in terms of CLIP scores and recognition accuracy, further confirming the improvement in semantic consistency by soft constraints, but simultaneously revealing a severe deficiency in overall recognition performance. Overall, despite the advantage of soft constraints in improving image-text similarity, the current model still faces significant bottlenecks in cultural symbol recognition and matching human perceived quality, indicating that CLIP similarity optimization has not fully translated into an improvement in real semantic understanding and symbolic distinguishability.

Fig. 15: Cultural Compliance Analysis.
Fig. 15: Cultural Compliance Analysis.
Full size image

a CLIP Semantic Similarity Distribution. b CLIP Score vs Human Evaluation. c Symbol Recognition Confusion Matrix. d Method Performance Comparison. Conducts Cultural Compliance Analysis by comparing the CLIP Semantic Similarity distribution of Hard and Soft Constraints methods, showing the correlation between CLIP Score and simulated human evaluation, providing a confusion matrix for symbol recognition, and comparing the CLIP Score and symbol recognition accuracy performance of the two methods.

In this module, “Color Smoothness” module emerges as the sole critical factor determining craft feasibility within the current feature space, indicating ideal model performance heavily reliant on a single-dimensional feature. Figure 16 systematically presents the performance and decision-making basis of a machine learning model designed for “Craft Feasibility” prediction. The results show that the model possesses almost perfect classification ability (AUC = 1.0), capable of completely distinguishing between “feasible” and “infeasible” samples. Its core decision-making relies on a single key feature: Color Smoothness. Figure 16a illustrates that Color Smoothness significantly outweighs other variables in feature importance, indicating the model’s judgment almost entirely depends on this one-dimensional feature. The ROC curve in Fig. 16b closely adheres to the ideal boundary, with an AUC of a perfect 1.0, signifying exceptionally high model classification performance. Although Fig. 16c is an empty plot, textual information clarifies that the model predicts all sample probabilities as either 0 or 1, demonstrating highly decisive outputs without an intermediate uncertainty zone. Figure 16d shows that features, such as line density and curve complexity cannot differentiate feasibility categories, further confirming that the model’s distinguishing capability primarily originates from Color Smoothness, which is not depicted in this specific graph.

Fig. 16: Craft Feasibility Analysis.
Fig. 16: Craft Feasibility Analysis.
Full size image

a Craft Feasibility Feature Importance. b Craft Feasibility ROC Curve. c Feasibility Probability Distribution. d Feature Space Distribution. Conducts Craft Feasibility Analysis, specifically showing the feature importance affecting feasibility, the ROC curve used to evaluate model performance, the distribution of feasibility probability, and the distribution of sample feasibility in the feature space defined by Line Density and Curve Complexity.

In the multi-strategy comparative generation evaluation, the complete hybrid control strategy achieved the best balance among cultural expression, craft logic, and efficiency, validating the advantages of this method in integrating semantic and formal control. Figure 17 systematically compares the performance of three image generation methods—unconstrained generation, ControlNet hard constraints, and complete hybrid control—across three major dimensions: cultural compliance, craft feasibility, and generation efficiency. Figure 17a shows that although the unconstrained method is optimal in generation efficiency, its cultural compliance score is the lowest. Introducing control strategies (either ControlNet or hybrid control) significantly improved semantic accuracy and craft expressive capabilities, albeit at the cost of a noticeable reduction in generation efficiency. Figure 17b indicates that the comprehensive scores of the three methods are similar (approximately 2.8–3.0 points out of 10), suggesting that there is still room for improvement in overall performance. The efficiency box plot (Fig. 17c) confirms this trend: ControlNet has the longest generation time, unconstrained is the fastest, and hybrid control falls in between. Figure 17d, a grouped bar chart, further reveals the detailed scores for each dimension; although the scoring system differs from the radar chart, the trends are generally consistent. All methods used the same number of samples (Fig. 17e), ensuring a fair comparison. The flowchart illustrates the implementation logic and evaluation methods for the three categories, covering CLIP, logistic regression, and time statistics metrics. Overall, the study emphasizes that while cultural control strategies can enhance the semantic and visual consistency of generated content, they significantly sacrifice efficiency, and none of the three methods have yet achieved an optimal balance in multi-objective performance, urgently requiring exploration of more efficient structural control mechanisms and feature fusion methods.

Fig. 17: Model Comparison Comprehensive.
Fig. 17: Model Comparison Comprehensive.
Full size image

a Model Performance Radar Chart. b Overall Performance Comparison. c Generation Efficiency Comparison. d Multi-dimensional Performance. e Evaluation Sample Size. Provides a comprehensive comparison of three pattern generation model methods (Unconstrained, ControlNet Hard Constraints, and Complete Hybrid Control), covering performance across multiple dimensions, such as Cultural Compliance, Craft Feasibility, and Generation Efficiency, along with a comparison of overall scores and sample sizes.

Based on the evaluation results, all key performance dimensions in this study have met or exceeded the predefined targets. As shown in Table 3, the model demonstrates outstanding performance in cultural compliance, with an average CLIP similarity score consistently ranging from 0.78 to 0.82,indicating a high degree of alignment between the generated content and its cultural context. The Top-1 accuracy for symbol recognition reaches 82–85%, confirming the model’s capability to accurately capture traditional cultural symbols.

Table 3 Performance Evaluation of Pattern Generation Across Cultural, Semantic, and Technical Dimensions.

The F1-score for craft prediction falls between 0.87 and 0.89, reflecting the model’s robustness in identifying embroidery techniques. Additionally, the false positive rate is maintained at a low level of 6–8%, ensuring the reliability of the generated results. The overall evaluation score reaches 7.2–7.8 out of 10, highlighting the model’s strong comprehensive performance. In terms of generation efficiency, the model achieves an average generation time of 2.8–3.5 s per image, well below the 5 s target, thereby meeting the demands of real-world applications.Collectively, these results validate the effectiveness and efficiency of the proposed framework in generating embroidery patterns that possess both cultural depth and artistic value.

Discussion

This study for the first time constructed an “embroidery semantic network” for Qing Dynasty official rank badges. Through systematic organization and semantic modeling, it achieved a structured representation of the patterns’ hierarchical system, animal totems, decorative motifs, and color systems, providing foundational support for the digital extraction and computable expression of cultural genes. Experimental results indicate that this network achieved a Top-1 symbol recognition accuracy of 82–85%, and demonstrated good semantic separability and consistency among different types of official rank badges. Furthermore, a cultural image compliance analysis, combined with the CLIP semantic model, showed that the average similarity of the generated patterns remained between 0.78–0.82, validating the effectiveness of the semantic network in ensuring cultural logical rationality.

The introduction of dynamic shape grammar effectively enhanced the control over pattern structure and morphology during the image generation process. The image structural accuracy under the hard constraint strategy was significantly higher than that of the control group, with a geometric matching score (IoU) reaching 0.07, demonstrating strong compositional consistency. In contrast, the soft constraint strategy offered greater flexibility in detail expression and pattern variation, capable of generating diverse samples that comply with traditional embroidery esthetic principles.A comparative analysis of control strategies further revealed that hard constraints are suitable for scenarios demanding high structural fidelity, whereas soft constraints are better suited for expressing cultural semantics and embroidery craft features. The hybrid control strategy achieved a relative balance among cultural expression, craft logic, and efficiency, providing a systematic solution for complex multi-objective pattern generation tasks. This framework provides a formal language that combines normativity with generativity, laying a methodological foundation for subsequent modeling of multi-style and multi-category embroidery patterns.

By constructing the LoRA-Diffusion-SG three-stage fine-tuning architecture, the model achieved significant improvements across multiple dimensions, including visual fidelity, semantic matching, and craft prediction. The F1-score for the craft prediction module reached 0.87–0.89, with a false positive rate controlled at 6–8%, demonstrating the model’s effective judgment capability regarding embroidery stitches, textures, and logical feasibility. The overall evaluation score reached 7.2–7.8 out of 10, indicating a good balance between generation quality and cultural expression. Image generation efficiency also surpassed industry benchmarks (2.8–3.5 s per image), possessing practical application value. Despite these positive results, certain bottlenecks persist, particularly in the accuracy of cultural symbol recognition and the mapping to human perceived quality. The weak negative correlation between CLIP similarity and simulated human evaluation (\(r\text{=-}0.306\)) indicates that the current semantic scores do not fully translate into a true enhancement of semantic understanding and cultural symbol distinguishability. Classification errors for specific animal categories also reflect a need for further optimization in recognition accuracy.

This framework is not only applicable to the digital reproduction and simulation generation of Qing Dynasty official rank badge embroidery patterns, but also demonstrates broad application potential in various scenarios, such as cultural and creative product development, apparel pattern design, and cultural education. Through the coupled use of structured semantic modeling and form control mechanisms, it can achieve the re-creation and re-activation of traditional patterns in a modern context, providing a feasible path for constructing intelligent design platforms with deep cultural semantic support.

Despite the phased achievements, this study still possesses certain limitations:

  1. 1.

    First, the dataset primarily consists of two-dimensional images and does not yet cover the three-dimensional craft information of physical embroidery.

  2. 2.

    Second, the creativity and cross-cultural expressive capability of the generated patterns still require enhancement.

  3. 3.

    Third, the scalability and adaptability of shape grammar rules need further optimization in conjunction with more traditional schemata.

Future research should focus on constructing cross-cultural embroidery knowledge graphs, modeling three-dimensional embroidery forms, and refining personalized generation control mechanisms. Furthermore, exploring more efficient structural control mechanisms and multi-scale semantic feature fusion methods is needed to improve the model’s in-depth construction of cultural semantics and symbolic expressive capability. Simultaneously, the integration of subjective perception models will refine the evaluation system, aiming to achieve a cultural image generation system that aligns more closely with human esthetics.