Abstract
Ancient Chinese costumes are a key component of China’s cultural heritage. This study introduces an intelligent question answering (Q&A) system based on a domain-specific knowledge graph to enhance the accuracy of information retrieval. The proposed system integrates modules for named entity recognition (NER), question classification (QC), and recall and ranking. Experimental results indicate that the system achieves an F1 score of 88% for queries with explicit attribute values, and 80% for queries without explicit attributes, outperforming existing Q&A system architectures. To further improve NER performance in complex contexts, we propose the RoBERTa-BiLSTM-SDPA-CRF model, which achieves F1 scores of 92% on a proprietary dataset and 81% on a public dataset. Additionally, the system incorporates both text- and image-based responses, enriching user interaction. This research contributes to the advancement of domain-specific knowledge retrieval and fosters the dissemination of cultural heritage by facilitating a more comprehensive understanding of Ancient Chinese costumes.
Similar content being viewed by others
Introduction
Ancient Chinese costumes represent the traditional clothing worn by people in China from the Zhou Dynasties to the Qing Dynasty. These garments were distinguished by unique styles, materials, and decorative elements that reflected social status, cultural rituals, and esthetic values. Over time, these costumes evolved in response to changes in social structure, political developments, and advancements in textile production, making them a profound expression of China’s historical and cultural progression. As treasures of cultural heritage, ancient Chinese costumes hold significant historical and artistic value, embodying the deep cultural identity and esthetic sensibilities of the Chinese nation. Renowned for their distinctive designs, exceptional craftsmanship, and diverse styles, these garments offer insight into the social landscape, cultural spirit, and philosophical ideals of various Chinese dynasties1,2.
The evolution of Chinese costumes—from the inclusive fashion of the Tang Dynasty to the formal attire of the Ming Dynasty—reveals the shifting social, political, and cultural landscapes of China3. In contemporary fashion, elements of ancient Chinese clothing have gained substantial recognition, appearing on international runways and being embraced by younger generations, fostering a dynamic exchange between traditional heritage and modern fashion. This integration promotes cross-cultural dialog between Eastern and Western societies, while also encouraging the preservation and continued celebration of China’s traditional clothing heritage.
To preserve and transmit this heritage, scholars have employed various methods, including esthetic, sociological, and anthropological analyses, to catalog and safeguard traditional designs4. Databases have been established to catalog and safeguard these designs5, ensuring their continued celebration and transmission through integration into modern fashion6. Digital technologies further aid preservation by simulating the physical properties of ancient garments7. For example, Jiang and Guo developed an interactive multimedia system for virtually displaying garment structures8, while Chen and Lin enhanced digital design through Kernel Principal Component Analysis, improving data processing accuracy9. Liu et al. advanced costume recognition with an enhanced YOLOv5 model, optimizing feature extraction10. These innovative physical and digital preservation methods ensure that ancient Chinese costume heritage is protected for future study and appreciation.
However, despite growing global interest, a significant gap remains in understanding the deeper cultural meanings and philosophical principles embodied in these costumes. While esthetic features—such as wide sleeves, stand-up collars, and intricate embroidery—are often admired, the underlying cultural and philosophical concepts, such as the ancient Chinese notion of “unity between man and nature,” are frequently overlooked or misunderstood. Furthermore, there is confusion regarding the dynastic origins of these costumes and their associated cultural norms and dress codes. A more comprehensive understanding of these aspects is essential for fully appreciating traditional Chinese culture and its historical evolution.
Intelligent Question Answering Systems (IQAs) offer a potential solution to address these gaps in understanding and to enhance awareness of Chinese cultural heritage. IQAs can be broadly classified into two categories: Generative and Retrieval-based systems. Generative systems generate answers via natural language processing but may produce hallucinations, whereas Retrieval-based systems ensure factual accuracy by retrieving answers from predefined databases, though they lack contextual flexibility11. Knowledge Graphs (KGs) enhance these systems by organizing data, improving accuracy, and reducing hallucinations. KGs organize and structure data, enabling complex queries and improving the accuracy of responses12,13. Embedding KGs into generative models using techniques like Graph Neural Networks14,15 and BERT16 enables more complex query handling and higher-quality responses17. This approach has been successfully applied in various fields, including healthcare18, finance19, tourism20, and education.
Knowledge Graph-based IQAs (KGIQAs) can significantly enhance cultural heritage research by organizing information, enabling complex queries, and facilitating semantic reasoning. These systems provide intuitive insights through intelligent recommendations and visualizations.For example, Hu et al. developed a Q&A system for preserving Chinese historical towns, enabling the retrieval of information about their historical evolution, cultural significance, and preservation methods21. Similarly, Xu et al. introduced the Nanjing Yunjin Q&A System, which utilizes KGs and Retrieval-Augmented Generation (RAG) techniques22. These examples highlight the potential of knowledge graphs and retrieval-based technologies to transform cultural heritage Q&A systems, improving the accuracy and accessibility of information.
Knowledge graphs play a crucial role in organizing cultural heritage data, such as historical documents, artifacts, and contextual information. By linking various entities, knowledge graphs enhance the precision of queries, enabling more accurate and contextually relevant responses. Cultural heritage knowledge graphs (KGs) are primarily constructed through two main approaches: Expert-Driven and Data-Driven methods. The Expert-Driven approach focuses on involving experts and communities in curating and structuring the knowledge to ensure accuracy, depth, and cultural sensitivity. For example, Carriero et al. developed the ArCo framework for cultural heritage KGs, emphasizing community involvementt23,24, while Songjin Yang et al. created a knowledge graph for Yueju theater to support research and preservation25. Similarly, Bai and Hou introduced a cultural heritage knowledge graph for Beijing to facilitate visual analysis and interactive Q&A26 and Sartini emphasized integrating iconographic data to interpret symbolic meanings in artifacts27. In contrast, the Data-Driven approach leverages technologies like Natural Language Processing (NLP), multimodal integration, and machine learning to automatically extract and organize knowledge from large datasets. Dou et al. used NLP to create a knowledge graph for intangible cultural heritage (ICH) in China28,while Huang et al. developed a Q&A system for Traditional Chinese Medicine (TCM) that retrieves data from custom datasets and online resourcess29,30. Simin Yang et al. used a meme-based approach to mine and organize knowledge of traditional Chinese settlement culture31, and Fan et al. proposed a large-scale multimodal knowledge graph to unify dispersed ICH data32 Wan et al. developed the Wu Men Multimodal Knowledge Graph (WuMKG) to integrate textual and visual data on Chinese painting and calligraphy33, and Asprino et al. used the ArCo knowledge graph to create a bilingual cultural heritage VQA dataset34. Building on this, Becattini et al. advanced VQA by enabling AI to reason over visual content and natural language descriptions, enhancing intelligent cultural heritage systems35. These approaches complement each other, with Expert-Driven methods ensuring cultural nuance and accuracy, while Data-Driven methods provide scalability and the ability to process diverse datasets, resulting in more comprehensive and accessible cultural heritage knowledge systems.
Despite the advancements in KGIQAs for cultural heritage, the application of knowledge graphs to ancient Chinese clothing remains underexplored. Unlike fields like Traditional Chinese Medicine36, the data concerning ancient clothing is inconsistent, complicating efforts to create a unified knowledge base37. Moreover, regional variations and historical context further complicate the task38. The intricate cultural significance embedded in ancient Chinese costumes—such as the “Shuitian Clothes” (Paddy-field Costume) popularized during the Tang, Ming, and Qing Dynasties—illustrates the complexity of this heritage39. These garments, shaped by sociopolitical factors and evolving social norms, also reflect advancements in textile technology, from dyeing to embroidery techniques40. Integrating multimodal data (text, images) into a knowledge graph is crucial for constructing a comprehensive representation of ancient Chinese clothing.
Despite these advances, the use of knowledge graphs for ancient Chinese clothing remains underexplored. The construction of a comprehensive knowledge graph for ancient Chinese garments faces significant challenges, including fragmented and under-digitized data, as well as the complexity of incorporating multimodal data. Moreover, terms such as “Hanfu” and “Tang suit” carry different meanings across historical periods, creating difficulties in interpretation and retrieval. For instance, NLP models often struggle with polysemous terms, leading to misinterpretation of cultural specifics. The example of “lianma” <练麻>—commonly misunderstood as “refined hemp fabric”—illustrates how general-purpose knowledge bases like Wikidata or Freebase fall short in providing accurate, context-specific responses for specialized queries related to cultural heritage.
The challenges of natural language ambiguity and domain-specific knowledge retrieval further complicate the development of effective information systems41. Terms such as “Hanfu” and “Tang suit” carry different meanings across various dynasties, requiring precise disambiguation for accurate interpretation. General-purpose knowledge bases like Wikidata or Freebase often fail to provide contextually accurate answers for specialized cultural heritage queries42,43. For instance, ChatGPT misinterprets the term “lianma” <练麻>, as a “refined hemp fabric,” when in fact it refers to “white mourning clothes” worn during ceremonies marking the first anniversary of a deceased person’s death <练祭>.
To address these challenges, the development of an IQA system that can accurately interpret user queries and provide precise, context-aware responses is essential. This requires not only advancements in natural language processing but also the integration of multimodal data processing techniques44. By combining deep learning, expert knowledge, and structured knowledge representation, such a system could enhance the accessibility and accuracy of cultural heritage information.
This paper proposes a framework for developing an IQAs focused on ancient Chinese costumes. By constructing a knowledge graph for ancient Chinese costume, the system aims to reduce inaccuracies and improve response quality. Additionally, the paper introduces a new theoretical model for intent recognition, integrating RoBERTa with a BiLSTM network to enhance the semantic understanding. This study seeks to improve the reliability and effectiveness of Q&A systems in cultural heritage research.
The main contributions of this study are:
-
(1)
Construction of a domain-specific Knowledge Graph: The primary contribution of this study is the creation of a specialized knowledge graph for Ancient Chinese Costumes. The integration of offline authoritative texts with image resources from online professional databases, along with knowledge verification and logical refinement through a semantic association mechanism and expert involvement, has significantly improved the accuracy of the KG of ancient Chinese costumes.
-
(2)
Development of an Intelligent Q&A System for ancient Chinese costumes: We proposed a novel Q&A framework based on this knowledge graph, integrating Named Entity Recognition, Question Classification, and Question Recall and Rank. This framework enables a deeper understanding of user intent, improving accuracy and user satisfaction, especially for queries with clear attributes. The system achieves an F1 score of 88%, demonstrating excellent performance in answering complex queries.
-
(3)
Enhancement of semantic understanding: To improve answer accuracy, we introduced the RoBERTa-BiLSTM-SDPA-CRF model for Named Entity Recognition (NER). This model combines RoBERTa’s contextual pre-training, BiLSTM’s feature extraction, and SDPA’s attention mechanism to effectively handle polysemy and complex contexts. Experimental validation demonstrates strong performance, achieving an F1 score of 91.68% on our dataset and 81.41% on a public dataset, demonstrating its effectiveness in providing deep semantic understanding of domain-specific texts in practical applications.
This paper is structured as follows: In “Methods”, presents the methodology, outlining the details of each component. The experiments results are described in “Results”. Finally, “Discussion” offers discussion of this study.
Methods
Construction of the KG for traditional Chinese costumes
This research examines ancient Chinese costumes, which have evolved over thousands of years and are a key part of China’s cultural heritage. As the styles, materials, and design elements of these costumes have changed across different historical periods, it is essential to capture these variations in the knowledge graph. Additionally, the symbolic meanings of colors, patterns, and materials reflect the socio-economic and cultural conditions of each era and must be incorporated to ensure an accurate representation. By synthesizing these historical, cultural, and design elements, the knowledge graph provides a strong foundation for a Q&A system that accurately reflects the complexity of ancient Chinese costumes. The process of constructing this knowledge graph is outlined in Fig. 1.
Note: This diagram illustrates the comprehensive process of constructing a knowledge graph. The process begins with data collection, where information is gathered from both offline sources, such as books, and online platforms, including Baidu Baike and museum databases. The knowledge construction process is divided into three key stages: knowledge extraction, knowledge fusion, and knowledge storage and graph construction. During the knowledge extraction phase, entity extraction identifies key elements such as names, categories, and types; attribute extraction focuses on capturing functional and structural characteristics; and relation extraction maps the semantic relationships between entities, such as “belongs to” or “consists of”. In the knowledge fusion phase, techniques such as entity detection, coreference resolution (e.g., unifying various expressions of the same entity), and entity disambiguation (e.g., clarifying ambiguities between similar terms) are employed to enhance the consistency and accuracy of the data. The diagram employs the example of the “Liuhe Tongyi Hat <六合统一帽>“ from the Qing Dynasty to demonstrate the fusion processes. Finally, the processed knowledge is stored in a Neo4j graph database, where entities and their relationships are visualized in a structured graph format.
As illustrated in Fig. 1, the first step is data collection, which involves gathering information from various sources such as historical texts, academic papers, ethnographic studies, and online archives. To ensure data quality, this study relied on two types of sources: offline and online. For offline sources, we selected highly authoritative publications, including Seven Thousand Years of Chinese Clothing published by Tsinghua University Press, an academic publisher affiliated with Tsinghua University; A Brief History of Clothing in China published by the Social Sciences Academic Press (SSAP), the publishing arm of the Chinese Academy of Social Sciences (CASS), established in 1985 and specializing in humanities and social sciences; and the Dictionary of Ancient Chinese Clothing published by Zhonghua Book Company, a Shanghai-based publisher founded in 1912 that is renowned for producing China’s largest dictionary, Cihai <辞海>. These sources provide reliable and scholarly information. For online sources, data were collected from platforms such as Hua Fuzhi <华服志, www.huafuzhi.com>and Baidu Encyclopedias <百度百科>, with additional information obtained through web crawling to study ancient Chinese clothing. Furthermore, visual materials such as murals, terracotta figurines, and paintings from museum collections featuring ancient Clothing were also utilized in this research.
During the data processing phase, texts were digitized using ABBYY FineReader V16 and manually corrected for accuracy. The Language Technology Platform was used to extract baseline entity types, while Lexical Analysis for Chinese was applied to annotate the costume records with domain-specific entities. To enhance the completeness of the KG, we selected ancient Chinese costumes from the Zhou to Qing Dynasties as the data source. However, records from the pre-Tang period are generally limited and predominantly ritualistic, whereas post-Tang sources, such as literature, paintings, and murals, offer more extensive and detailed documentation, leading to an uneven distribution of data. To mitigate this imbalance, we supplemented the pre-Tang data and reduced the volume of data from the Qing Dynasty, ensuring a more consistent representation across the different historical periods. For image data processing, entity attribute extension was implemented to introduce image-specific attributes for entities requiring visual display, thereby linking images to their corresponding textual content. Due to the complexity of the original data sources, we prioritized authoritative literature to maintain the accuracy and reliability of the KG. A Large Language Model (LLM) was utilized to extract key information from the selected sources, while experts in the field were engaged to manually review and validate the extracted data. In cases of inconsistencies or ambiguities, further consultations with specialists in ancient Chinese costumes were conducted to resolve discrepancies and enhance data quality.
The next step is to construct a KG for ancient Chinese costumes, which involves three main stages: extraction, fusion, and storage. In the knowledge extraction phase, the BERT model is used to identify entities. The fusion stage then integrates this information, removing redundancies and inconsistencies to ensure accuracy and coherence. In the context of ancient Chinese costumes, the ontology may define entities such as Dress Type, Dynasty, Material, Craftsmanship, and Cultural Context. Each entity is characterized by specific attributes; for example, Dress Type could include attributes such as Name, Occasion of Use, Features, and Social Rank Association, while Dynasty might include attributes like Name, Time Period, and Cultural Characteristics. The ontology also establishes relationships between entities to illustrate their interconnections. Examples of these relationships include Belongs to (e.g., a garment associated with a specific dynasty’s ritual system), Consists of (e.g., the structural components of a dress), Symbolizes (e.g., cultural or hierarchical significance), Evolved into (e.g., the progression of clothing styles over time), and Popular in (e.g., the prominence of a garment during a particular historical era). The following table (Table 1) presents the entity relationship types and examples of Ancient Chinese Costumes, which will serve as the foundation for organizing these connections within the Knowledge Graph.
To ensure data accuracy and integrity, the ontology is built with specific rules and constraints. After the entity extraction and the definition of attributes and relationships, a manual review was conducted to identify and correct any errors in the entities and relationships.
For example, consider the following text data:
“Mianfu <冕服>, a ceremonial garment reserved for emperors in ancient China. Originating in the Shang and Zhou Dynasties and disappearing after the Ming Dynasty, Mianfu was a central feature of major state and religious ceremonies, reflecting the hierarchical and ritualistic values of ancient Chinese society. The structure of Mianfu included several key components: a crown, upper and lower garments, slippers, and accessories such as ribbons, belts, and kneepads. Crafted from premium materials such as silk, brocade, jade, and precious metals, Mianfu was distinguished by its dark color palette—typically black and red—symbolizing solemnity and authority. Its decorative chapter patterns featured motifs like the sun, moon, stars, mountains, and dragons, with specific designs and combinations denoting social rank and ceremonial purpose. Even the crown’s tassels varied in number based on the wearer’s hierarchical status. Mianfu was worn during significant occasions, including sacrifices to heaven and earth, ancestral temple rituals, and important imperial ceremonies. In the Ming Dynasty, for example, emperors wore Mianfu for events such as the winter solstice and worship ceremonies. Beyond its functional use, Mianfu was a potent symbol of the emperor’s authority and a representation of the social and cultural hierarchy of the time. It is categorized as belonging to the ancient Chinese ritual system, used in specific ceremonial contexts, symbolizing the emperor’s authority and sanctity, and influencing subsequent dress systems.
The process begins with the systematic processing and categorization of information, focusing on identifying key attributes and relationships. This involves analyzing origins and historical periods (e.g., Shang and Zhou Dynasties, Ming Dynasty), components and materials (e.g., crown, garments, silk, jade), as well as symbolism and usage contexts (e.g., authority, rituals, ceremonies). Next, an ontology is developed to define the entities (e.g., Mianfu, crown, chapter patterns, ceremonies, materials), their attributes (e.g., colors, materials, number of tassels, decorative patterns), and the relationships between them (e.g., “INCLUDES,” “MADE_FROM,” “WORN_DURING,” “SYMBOLIZES”). The ontology is carefully structured to capture both hierarchical relationships (e.g., Mianfu includes accessories) and cross-domain connections (e.g., the symbolism of Mianfu in ceremonial contexts). Once the ontology is established, the data is modeled into a graph structure, with nodes representing entities (e.g., “Mianfu”) and edges representing their relationships (e.g., “Mianfu INCLUDES Crown”). Properties are assigned to nodes, such as material type (e.g., silk), color (e.g., black and red), and era (e.g., Shang Dynasty), while edges are enriched with attributes relevant to their relationship type (e.g., specific ceremonies for “WORN_DURING”). All data will be manually verified upon completion of the analysis. Finally, once the data modeling process is complete, all information is manually verified to ensure accuracy and consistency.
Selecting an appropriate platform to store and manage the KG is a critical step in this research. Among the available graph database platforms, such as Neo4j, Amazon Neptune, and Virtuoso, Neo4j was chosen for its robust capability to handle complex and interconnected data. The intricate relationships inherent in ancient Chinese costumes—spanning dynasties, social roles, materials, patterns, and occasions—are effectively modeled in Neo4j due to its schema flexibility and efficient representation of relationships. Moreover, Neo4j’s Cypher Query Language facilitates advanced queries, such as tracing the historical evolution of silk usage or identifying ceremonial attire associated with specific eras. The platform’s support for visual exploration through tools like Neo4j Bloom further enhances its utility for historians and designers by providing intuitive and interactive insights. Additionally, Neo4j enables seamless integration with related domains, including art, literature, and geography, offering a holistic perspective on cultural heritage.
Following data processing, the constructed ancient Chinese costumes KG includes 519 entities and 11,724 links, partially showed in Fig.2.
Note: This visualization presents a knowledge graph of ancient Chinese costumes and accessories, where colored circular nodes represent distinct entities such as “clothing <服装>“ and “accessories <配饰>,” while connecting lines illustrate their relationships—including same-category associations (e.g., “crown headgear” and “ceremonial headgear” as similar types) and hierarchical containment (e.g., “accessories” encompassing “head ornaments” and “hair accessories”). The graph’s color-coded nodes and structured linkages systematically organize these historical elements, clarifying categorical similarities and part-whole dependencies within ancient Chinses costumes.
As illustrated in Fig. 2, the entities within the KG are organized into two primary categories: clothing and accessories, with a particular focus on clothing. Each entity is intricately linked to associated elements, reflecting the complex interconnections inherent in the dataset. For example, the classification of hats encompasses various types, including Guan <冠> and Mian <冕>, while the category of armor demonstrates a relationship between Chainmail <链甲> and Wooden Armor <木甲>. Additionally, certain accessories, such as Peiyu <佩玉> and Jade Necklace <玉项链>, share a common material, as both are crafted from jade. This detailed categorization highlights the intricate and interconnected relationships among entities within the knowledge graph.
The algorithms of the ancient Chinese costume Q&A system
The Ancient Chinese Costume Q&A system proposed in this study consists of three key modules: the Semantic Comprehension Module, the Knowledge Graph Interaction Module, and the Query Construction Module. When a user submits a question, it is processed by the Semantic Comprehension Module, which includes three sub-modules: Named Entity Recognition (NER), Question Classification, and Recall and Rank. The NER extracts relevant clothing entities, while the Question Classification determines the question type. In the Recall and Rank sub-module, the extracted entities are matched with their corresponding attributes or relationships and ranked by relevance. The Query Construction Module then uses the most relevant entities and attributes to generate query templates, which are executed on the knowledge graph to retrieve the appropriate answers. Finally, the system returns the results to the user. The system architecture is illustrated in Fig. 3.
Note: This diagram depicts the architecture of a knowledge graph-based question-answering system, outlining the main processes and module functions. The Semantic Comprehension Module processes user input questions (e.g., “What is a Mianfu?”), identifies key entities (e.g., “Mianfu <冕服>“), and classifies the type of user query. The Knowledge Graph Interaction Module constructs a knowledge graph using both online and offline data sources, specifically focusing on knowledge related to “Clothing” for information retrieval purposes. The Query Construction Module consists of several components: the Recall Layer, which retrieves a candidate set from the knowledge graph based on the classification of the user’s question; the Ranking Model, which ranks the retrieved candidates; a component that generates SQL queries to extract relevant information from the Neo4j database; and the Answer Generator, which synthesizes the retrieved information into a coherent response. Through the coordinated efforts of these modules, the system enables a seamless question-answering process that spans from understanding the user’s query to retrieving relevant knowledge and generating an informative answer.
As shown in Fig.3, the Semantic Comprehension Module serves as the foundation of the system, processing user input through a series of preliminary steps such as word segmentation and the removal of stop words. Once this preprocessing is complete, the module performs two key functions. First, it employs a sub-problem classification model to determine the user’s intent, categorizing queries into domains such as historical context, production techniques, or suitable occasions for wearing a specific garment. Second, it utilizes entity recognition techniques to extract key entities from the query, such as the garment’s name, historical era, and cultural significance. These extracted intents and entities are systematically organized into predefined slots for subsequent processing. Finally, the Semantic Comprehension Module applies recall and rank models for an in-depth analysis of the constructed query representation. This analysis transforms the input into precise and actionable queries, enabling accurate retrieval of information and generation of responses.
The Knowledge Graph Interaction Module is a central component of the Ancient Chinese Costume Q&A system, facilitating the connection between user queries and the structured knowledge base. Its primary role is to leverage the knowledge graph to retrieve relevant information and uncover meaningful relationships between entities. The module begins by using the key entities and user intent identified by the Semantic Comprehension Module to interact with the knowledge graph. It retrieves relevant nodes, edges, and subgraphs, ensuring the query’s context is preserved. For complex queries involving multi-step reasoning, the module applies graph traversal algorithms to navigate through interconnected nodes, identifying paths that lead to the desired information. To prioritize results, the module employs ranking mechanisms to extract the most relevant data from the graph. The retrieved information is then structured and passed to the Query Construction Module, enabling the generation of accurate and contextually appropriate responses. By efficiently managing interactions with the knowledge graph, this module ensures the system delivers precise and comprehensive answers to user inquiries.
The Query Construction Module serves as the final stage of the Ancient Chinese Costume Q&A system, transforming processed user input and retrieved knowledge into actionable responses. Its primary function is to generate precise, context-aware queries that enable the system to retrieve relevant answers effectively. This module takes the output from the Semantic Comprehension Module (user intents and extracted entities) and the Knowledge Graph Interaction Module (retrieved relationships and relevant data) and synthesizes them into well-structured query representations. These representations are optimized to match the system’s information retrieval mechanisms, ensuring the results are accurate and relevant to the user’s original question. Additionally, the module handles the integration of multi-hop reasoning results and utilizes natural language generation techniques to produce coherent, user-friendly responses, ensuring the system delivers accurate and relevant answers to user inquiries.
Robustly optimized BERT pretraining approach
The Robustly Optimized BERT Pretraining Approach (RoBERTa), introduced by Facebook AI in 2019, represents a major advancement in Natural Language Processing (NLP). Building upon the foundation of BERT (Bidirectional Encoder Representations from Transformers), RoBERTa retains the multi-layer bidirectional transformer encoder while introducing substantial enhancements. Available in configurations such as RoBERTa-Base (12 layers, 768 hidden dimensions) and RoBERTa-Large (24 layers, 1024 hidden dimensions), the model processes input using token, position, and segment embeddings, although segment embeddings are deemphasized due to the removal of the next sentence prediction (NSP) task. By focusing exclusively on masked language modeling and incorporating dynamic masking, which varies masked tokens across training epochs, RoBERTa delivers more robust and generalizable contextual representations. Figure 4 illustrates the structure of the RoBERTa model.
Note: This diagram presents the architecture of NLP system based on the RoBERTa model, illustrating the workflow from text input to task-specific processing and prediction. The input stage begins with tokenization, wherein the text is divided into tokens, and special markers (CLS and SEP) are introduced to delineate the structure of the text. These tokens are then transformed into numerical word vectors via embedding, resulting in input embeddings that represent the textual input in a format suitable for model processing. The core processing stage consists of stacked Transformer modules, which utilize multi-head attention mechanisms to capture complex interdependencies between words, while feed-forward networks refine the feature representations. The model is further optimized through residual connections and normalization (Add & Norm) to ensure stable and effective training. In the output stage, a task classifier utilizes the features extracted from the Transformer, particularly the CLS vector representing overall semantic information, to classify the input text into a specific task category. Subsequently, text prediction generates task-specific outputs, such as text generation or sentiment analysis results, based on the model’s processed representations.
RoBERTa addresses BERT’s limitations through several key innovations aimed at enhancing performance. One of the most significant advancements is the use of a much larger and more diverse dataset—approximately 160GB compared to BERT’s 16GB—enabling the model to learn richer and more nuanced language representations. The elimination of the NSP task simplifies pretraining, improving data efficiency without sacrificing performance on downstream tasks. Additional training optimizations, such as larger batch sizes, higher learning rates, and extended training durations, further enhance the model’s capabilities. These improvements allow RoBERTa to outperform BERT consistently on widely recognized benchmarks such as GLUE and SQuAD, establishing it as a powerful and versatile tool for a broad range of NLP applications. RoBERTa is highly effective for modeling the complex semantics of ancient costumes due to its dynamic capabilities and scalability. It resolves terminological ambiguities using domain-specific disambiguation and adversarial training, while also employing hierarchical attention and knowledge constraints for precise spatio-temporal contextual parsing. Therefore, in this study, the Semantic Comprehension Module employs the RoBERTa to achieve precise semantic parsing of user queries.
Roberta model in named entity recognition
Within the framework of a Semantic Comprehension Module, the RoBERTa model provides a robust foundation, offering deep, contextualized representations of text that are critical for tasks requiring advanced semantic understanding. Named Entity Recognition (NER) is one such task, aimed at identifying and categorizing entities with distinct semantic roles within a given text45. While the BiLSTM_CRF model has been widely adopted for NER due to its ability to extract both character-level and word-level features, its limitations in capturing contextual semantics, especially when dealing with polysemous words, restrict its effectiveness.
To overcome these shortcomings, this study introduces enhanced architecture, RoBERTa_BiLSTM_SDPA_CRF, which combines pretrained contextual embeddings from RoBERTa with a Scaled Dot-Product Attention (SDPA) mechanism integrated into the BiLSTM framework. The SDPA layer resolves terminology disambiguation by dynamically adjusting contextual semantic weights. It calculates the dot-product similarity between Query, Key, and Value matrices from BiLSTM outputs, applying a scaling factor to maintain gradient stability. This allows the model to focus on relevant contextual details, such as variations of “深衣“ (Shenyi) across different dynasties. The multi-head attention mechanism also identifies implicit associations, such as the connection between “十二章纹“ (Twelve Ornaments) and ritual hierarchy, and distinguishes polysemous terms like “袍“ (pao). This approach improves entity boundary detection accuracy, compensating for BiLSTM’s limitations in handling long-range dependencies, ultimately enabling the CRF decoding layer to generate more contextually coherent label sequences. This architecture improves entity boundary detection, overcoming BiLSTM’s limitations in modeling long-range dependencies, and enhances the CRF decoding layer to generate contextually coherent label sequences. By incorporating RoBERTa embeddings, the model achieves richer semantic representations, and the SDPA mechanism refines its ability to focus on key contextual cues, thereby improving performance in complex NER tasks. The architecture of this proposed model is illustrated in Fig. 5.
Note: This diagram illustrates the architecture of the RoBERTa_BiLSTM_SDPA_CRF Model, consisting of several key layers, which integrates several critical components for natural language processing tasks. The Input Layer converts the raw input text (e.g., “What dynasty did the Mianfu originate from? <冕服起源于什么朝代>“) into a sequence of symbols that the model can process, incorporating special start and end markers to delineate the boundaries of the input. The Pretrain Layer leverages the RoBERTa model, consisting of multiple encoder modules pretrained on large-scale text corpora. This pretraining enables the model to capture general linguistic knowledge and semantic representations, providing well-initialized parameters that can be fine-tuned for task-specific applications. The Feature Extraction Layer utilizes a Bidirectional Long Short-Term Memory (BiLSTM) network, which extracts features from the pretrain layer’s output by processing the text in both forward and backward directions to capture comprehensive contextual information, producing hidden states that encapsulate the text’s semantics. The Attention Layer employs the scaled dot-product attention mechanism to compute attention weights between feature vectors, allowing the model to prioritize significant features and enhance the representation of critical information, which is then used to generate context vectors. Finally, the Decoder Layer applies a CRF to predict label sequences for the input text, using the output from the attention layer while considering label constraints to ensure the predicted sequence conforms to both grammatical and semantic rules. This architecture seamlessly integrates these components to improve semantic understanding and enhance the precision of entity recognition tasks.
The proposed RoBERTa_BiLSTM_SDPA_CRF model is structured into four key components, each contributing to the overall improvement in NER performance. These components are detailed as follows:
(1) Pretraining Layer
The Pretraining Layer integrates the RoBERTa model to generate rich, contextual embeddings that capture the linguistic nuances of the input text. RoBERTa is used for pretraining, producing feature vectors that serve as input to the neural network. For instance, when the query “What is the origin of Mianfu?”<冕服起源于哪个朝代? >is entered, the feature vector \(I\) of the input text is passed through RoBERTa to produce a contextualized feature vector \(X=[{x}_{1},{x}_{2},\cdots ,{x}_{k}]\), as expressed in Eq. (1):
This embedding process allows the model to extract rich, context-aware representations for downstream processing.
(2) Feature Extraction Layer
The Feature Extraction Layer utilizes a BiLSTM model to capture both forward and backward contextual information. BiLSTM processes the feature vectors produced by RoBERTa and computes the hidden states for both the forward and backward LSTM networks. These two sequences of hidden states are then concatenated at each timestep, producing the output for the hidden layer. This approach allows the model to leverage contextual information from both directions, mitigating the issue of gradient explosion. The forward and backward LSTM states are computed as follows:
(3) Scaled Dot-Product Attention Layer
The Scaled Dot-Product Attention (SDPA) layer46 enhances the model’s ability to focus on relevant contextual information by computing attention weights. The SDPA mechanism calculates the dot product between the Query matrix \(Q\) and the Key matrix \(K\), followed by scaling the result by the square root of the dimensionality of the vectors. A softmax function is then applied to scaled scores to obtain attention weights. These weights are used to compute a weighted context matrix by multiplying with the Value matrix \(V\). The attention operation is formalized as:
This attention mechanism allows the model to effectively capture long-range dependencies and contextual relationships within the input sequence.
(4) Decoding Layer
The Decoding Layer employs a Conditional Random Field (CRF) to produce structured output predictions. CRF models the conditional dependencies between adjacent labels in the sequence, ensuring that the predicted labels form a coherent and contextually appropriate sequence. The model computes the score for each possible label sequence and selects the sequence with the highest score. The score function is given by:
Here, \({P}_{i,j}\) represents the score for the \(i\)-th word labeled with \({y}_{i}\), and \({A}_{{y}_{i-1},{y}_{i}}\) captures the transition score between labels \({y}_{i-1}\) and \({y}_{i}\). The scores for each label sequence are normalized, and the label sequence with the highest probability is selected as the final output:
This improved model architecture addresses the semantic limitations of traditional approaches, providing a more robust and context-sensitive solution for NER tasks. It represents a significant advancement in leveraging pretrained models and attention mechanisms to enhance entity recognition in natural language processing.
This enhanced model architecture, combining RoBERTa’s pretrained embeddings, BiLSTM, SDPA, and CRF, overcomes the semantic limitations of traditional NER approaches. By leveraging advanced techniques such as dynamic contextual embeddings and attention mechanisms, the model achieves superior performance in entity recognition. The integration of these components represents a significant advancement in NER, offering a more robust and context-sensitive solution for a wide range of natural language processing tasks.
Roberta model in question classification task
In the problem classification task, the RoBERTa model is employed to categorize questions based on their semantic and structural features. By generating deep contextual embeddings, RoBERTa identifies subtle patterns that help distinguish between different question types.
To improve the efficiency of the Q&A system, questions are first categorized based on named entities before being matched to predefined templates. This step reduces the search space, lowers computational complexity, and speeds up response times. In this paper, questions are classified into two types based on the position of the question mark in the SPARQL triple, labeled as Type 1 and Type 2, as shown in Table 2:
In this study, the RoBERTa model first performs deep semantic parsing of the user’s question, classifying it into one of two types: Type 1 (single-entity attribute queries) or Type 2 (two-entity relational queries). Once the question is classified, normalized entities are extracted. Extracted normalized entities are then used to select the appropriate SPARQL templates based on the question type. Each type corresponds to a specific template, and the system generates executable queries by replacing entity placeholders within the selected templates.
Roberta model in question recall and rank task
The RoBERTa model proves highly effective in question recall and rank tasks, where the objective is to rank candidate questions according to their relevance to a given query. In this study, RoBERTa is applied to distinguish between questions that share subtle semantic differences, particularly in domain-specific knowledge retrieval. By leveraging RoBERTa’s ability to generate deep, contextualized embeddings, the model captures the nuanced relationships between the query and candidate questions. For example, in the context of Ancient Chinese costumes knowledge retrieval, RoBERTa can differentiate between questions such as “What is the significance of the dragon robe?” and “Who wore the dragon robe?” based on their relevance to a user’s specific query intent. The model processes the query-candidate question pair and uses the output from the [CLS] token to compute a relevance score for each candidate.
The recall and rank framework in this study involves normalizing these relevance scores using a softmax function, which allows for the effective ranking of multiple candidate questions. RoBERTa’s pretrained embeddings—trained on large and diverse corpora—enable it to capture complex semantic and contextual relationships, improving the precision of relevance scoring. In addition, the bidirectional nature of the model ensures that both the query and candidate questions are evaluated with rich contextual awareness. The proposed framework fine-tunes RoBERTa specifically for the question ranking task, optimizing the model’s ability to prioritize the most relevant responses. This approach enhances the performance of information retrieval systems and question-answering applications, providing a more accurate and context-sensitive ranking of candidate questions.
In addition to question ranking, the system employs a structured query construction approach that integrates NER, question classification, and recall and rank modules to extract key entities and relationships from the user’s query. The template-matching process involves selecting an appropriate query template, completing it with the extracted information, executing the query on a Neo4j graph database, and generating a structured response for the user. This methodology ensures efficient, context-aware information retrieval, enabling the system to provide precise and accurate answers to user queries.
Results
Experimental setup
This study utilizes a dataset derived from authoritative offline sources, such as Seven Thousand Years of Chinese Clothing, A Brief History of Clothing in China and Dictionary of Ancient Chinese Clothing, along with online platforms like Hua Fuzhi and Baidu Baike, and supplemented by visual materials from murals, terracotta figurines, and museum paintings, as detailed in “Experimental setup”. The dataset was built upon the ancient Chinese costumes KG, which comprises 519 entities and 14 predefined relationships, providing the foundational structure for addressing data cold-start issues. The corpus generation process involved applying rules based on Chinese natural language expressions and sentence structures to create templates for each predefined relationship. The templates were structured to mirror the KG’s hierarchical and relational framework, which encodes domain-specific entities (e.g., historical garments, rituals) and their semantic relationships (e.g., used-in, symbolizes). Each template incorporates slots (e.g., [Entity], [Time Period]) that directly correspond to KG node types and edges, ensuring syntactic and semantic fidelity—for instance, the template “[Entity] was worn during [Time Period] for [Purpose]” maps to KG triples like (深衣, time-period, Han Dynasty). To illustrate, KG triples such as (十二章纹, symbolizes, imperial authority) were instantiated into natural language examples (e.g., “The embroidery patterns十二章纹 symbolize imperial authority, appearing on garments reserved for emperors”) through context-aware slot filling, which prioritized high-confidence KG edges verified by domain experts. Templates were further categorized (e.g., temporal evolution, cultural significance) based on KG subgraphs, leveraging relationships like (深衣, evolved-from, 曲裾袍) to describe historical garment transitions. Challenges such as entity polysemy (e.g., disambiguating 袍 as ceremonial or casual via KG attributes like material) and temporal consistency (e.g., cross-referencing dynastic changes in the KG) were mitigated through constraints and manual validation. To enhance the diversity and accuracy of the corpus, techniques such as semantic substitution and sentence reconstruction were employed, thereby improving the model’s generalization ability and recognition performance. These question templates were then populated with entities from the knowledge graph, resulting in a dataset suitable for tasks such as NER, Question Classification, and Question Recall and Rank. Ultimately, a total of 13,569 questions were generated, with the dataset split into training and testing sets in a 3:1 ratio.
The experiments were conducted in an environment equipped with Windows 11, a Tesla V100 32GB GPU, and Python 3.9. The hyperparameters for the training process were configured as follows: the number of epochs was set to 10, the batch size was 16, and the dimensions of the hidden layers and output slots were set to 1 and 2, respectively, with the dimensionality of the hidden and output vectors set to 150. The learning rate was initialized at 1e-5, the dropout rate was set to 0.2, and the total number of training epochs was 10.
Named entity recognition experiment
In this study, a question dataset was generated from the Ancient Chinese Costume Knowledge Graph, with both the training and testing datasets annotated using the BIO annotation scheme. To evaluate the performance of the proposed RoBERTa_BiLSTM_SDPA_CRF model, a comparative analysis was conducted against several baseline models, including BERT_CRF, BERT_BiLSTM_CRF, RoBERTa_CRF, and RoBERTa_BiLSTM_CRF. The effectiveness of these models was assessed using key evaluation metrics: Precision, Recall, and F1 score. Precision is the proportion of samples predicted as positive that are correctly predicted. Precision refers to the proportion of samples predicted as positive that are correct:
where TP represents true positives and FP represents false positives.
Recall measures the proportion of actual positive samples that are correctly predicted:
where FN denotes false negatives.
Accuracy is the proportion of correctly classified samples in the entire sample space:
where TN denotes true negatives.
Finally, the F1 score, which balances precision and recall, was used to provide an overall measure of the model’s performance. These metrics were employed to rigorously evaluate the models’ effectiveness in recognizing entities and answering questions within the context of the knowledge graph, focusing on accuracy, completeness, and overall performance. Figure 6 shows the results of NER comparative experiment.
Note: This bar chart presents a comparison of the performance of various models in NER tasks, evaluated across three key metrics: Precision, Recall, and F1-Score. The models compared include BERT_CRF, RoBERTa_CRF, BERT_BiLSTM_CRF, RoBERTa_BiLSTM_CRF, and RoBERTa_BiLSTM_SDPA_CRF. Among these, RoBERTa_BiLSTM_SDPA_CRF achieves the highest precision at 0.9369, surpassing the other models, indicating its superior ability to correctly identify true positives. In terms of recall, RoBERTa_BiLSTM_SDPA_CRF also leads with 0.8975. Regarding the F1-Score, RoBERTa_BiLSTM_SDPA_CRF again outperforms the others with a score of 0.9168, compared to the other models, demonstrating a better balance between precision and recall. Overall, RoBERTa_BiLSTM_SDPA_CRF consistently outperforms the other models across all three metrics, highlighting its superior performance in named entity recognition tasks.
As shown in Fig. 6, the F1 scores for the NER module on the cold-start dataset are as follows: 90.63% for BERT_CRF, 90.07% for RoBERTa_CRF, 90.28% for BERT_BiLSTM_CRF, 90.38% for RoBERTa_BiLSTM_CRF, and 91.68% for the proposed RoBERTa_BiLSTM_SDPA_CRF model. The RoBERTa_BiLSTM_SDPA_CRF model outperforms the other models in terms of Precision, Recall, and F1 score, demonstrating its superior ability to effectively recognize costume-related entities. This highlights the improved performance and enhanced entity recognition capabilities of the proposed model.
To further evaluate the model’s effectiveness, we conducted a validation experiment using the KgCLUE dataset. KgCLUE is a large-scale, open-source Chinese KGQA dataset and serves as a benchmark for assessing Chinese KBQA systems. The knowledge base is derived from encyclopedism data, containing factual triples sourced from search pages. It includes 3,121,457 entities, 245,838 relationships, and 20,559,652 triples. The dataset is divided into four parts: one training set (Train); one validation set (Dev); one public test set (Test Public), used for testing; and one private test set (Test Private), used for submission and not disclosed. The results of the experiment are summarized in Table 3.
Table 3 reports the F1 scores for the NER module across different models: 81.89% for BERT_CRF, 80.26% for RoBERTa_CRF, 80.89% for BERT_BiLSTM_CRF, 77.65% for RoBERTa_BiLSTM_CRF, and 81.41% for the proposed RoBERTa_BiLSTM_SDPA_CRF model. Among these, BERT_CRF achieves the highest F1 score of 0.8189, demonstrating superior precision (0.8432), although its recall (0.7960) suggests that some answers may be missed. RoBERTa_BiLSTM_SDPA_CRF follows closely with an F1 score of 0.8141, showing strong precision (0.8177) and recall (0.8105), indicating a balanced performance. The model’s recall slightly exceeds its precision, suggesting a focus on covering a broader range of questions in its responses.
However, for more complex applications, such as KGIQAs for Chinese ancient costumes, the RoBERTa_BiLSTM_SDPA_CRF model is better suited to handle intricate semantics and long texts. Its capacity to process complex information allows for a more accurate interpretation of Chinese ancient costumes KG. Moreover, its ability to capture word-level semantic nuances is essential for understanding specialized terminology and complex descriptions in the domain of ancient Chinese costumes.
Question and answer structure experiment
The Q&A Structure Experiment aims to evaluate and improve the performance of models in extracting and processing relevant information to answer domain-specific queries, particularly in the context of Ancient Chinese costumes. The experiment focuses on assessing how effectively the RoBERTa_BiLSTM_SDPA_CRF model can recognize entities and relationships, construct precise queries, and generate accurate answers from a knowledge graph. Additionally, it examines the impact of pretrained models and contextual embeddings on knowledge retrieval, as well as the model’s capacity to rank and select the most relevant responses. The primary goal is to enhance the system’s accuracy and contextual relevance when addressing complex domain-specific questions.
For a comprehensive evaluation, the KGIQAs dataset was divided into two categories: questions with distinct attribute values (dataset 1) and questions without clear attribute values (dataset 2), each containing 3000 samples. This classification facilitates a deeper understanding of different question types and enables more targeted analysis in subsequent experiments. Table 4 presents examples of questions from both categories.
To evaluate the effectiveness of our proposed Q&A framework, we conducted comparative experiments with five different architectures. Structure 1 combines NER and Relation Extraction (RE)47 to identify key entities and their relationships, enhancing the understanding of the text. Structure 2 integrates NER with Question Classification (QC)48 to extract entities and classify the question type, helping to infer the user’s intent and generate appropriate responses. Structure 3 enhances intent understanding by integrating NER with Intent Recognition (IR)49. After identifying entities, it analyzes the user’s intent to provide more accurate answers. Structure 4 utilizes the pre-trained capabilities of ChatGPT 3.5 for natural language generation and reasoning. Structure 5, our proposed framework, combines NER, QC, and a Recall and Rank (R&R) mechanism. It extracts entities, classifies question types, and refines answer accuracy by ranking candidate responses for greater relevance. The experimental results, measured by F1 scores, are illustrated in Fig. 7.
Note: This bar chart presents two sets of comparisons illustrating the performance of various structural models on two distinct types of datasets. In the dataset with explicit attributes, Structure 1 (NER + RE) achieves a score of 0.85, Structure 2 (NER + QC) scores 0.82, Structure 3 (NER + IR) scores 0.80, Structure 4 (ChatGPT-3.5) scores 0.70, and Structure 5 (Ours) attains the highest performance with a score of 0.88. In contrast, on the dataset without explicit attributes, Structure 1 scores 0.78, Structure 2 scores 0.75, Structure 3 scores 0.73, Structure 4 (ChatGPT-3.5) scores 0.65, and Structure 5 (Ours) again leads with a score of 0.80. Across both datasets, Structure 5 (Ours) consistently outperforms all other models, including ChatGPT-3.5. Furthermore, it is evident that all models generally perform better on the dataset with explicit attributes compared to the dataset lacking explicit attributes.
As shown in Fig. 7, Structure 1 (NER + RE) performed well on the dataset with clearly defined attribute values, achieving an F1 score of 85%. However, its performance declined when handling questions without explicit attribute values. Structures 2 (NER + QC) and 3 (NER + IR) were less effective in such cases, with F1 scores of 75% and 73%, respectively. Structure 4 (ChatGPT 3.5), despite being a large language model, struggled with domain-specific questions, especially in the absence of clear attribute values. In contrast, our proposed Q&A system (Structure 5 NER + QC + R&R) excelled across both datasets, particularly on the one with clear attribute values, where it achieved an F1 score of 88%. This indicates that the inclusion of recall and rank mechanisms significantly improved the accuracy of information retrieval and question answering, particularly for complex, domain-specific queries.
Question answering system testing
The Q&A system architecture connects user input through a front-end interface and integrates modules for NLP, QC, knowledge retrieval, and answer generation. With the support of KG and a model repository, the system provides accurate and efficient Q&A services. The structure of the system is shown in Fig. 8.
Note: This diagram depicts the architecture of a Q&A system focused on ancient Chinese costumes, structured into four distinct layers. The User Interface Layer, developed using CSS, HTML, and the Flask framework, provides a visual interface for user interaction and facilitates the question-answering functionality. The Application Layer manages the sequential processing of user queries, encompassing reception, comprehension, coordination by the question-answering system, knowledge graph retrieval, and the generation and delivery of responses. The Model Layer utilizes the RoBERTa-BiLSTM-SDPA-CRF model for NER and employs RoBERTa for question classification, as well as information retrieval and ranking to construct relevant queries. The Data Layer organizes entities, relationships, and attributes within a knowledge graph of ancient Chinese clothing, with data stored and queried through Cypher queries in the Neo4j database.
As depicted in Fig. 8, the system architecture consists of a four-layer framework designed for an Ancient Chinese Costumes Q&A and KG visualization system. The User Interface Layer, built with CSS, HTML, and Flask, features interactive modules such as the “Ancient Chinese Costumes Q&A” interface and the dynamic “Visualization” of query results. The Application Layer handles key processes, including parsing user queries, generating Cypher queries, retrieving data from the knowledge graph, and delivering responses. The Model Layer incorporates specialized NLP components-RoBERTa-BiLSTM-SDPA-CRF for entity recognition, RoBerta for question classification and semantic ranking ensuring accurate interpretation and result prioritization. At the base, the Data Layer utilizes Neo4j to store and query a structured KG that includes entities, relationships, and attributes.
The system’s interactive interface (Fig. 9) allows users to enter questions and receive textual answers, along with relevant images matched through the KG. By linking KG entities to predefined image templates, the system ensures the synchronized display of visual and textual results, such as highlighting historical clothing elements in the images. This integration improves usability by providing intuitive multimodal feedback.
Note: This intelligent question answering system is designed for exploring ancient Chinese clothing, featuring a Chinese interface. On the left side, the interface allows users to pose questions such as, “Can you introduce Daqiumian for me?<请为我介绍一下大裘冕>“. In response, the system provides detailed information about the item, including its definition, common colors, and its ceremonial usage, such as in the emperor’s Heaven worship ritual. The system draws from a wide range of sources, including ancient texts and visual materials, presenting both textual descriptions and images. Key image sources include Dunhuang Cave 220 and San Li Tu <三礼图>. Powered by a knowledge graph, this system enhances users’ understanding of ancient Chinese costumes. The right side of the interface offers an English translation, though the system is not yet fully accessible to non-Chinese-speaking users.
To assess the usability of the Q&A system, we evaluated three key metrics: response efficiency, answer accuracy, and user satisfaction. Twelve participants were invited to take part in the evaluation, including four scholars, five students, and three enthusiasts of traditional Chinese culture. Participants ranged in age from 21 to 60 years, with a mean age of 33.67 years. Each participant submitted 10 questions (resulting in 120 queries in total) covering three types of tasks: simple queries (e.g., <什么是大裘冕?>“What is Daqiumian?”), inferential reasoning (e.g., <秦朝至汉朝时期深衣如何演变?>“How did Shenyi evolve from the Qin to Han Dynasties?”), and ambiguous requests (e.g., <古代男性戴什么头饰?>“What headwear did ancient men wear?”). The evaluation metrics and corresponding results are summarized in Table 5.
As presented in Table 5, the test results indicate that the system demonstrates strong usability. It effectively responds to user inquiries regarding ancient Chinese costumes and provides relevant reference images to complement its answers. The system achieves an average response time of 75 ms, ensuring prompt and efficient interaction. Moreover, the answer accuracy rate exceeds 93%, reflecting its strong ability to accurately interpret the user intent and deliver contextually appropriate, precise answers. The user satisfaction rating reaches 4.64 out of 5. During open-ended interviews, participants emphasized that the system’s ability to provide supplementary materials, such as museum reference images and pertinent bibliographic information alongside textual responses, significantly enhances the overall user experience and satisfaction.
Discussion
In this study, we developed an intelligent Q&A system focused on ancient Chinese costumes to enhance public understanding and contribute to the preservation of cultural heritage. To enable the functionality of the system, we utilized both online and offline data sources to construct a specialized knowledge graph for ancient Chinese costumes, applying knowledge extraction and integration techniques. The system consists of three core modules: the Semantic Comprehension Module, the Knowledge Graph Interaction Module, and the Query Construction Module. In the Semantic Comprehension Module, we enhanced the RoBERTa model by integrating the BiLSTM model for feature extraction, along with the SDPA mechanism to capture long-range dependencies and contextual relationships within the input sequence. This approach improves the system’s ability to handle polysemous words and resolve ambiguities. Additionally, we incorporated Question Classification techniques to interpret user queries more accurately. The integration of these modules enables the system to process complex queries, including those lacking explicit attribute keywords, resulting in improved understanding and more precise responses. In the Knowledge Graph Interaction Module, the system interacts with the ancient Chinese costume knowledge graph to retrieve relevant information, while the Query Construction Module formulates queries that ultimately generate accurate answers.
The experimental results show that the RoBERTa_BiLSTM_SDPA_CRF model we proposed in the intelligent Q&A system for ancient Chinese costumes outperforms traditional models. This model demonstrates a particular strength in handling polysemous words, thereby improving its ability to accurately interpret complex and ambiguous questions. In addition, we introduce an innovative Q&A framework that incorporates a recall and rank module, which enhances the system’s capability to address queries that lack explicit keywords. The system also exhibits strong performance in user interactions, consistently providing responses that meet user expectations. Furthermore, through its integration with the established ancient Chinese costumes knowledge graph, the system can offer both textual and image-based responses, thereby providing a more comprehensive and informative user experience.
In conclusion, the success of this KGIQA system for ancient Chinese costumes can be attributed to several key factors. The RoBERTa-BiLSTM-SDPA-CRF model enhances NER by combining RoBERTa’s contextual pre-training, BiLSTM’s feature extraction, and SDPA’s attention mechanism, which improves polysemy handling and contextual understanding. The knowledge graph-based architecture, with its multi-layered design, effectively parses user intent and retrieves accurate answers by integrating KG construction, question analysis, and query construction layers. This approach is particularly effective for questions with clear attribute values. Additionally, the system’s seamless frontend and backend integration supports efficient entity and intent recognition, Cypher query generation, and answer feedback, ensuring quick responses while providing richer cultural context and relevant visual information through knowledge graph reasoning.
This study identifies areas for further enhancement, particularly in the development of KBIQAs for Ancient Chinese Costumes. Specifically, improvements are needed in the system’s ability to process complex contexts and handle polysemous words within the semantic comprehension module. Additionally, the current system lacks support for multi-turn dialog, which restricts its interactive capabilities. Future work will focus on (1) adapting the current model with LLMs to enhance disambiguation and historical reasoning in the semantic comprehension module, (2) incorporating a multi-turn dialog memory function to improve user experience, and (3) exploring the image and knowledge graph entity association to better integrate multimodal data. These improvements aim to advance both cultural heritage research and public engagement.
Data availability
No datasets were generated or analyzed during the current study.
References
French, E. & Reddy-Best, K. Women’s Czech folk costume: negotiating ambivalence and white ethnicity in the midwest. Cloth. Text. Res. J. 41, 191–207, https://doi.org/10.1177/0887302X211027500 (2023).
Kaya, Ö. & Cuciuc Romanescu, L. S. The analysis of colour and pattern in Romanian folk dress: protecting past legacies in an uncertain future. Folk Life 62, 97–112, https://doi.org/10.1080/04308778.2024.2384237 (2024).
Liu, K., Zhou, S. & Zhu, C. Historical changes of Chinese costumes from the perspective of archaeology. Herit. Sci. 10, 205, https://doi.org/10.1186/s40494-022-00841-z (2022).
Zhang, X., Li, Y., Lin, J. & Ye, Y. The construction of placeness in traditional handicraft heritage sites: a case study of Suzhou embroidery. Sustainability 13, 9176, https://doi.org/10.3390/su13169176 (2021).
Jimoh, K. O., Ọdẹ́jọbí, ỌT. À., A Fọlárànmí, S. & Aina, S. Handmade embroidery pattern recognition: a new validated database. Malays. J. Comput. 5, 390–402 (2020).
Xu, Y. Innovative applications of intangible cultural heritage: the inheritance of Manchu embroidery in modern clothing design. Art. Soc. 3, 14–18, https://doi.org/10.56397/AS.2024.04.03 (2024).
Ding, Q.-K. & Liang, H.-E. Digital restoration and reconstruction of heritage clothing: a review. Herit. Sci. 12, 225, https://doi.org/10.1186/s40494-024-01349-4 (2024).
Jiang, Y., Guo, R., Ma, F. & Shi, J. Cloth simulation for Chinese traditional costumes. Multimed. Tools Appl. 78, 5025–5050, https://doi.org/10.1007/s11042-018-5983-8 (2019).
Chen, R. & Lin, X. Research on digital construction and design of minority clothing based on multivariate statistical analysis. Appl. Math. Nonlinear Sci. 9, 1–21, https://doi.org/10.2478/amns.2023.2.01544 (2023).
Liu, K., Lin, K. & Zhu, C. Research on Chinese traditional opera costume recognition based on improved YOLOv5. Herit. Sci. 11, 40, https://doi.org/10.1186/s40494-023-00883-x (2023).
Lavrinovics, E., Biswas, R., Bjerva, J. & Hose, K. Knowledge graphs, large language models, and hallucinations: an NLP perspective. J. Web Semant. 85, 100844, https://doi.org/10.1016/j.websem.2024.100844 (2025).
Berners-Lee, T., Fischetti, M. & Dertouzos, M. L. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor (Harper San Francisco, 1999).
Gruber, T. R. A translation approach to portable ontology specifications. Knowl. Acquis. 5, 199–220 (1993).
Bordes, A., Usunier, N., Garcia-Durán, A., Weston, J. & Yakhnenko, O. Translating embeddings for modeling multi-relational data. In: Proc. 27th International Conference on Neural Information Processing Systems. 2787–2795 (Curran Associates Inc., 2013).
Berant, J., Chou, A., Frostig, R. & Liang, P. Semantic parsing on freebase from question-answer pairs. In: Proc. Conference on Empirical Methods in Natural Language Processing. 1533–1544 (Association for Computational Linguistics, 2013).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proc. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 4171–4186 (Association for Computational Linguistics, 2019).
Ji, S., Pan, S., Cambria, E., Marttinen, P. & Yu, P. S. A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans. Neural Netw. Learn. Syst. 33, 494–514, https://doi.org/10.1109/TNNLS.2021.3070843 (2021).
Huang, X., Zhang, J., Xu, Z., Ou, L. & Tong, J. A knowledge graph based question answering method for medical domain. PeerJ Comput. Sci. 7, e667, https://doi.org/10.7717/peerj-cs.667 (2021).
Bulla, M., Hillebrand, L., Lübbering, M. & Sifa, R. Knowledge graph based question answering system for financial securities. In: Edelkamp, S., Möller, R., Rueckert, E. (eds) KI 2021: Advances in Artificial Intelligence. KI 2021. Lecture Notes in Computer Science, vol 12873. 44–50 (Springer, 2021).
Li, J., Luo, Z., Huang, H. & Ding, Z. Towards knowledge-based tourism Chinese question answering system. Mathematics 10, 664, https://doi.org/10.3390/math10040664 (2022).
Hu, T., Shi, D., Wang, F., Wu, B. & Wang, J. Intelligent question-answering system for famous towns and villages based on knowledge graph. In: Proc. 16th International Conference on Advanced Computer Theory and Engineering (ICACTE). 1–6 (IEEE, 2023).
Xu, L., Lu, L., Liu, M., Song, C. & Wu, L. Nanjing Yunjin intelligent question-answering system based on knowledge graphs and retrieval augmented generation technology. Herit. Sci. 12, 118, https://doi.org/10.1186/s40494-024-01231-3 (2024).
Carriero, V. A. et al. ArCo: the Italian cultural heritage knowledge graph. In: Ghidini, C. et al. Proc.18th International Semantic Web Conference Proceedings,The Semantic Web–ISWC 2019. 36–52 (Springer, 2019).
Carriero, V. A. et al. Pattern-based design applied to cultural heritage knowledge graphs. Semantic Web 12, 313–357, https://doi.org/10.3233/SW-200422 (2021).
Yang, S. et al. YueGraph: a prototype for Yue Opera lineage review based on knowledge graph. In: Proc. Third CAAI International Conference, CICAI 2023 Revised Selected Papers, Part II. 435–441 (Springer, 2023).
Bai, B. & Hou, W. The application of knowledge graphs in the Chinese cultural field: the ancient capital culture of Beijing. Herit. Sci. 11, 77, https://doi.org/10.1186/s40494-023-00922-7 (2023).
Sartini, B. IICONGRAPH: improved Iconographic and Iconological Statements in Knowledge Graphs. In: Proc. 21st International Conference, ESWC 2024, Proceedings, Part II. 57–74 (Springer, 2024).
Dou, J., Qin, J., Jin, Z. & Li, Z. Knowledge graph based on domain ontology and natural language processing technology for Chinese intangible cultural heritage. J. Vis. Lang. Comput. 48, 19–28, https://doi.org/10.1016/j.jvlc.2018.06.005 (2018).
Huang, X., Zhang, Y., Wei, B. & Yao, L. A question-answering system over Traditional Chinese Medicine. In: Proc. IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 1737–1739 (IEEE, 2015).
Huang, X., Zhang, Y., Wei, B. & Yao, L. A joint model for question-answering over Traditional Chinese Medicine. In: Proc. IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 1904–1906 (IEEE, 2016).
Yang, S., Yi, L., Guan, H. & Li, Y. A meme-based approach for knowledge mining, organization, and presentation of traditional Chinese settlement culture. Herit. Sci. 11, 206, https://doi.org/10.1186/s40494-023-01052-w (2023).
Fan, T., Wang, H. & Hodel, T. CICHMKG: a large-scale and comprehensive Chinese intangible cultural heritage multimodal knowledge graph. Herit. Sci. 11, 115, https://doi.org/10.1186/s40494-023-00927-2 (2023).
Wan, J. et al. WuMKG: a Chinese painting and calligraphy multimodal knowledge graph. Herit. Sci. 12, 159, https://doi.org/10.1186/s40494-024-01268-4 (2024).
Asprino, L., Bulla, L., Marinucci, L., Mongiovì, M. & Presutti, V. A large visual question answering dataset for cultural heritage. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. 193–197 (Springer, 2022).
Becattini, F. et al. VISCOUNTH: a large-scale multilingual visual question answering dataset for cultural heritage. ACM Trans. Multimed. Comput., Commun. Appl. 19, 1–20, https://doi.org/10.1145/359077 (2023).
Yang, Y., Yin, Y. & Li, Z. Research on the model of automatic recognition and natural language question-answer system for traditional Chinese medicine tongue images based on LLMs. Appl. Comput. Eng. 36, 271–277, https://doi.org/10.54254/2755-2721/36/20230461 (2024).
Cui, Y., Yao, S., Wu, J. & Lv, M. Linking past insights with contemporary understanding: an ontological and knowledge graph approach to the transmission of ancient Chinese classics. Herit. Sci. 12, 382, https://doi.org/10.1186/s40494-024-01504-x (2024).
Kuper, H. Costume and Identity. Comp. Stud. Soc. Hist. 15, 348–367, https://doi.org/10.1017/S0010417500007143 (1973).
Huang, C.-Y. The Integration of Chinese Historical Costumes and Contemporary Women’s Fashion: With Special Reference to the Shuitianyi. Doctoral thesis (Birmingham City University, 2011).
Song, C., Zhao, H., Men, A. & Liang, X. Design expression of “Chinese-style” costumes in the context of globalization. Fibres Text. East. Eur. 31, 82–91, https://doi.org/10.2478/ftee-2023-0019 (2023).
Ma, R., Liu, Y. & Ma, Z. f-KGQA: a fuzzy question answering system for knowledge graphs. Fuzzy Sets Syst. 498, 109117, https://doi.org/10.1016/j.fss.2024.109117 (2025).
Wu, S. & Ou, Y. A quantitative study of the polysemy of Mandarin Chinese perception verb kàn ‘look/see. Aust. J. Linguist. 43, 191–218, https://doi.org/10.1080/07268602.2023.2289194 (2023).
Xu, J., Zhang, H., Zhang, H., Lu, J. & Xiao, G. ChatTf: a knowledge graph-enhanced intelligent Q&A system for mitigating factuality hallucinations in traditional folklore. IEEE Access 12, 162638–162650, https://doi.org/10.1109/ACCESS.2024.3485877 (2024).
Liang, Y. Multimodal knowledge graph embedding with missing data integration. IEEE Trans. Comput. Soc. Syst. 1–13. https://doi.org/10.1109/TCSS.2024.3385672 (2024).
Liu, Z., Lin, W., Shi, Y. & Zhao, J. A robustly optimized BERT pre-training approach with post-training. In: Proc. 20th China National Conference on Computational Linguistics. 471–484 (Springer, 2021).
Du, Y., Pei, B., Zhao, X. & Ji, J. Deep scaled dot-product attention based domain adaptation model for biomedical question answering. Methods 173, 69–74, https://doi.org/10.1016/j.ymeth.2019.06.024 (2020).
Nasar, Z., Jaffry, S. W. & Malik, M. K. Named entity recognition and relation extraction: state-of-the-art. ACM Comput. Surv. 54, 1–39, https://doi.org/10.1145/3445965 (2021).
Haisa, G. & Altenbek, G. Multi-task learning model for Kazakh query understanding. Sensors 22, 9810, https://doi.org/10.3390/s22249810 (2022).
Xin, P. & Qiujun, L. Semantic dependency graph parsing of financial domain questions based on Deep Learning. J. Phys. Conf. Ser. 1453, 012058, https://doi.org/10.1088/1742-6596/1453/1/012058 (2020).
Acknowledgements
This research study was supported by the Wuhan Textile University Funding (Nos. 2024465 and No. 2024340) and the Research Project of the Department of Education of Hubei Province (23Y151).
Author information
Authors and Affiliations
Contributions
Conceptualization: Hua Yuan and Junjie Zhang; Funding acquisition: Hua Yuan; Methodology: Yuhan Li and Baohui Wang; Resources: Yuhan Li, Baohui Wang, and Junjie Zhang; Original draft writing: Hua Yuan; Review and editing: Junjie Zhang and Kaixuan Liu.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yuan, H., Li, Y., Wang, B. et al. Knowledge graph-based intelligent question answering system for ancient Chinese costume heritage. npj Herit. Sci. 13, 198 (2025). https://doi.org/10.1038/s40494-025-01776-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s40494-025-01776-x
This article is cited by
-
The analysis of bidirectional long short-term memory network model for construction of cultural gene map and information extraction
Scientific Reports (2025)
-
A user-demand-driven neuro-symbolic framework for sustainable Xuan paper intangible cultural heritage preservation
npj Heritage Science (2025)
-
An LLM-based QA system for Chinese Painting and Calligraphy with Knowledge Graphs and external documents
npj Heritage Science (2025)











