Abstract
Chinese paintings, as treasures of traditional culture, embody the historical depth and artistic heritage of Chinese civilization. To support digital management of A Comprehensive Collection of Ancient Chinese Paintings (CCACP), this study constructs an ontological conceptual model integrating metadata and ontology. CCACP metadata were modeled to refine its structure and align with standards such as CIDOC CRM, CDWA, and China’s Metadata Standard for Painting Cultural Relics. Using Protégé and the Seven-Step Method, an ontology of 60 classes and properties was developed. A knowledge graph based on Wang Ximeng’s A Thousand Li of Rivers and Mountains tested its consistency and revealed embedded knowledge associations. The model demonstrates value for collection management, knowledge sharing, and academic research, providing a practical foundation for CCACP’s digital management and advancing digital humanities and cultural heritage research.
Similar content being viewed by others

Introduction
With the rapid advancement of information technology, the boundaries between various academic disciplines are increasingly blurring, leading to more frequent interdisciplinary collaboration. Within this context, the humanities have gradually fostered new research paradigms, converging with computer science to form a novel interdisciplinary field—Digital Humanities1,2,3. Under this paradigm, the introduction of metadata and ontology has established a new methodological foundation for the informatization and resource management of cultural relics and artworks. Their application has become increasingly widespread, particularly in the fields of cultural heritage and museology, offering new academic opportunities for the systematic organization, precise description, and broad dissemination of cultural heritage.
A Comprehensive Collection of Ancient Chinese Paintings (中国历代绘画大系, CCACP) is a monumental national cultural project, spanning history and transcending borders. It represents China’s first comprehensive, systematic investigation, compilation, and research into ancient Chinese paintings preserved worldwide, jointly edited and published by Zhejiang University and the Zhejiang Provincial Cultural Relics Bureau. To date, the CCACP has collected 12,405 Chinese painting holdings from 263 cultural institutions worldwide, on materials such as paper, silk (including bo and ling), and hemp4. It encompasses the vast majority of extant “national treasure”-level painting masterpieces, including the Complete Tang and Pre-Tang Paintings, Complete Song Paintings, Complete Yuan Paintings, Complete Ming Paintings, and Complete Qing Paintings, totaling 60 volumes and 226 books. It is the most comprehensive collection and the largest publishing project of Chinese painting image literature of its kind to date.
However, despite the CCACP’s landmark significance in organizing ancient paintings and resource construction, its research and utilization still face numerous challenges. Firstly, discrepancies in cataloging systems, descriptive standards, and classification methods among different institutions lead to inconsistent information presentation for the same artwork across databases, complicating cross-institutional retrieval and comparison5,6. Secondly, a vast amount of information related to painting works lacks a unified semantic description and knowledge modeling system, limiting cross-platform, cross-context resource sharing and knowledge discovery7,8. Furthermore, the informational dimensions contained in paintings are highly complex, encompassing not only artistic attributes such as subject matter, technique, and style but also documentary information such as provenance history, inscriptions, and restoration records9. The logical relationships between these diverse elements are difficult to systematize using traditional bibliographical or image archival methods. These issues, to some extent, constrain the advancement of painting research and the reuse of research outcomes, highlighting the necessity for digital humanities methodologies10.
Within the Digital Humanities research context, the integration of information technology and the humanities provides new methodological pathways for cultural heritage research11. In the context of knowledge organization and semantic technology, data typically refers to uninterpreted symbols or records lacking semantic connotation; information is data that has been organized and contextualized, capable of answering “what is”; while knowledge is information understood within a specific context and usable for reasoning and decision-making, emphasizing semantic structure and logical connections12. Within this framework, metadata, i.e., “data about data”, provides structured and standardized descriptions of data resources, characterized by modularity, extensibility, interoperability, and multilingualism13. It forms the basis for the organization, retrieval, and management of cultural heritage data14,15. Ontology, building upon metadata, involves further abstraction, focusing on the abstract essence of objective reality, i.e., a conceptual model abstracted from the objective world16. In information science and knowledge engineering, ontology is further developed as a formal representation of concepts and their interrelationships within a specific domain, thereby enabling knowledge sharing and reuse. The relationship between the two can be viewed as akin to syntax and semantics, or micro and macro perspectives: metadata emphasizes the structured description of information objects, addressing the “how to represent” question, whereas ontology focuses on the semantic logical relationships between collections of resources, addressing the “how to relate” question17,18. Building on this, knowledge graphs can transform ontological models into visualizable, queryable structured networks, supporting complex association analysis and semantic search19.
In CCACP research, metadata is used for the systematic encoding of paintings and their management information; the ontological conceptual model further integrates the semantic relationships between paintings, academic research, and management information; and the knowledge graph based on this ontology enables the visual presentation and intelligent management of painting data. So, this study applies the theoretical tools of digital humanities to the systematic management and academic research of ancient Chinese paintings through the metadata—ontology—knowledge graph methodological chain, achieving the transformation from data to knowledge and providing a reusable paradigm and practical path for digital cultural heritage research.
In digital cultural heritage research, the value of metadata and ontology has gradually gained academic attention and has been widely applied in project practices across multiple fields. A core task in constructing metadata and ontology is forming a unified conceptual reference model within a specific domain to facilitate resource sharing across institutions, regions, and even national borders. Consequently, a series of general or specialized metadata and ontology standards have been introduced internationally, among which the most influential is the International Committee for Documentation Conceptual Reference Model (CIDOC CRM). It aims to promote a common understanding and interoperability of cultural heritage information by providing a universal and extensible semantic framework, thereby enabling the exchange and integration of heterogeneous cultural heritage information sources(see https://www.cidoc-crm.org/).
In international academic practice, metadata and ontological modeling based on CIDOC CRM have been widely applied in numerous fields, including ceramics18, military history20, ancient maps21, grottoes22, cultural heritage23,24, archeology25,26,27, maritime heritage28, architecture29,30,31, oral history32,33, and ancient sites34,35. For instance, Zhao et al.36 constructed an ontology and knowledge graph for the “Tea Road,” a significant linear cultural heritage in China; Zhang et al.18 reused and extended CIDOC CRM for ceramics, an important category in world history museums, creating an “ancient Chinese ceramics ontology” framework; Fan et al.36 built the OpenOnto ontology for traditional Chinese opera using Chinese ethnic traditional opera as an example; He et al.21 conducted an ontological construction for ancient map knowledge and demonstrated it with a case study on the Yangshi Lei archives, Qing Dynasty architectural engineering drawings; and Lu et al.37 semantically organized and described historical literature resources related to ancient warfare, using the Song and Yuan periods as examples, and built a warfare ontology semantic model based on the Event Ontology.
Besides using CIDOC CRM for metadata and ontological modeling, some studies have also attempted to incorporate other metadata standards. For example, Cheng et al.38 used textual information from the “Major Woodwork System” craft described in volumes 4 and 5 of the classic ancient architectural text Yingzao Fashi as raw corpus and built, reused, and extended an ontology for “Song-style Major Woodwork Construction Techniques” based on the Metadata Standard for Ancient Architecture Cultural Relics and Liang et al.39 used Categories for the Description of Works of Art (CDWA) metadata as the primary reference for constructing metadata for the traditional costumes of Guangxi’s indigenous ethnic groups. Furthermore, CIDOC CRM is also used in galleries40, libraries, archives, museums41,42, and other cultural institutions to increase the accessibility of museum-related information and knowledge.
Overall, these studies can be broadly categorized into those dealing with “tangible cultural heritage” and “intangible cultural heritage”. The former primarily focuses on the semantic modeling and knowledge organization of tangible heritage such as ceramics, ancient maps, architecture, grottoes, and clothing; the latter involves more ontological construction and instantiation applications for intangible heritage such as opera, construction techniques, and oral history. This demonstrates not only the diverse application paths of metadata and ontology across different types of cultural heritage but also highlights the complexity and necessity of future cross-type, interdisciplinary integration.
Although existing research has accumulated considerable experience in ontological modeling within the cultural heritage domain, systematic ontological modeling remains lacking for the highly complex and unique field of ancient Chinese painting. Based on this, the research object of this paper is the major national cultural project, CCACP. It adopts a digital humanities perspective, i.e., focusing on semantic modeling, knowledge organization, and data-driven cultural heritage research methods and emphasizing the application of computational methods to humanities questions. Using metadata and ontology as the theoretical foundation, it reuses and extends international universal models such as CIDOC CRM to construct a conceptual domain ontology model for the CCACP. Within this framework, the digital humanities perspective refers not only to the research background but also to a methodological orientation, i.e., promoting the shift in humanities research from traditional text and image studies to a data-driven, structured knowledge system through computer-assisted knowledge representation. The research objectives include:
-
In the vertical dimension, proposing a reusable framework for painting knowledge organization to support academic research on ancient Chinese paintings across periods and regions, promoting the systematization and standardization of painting research.
-
In the horizontal dimension, forming a generalizable conceptual modeling paradigm to serve as a reference for other digital cultural heritage projects (e.g., steles, sculptures, artefacts), facilitating the integration and sharing of cultural heritage information.
-
At the application level, increasing the standardization, normalization, and interoperability of cultural relic information by constructing a semanticized painting knowledge network, providing new pathways for intelligent retrieval, in-depth research, and knowledge discovery of digital resources.
The significance of this research lies in, on the one hand, providing a new method for the digital organization of painting resources that transcends traditional bibliography and iconography, responding to the new demands of the information era for cultural heritage preservation and transmission and, on the other hand, exploring a localized path for ontological construction by combining international universal models with specific Chinese cultural contexts, offering valuable experience for global digital cultural heritage research. Its innovations are mainly reflected in (1) the uniqueness of the research object: it is the first systematic ontological modeling effort focused on the CCACP; (2) the integration of methodologies: extending CIDOC CRM based on the unique classifications and semantics of Chinese painting; and (3) the forward-looking nature of the research paradigm: achieving the transformation from data to knowledge through the combination of metadata-ontology, promoting the shift in digital cultural heritage research from static preservation to dynamic generation.
Methods
Metadata construction for the CCACP
The term metadata formally appeared in English in 1968, coined by Philip Bagley in his book Extension of Programming Language Concepts43, although its semantics can be traced back to the foreign word formation metaphysics from Aristotle’s Metaphysics, implying the inquiry into the essence behind phenomena or objects, sharing the same meaning as metadata in the information resource field discussed here44. The application of metadata can be traced back to library collection management around 245 BC; what is now called metadata was then termed information in the library catalog. Historians believe that Callimachus’s creation of the Pinakes (tablets) for the Library of Alexandria around 245 BC was the world’s first library catalog and the earliest known historical event using the semantics of metadata.
The latest classification of metadata, as of 2017, is presented in the document Understanding Metadata: What is Metadata, and What is it For?:A Primer45 published by the National Information Standards Organisation (NISO), which categorizes metadata into descriptive metadata, structural metadata, administrative metadata, and markup languages. Administrative metadata is further subdivided into technical metadata, preservation metadata, and rights metadata. In the cultural heritage field, descriptive metadata is more commonly used. This paper comprehensively employs administrative metadata and technical language alongside descriptive metadata in constructing the conceptual model for the CCACP.
Metadata standards are sets of rules for describing specific objects in a resource. The earliest international metadata standard is Dublin Core(DC)(see https://www.dublincore.org/specifications/dublin-core/dcmi-terms/), which, dedicated to concisely describing applications across various international fields and material types, has only 15 element sets. However, with rapid technological advancement in recent years, digitization has gradually become the preferred method for data storage across various industries, giving rise to more complex requirements for data storage. Data about data, i.e., metadata, standards have emerged for different domains. These include broadly applicable standards such as Dublin Core and Metadata Object Description Schema (see https://www.loc.gov/standards/mods/), as well as standards specifically for describing artworks such as CDWA and Canadian Heritage Information Network’s humanities data dictionaries, and for describing cultural heritage such as CIDOC CRM. Second stage describes the metadata standards reused and extended in constructing the CCACP.
In the first stage, metadata were classified and organized according to the characteristics of ancient Chinese painting data. As a continuation of traditional Chinese culture, Chinese paintings through the ages carry the historical lineage of the Chinese nation, possess extremely high historical, cultural, artistic, and scientific value, and contain rich information resources. Constructing the metadata model for the CCACP must consider two factors: the paintings themselves and the relevant content of the CCACP outcomes. The former involves their management, preservation, restoration, and research by museums; the latter involves exhibitions based on the CCACP outcomes, related research projects, data compilation, digital project extensions, etc.
To make the information organization structure of the CCACP more precise, its information resource categories are macroscopically divided into five major types: basic information, management information, resource information, research information, and extension information.
-
i.
Basic Information describes the painting itself.
-
ii.
Management Information pertains to the resource information involved when museums manage the physical painting.
-
iii.
Resource Information refers to various existing physical, audio-visual, etc., materials based on the painting.
-
iv.
Research Information involves research projects, publications, digital content, etc., related to the outcomes of the CCACP.
-
v.
Extension Information constitutes additional information elements beyond the above.
This paper creates a CCACP metadata framework that includes the following core components: Basic Information Metadata, Management Information Metadata, Resource Information Metadata, Research Information Metadata, and Extension Information Metadata. Detailed descriptions of these components are provided in fourth stage, which lays the foundation for subsequent ontology engineering.
In the second stage, existing metadata standards were reused and mapped. To ensure the CCACP metadata framework complies with current relevant domestic standards and facilitates broad international dissemination, its construction primarily involved the reuse and mapping of international general standards such as CIDOC CRM (version 7.2.3), CDWA, and Dublin Core. Domestic standards referenced include: Metadata Standard for Digital Preservation of Cultural Relics of the PRC46, Design Specification for Specialized Metadata of Cultural Relics Protection Industry Standard of the PRC, Application Specification for Descriptive Metadata of Cultural Relics of the PRC47, Cataloging Rules for Painting Cultural Relics Metadata of the PRC48, Metadata Standard for Painting Cultural Relics of the PRC49, and the Palace Museum Painting Collection Information Indicators50. The framework was expanded to accommodate the unique management and research characteristics of the CCACP. The reuse, mapping, and extension of the aforementioned metadata can be categorized into macro, meso, and micro perspectives based on their scope and characteristics.
From a macro perspective, the reuse and mapping focus on the metadata frameworks in the international cultural heritage domain, forming the main content of the CCACP metadata construction. Their reuse ensures the universality of the CCACP metadata model for broad dissemination within the international cultural heritage field, including CDWA, CIDOC CRM, and DC (Dublin Core). As mentioned in the background section, the DC element set, as the earliest international metadata standard, has 15 basic descriptive items. It is a simple, effective, and widely disseminated core element set. In practical applications, these 15 items can be repeated or selectively used, and subtypes and subschemas can be established, thereby offering strong interoperability and operability in resource exchange and sharing. Therefore, this paper mapped the following items from DC: Title, Description, Source, Relation, Creator, Date, Type, and Identifier. CIDOC-CRM is the core object of reuse in constructing the CCACP metadata model. It is a conceptual reference model for information integration first developed in 1996 by the International Committee for Documentation (CIDOC) under the International Council of Museums (ICOM), specifically for the cultural heritage domain. It aims to promote a common international understanding of cultural heritage information by providing a universal and extensible semantic framework2,51. After over 20 years of maintenance and development, the latest version, which is the version reused and mapped in this paper, is 7.2.3, released in August 2023, containing 99 classes and 199 property descriptions. CDWA is a metadata standard designed for art historians, art managers, and information technology experts and is currently widely used in the museum management field. Like CIDOC CRM, CDWA also provides mapping tables with other metadata standards, laying the foundation for data exchange and sharing52. The CDWA metadata standard contains 31 top-level elements and sub-elements, 13 of which are core elements, totaling ~540 elements. However, considering the particularity of the CCACP metadata framework based on painting and museum management and the complexity of the related research, the international DC, CIDOC CRM, and CDWA element sets are macro-level for cultural heritage and museums in general. Therefore, they cannot be fully reused but must be referenced alongside meso-level domestic cultural relic standards and collection information index system specifications, as well as micro-level painting-specific metadata standards for reuse and extension.
From a meso perspective, meso-level reference is made to the Cultural Relics Protection Industry Standards of the People’s Republic of China and the Collection Information Index specifications. Both are industry standards for cultural relic protection issued by the National Cultural Heritage Administration, possessing authority, professionalism, and specificity in the information resource management of cultural relics. Reusing and mapping these metadata standards ensures the professionalism of the CCACP metadata model and its universality within the cultural relic domain, although they still have certain limitations in describing painting cultural relics. The Museum Information Indicator System Specification (Trial) mainly includes 3 index sets, 33 index groups, and 139 index items, aimed at meeting the needs of information construction in Chinese cultural relic museums and standardizing the information processing and exchange of museum collections, making it suitable for constructing metadata for paintings that are cultural relics in China. Therefore, based on reusing museum information metadata standards, utilization metadata required for the use of collection information resources is integrated and supplemented.
From a micro perspective, reuse and extension involves the relevant metadata specifications for paintings in the People’s Republic of China, primarily concerning the metadata standard and cataloging rules for paintings that are cultural relics. Compared to the macro- and meso-metadata specifications mentioned above, these are more targeted. Here, reuse means directly quoting content from the standard, mapping means extending based on the standard, and reference means drawing on the normative content. The details are shown in Table 1:
In the third stage, core metadata elements were extracted to form the foundation for subsequent ontology construction. The core metadata of the CCACP, as the basic attributes of digital resources, form the core foundation for building the domain ontology. In this study, CCACP refers to a series of collection books that systematically include important painting works from various dynasties and their related information. The ontology constructed in this paper uses the content of this series of books as the data source, abstracting the information into metadata elements and categories, thereby ensuring the integrity and academic reference value of the ontology.
As mentioned earlier, macro, meso, and micro metadata standards have their respective advantages in generality and professionalism and complement each other to a certain extent. Therefore, this study, based on integrating and refining them, extracts a set of core metadata elements as the core metadata for the CCACP. Table 2 displays the categories, names, definitions, and mapping relationships with reused metadata. The metadata architecture listed in Table 2 is divided into five categories: Basic Information Metadata, Management Information Metadata, Resource Information Metadata, Research Information Metadata, and Extension Information Metadata. The basis for this categorization includes two primary aspects: first, a systematic review of existing metadata standards (e.g., CIDOC CRM, CDWA, and Dublin Core) and related literature; second, consideration of the practical application needs of CCACP information. This classification ensures that the constructed core metadata is both scientific and complete, while also possessing adequate applicability and extensibility. Abbreviations and term explanations in the table are as follows:
In the fourth stage, based on the aforementioned 24 core metadata elements, a CCACP metadata schema containing 24 main elements and 60 sub-elements is created through multi-dimensional information extension of the core metadata elements (Table 3). The content of the metadata schema is as follows:
The first part is Basic Information Metadata, involving the title, period, author, dimensions, etc., of the painting itself. When constructing the metadata model, considering that a painting’s name might have an academic Chinese name, an academic English name, and some popular names, the Title element was subdivided into three types. Regarding the painting’s period, to ensure the accuracy of metadata information, the Period element is divided into Macro Period and Micro Period; the former refers to the dynasty (e.g., Tang, Song, Yuan, Ming, or Qing), and the latter refers to the specific year (if available). Besides the author, period, size, etc., a painting also involves its material, technique, subject matter, inscriptions, and seals. Considering element consistency, these were uniformly categorized under the “Physical Information” element when constructing the CCACP Basic Information Metadata. Additionally, an element often overlooked but important in Basic Information Metadata is Remarks, which records uncertainties or discrepancies in the descriptions above. For example, if the painting’s author was not clearly identified initially and was recorded as Anonymous in the CCACP compilation, but later research clarified the author, or if there are rumors about the author, such records can be noted in Remarks to clarify uncertain information.
The second part is Management Information Metadata, primarily involving the museum’s management of the painting, here understood as the museum’s collection. It includes information data related to the identifier, the collection grade, preservation, exhibition, and collection sources. Among these, the Identifier is a coding system used to identify each collection item uniquely, ensuring accurate identification and tracking within the museum and in cross-institutional exchanges. A collection item may have a registration number recorded in the general ledger of cultural institution collections, as well as a number assigned by the museum according to its specific requirements. Therefore, the Identifier element is subdivided into Formal Identifier and Other Identifier categories; multiple identifiers can be extended under Other Identifier. The Preservation element mainly contains information data about the painting’s storage location, circulation records, and preservation conditions. The Exhibition element contains records of the painting being loaned to other exhibitors for exhibitions, involving the exhibitor, the exhibition venue, the exhibition title, and the exhibition start and end times. Furthermore, another important metadata element concerning collection management is the record of Condition, involving the recording of the collection item’s completeness, damage status, and location of damage, as well as records of the restorer, restoration time, restoration institution, restored area, restoration technique, and results, which can serve as references for subsequent painting image restoration.
The third part is Resource Information Metadata. This part primarily records material content related to the painting, enriching the CCACP database through records of material category, format, size, and title information.
The fourth part is Research Information Metadata. This part is based on the outcomes related to the CCACP and research analysis conducted on the painting itself. Examples include compiling data visualization analysis based on the CCACP; conducting color analysis on the painting works included in the existing CCACP to construct a CCACP painting color database platform; building a human-machine collaborative intelligent ancient painting color restoration system based on artificial intelligence generated content (AIGC) technology and large language models; and conducting material culture analysis based on CCACP paintings. Macroscopically, the outcomes related to the CCACP are divided into three major categories: Research Projects, Publications, and Digital Projects. Under each category, sub-elements such as Researcher, Institution, Time, Name, and Outcome are added based on specific information recording needs.
The fifth part is Extension Information Metadata. Although currently empty, the Extension Information Metadata category is added during the initial construction of the CCACP metadata model to accommodate potential future element sets that may require documentation and data that does not fit into the four major categories. Data information that cannot be categorized in the future can be classified under this category.
Ontological conceptual model construction for the CCACP
Since the mid-1970s, researchers in the field of artificial intelligence have recognized that knowledge acquisition is the key to building powerful AI systems. Consequently, ontology, as a tool for information abstraction and knowledge description, began to be adopted in the computer field. The ontology discussed in this paper refers specifically to ontology in the information science field. Synthesizing definitions from multiple scholars15, the author comprehensively defines it as an explicit formal specification of a shared conceptual model based on the basic terms and relations that constitute the vocabulary of a relevant domain. Some scholars believe that the term ontology, borrowed from philosophy and extended into information science, is essentially a conceptual model; hence, ontology can also be referred to as an ontological model. The process of building an ontology is ontology engineering. The process of constructing the conceptual model for the CCACP in this paper is thus ontology engineering.
Ontology engineering methods for constructing ontologies have gradually developed into various approaches due to different domain needs. Currently, mainstream international construction methods include the seven-step method (for domain ontology construction), methontology (for chemical ontology modeling), the KACTUS project method (for knowledge modeling of complex technical systems), the Toronto virtual enterprise (TOVE) method, and the skeleton method (for commercial ontology construction), among others. The latter four are methods designed for specific domains and are not entirely universal47. The seven-step method for building ontology engineering, developed by Stanford University, is universal and applicable to ontology construction across various fields. Figure 1 displays its basic flowchart. In this study, the seven-step method is selected as the approach for constructing the ontological conceptual model of the CCACP.
In the first stage, selection of ontology modeling software. Ontology modeling software is used to design and manage ontologies. Some common international software includes Protégé, TopBraid Composer, OntoStudio, Semantic Turkey, and VIVO, chosen based on user needs and the scale and complexity of the ontology. This paper ultimately selected Protégé as the ontology modeling software platform because it is open-source, free, and has sustainable maintenance advantages; it also features an intuitive graphical interface capable of supporting the modeling needs of large-scale cultural heritage projects; and, relying on an active international community and a rich plugin ecosystem, it offers good extensibility and shareability, better aligning with the goals and application scenarios of this study.
In the second stage, constructing the ontology model using the stanford seven-step method. As mentioned in the Related Background section, this paper selects the seven-step method developed by Stanford University as the construction method for ontology engineering. The specific steps are as follows.
First, the domain and scope of the ontology were clearly defined to delimit the conceptual boundaries of the CCACP dataset. At the outset of constructing the CCACP ontological conceptual model, its professional domain and scope must be clarified to ensure the constructed ontological model strictly aligns with the discipline’s connotations and structure. The CCACP ontology domain involves paintings, collection management, exhibitions, and outcome research, including but not limited to basic information, research information, and management information for paintings. Its goal is to build instance models based on specific paintings, enable semantic search and visual graph associations for painting works, and support subsequent knowledge graph construction.
Second, existing ontologies such as CIDOC-CRM and Dublin Core were examined to identify reusable classes and properties. As mentioned in the Metadata Construction section, CIDOC CRM provides a universal metadata standard set for the international cultural heritage domain, containing not only standardized definitions of classes/entities but also specifying their properties and relationships. Taking temporal information as an example, E1 CRM Entity is the top-level abstraction of all concepts; E2 Temporal Entity describes phenomena occurring at specific times and places; E41 Time Interval represents a specific time span; E49 Time Coordinate corresponds to a more precise time point; and related properties such as P1 is identified by, P4 has time span, and P9 occurs in are used to connect different classes. Figure 2 further illustrates the semantic relationships between these classes and properties, with arrow directions indicating the direction of semantic association. Specifically, starting from the E1 CRM Entity at the bottom of Fig. 2, core classes such as E2 Temporal Entity, E3 Event Activity, and E4 Acquisition Entity are derived sequentially downward. Semantic associations between classes are established through properties such as P1 (is identified by), P4 (has a time span), and P9 (occurs in). For instance, E2 Temporal Entity can be associated with E41 Time Interval via P4, which can then be associated with E49 Time Coordinate via P86 (falls within), forming a complete temporal description chain. The directed arrows clearly express the hierarchical structure and property relationships between entities, reflecting the rigor and expressiveness of CIDOC CRM in semantic modeling. As a universal model in the international cultural heritage domain, CIDOC CRM has been widely applied to studies related to archeology, museums, and the semantic description of cultural heritage. Therefore, this paper primarily reuses and extends the classes and properties of CIDOC CRM ver. 7.2.3 when constructing the CCACP conceptual model to ensure international compatibility and semantic interoperability.
Third, a comprehensive list of key terms related to Chinese painting—such as artist, dynasty, material, and motif—was enumerated to establish the conceptual vocabulary. Prior to constructing the Compendium ontology model, relevant domain knowledge must be collected to extract important information and terms. The important terms in the ontology can also be understood as the modeling of the Compendium metadata. This part primarily references content from CIDOC-CRM, the Metadata Standard for Painting Cultural Relics of the PRC, and the Application Specification for Descriptive Metadata for Cultural Relics of the PRC.
Fourth, classes and their hierarchies were defined to reflect both general cultural heritage structures and the specific semantics of painting metadata. In this step, classes and their subclasses are defined. After listing the important terms in the ontology, the model elaborates on them based on the specific content of the application objects. In Protégé, the classes and class hierarchy are defined according to the Compendium metadata table from section “Metadata construction for the CCACP”, as shown in Fig. 3.
Fifth, the properties of classes were specified to describe relationships. In ontology modeling, properties are used to define specific relationships between classes, divided into Object Property and Data Property. The former describes semantic links between classes and instances, e.g., Painting–Creator; the latter describes relationships between classes or instances and numerical information, e.g., Painting – Creation Date. This paper establishes the object properties and data properties for the CCACP conceptual model on the Protégé platform (see Table 4). Figure 3 shows the relationship between these two types of properties in the model and their semantic expression.
Sixth, property constraints were introduced to ensure logical consistency and semantic precision across the model. The constraints of properties are the constraints of the class properties in the Compendium ontology model, i.e., defining the domain and range, as detailed in Table 5.
Finally, instances were created based on real examples from the CCACP to test, validate, and refine the ontology through iterative feedback. The selection of instances follows the principles of the model’s universality and breadth, covering multiple dynasties and types of paintings. Representative works are selected from the Complete Tang and Pre-Tang Paintings, Complete Song Paintings, Complete Yuan Paintings, Complete Ming Paintings, and Complete Qing Paintings included in the CCACP, encompassing landscape paintings, figure paintings, gongbi paintings, bird-and-flower paintings, etc., and added to the conceptual model of this study, thus covering almost all important historical periods and categories. Table 5 shows the basic information about the instances.
In the third stage, graphical presentation. After constructing the ontology for the CCACP and adding instances, Protégé internally generates a knowledge graph based on class and property relationships and providing a relatively intuitive view of the hierarchical relationships between classes and subclasses and their property associations. Figure 4 shows the knowledge graph generated by Protégé based on the ontological model, where solid lines and arrows between classes indicate the hierarchical relationship between a class and its subclass, and dashed lines indicate object properties. Furthermore, the hierarchical relationship between classes and subclasses can also be intuitively seen in the asserted hierarchy (Fig. 5).
Results
Case selection
After constructing the ontological conceptual model of the CCACP to verify its internal logical rationality and feasibility and to demonstrate the rich intrinsic knowledge associations, 20 paintings of various types from the Pre-Qin Han Tang to the Qing Dynasty are selected, as mentioned above. Based on the ontological conceptual model, their basic information, such as author name, current collection location, period, and material type are used for preliminary knowledge graph construction in Neo4j. Wang Ximeng’s A Thousand Li of Rivers and Mountains (section) from the Song Dynasty is used as a specific example for presentation. A Thousand Li of Rivers and Mountains is one of the representative works of blue-green landscape painting from the Northern Song period in China and one of China’s renowned handed-down famous paintings. It was created by the Northern Song painter Wang Ximeng and is his only extant work. The painting depicts the magnificent scenery of the motherland’s rivers and mountains using a long scroll format, portraying rolling hills and vast rivers and lakes, interspersed with pavilions, towers, villages, and houses, expressing the beauty and grandeur of natural landscapes.
Graph presentation and application
Figure 6 presents the knowledge graph structure of the basic information of the 20 paintings. The ontological conceptual model of the CCACP lays the foundation for subsequent knowledge graph construction. Figure 7 shows the structure of the knowledge graph for A Thousand Li of Rivers and Mountains, demonstrating the application of the CCACP ontological model in a specific case. Given that this paper focuses on the ontological construction of the CCACP and limited space for presentation, when using A Thousand Li of Rivers and Mountains as an example, representative metadata and their intrinsic relationships are selected for annotation. For example, in the related resources section, the graph presents the digital animation video of A Thousand Li of Rivers and Mountains from the China Media Group’s Yangbo Digital Culture and Art Museum. The online link is placed in the Other Supplementary Information section; in the Research Project section, the 2023 National Social Science Fund Arts Major Project “Research on Value Interpretation and Protection Inheritance of Jin, Tang, Song, and Yuan Paintings and Calligraphy” is selected; for the digital project, the “Exploring Danqing” digital project published by the Forbidden City Publishing House in 2023 is selected. Overall, these knowledge graphs provide a clearer glimpse of the rich knowledge system and internal logic of the CCACP ontological model.
Figure 8 intuitively presents the diverse applications of the CCACP ontological conceptual model, mainly expanded from three dimensions: cultural relic collection and management, cultural relic dissemination, and cultural relic research.
-
With respect to cultural relic collection and management, by integrating museum collection and cultural relic digital management concepts and utilizing knowledge graph technology, systematic management of collection painting works can be implemented, which significantly improves the efficiency and scientificity of the collection process. It also achieves efficient storage, precise querying, and in-depth analysis of cultural relic information, providing solid support for cultural relic protection.
-
At the level of cultural relic dissemination, the focus is on exhibition planning and display, education and popularization. Using knowledge graphs provides data support and logical structure for exhibition planning, helping to create richer, broader-perspective exhibitions, optimizing the audience’s viewing experience. Furthermore, leveraging knowledge graphs for educational activities popularizes painting knowledge in an accessible form, effectively enhancing public awareness and protection consciousness of cultural heritage and promoting its broad dissemination.
-
In the field of cultural relic research, the emphasis is on supporting academic research and interdisciplinary research collaboration. On one hand, it provides scholars with comprehensive and systematic data resources to deeply excavate the value of cultural relics. On the other hand, it breaks disciplinary boundaries, promotes collaborative innovation across different fields, injects new vitality into cultural heritage research, and drives its continuous development and theoretical understanding.
Discussion
This paper takes the CCACP as its research object and, addressing its needs for informatized management and academic research development, constructs the “CCACP Metadata” and the “CCACP Ontological Conceptual Model”. Paintings from different eras and of different types, such as landscape paintings, figure paintings, Gongbi paintings, and bird-and-flower paintings, are selected from the Complete Tang and Pre-Tang Paintings, Complete Song Paintings, Complete Yuan Paintings, Complete Ming Paintings, and Complete Qing Paintings included in the CCACP and added to the conceptual model of this study. Finally, using the Song Dynasty’s A Thousand Li of Rivers and Mountains as a specific example, the model is visually presented, effectively demonstrating the ontological structure of the compendium and the relationships between various types of information.
Ancient paintings contain the genetic code of the continuous inheritance of Chinese civilization and are an extremely important component of outstanding traditional Chinese culture. CCACP, as a long-term, foundational national cultural project, holds significant importance for promoting the creative transformation and innovative development of outstanding traditional Chinese culture through its informatization construction and academic research.
The main contribution of this paper lies in introducing an informatized management method based on metadata and ontological technology into the informatized management and research practices of the CCACP, providing a new solution and practical case for it. Specifically, the ontological conceptual model constructed in this study not only lays the data foundation for subsequent knowledge graph construction but also optimizes the organization, management, and dissemination methods of painting resources, increasing research efficiency and data sharing capabilities. However, the metadata and ontological structure proposed in this paper still require continuous improvement, as the CCACP encompasses a vast number of works scattered across major museums worldwide and related academic research is continuously growing. This study marks the initiation of a long-term academic project; the constructed metadata and ontological model will provide structural support for future knowledge graph research based on important ancient Chinese painters, paintings, material elements, and spiritual elements.
The academic value of this research is reflected not only in the optimization of the CCACP’s own informatized management and research practices but also in its promoting effect on broader academic fields. At the vertical research level, this ontological model provides a reusable conceptual framework for global ancient painting research, helping to promote the systematization and standardization of ancient painting research worldwide. At the horizontal research level, this model provides a reference paradigm for other projects in the digital cultural heritage field, promoting the integration and sharing of cultural heritage information resources across institutions and disciplines, and facilitating collaborative innovation in cultural heritage protection. In the future, this research framework can be further extended to the management and utilization of relevant cultural relic information resources domestically and internationally, thereby promoting the standardized development of global cultural heritage information management and providing new methodological support for the digital protection and transmission of cultural heritage.
In terms of application evaluation, the ontological conceptual model is feasible for application in information organization, semantic association, and knowledge presentation through multi-instance construction and visual demonstration, showcasing its potential application value in academic research and digital management. However, due to the vast number of works covered by the CCACP and their dispersion across global museums, future research will focus on the following directions:
-
Inviting art historians, curators, and experts in related fields for expert review to evaluate the academic rationality and application adaptability of the ontological model;
-
Designing competency questions and system tests to verify the model’s operability in practical data retrieval, semantic query, and knowledge graph construction;
-
Further extending the model to other domestic and international digital cultural heritage projects, exploring methods for ontology reuse and cross-domain knowledge integration, and providing new theoretical and practical support for the standardization and sustainable development of global cultural heritage information management.
In summary, this study not only provides methodological support for the informatized management and academic research of CCACP but also offers a practical paradigm worthy of reference for cultural heritage digitization, interdisciplinary information integration, and knowledge graph construction, providing a new academic path and methodological basis for promoting the digital protection and transmission of outstanding traditional Chinese culture.
Data availability
The data used and analyzed during the study are available from the corresponding author upon reasonable request.
References
Asundi, A. Y., Reddy, B. S. & Krishnamurthy, M. Digital humanities: concepts, tools and applications. DESIDOC J. Libr. Inf. Technol. 43, 276–281 (2023).
Melo, D., Rodrigues, I. P. & Varagnolo, D. A strategy for archives metadata representation on cidoc-crm and knowledge discovery. Semant. Web 14, 553–584 (2023).
Li, Z., He, L. & Gao, D. Ontology construction and evaluation for chinese traditional culture: Towards digital humanity. Knowl. Organ. 49, 22–39 (2022).
Compilation of classics in the flourishing age: The exhibition of achievements in compiling a comprehensive collection of ancient chinese paintings. National Museum of China website. https://en.chnmuseum.cn/exhibition/exhibition_series/temporary_exhibitions/classical_art_exhibitions/202210/t20221009_257470.html (2022).
O’Neill, B. & Stapleton, L. Digital cultural heritage standards: from silo to semantic web. AI Soc. 37, 891–903 (2022).
Moraitou, E., Christodoulou, Y. & Caridakis, G. Semantic models and services for conservation and restoration of cultural heritage: A comprehensive survey. Semant. Web 14, 261–291 (2022).
Wang, X., Song, N., Liu, X. & Xu, L. Data modeling and evaluation of deep semantic annotation for cultural heritage images. J. Doc. 77, 906–925 (2020).
Barzaghi, S., Moretti, A., Heibi, I. & Peroni, S. CHAD-KG: a knowledge graph for representing cultural heritage objects and digitisation paradata. Preprint at https://doi.org/10.48550/arXiv.2505.13276 (2025).
Padfield, J., Kontiza, K., Bikakis, A. & Vlachidis, A. Semantic representation and location provenance of cultural heritage information: the National Gallery Collection in London. Heritage 2, 648–665 (2019).
Langmead, A., Otis, J. M., Warren, C. N., Weingart, S. B. & Zilinksi, L. D. Towards interoperable network ontologies for the digital humanities. IJHAC 10, 22–35 (2016).
Berry, D. Understanding Digital Humanities (Springer, 2012).
Rowley, J. The wisdom hierarchy: representations of the DIKW hierarchy. J. Inf. Sci. 33, 163–180 (2007).
Shreeves, S. L. & Cragin, M. H. Introduction: Institutional repositories: current state and future. Libr. Trends 57, 89–97 (2008).
Dai, T. Research and application of digital image metadata for museum cultural relics. Museum Manag. 11–17 (2020).
Ranjgar, B., Sadeghi-Niaraki, A., Shakeri, M., Rahimi, F. & Choi, S.-M. Cultural heritage information retrieval: Past, present, and future trends. IEEE Access 12, 42992–43026 (2024).
Deng, Z. H. & Tang, S. W. A review of ontology research. J. Peking Univ. (Nat. Sci. Ed.) 38, 730 (2002).
Hua, K. M., Chen, J. X. & Yang, H. S. Semantic retrieval based on ontology and metadata. Comput. Eng. 33, 220–221 (2007).
Zhang, J. & Ren, T. A conceptual model for ancient chinese ceramics based on metadata and ontology: a case study of collections in the nankai university museum. J. Cult. Herit. 66, 20–36 (2024).
Hogan, A. et al. Knowledge graphs. ACM Comput. Surv. 54, 71:1–71:37 (2021).
Koho, M. et al. Warsampo knowledge graph: Finland in the second world war as linked open data. Semant. Web 12, 265–278 (2021).
Beijie, H. E., Yu, Z., Jie, H. E. & Zhaoyi, M. A. Ontology modelling of ancient map information through a cognition-practical model: a case study of the Yangshi Lei Archives. Geomat. Inf. Sci. Wuhan Univ. 49, 546–561 (2024).
Yang, S. & Hou, M. Knowledge graph representation method for semantic 3d modeling of chinese grottoes. Herit. Sci. 11, 1–26 (2023).
Ranjgar, B., Sadeghi-Niaraki, A., Shakeri, M. & Choi, S.-M. An ontological data model for points of interest (POI) in a cultural heritage site. Herit. Sci. 10, 1–22 (2022).
Castelli, L., Felicetti, A. & Proietti, F. Heritage science and cultural heritage: standards and tools for establishing cross-domain data interoperability. Int. J. Digit. Libr. 22, 279–287 (2021).
Hiebel, G., Goldenberg, G., Grutsch, C., Hanke, K. & Staudt, M. Fair data for prehistoric mining archaeology. Int. J. Digit. Libr. 22, 267–277 (2021).
Gergatsoulis, M., Papaioannou, G., Kalogeros, E. & Carter, R. Representing archaeological excavations using CIDOC CRM–based conceptual models. In Metadata and Semantic Research (MTSR 2020) (eds Ovalle-Perandones, M. A. & Garoufallou, E.) 355–366 (Springer, 2021).
Eichert, S. Digital mapping of medieval cemeteries: case studies from Austria and Czechia. ACM J. Comput. Cult. Herit. 14, 1–15 (2021).
Fafalios, P., Kritsotaki, A. & Doerr, M. The sealit ontology—an extension of CIDOC-CRM for the modeling and integration of maritime history information. ACM J. Comput. Cult. Herit. 16, 60:1–60:21 (2023).
Cheng, Y.-M., Kuo, C.-L. & Mou, C.-C. Ontology-based HBIM for historic buildings with traditional woodwork in Taiwan. J. Civ. Eng. Manag. 27, 27–44 (2021).
Ronzino, P., Toth, A. & Falcidieno, B. Documenting the structure and adaptive reuse of Roman amphitheatres through the CIDOC CRMBa model. ACM J. Comput. Cult. Herit. 15, 1–23 (2022).
Dammag, B. Q. D. et al. Modeling ontology-based decay analysis and HBIM for the conservation of architectural heritage: the big gate and adjacent curtain walls in Ibb, Yemen. Buildings 15, 2795 (2025).
Vrachliotou, M. & Papatheodorou, C. Interoperability of oral history metadata: an ontological model. Oral. Hist. Rev. 51, 446–474 (2024).
Vrachliotou, M. & Papatheodorou, C. Ontology-based metadata integration for oral history interviews. In Linking Theory and Practice of Digital Libraries (TPDL 2022) Vol. 13541 (eds Silvello, G. et al.) 410–416 (Springer International Publishing AG, 2022).
Hiebel, G., Aspöck, E. & Kopetzky, K. Ontological modeling for excavation documentation and virtual reconstruction of an ancient Egyptian site. J. Comput. Cult. Herit. 14, 32:1–32:14 (2021).
Semantic data model for knowledge representation and dissemination of cultural heritage site, poompuhar. Web of Science. https://www.webofscience.com/wos/alldb/full-record/WOS:001179453800015.
Fan, T., Wang, H. & Hodel, T. Multimodal knowledge graph construction of chinese traditional operas and sentiment and genre recognition. J. Cult. Herit. 62, 32–44 (2023).
Lu, T. T., Ou, S. Y., Li, X. W. & Shen, X. Y. Construction and application of a knowledge graph of ancient warfare: a case study of the Song and Yuan periods. Library Forum 1–12 (2024).
Cheng, X. F., Tian, Y., Shu, K. H. & Wang, Z. Knowledge organization of traditional architectural texts for craft inheritance: a case study of the timberwork system in yingzao fashi. Library Forum 1–12 (2024).
Liang, J., Wang, Q. & Zhao, Y. Metadata construction for the digitization of traditional clothing of indigenous ethnic groups in Guangxi. J. Guangxi Univ. Natl. (Philos. Soc. Sci. Ed.) 43, 112–116 (2021).
Rodríguez-Ortega, N. Contours of knowledge: epistemological implications of semantic models in the representation of the art exhibition domain through the lens of the ontoexhibit ontology. Život Umjet. 114, 122–147 (2024).
Khalid, H. & Zim, E. Repairing raw metadata for metadata management. Inf. Syst. 122, 102344 (2024).
Koch, I., Teixeira Lopes, C. & Ribeiro, C. Moving from ISAD(g) to a CIDOC CRM-based linked data model in the Portuguese archives. J. Comput. Cult. Herit. 16, 71:1–71:21 (2023).
Bagley, P. R. Extension of Programming Language Concepts. Technical Report (University City Science Center, 1968).
Pomerantz, J. Metadata. The MIT Press Essential Knowledge Series (MIT Press, 2015).
NISO. Understanding metadata: What is metadata, and what is it for? a primer. NISO website. https://www.niso.org/publications/understanding-metadata-2017 (2017).
National Cultural Heritage Administration of the People’s Republic of China. Metadata Standard for the Digitization and Preservation of Cultural Relics. Technical Report. Accessed 18 September 2025. https://www.lib.pku.edu.cn/portal/sites/default/files/news/cms/resupload/0000001494/52.pdf (National Cultural Heritage Administration, 2017).
National Cultural Heritage Administration of the People’s Republic of China. Application Specification for Descriptive Metadata of Cultural Relics. Technical Report. Accessed 18 September 2025. https://www.lib.pku.edu.cn/portal/sites/default/files/news/cms/resupload/0000001494/5.pdf (National Cultural Heritage Administration, 2017).
The Palace Museum. Cataloguing Rules and Metadata Specification for Painting-Type Cultural Relics. Technical Report. Accessed 18 September 2025. https://www.lib.pku.edu.cn/portal/sites/default/files/news/cms/resupload/0000001494/18.pdf (National Cultural Heritage Administration/The Palace Museum, 2017).
National Cultural Heritage Administration of the People’s Republic of China. Metadata Specification for Painting-Type Cultural Relics. Technical Report. Accessed 18 September 2025. https://www.lib.pku.edu.cn/portal/sites/default/files/news/cms/resupload/0000001494/17.pdf (National Cultural Heritage Administration, 2017).
The Palace Museum. Information Indicators for Painting Collections of the Palace Museum. Technical Report. Accessed 18 September 2025. https://www.dpm.org.cn/Uploads/File/2020/04/08/u5e8d77947896e.pdf (The Palace Museum, 2017).
Faraj, G. & Micsik, A. Representing and validating cultural heritage knowledge graphs in CIDOC-CRM ontology. Future Internet 13, 277 (2021).
Shi, X. M. A comparative study of CDWA and DC metadata standards and the Palace Museum painting collection information index system. Palace Mus. J. (2016).
Acknowledgements
The author sincerely acknowledges the support of the Fundamental Research Funds for the Central Universities. The author also thanks the reviewers and editors for their valuable efforts in improving this article.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, Z. Conceptual model construction for the A Comprehensive Collection of Ancient Chinese Paintings based on metadata and ontology. npj Herit. Sci. 14, 63 (2026). https://doi.org/10.1038/s40494-025-02258-w
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s40494-025-02258-w









