Introduction

With the rapid advancement of information technology, the boundaries between various academic disciplines are increasingly blurring, leading to more frequent interdisciplinary collaboration. Within this context, the humanities have gradually fostered new research paradigms, converging with computer science to form a novel interdisciplinary field—Digital Humanities1,2,3. Under this paradigm, the introduction of metadata and ontology has established a new methodological foundation for the informatization and resource management of cultural relics and artworks. Their application has become increasingly widespread, particularly in the fields of cultural heritage and museology, offering new academic opportunities for the systematic organization, precise description, and broad dissemination of cultural heritage.

A Comprehensive Collection of Ancient Chinese Paintings (中国历代绘画大系, CCACP) is a monumental national cultural project, spanning history and transcending borders. It represents China’s first comprehensive, systematic investigation, compilation, and research into ancient Chinese paintings preserved worldwide, jointly edited and published by Zhejiang University and the Zhejiang Provincial Cultural Relics Bureau. To date, the CCACP has collected 12,405 Chinese painting holdings from 263 cultural institutions worldwide, on materials such as paper, silk (including bo and ling), and hemp4. It encompasses the vast majority of extant “national treasure”-level painting masterpieces, including the Complete Tang and Pre-Tang Paintings, Complete Song Paintings, Complete Yuan Paintings, Complete Ming Paintings, and Complete Qing Paintings, totaling 60 volumes and 226 books. It is the most comprehensive collection and the largest publishing project of Chinese painting image literature of its kind to date.

However, despite the CCACP’s landmark significance in organizing ancient paintings and resource construction, its research and utilization still face numerous challenges. Firstly, discrepancies in cataloging systems, descriptive standards, and classification methods among different institutions lead to inconsistent information presentation for the same artwork across databases, complicating cross-institutional retrieval and comparison5,6. Secondly, a vast amount of information related to painting works lacks a unified semantic description and knowledge modeling system, limiting cross-platform, cross-context resource sharing and knowledge discovery7,8. Furthermore, the informational dimensions contained in paintings are highly complex, encompassing not only artistic attributes such as subject matter, technique, and style but also documentary information such as provenance history, inscriptions, and restoration records9. The logical relationships between these diverse elements are difficult to systematize using traditional bibliographical or image archival methods. These issues, to some extent, constrain the advancement of painting research and the reuse of research outcomes, highlighting the necessity for digital humanities methodologies10.

Within the Digital Humanities research context, the integration of information technology and the humanities provides new methodological pathways for cultural heritage research11. In the context of knowledge organization and semantic technology, data typically refers to uninterpreted symbols or records lacking semantic connotation; information is data that has been organized and contextualized, capable of answering “what is”; while knowledge is information understood within a specific context and usable for reasoning and decision-making, emphasizing semantic structure and logical connections12. Within this framework, metadata, i.e., “data about data”, provides structured and standardized descriptions of data resources, characterized by modularity, extensibility, interoperability, and multilingualism13. It forms the basis for the organization, retrieval, and management of cultural heritage data14,15. Ontology, building upon metadata, involves further abstraction, focusing on the abstract essence of objective reality, i.e., a conceptual model abstracted from the objective world16. In information science and knowledge engineering, ontology is further developed as a formal representation of concepts and their interrelationships within a specific domain, thereby enabling knowledge sharing and reuse. The relationship between the two can be viewed as akin to syntax and semantics, or micro and macro perspectives: metadata emphasizes the structured description of information objects, addressing the “how to represent” question, whereas ontology focuses on the semantic logical relationships between collections of resources, addressing the “how to relate” question17,18. Building on this, knowledge graphs can transform ontological models into visualizable, queryable structured networks, supporting complex association analysis and semantic search19.

In CCACP research, metadata is used for the systematic encoding of paintings and their management information; the ontological conceptual model further integrates the semantic relationships between paintings, academic research, and management information; and the knowledge graph based on this ontology enables the visual presentation and intelligent management of painting data. So, this study applies the theoretical tools of digital humanities to the systematic management and academic research of ancient Chinese paintings through the metadata—ontology—knowledge graph methodological chain, achieving the transformation from data to knowledge and providing a reusable paradigm and practical path for digital cultural heritage research.

In digital cultural heritage research, the value of metadata and ontology has gradually gained academic attention and has been widely applied in project practices across multiple fields. A core task in constructing metadata and ontology is forming a unified conceptual reference model within a specific domain to facilitate resource sharing across institutions, regions, and even national borders. Consequently, a series of general or specialized metadata and ontology standards have been introduced internationally, among which the most influential is the International Committee for Documentation Conceptual Reference Model (CIDOC CRM). It aims to promote a common understanding and interoperability of cultural heritage information by providing a universal and extensible semantic framework, thereby enabling the exchange and integration of heterogeneous cultural heritage information sources(see https://www.cidoc-crm.org/).

In international academic practice, metadata and ontological modeling based on CIDOC CRM have been widely applied in numerous fields, including ceramics18, military history20, ancient maps21, grottoes22, cultural heritage23,24, archeology25,26,27, maritime heritage28, architecture29,30,31, oral history32,33, and ancient sites34,35. For instance, Zhao et al.36 constructed an ontology and knowledge graph for the “Tea Road,” a significant linear cultural heritage in China; Zhang et al.18 reused and extended CIDOC CRM for ceramics, an important category in world history museums, creating an “ancient Chinese ceramics ontology” framework; Fan et al.36 built the OpenOnto ontology for traditional Chinese opera using Chinese ethnic traditional opera as an example; He et al.21 conducted an ontological construction for ancient map knowledge and demonstrated it with a case study on the Yangshi Lei archives, Qing Dynasty architectural engineering drawings; and Lu et al.37 semantically organized and described historical literature resources related to ancient warfare, using the Song and Yuan periods as examples, and built a warfare ontology semantic model based on the Event Ontology.

Besides using CIDOC CRM for metadata and ontological modeling, some studies have also attempted to incorporate other metadata standards. For example, Cheng et al.38 used textual information from the “Major Woodwork System” craft described in volumes 4 and 5 of the classic ancient architectural text Yingzao Fashi as raw corpus and built, reused, and extended an ontology for “Song-style Major Woodwork Construction Techniques” based on the Metadata Standard for Ancient Architecture Cultural Relics and Liang et al.39 used Categories for the Description of Works of Art (CDWA) metadata as the primary reference for constructing metadata for the traditional costumes of Guangxi’s indigenous ethnic groups. Furthermore, CIDOC CRM is also used in galleries40, libraries, archives, museums41,42, and other cultural institutions to increase the accessibility of museum-related information and knowledge.

Overall, these studies can be broadly categorized into those dealing with “tangible cultural heritage” and “intangible cultural heritage”. The former primarily focuses on the semantic modeling and knowledge organization of tangible heritage such as ceramics, ancient maps, architecture, grottoes, and clothing; the latter involves more ontological construction and instantiation applications for intangible heritage such as opera, construction techniques, and oral history. This demonstrates not only the diverse application paths of metadata and ontology across different types of cultural heritage but also highlights the complexity and necessity of future cross-type, interdisciplinary integration.

Although existing research has accumulated considerable experience in ontological modeling within the cultural heritage domain, systematic ontological modeling remains lacking for the highly complex and unique field of ancient Chinese painting. Based on this, the research object of this paper is the major national cultural project, CCACP. It adopts a digital humanities perspective, i.e., focusing on semantic modeling, knowledge organization, and data-driven cultural heritage research methods and emphasizing the application of computational methods to humanities questions. Using metadata and ontology as the theoretical foundation, it reuses and extends international universal models such as CIDOC CRM to construct a conceptual domain ontology model for the CCACP. Within this framework, the digital humanities perspective refers not only to the research background but also to a methodological orientation, i.e., promoting the shift in humanities research from traditional text and image studies to a data-driven, structured knowledge system through computer-assisted knowledge representation. The research objectives include:

  • In the vertical dimension, proposing a reusable framework for painting knowledge organization to support academic research on ancient Chinese paintings across periods and regions, promoting the systematization and standardization of painting research.

  • In the horizontal dimension, forming a generalizable conceptual modeling paradigm to serve as a reference for other digital cultural heritage projects (e.g., steles, sculptures, artefacts), facilitating the integration and sharing of cultural heritage information.

  • At the application level, increasing the standardization, normalization, and interoperability of cultural relic information by constructing a semanticized painting knowledge network, providing new pathways for intelligent retrieval, in-depth research, and knowledge discovery of digital resources.

The significance of this research lies in, on the one hand, providing a new method for the digital organization of painting resources that transcends traditional bibliography and iconography, responding to the new demands of the information era for cultural heritage preservation and transmission and, on the other hand, exploring a localized path for ontological construction by combining international universal models with specific Chinese cultural contexts, offering valuable experience for global digital cultural heritage research. Its innovations are mainly reflected in (1) the uniqueness of the research object: it is the first systematic ontological modeling effort focused on the CCACP; (2) the integration of methodologies: extending CIDOC CRM based on the unique classifications and semantics of Chinese painting; and (3) the forward-looking nature of the research paradigm: achieving the transformation from data to knowledge through the combination of metadata-ontology, promoting the shift in digital cultural heritage research from static preservation to dynamic generation.

Methods

Metadata construction for the CCACP

The term metadata formally appeared in English in 1968, coined by Philip Bagley in his book Extension of Programming Language Concepts43, although its semantics can be traced back to the foreign word formation metaphysics from Aristotle’s Metaphysics, implying the inquiry into the essence behind phenomena or objects, sharing the same meaning as metadata in the information resource field discussed here44. The application of metadata can be traced back to library collection management around 245 BC; what is now called metadata was then termed information in the library catalog. Historians believe that Callimachus’s creation of the Pinakes (tablets) for the Library of Alexandria around 245 BC was the world’s first library catalog and the earliest known historical event using the semantics of metadata.

The latest classification of metadata, as of 2017, is presented in the document Understanding Metadata: What is Metadata, and What is it For?:A Primer45 published by the National Information Standards Organisation (NISO), which categorizes metadata into descriptive metadata, structural metadata, administrative metadata, and markup languages. Administrative metadata is further subdivided into technical metadata, preservation metadata, and rights metadata. In the cultural heritage field, descriptive metadata is more commonly used. This paper comprehensively employs administrative metadata and technical language alongside descriptive metadata in constructing the conceptual model for the CCACP.

Metadata standards are sets of rules for describing specific objects in a resource. The earliest international metadata standard is Dublin Core(DC)(see https://www.dublincore.org/specifications/dublin-core/dcmi-terms/), which, dedicated to concisely describing applications across various international fields and material types, has only 15 element sets. However, with rapid technological advancement in recent years, digitization has gradually become the preferred method for data storage across various industries, giving rise to more complex requirements for data storage. Data about data, i.e., metadata, standards have emerged for different domains. These include broadly applicable standards such as Dublin Core and Metadata Object Description Schema (see https://www.loc.gov/standards/mods/), as well as standards specifically for describing artworks such as CDWA and Canadian Heritage Information Network’s humanities data dictionaries, and for describing cultural heritage such as CIDOC CRM. Second stage describes the metadata standards reused and extended in constructing the CCACP.

In the first stage, metadata were classified and organized according to the characteristics of ancient Chinese painting data. As a continuation of traditional Chinese culture, Chinese paintings through the ages carry the historical lineage of the Chinese nation, possess extremely high historical, cultural, artistic, and scientific value, and contain rich information resources. Constructing the metadata model for the CCACP must consider two factors: the paintings themselves and the relevant content of the CCACP outcomes. The former involves their management, preservation, restoration, and research by museums; the latter involves exhibitions based on the CCACP outcomes, related research projects, data compilation, digital project extensions, etc.

To make the information organization structure of the CCACP more precise, its information resource categories are macroscopically divided into five major types: basic information, management information, resource information, research information, and extension information.

  1. i.

    Basic Information describes the painting itself.

  2. ii.

    Management Information pertains to the resource information involved when museums manage the physical painting.

  3. iii.

    Resource Information refers to various existing physical, audio-visual, etc., materials based on the painting.

  4. iv.

    Research Information involves research projects, publications, digital content, etc., related to the outcomes of the CCACP.

  5. v.

    Extension Information constitutes additional information elements beyond the above.

This paper creates a CCACP metadata framework that includes the following core components: Basic Information Metadata, Management Information Metadata, Resource Information Metadata, Research Information Metadata, and Extension Information Metadata. Detailed descriptions of these components are provided in fourth stage, which lays the foundation for subsequent ontology engineering.

In the second stage, existing metadata standards were reused and mapped. To ensure the CCACP metadata framework complies with current relevant domestic standards and facilitates broad international dissemination, its construction primarily involved the reuse and mapping of international general standards such as CIDOC CRM (version 7.2.3), CDWA, and Dublin Core. Domestic standards referenced include: Metadata Standard for Digital Preservation of Cultural Relics of the PRC46, Design Specification for Specialized Metadata of Cultural Relics Protection Industry Standard of the PRC, Application Specification for Descriptive Metadata of Cultural Relics of the PRC47, Cataloging Rules for Painting Cultural Relics Metadata of the PRC48, Metadata Standard for Painting Cultural Relics of the PRC49, and the Palace Museum Painting Collection Information Indicators50. The framework was expanded to accommodate the unique management and research characteristics of the CCACP. The reuse, mapping, and extension of the aforementioned metadata can be categorized into macro, meso, and micro perspectives based on their scope and characteristics.

From a macro perspective, the reuse and mapping focus on the metadata frameworks in the international cultural heritage domain, forming the main content of the CCACP metadata construction. Their reuse ensures the universality of the CCACP metadata model for broad dissemination within the international cultural heritage field, including CDWA, CIDOC CRM, and DC (Dublin Core). As mentioned in the background section, the DC element set, as the earliest international metadata standard, has 15 basic descriptive items. It is a simple, effective, and widely disseminated core element set. In practical applications, these 15 items can be repeated or selectively used, and subtypes and subschemas can be established, thereby offering strong interoperability and operability in resource exchange and sharing. Therefore, this paper mapped the following items from DC: Title, Description, Source, Relation, Creator, Date, Type, and Identifier. CIDOC-CRM is the core object of reuse in constructing the CCACP metadata model. It is a conceptual reference model for information integration first developed in 1996 by the International Committee for Documentation (CIDOC) under the International Council of Museums (ICOM), specifically for the cultural heritage domain. It aims to promote a common international understanding of cultural heritage information by providing a universal and extensible semantic framework2,51. After over 20 years of maintenance and development, the latest version, which is the version reused and mapped in this paper, is 7.2.3, released in August 2023, containing 99 classes and 199 property descriptions. CDWA is a metadata standard designed for art historians, art managers, and information technology experts and is currently widely used in the museum management field. Like CIDOC CRM, CDWA also provides mapping tables with other metadata standards, laying the foundation for data exchange and sharing52. The CDWA metadata standard contains 31 top-level elements and sub-elements, 13 of which are core elements, totaling ~540 elements. However, considering the particularity of the CCACP metadata framework based on painting and museum management and the complexity of the related research, the international DC, CIDOC CRM, and CDWA element sets are macro-level for cultural heritage and museums in general. Therefore, they cannot be fully reused but must be referenced alongside meso-level domestic cultural relic standards and collection information index system specifications, as well as micro-level painting-specific metadata standards for reuse and extension.

From a meso perspective, meso-level reference is made to the Cultural Relics Protection Industry Standards of the People’s Republic of China and the Collection Information Index specifications. Both are industry standards for cultural relic protection issued by the National Cultural Heritage Administration, possessing authority, professionalism, and specificity in the information resource management of cultural relics. Reusing and mapping these metadata standards ensures the professionalism of the CCACP metadata model and its universality within the cultural relic domain, although they still have certain limitations in describing painting cultural relics. The Museum Information Indicator System Specification (Trial) mainly includes 3 index sets, 33 index groups, and 139 index items, aimed at meeting the needs of information construction in Chinese cultural relic museums and standardizing the information processing and exchange of museum collections, making it suitable for constructing metadata for paintings that are cultural relics in China. Therefore, based on reusing museum information metadata standards, utilization metadata required for the use of collection information resources is integrated and supplemented.

From a micro perspective, reuse and extension involves the relevant metadata specifications for paintings in the People’s Republic of China, primarily concerning the metadata standard and cataloging rules for paintings that are cultural relics. Compared to the macro- and meso-metadata specifications mentioned above, these are more targeted. Here, reuse means directly quoting content from the standard, mapping means extending based on the standard, and reference means drawing on the normative content. The details are shown in Table 1:

Table 1 Reuse and Mapping of Existing Metadata Frameworks

In the third stage, core metadata elements were extracted to form the foundation for subsequent ontology construction. The core metadata of the CCACP, as the basic attributes of digital resources, form the core foundation for building the domain ontology. In this study, CCACP refers to a series of collection books that systematically include important painting works from various dynasties and their related information. The ontology constructed in this paper uses the content of this series of books as the data source, abstracting the information into metadata elements and categories, thereby ensuring the integrity and academic reference value of the ontology.

As mentioned earlier, macro, meso, and micro metadata standards have their respective advantages in generality and professionalism and complement each other to a certain extent. Therefore, this study, based on integrating and refining them, extracts a set of core metadata elements as the core metadata for the CCACP. Table 2 displays the categories, names, definitions, and mapping relationships with reused metadata. The metadata architecture listed in Table 2 is divided into five categories: Basic Information Metadata, Management Information Metadata, Resource Information Metadata, Research Information Metadata, and Extension Information Metadata. The basis for this categorization includes two primary aspects: first, a systematic review of existing metadata standards (e.g., CIDOC CRM, CDWA, and Dublin Core) and related literature; second, consideration of the practical application needs of CCACP information. This classification ensures that the constructed core metadata is both scientific and complete, while also possessing adequate applicability and extensibility. Abbreviations and term explanations in the table are as follows:

Table 2 Core Metadata of the A Comprehensive Collection of Ancient Chinese Paintings

In the fourth stage, based on the aforementioned 24 core metadata elements, a CCACP metadata schema containing 24 main elements and 60 sub-elements is created through multi-dimensional information extension of the core metadata elements (Table 3). The content of the metadata schema is as follows:

Table 3 Detailed metadata elements of the CCACP

The first part is Basic Information Metadata, involving the title, period, author, dimensions, etc., of the painting itself. When constructing the metadata model, considering that a painting’s name might have an academic Chinese name, an academic English name, and some popular names, the Title element was subdivided into three types. Regarding the painting’s period, to ensure the accuracy of metadata information, the Period element is divided into Macro Period and Micro Period; the former refers to the dynasty (e.g., Tang, Song, Yuan, Ming, or Qing), and the latter refers to the specific year (if available). Besides the author, period, size, etc., a painting also involves its material, technique, subject matter, inscriptions, and seals. Considering element consistency, these were uniformly categorized under the “Physical Information” element when constructing the CCACP Basic Information Metadata. Additionally, an element often overlooked but important in Basic Information Metadata is Remarks, which records uncertainties or discrepancies in the descriptions above. For example, if the painting’s author was not clearly identified initially and was recorded as Anonymous in the CCACP compilation, but later research clarified the author, or if there are rumors about the author, such records can be noted in Remarks to clarify uncertain information.

The second part is Management Information Metadata, primarily involving the museum’s management of the painting, here understood as the museum’s collection. It includes information data related to the identifier, the collection grade, preservation, exhibition, and collection sources. Among these, the Identifier is a coding system used to identify each collection item uniquely, ensuring accurate identification and tracking within the museum and in cross-institutional exchanges. A collection item may have a registration number recorded in the general ledger of cultural institution collections, as well as a number assigned by the museum according to its specific requirements. Therefore, the Identifier element is subdivided into Formal Identifier and Other Identifier categories; multiple identifiers can be extended under Other Identifier. The Preservation element mainly contains information data about the painting’s storage location, circulation records, and preservation conditions. The Exhibition element contains records of the painting being loaned to other exhibitors for exhibitions, involving the exhibitor, the exhibition venue, the exhibition title, and the exhibition start and end times. Furthermore, another important metadata element concerning collection management is the record of Condition, involving the recording of the collection item’s completeness, damage status, and location of damage, as well as records of the restorer, restoration time, restoration institution, restored area, restoration technique, and results, which can serve as references for subsequent painting image restoration.

The third part is Resource Information Metadata. This part primarily records material content related to the painting, enriching the CCACP database through records of material category, format, size, and title information.

The fourth part is Research Information Metadata. This part is based on the outcomes related to the CCACP and research analysis conducted on the painting itself. Examples include compiling data visualization analysis based on the CCACP; conducting color analysis on the painting works included in the existing CCACP to construct a CCACP painting color database platform; building a human-machine collaborative intelligent ancient painting color restoration system based on artificial intelligence generated content (AIGC) technology and large language models; and conducting material culture analysis based on CCACP paintings. Macroscopically, the outcomes related to the CCACP are divided into three major categories: Research Projects, Publications, and Digital Projects. Under each category, sub-elements such as Researcher, Institution, Time, Name, and Outcome are added based on specific information recording needs.

The fifth part is Extension Information Metadata. Although currently empty, the Extension Information Metadata category is added during the initial construction of the CCACP metadata model to accommodate potential future element sets that may require documentation and data that does not fit into the four major categories. Data information that cannot be categorized in the future can be classified under this category.

Ontological conceptual model construction for the CCACP

Since the mid-1970s, researchers in the field of artificial intelligence have recognized that knowledge acquisition is the key to building powerful AI systems. Consequently, ontology, as a tool for information abstraction and knowledge description, began to be adopted in the computer field. The ontology discussed in this paper refers specifically to ontology in the information science field. Synthesizing definitions from multiple scholars15, the author comprehensively defines it as an explicit formal specification of a shared conceptual model based on the basic terms and relations that constitute the vocabulary of a relevant domain. Some scholars believe that the term ontology, borrowed from philosophy and extended into information science, is essentially a conceptual model; hence, ontology can also be referred to as an ontological model. The process of building an ontology is ontology engineering. The process of constructing the conceptual model for the CCACP in this paper is thus ontology engineering.

Ontology engineering methods for constructing ontologies have gradually developed into various approaches due to different domain needs. Currently, mainstream international construction methods include the seven-step method (for domain ontology construction), methontology (for chemical ontology modeling), the KACTUS project method (for knowledge modeling of complex technical systems), the Toronto virtual enterprise (TOVE) method, and the skeleton method (for commercial ontology construction), among others. The latter four are methods designed for specific domains and are not entirely universal47. The seven-step method for building ontology engineering, developed by Stanford University, is universal and applicable to ontology construction across various fields. Figure 1 displays its basic flowchart. In this study, the seven-step method is selected as the approach for constructing the ontological conceptual model of the CCACP.

Fig. 1
figure 1

Stanford Seven-Step Ontology Development Method.

In the first stage, selection of ontology modeling software. Ontology modeling software is used to design and manage ontologies. Some common international software includes Protégé, TopBraid Composer, OntoStudio, Semantic Turkey, and VIVO, chosen based on user needs and the scale and complexity of the ontology. This paper ultimately selected Protégé as the ontology modeling software platform because it is open-source, free, and has sustainable maintenance advantages; it also features an intuitive graphical interface capable of supporting the modeling needs of large-scale cultural heritage projects; and, relying on an active international community and a rich plugin ecosystem, it offers good extensibility and shareability, better aligning with the goals and application scenarios of this study.

In the second stage, constructing the ontology model using the stanford seven-step method. As mentioned in the Related Background section, this paper selects the seven-step method developed by Stanford University as the construction method for ontology engineering. The specific steps are as follows.

First, the domain and scope of the ontology were clearly defined to delimit the conceptual boundaries of the CCACP dataset. At the outset of constructing the CCACP ontological conceptual model, its professional domain and scope must be clarified to ensure the constructed ontological model strictly aligns with the discipline’s connotations and structure. The CCACP ontology domain involves paintings, collection management, exhibitions, and outcome research, including but not limited to basic information, research information, and management information for paintings. Its goal is to build instance models based on specific paintings, enable semantic search and visual graph associations for painting works, and support subsequent knowledge graph construction.

Second, existing ontologies such as CIDOC-CRM and Dublin Core were examined to identify reusable classes and properties. As mentioned in the Metadata Construction section, CIDOC CRM provides a universal metadata standard set for the international cultural heritage domain, containing not only standardized definitions of classes/entities but also specifying their properties and relationships. Taking temporal information as an example, E1 CRM Entity is the top-level abstraction of all concepts; E2 Temporal Entity describes phenomena occurring at specific times and places; E41 Time Interval represents a specific time span; E49 Time Coordinate corresponds to a more precise time point; and related properties such as P1 is identified by, P4 has time span, and P9 occurs in are used to connect different classes. Figure 2 further illustrates the semantic relationships between these classes and properties, with arrow directions indicating the direction of semantic association. Specifically, starting from the E1 CRM Entity at the bottom of Fig. 2, core classes such as E2 Temporal Entity, E3 Event Activity, and E4 Acquisition Entity are derived sequentially downward. Semantic associations between classes are established through properties such as P1 (is identified by), P4 (has a time span), and P9 (occurs in). For instance, E2 Temporal Entity can be associated with E41 Time Interval via P4, which can then be associated with E49 Time Coordinate via P86 (falls within), forming a complete temporal description chain. The directed arrows clearly express the hierarchical structure and property relationships between entities, reflecting the rigor and expressiveness of CIDOC CRM in semantic modeling. As a universal model in the international cultural heritage domain, CIDOC CRM has been widely applied to studies related to archeology, museums, and the semantic description of cultural heritage. Therefore, this paper primarily reuses and extends the classes and properties of CIDOC CRM ver. 7.2.3 when constructing the CCACP conceptual model to ensure international compatibility and semantic interoperability.

Fig. 2
figure 2

Example of Entities and Relations, using Acquisition Information as an example.

Third, a comprehensive list of key terms related to Chinese painting—such as artist, dynasty, material, and motif—was enumerated to establish the conceptual vocabulary. Prior to constructing the Compendium ontology model, relevant domain knowledge must be collected to extract important information and terms. The important terms in the ontology can also be understood as the modeling of the Compendium metadata. This part primarily references content from CIDOC-CRM, the Metadata Standard for Painting Cultural Relics of the PRC, and the Application Specification for Descriptive Metadata for Cultural Relics of the PRC.

Fourth, classes and their hierarchies were defined to reflect both general cultural heritage structures and the specific semantics of painting metadata. In this step, classes and their subclasses are defined. After listing the important terms in the ontology, the model elaborates on them based on the specific content of the application objects. In Protégé, the classes and class hierarchy are defined according to the Compendium metadata table from section “Metadata construction for the CCACP”, as shown in Fig. 3.

Fig. 3
figure 3

Definition of Entities, Properties, and Relations in the Compendium.

Fifth, the properties of classes were specified to describe relationships. In ontology modeling, properties are used to define specific relationships between classes, divided into Object Property and Data Property. The former describes semantic links between classes and instances, e.g., Painting–Creator; the latter describes relationships between classes or instances and numerical information, e.g., Painting – Creation Date. This paper establishes the object properties and data properties for the CCACP conceptual model on the Protégé platform (see Table 4). Figure 3 shows the relationship between these two types of properties in the model and their semantic expression.

Table 4 Basic Information of 20 Selected Paintings from the CCACP

Sixth, property constraints were introduced to ensure logical consistency and semantic precision across the model. The constraints of properties are the constraints of the class properties in the Compendium ontology model, i.e., defining the domain and range, as detailed in Table 5.

Finally, instances were created based on real examples from the CCACP to test, validate, and refine the ontology through iterative feedback. The selection of instances follows the principles of the model’s universality and breadth, covering multiple dynasties and types of paintings. Representative works are selected from the Complete Tang and Pre-Tang Paintings, Complete Song Paintings, Complete Yuan Paintings, Complete Ming Paintings, and Complete Qing Paintings included in the CCACP, encompassing landscape paintings, figure paintings, gongbi paintings, bird-and-flower paintings, etc., and added to the conceptual model of this study, thus covering almost all important historical periods and categories. Table 5 shows the basic information about the instances.

Table 5 Basic Information of the Instances

In the third stage, graphical presentation. After constructing the ontology for the CCACP and adding instances, Protégé internally generates a knowledge graph based on class and property relationships and providing a relatively intuitive view of the hierarchical relationships between classes and subclasses and their property associations. Figure 4 shows the knowledge graph generated by Protégé based on the ontological model, where solid lines and arrows between classes indicate the hierarchical relationship between a class and its subclass, and dashed lines indicate object properties. Furthermore, the hierarchical relationship between classes and subclasses can also be intuitively seen in the asserted hierarchy (Fig. 5).

Fig. 4
figure 4

Knowledge Graph of the CCACP Ontology Model in Protégé.

Fig. 5
figure 5

Asserted Class Hierarchy of the CCACP Ontology in Protégé.

Results

Case selection

After constructing the ontological conceptual model of the CCACP to verify its internal logical rationality and feasibility and to demonstrate the rich intrinsic knowledge associations, 20 paintings of various types from the Pre-Qin Han Tang to the Qing Dynasty are selected, as mentioned above. Based on the ontological conceptual model, their basic information, such as author name, current collection location, period, and material type are used for preliminary knowledge graph construction in Neo4j. Wang Ximeng’s A Thousand Li of Rivers and Mountains (section) from the Song Dynasty is used as a specific example for presentation. A Thousand Li of Rivers and Mountains is one of the representative works of blue-green landscape painting from the Northern Song period in China and one of China’s renowned handed-down famous paintings. It was created by the Northern Song painter Wang Ximeng and is his only extant work. The painting depicts the magnificent scenery of the motherland’s rivers and mountains using a long scroll format, portraying rolling hills and vast rivers and lakes, interspersed with pavilions, towers, villages, and houses, expressing the beauty and grandeur of natural landscapes.

Graph presentation and application

Figure 6 presents the knowledge graph structure of the basic information of the 20 paintings. The ontological conceptual model of the CCACP lays the foundation for subsequent knowledge graph construction. Figure 7 shows the structure of the knowledge graph for A Thousand Li of Rivers and Mountains, demonstrating the application of the CCACP ontological model in a specific case. Given that this paper focuses on the ontological construction of the CCACP and limited space for presentation, when using A Thousand Li of Rivers and Mountains as an example, representative metadata and their intrinsic relationships are selected for annotation. For example, in the related resources section, the graph presents the digital animation video of A Thousand Li of Rivers and Mountains from the China Media Group’s Yangbo Digital Culture and Art Museum. The online link is placed in the Other Supplementary Information section; in the Research Project section, the 2023 National Social Science Fund Arts Major Project “Research on Value Interpretation and Protection Inheritance of Jin, Tang, Song, and Yuan Paintings and Calligraphy” is selected; for the digital project, the “Exploring Danqing” digital project published by the Forbidden City Publishing House in 2023 is selected. Overall, these knowledge graphs provide a clearer glimpse of the rich knowledge system and internal logic of the CCACP ontological model.

Fig. 6
figure 6

Knowledge Graph of Basic Information for 20 Selected Paintings.

Fig. 7
figure 7

Knowledge Graph Example for “A Thousand Li of Rivers and Mountains”.

Figure 8 intuitively presents the diverse applications of the CCACP ontological conceptual model, mainly expanded from three dimensions: cultural relic collection and management, cultural relic dissemination, and cultural relic research.

  • With respect to cultural relic collection and management, by integrating museum collection and cultural relic digital management concepts and utilizing knowledge graph technology, systematic management of collection painting works can be implemented, which significantly improves the efficiency and scientificity of the collection process. It also achieves efficient storage, precise querying, and in-depth analysis of cultural relic information, providing solid support for cultural relic protection.

  • At the level of cultural relic dissemination, the focus is on exhibition planning and display, education and popularization. Using knowledge graphs provides data support and logical structure for exhibition planning, helping to create richer, broader-perspective exhibitions, optimizing the audience’s viewing experience. Furthermore, leveraging knowledge graphs for educational activities popularizes painting knowledge in an accessible form, effectively enhancing public awareness and protection consciousness of cultural heritage and promoting its broad dissemination.

  • In the field of cultural relic research, the emphasis is on supporting academic research and interdisciplinary research collaboration. On one hand, it provides scholars with comprehensive and systematic data resources to deeply excavate the value of cultural relics. On the other hand, it breaks disciplinary boundaries, promotes collaborative innovation across different fields, injects new vitality into cultural heritage research, and drives its continuous development and theoretical understanding.

Fig. 8
figure 8

Application Dimensions of the Ontological Conceptual Model.

Discussion

This paper takes the CCACP as its research object and, addressing its needs for informatized management and academic research development, constructs the “CCACP Metadata” and the “CCACP Ontological Conceptual Model”. Paintings from different eras and of different types, such as landscape paintings, figure paintings, Gongbi paintings, and bird-and-flower paintings, are selected from the Complete Tang and Pre-Tang Paintings, Complete Song Paintings, Complete Yuan Paintings, Complete Ming Paintings, and Complete Qing Paintings included in the CCACP and added to the conceptual model of this study. Finally, using the Song Dynasty’s A Thousand Li of Rivers and Mountains as a specific example, the model is visually presented, effectively demonstrating the ontological structure of the compendium and the relationships between various types of information.

Ancient paintings contain the genetic code of the continuous inheritance of Chinese civilization and are an extremely important component of outstanding traditional Chinese culture. CCACP, as a long-term, foundational national cultural project, holds significant importance for promoting the creative transformation and innovative development of outstanding traditional Chinese culture through its informatization construction and academic research.

The main contribution of this paper lies in introducing an informatized management method based on metadata and ontological technology into the informatized management and research practices of the CCACP, providing a new solution and practical case for it. Specifically, the ontological conceptual model constructed in this study not only lays the data foundation for subsequent knowledge graph construction but also optimizes the organization, management, and dissemination methods of painting resources, increasing research efficiency and data sharing capabilities. However, the metadata and ontological structure proposed in this paper still require continuous improvement, as the CCACP encompasses a vast number of works scattered across major museums worldwide and related academic research is continuously growing. This study marks the initiation of a long-term academic project; the constructed metadata and ontological model will provide structural support for future knowledge graph research based on important ancient Chinese painters, paintings, material elements, and spiritual elements.

The academic value of this research is reflected not only in the optimization of the CCACP’s own informatized management and research practices but also in its promoting effect on broader academic fields. At the vertical research level, this ontological model provides a reusable conceptual framework for global ancient painting research, helping to promote the systematization and standardization of ancient painting research worldwide. At the horizontal research level, this model provides a reference paradigm for other projects in the digital cultural heritage field, promoting the integration and sharing of cultural heritage information resources across institutions and disciplines, and facilitating collaborative innovation in cultural heritage protection. In the future, this research framework can be further extended to the management and utilization of relevant cultural relic information resources domestically and internationally, thereby promoting the standardized development of global cultural heritage information management and providing new methodological support for the digital protection and transmission of cultural heritage.

In terms of application evaluation, the ontological conceptual model is feasible for application in information organization, semantic association, and knowledge presentation through multi-instance construction and visual demonstration, showcasing its potential application value in academic research and digital management. However, due to the vast number of works covered by the CCACP and their dispersion across global museums, future research will focus on the following directions:

  • Inviting art historians, curators, and experts in related fields for expert review to evaluate the academic rationality and application adaptability of the ontological model;

  • Designing competency questions and system tests to verify the model’s operability in practical data retrieval, semantic query, and knowledge graph construction;

  • Further extending the model to other domestic and international digital cultural heritage projects, exploring methods for ontology reuse and cross-domain knowledge integration, and providing new theoretical and practical support for the standardization and sustainable development of global cultural heritage information management.

In summary, this study not only provides methodological support for the informatized management and academic research of CCACP but also offers a practical paradigm worthy of reference for cultural heritage digitization, interdisciplinary information integration, and knowledge graph construction, providing a new academic path and methodological basis for promoting the digital protection and transmission of outstanding traditional Chinese culture.