Introduction

Nowadays, new technologies, such as cloud computing, big data, and artificial intelligence, have reshaped the traditional culture industry. Digitization is igniting the new culture economy from online, and data traffic upgrading and field expansion have boosted the cultural industry’s accelerated fission and cross-border reorganization. For instance, Steven Spielberg revealed to the foreign media Showbiz411 that the movie Ready Player One was expected to experiment with further creativity in meta-universe, i.e., creating a 3A-level MMO open meta-universe experience that matches the iconic IP. The Louvre’s website already categorizes its collection by fields and themes, allowing visitors to “immerse themselves” in the classic collection without leaving their homes via an interactive mapping experience. Roblox has also been experimenting with “immersive advertising,” a combination of digital and real that offers users a richer and more diverse experience and opens up new development opportunities for digital marketing. From this perspective, the cultural industry has been increasingly exhibiting new expressions of networked, intelligent, and interactive digital-real integration development. Therefore, scientific insight and evaluation of the progressing in the development of digital-real integration in the cultural industry nowadays played a significant role in optimizing digital transformation and development.

The so-called digital-real integration refers to integrating digital technology into the real economy with a full industrial chain and complete life cycle to form a high-quality industrial ecosystem with digital technology as the core (Meng et al. 2023). The development of digital-real integration in the cultural industry was supported by digital technology, driven by data elements, and carried on by the network. Through a complete digital transformation of all the processes, all the directions, and all the chains, such as cultural creative design and R&D, cultural product manufacturing and producing, and cultural product distribution and circulation, it realized the operational mode of improving the total factor productivity of the cultural industry (Passiante and Secundo, 2002). It would effectively connect the value chain, industrial chain, supply chain, innovation chain, and ecological chain of the cultural industry both online and offline (Xu et al. 2021; Tsolakis et al. 2023) and make the flow of information, technology, talents, capital and supplies in the cultural industry more efficient and smooth. At the same time, digital-real integration could facilitate the cultural industry’s flexible production and digital cluster operation through a networked, flattened, and platform-based structure (Tracey and Clark, 2003).

Nevertheless, how should the progress of industrial digital-real integration be assessed? Urbinati et al. (2020) identified the stages and characteristics of industrial digital-real integration from the perspectives of technological aspects and integration pathways using a qualitative method (Elia et al. 2020). Wang Z, et al. (2024) broadly reflected the state of industrial digital-real integration by calculating comprehensive values using a coupling coordination degree model. Thus, the development of industrial digital-real integration is a gradual process that cannot be achieved overnight. While its ultimate goal is to achieve industrial ecological operations centered on digital technologies, more precise identification of the detailed progress and state of digital-real integration remains urgently needed. Therefore, unlike previous studies that focused on qualitative analyses or generalized quantitative approaches, this study proposes an innovative methodology for assessing the progress of digital-real integration in the cultural industry based on China’s patent information through social network analysis, text classification and text similarity methods. Specifically, after an introduction, the next section reviewed the literature on the development of digital-real integration of industries. The next section presented a theoretical analysis to define the characteristics of different stages in the development of digital-real integration. Next two sections cover the data, methodology and research findings, respectively, identifying the progress of digital-real integration in the cultural industry using China as an example. Last section offered a discussion and conclusion.

Related work

The integration of digital-real industries is a complex and gradual process that cannot be accomplished instantaneously. It is anticipated to evolve through distinct phases, including digitalization, intelligence and ecologization (Wang Y, et al., 2025). Given the unique characteristics and challenges associated with each stage, it is imperative to employ tailored methodologies to accurately evaluate the progress and depth of digital-real integration. Against this backdrop, this section provides a comprehensive review and synthesis of existing research on the evolution and measurement of digital-real integration, aiming to establish a robust foundation for further scholarly inquiry.

The evolution of the development of digital-real integration

Li (2021) pointed out that digital-real integration was a process of reconfiguring the business logic of the real economy, changing the organizational form, and innovating the value model through the integration and innovation of digital technology, application scenarios, and business models. Fu et al. (2022) found that the deep integration of the digital economy and real economy would follow the “new real enterprise-digital ecology-new real economy” route. The digital-real integration covered data elements, digital technology, the platform economy, and the deep integration of digital sharing with the real economy (Zhengzhen, 2021). At the technical level, it involved a highly connected linkage between industrial technology and digital technology. Enterprise entities would have a possibility of digital operation in the cloud through digital transformation (Borangiu et al. 2019). At the meso-level, the digital-real integration addressed the deep coupling of the industrial chain and the innovation chain. Namely, a distinctive main value chain has been formed within the enterprise cooperation network both online and offline (Tiwari, 2021). At the macroscopic level, the digital-real integration implied that the interactions and exchanges between the main entities of the industrial chain group would display significant ecological characteristics (de Koning et al. 2023). Xiao (2020) studied the main approaches for integrating the digital economy and the real economy at the enterprise, industry, and macroeconomic levels. He concluded that integrating the digital economy and real economy should be accelerated from innovation and introduction of digital technology, improvement and connection of digital infrastructure, cultivation of digital industrial ecosystem, and support and regulation of digital platform enterprises. Balaji and Roy (2017) identified that the industrial Internet, as the primary field of Internet of Things (IoT) application, shared a similar value creation model with the network architecture for developing digital-real integration. He suggested that digital platforms should be used to facilitate the digital transformation of enterprises and that digital ecosystems could be used to optimize industrial ecosystems, and this is the only way to consolidate the depth and breadth of digital-real integration continuously.

The effect measurement for the development of digital-real integration

Song, Yuxin (2020) used the coupling coordination degree model and gray correlation analysis to carefully measure the integration level of the digital economy and real economy. The coupling coordination degree of digital-real integration development presented a trend of agglomeration followed by dispersion, while the regional multi-level differentiation feature was significant. Lee et al. (2015) measured the integration level of the digital economy and real economy based on the patent co-classification analysis method using patent application data and patent-industry matching information. Findings revealed that the digital-real integration network featured an obvious core-edge structure, and the digital-real integration still had a large space for industrial expansion. Xu et al. (2021) explored the spatiotemporal evolution characteristics and driving factors of integrating the digital economy and real economy through the entropy method, coordination degree, and spatial econometric model. The research found that the current level of digital-real integration was dominated by weak integration and the primary level of integration, and the digital-real integration was characterized by low agglomeration. Zhang and Wang (2022) examined the factual characteristics, regional differences, and dynamic evolution laws of digital-real integration using traditional and spatial Markov chains, Moran’s I, and other methods. He argued that the current level of comprehensive development of the real economy lagged behind the level of development of the digital economy. However, with the popularization and development of digital technology, the polarization of the development of digital-real integration across regions would be alleviated.

Previous research has primarily relied on methodologies such as the coupling coordination degree to evaluate the integrated development level of digital-real integration from a holistic perspective. However, such approaches often neglect a nuanced examination and deconstruction of the distinctive characteristics associated with different stages of industrial digital-real integration, consequently limiting their ability to accurately assess the true progress of this phenomenon. In response to this limitation, the current study focuses on the cultural industry in China as a case study. By utilizing patent data, social network analysis, text classification and text similarity analysis, this research seeks to systematically analyze the developmental trajectory of digital-real integration within the cultural industry, with a particular emphasis on technological advancements, industrial chain dynamics and ecological chain division of labor. This study represents an innovative and pioneering contribution to the field, offering fresh insights into the evolving landscape of digital-real integration.

Theoretical framework

Theoretical foundation

Only by clarifying the characteristics of digital-real integration in the cultural industry at different stages can the progress of a specific country’s cultural industry be accurately assessed. To address the limitations of previous studies that used the coupling coordination degree method for overly generalized observations of digital-real integration, this study begins with the concept of digital-real integration. The development is divided into primary, intermediate, and advanced stages. Using China as a case study, the analysis examined the progress and characteristics of digital-real integration in the cultural industry.

  1. (1)

    Analysis of characteristics at the primary stage of digital-real integration: Previous studies focused predominantly on technological integration at the primary stage of digital-real integration in the cultural industry (Rosenberg N, 1963; Bygstad, Lanestedt, 2009). Specifically, researchers have highlighted the deep integration and application of digital technologies such as the Internet, big data, artificial intelligence, cloud computing, and the Internet of Things in the cultural industry. For instance, digital asset management and blockchain technologies have been widely applied to extract high-precision mural elements, reconstruct caves in 3D, and enable virtual tours of the Dunhuang caves. The creation of “Digital Dunhuang” ensured the proper preservation and long-term retention of invaluable cultural heritage in digital form. Consequently, the integration and transformation of traditional cultural technologies through digital technologies were identified as the most prominent features of the primary stage of digital-real integration (Breidbach and Maglio, 2015). The emergence and application of cultural technology patents characterized by both digitalization and platformization have been considered critical drivers of digital-real integration in the cultural industry (Alshammari et al., 2024).

  2. (2)

    Analysis of characteristics at the intermediate stage of digital-real integration: At the intermediate stage of digital-real integration, more characteristics predominantly focused on restructuring cultural industry organizations by integrating innovation and industrial chains (Wang and Huang, 2024). At this stage, digital-real integration moved beyond simple technological integration. Digital technologies drove the digital transformation of the entire cultural industry chain. i.e., digital technologies not only accelerated the horizontal and vertical integration of production, distribution, and consumption in the cultural industry through highly integrated network platforms but also brought numerous participants into virtual spaces, expanding the boundaries of cultural industry clusters (Djamfa, 2024). For example, the National Public Culture Cloud was identified as a representative case. In the data storage and processing stage, cloud computing and big data technologies enabled the platform to classify and store vast cultural resources while implementing intelligent tagging for management. In the data exchange and management stage, blockchain and automated content review technologies ensured copyright protection and compliance for the platform’s content. In the content production stage, platform technologies steered the vertical circulation of cultural industry production models. In the promotion and preservation stages, digital technologies expanded the boundaries of cultural dissemination and heritage through tools such as precise recommendation algorithms and virtual reality. Therefore, the integration of cultural industry chains and innovation chains was regarded as an inevitable trend (Firican, 2024).

  3. (3)

    Analysis of characteristics at the advanced stage of digital-real integration: At the advanced stage of digital-real integration, more characteristics focused on reshaping organizational relationships in the cultural industry by integrating innovation and ecological chains (Lerro et al. 2022). At this stage, digital-real integration transitioned to an ecosystem integration model characterized by industrial interconnectivity, and its cultural industry organizations exhibited distinct ecological contexts (Bustamante, 2004; Calisi and Botta, 2022). The organizational relationships in digital-real integration included service groups (e.g., scenario creation, ecosystem interconnectivity, financial and tax services, and platform governance), key groups (e.g., cultural creativity, content production, and product marketing), innovation groups (e.g., 5G, AI, big data, and the metaverse), and resource groups (e.g., data collection, processing, trading, distribution, and presentation). For instance, the Palace Museum in Beijing launched the “ The Digital Hall of Da Gao-Xuan Dian (大高玄殿数字馆, Lit. The Digital Hall of the Great Shrine of Mystery Highness),” the 3.0 version of its digital exhibitions, which was regarded as a model of the cultural metaverse (Pais, 2019). This platform integrated online curation and offline exhibitions, enabling ultra-high-definition displays of digital artifacts, immersive interactive experiences, and comprehensive digital ecosystem planning for promoting cultural exhibitions of the Palace Museum. Thus, the advanced stage of digital-real integration was marked by ecosystem-driven operations in organizational relationships underpinned by digital technologies. At this stage, all aspects of the cultural industry ecosystem were fully transformed by digital technologies. The integration of the cultural industry ecosystem and innovation chains fostered the most professional and detailed modular division of labor within the industry (He and Wen, 2025). This integration was identified as the ultimate goal of digital-real integration in the cultural industry.

Analytical framework

In summary, to scientifically assess the progress of digital-real integration in the cultural industry, this study begins with the conceptual framework of digital-real integration and conducts empirical tests based on theoretical analyses of its characteristics at the primary, intermediate and advanced stages, as shown in Fig. 1. Specifically, at the primary stage, the study will use the keywords “digital technologies” plus “cultural industry” to search for relevant patent information, representing the technological integration of the cultural industry. At the intermediate stage, the study will extract information about patent inventors from the aforementioned patent data. It will then match the business scope of patent inventors with the various segments of the cultural industry chain based on similarity, showcasing the integration of innovation and industrial chains in the cultural industry. At the advanced stage, the study will similarly match the business scope of patent inventors with the various segments of the cultural industry ecosystem chain based on similarity, demonstrating the integration of innovation and ecosystem chains in the cultural industry.

Fig. 1
figure 1

Theoretical framework.

Data and methods

Data source

Patent information

Since the development of digital-real integration in the cultural industry was based on digital technology, this paper selected relevant patent information to interpret, which fitted very well, and the specific data came from the patent database of China National Intellectual Property Administration. First, by referring to Guan and Liu (2016), following the approach of Benson, Magee (2013) and Montecchi et al. (2013), we used specific keywords to search for relevant patents, this paper selected 18 keywords related to “digital-real integration” from documents and reports such as “Guiding Opinions on Promoting the Deep Integration of Culture with Science and Technology (S&T)” and “Top 10 Frontier Application Trends of Culture and S&T in 2022”, and then constructed a search formula for “digital-real integration”. Second, the keywords related to the cultural industry were selected from the Classification of Cultural and Other Related Industries in China (2018). Finally, link the keywords “digital-real integration” and “cultural industry” with the “AND” operator to construct the “cultural industry digital-real integration” keyword search formula.

For example, TS = (cultural industry keywords) AND (digital-real integration keywords) (see Table 1 for details). The use of these keywords helped us obtain technological patent data related to the integration of the digital and physical aspects of the cultural industry, and further identify relevant enterprises through patent citation relationships. At the same time, this paper took 2021, the release year of China’s “14th Five-Year Plan” for the development of the cultural industry, as the base year, and as of June 2023, cross-searches were conducted by applying the above search formula. Finally, 12,672 patents and their citation relationships related to “digital-real integration of cultural industry” were obtained. By further summing up the patents to the individual level, a total of 2237 individuals (including enterprises, research institutions, universities, and so on) were covered, and a total of 3146 pairs of inter-individual citation relationships were obtained. Therefore, the nodes in the network of this paper represent various entities in the cultural industry, rather than individual patents. These entities include enterprises, research institutions, universities, etc., and each plays a different role in the patent network. The 2237 entities (nodes) represent various innovators in the field of cultural industry digital-physical integration. The connections between the nodes are determined by patent citation relationships. When a node’s patent cites another node’s patent, a directed edge is formed, indicating the technological transfer path from the citing patent to the cited patent. Specifically, if Entity A’s patent cites Entity B’s patent, a directed edge is established between A and B, and the number of citations determines the strength of the connection between A and B. Therefore, the connection is based on the citation relationship between patents. In this network, the connections between nodes not only reflect technological relationships but also reveal the interactions and collaborations of different entities in the technological innovation process (Karvonen and Kässi, 2013; Choi and Yoon, 2022).

Table 1 Related patent search equation.

Text information on business scope

The “business scope” refers to the range of business activities and fields disclosed in an enterprise or institution’s business registration information. This information typically includes the main products or services the enterprise engages in, reflecting its market positioning and technological capabilities. Additionally, as part of the business registration information, the business scope has a high degree of standardization and credibility, providing a foundation for large-scale analysis. In the context of the cultural industry, the business scope may include various aspects such as cultural creativity, cultural content development, and cultural consumption. Therefore, the business scope not only provides detailed information about an enterprise’s involvement in the industrial and ecological chains but also helps clarify the role and function of each enterprise in the integration process at different levels. In the research process, we use the business scope data to deepen our understanding of each enterprise and institution in the industrial and ecological chains. The construction of the industrial and ecological chain corpora is based on the classification and stratification of these business scope texts. Specifically, by analyzing the business scope of each enterprise, we extract keywords and themes related to the integration of digital technologies and the cultural industry, mapping them to different links in the industrial and ecological chains.

Through the Enterprise Search platform QiChaCha.com, this paper batch searched the individual names involved in the above patents to get the business scope information of 2231 of them and used them as the text to analyze.

  1. (1)

    Industrial chain corpus: The cultural industry chain includes three major links: cultural creativity supply, cultural content R&D, and cultural product marketing (Hao and Huang, 2021). When constructing the industrial chain corpus, we set keywords for each industrial chain link (such as “cultural creativity”, “cultural development”, “cultural consumption”, etc.) and collect the business scope information of enterprises involved in these industrial links. Based on this information, we construct the industrial chain corpus. Miric et al. (2023) used text classification methods to categorize AI patents. This paper follows their approach by classifying enterprises based on their business scope and assigning each entity to a specific industrial chain link. This classification method is based on the specific content of the business scope text and employs a Naive Bayes classification method. Therefore, in the process of supplying cultural creativity, this paper searched for “cultural creativity” as the keyword in the QiChaCha platform. In the process of cultural content R&D, this paper used “cultural R&D” and “cultural science and technology” as keywords to search; in the process of cultural product marketing, this paper used “cultural marketing”, “cultural consumption”, “cultural experience”, and “cultural derivatives” as keywords to search. Finally, a total of 600 individual names and business scope information from the three industrial chain links were obtained. This paper took this as the basis for analyzing the division of labor in the industry chain for the development of digital-real integration in the cultural industry.

  2. (2)

    Ecological chain corpus: In constructing the ecological chain corpus, the business scope text information helps us distinguish the position of enterprises in the broader ecosystem of digital-physical integration in the cultural industry. We set keywords for each ecological chain module (such as “scene construction”, “ecological interconnection”, “VR/AR”, etc.), collect business scope information from enterprises involved in these ecological modules, and use this information to build the ecological chain corpus. Hoberg and Phillips (2016) used text analysis techniques to examine product descriptions in company annual reports, proposing the concept and measurement of product similarity. This paper adopts their methodology, and using the cosine similarity algorithm, analyzes the business scope of the entities in the study, categorizing them based on the module they occupy within the ecological chain.This paper collected the keywords of each ecological group from official documents and authoritative reports and obtained the individual names and business scope information related to the ecological chain for the development of digital-real integration in the cultural industry with the help of QiChaCha.com Platform, thus taking it as the basis for the classification of the ecological chain, as shown in Table 2.

Table 2 Related search keywords for links in the ecological chain.

Research methods

Social network analysis

To better meticulously present the progress of the integration of digital-real aspects in the cultural industry, the social network analysis method was used to quantitatively examine the relationships between network entities to understand the characteristics of the network structure (Wasserman, 1994; Scott, 2012). Therefore, this article referenced the approach of Hu et al. (2024) and conducted a social network analysis to characterize the network features of the development of digital-real integration in cultural industries. Specific network topology indicators selected were as follows.

  1. (1)

    Degree had been defined as the number of relationships between nodes, which characterized the active degree of a node’s connection with other nodes. The mean value of the degrees of all the nodes in the network was the average degree of the network, and the formula was

    $${D}={d}({x})$$
    (1)
  2. (2)

    Network density was used as a measure of the overall closeness of the network and was defined as the ratio of the total number of links actually presented in the network to the theoretical maximum total number of links. Using the formula below:

    $${D}=\frac{{\sum }_{{t}}^{{N}}{{k}}_{{i}}}{{N}({N}-1)}$$
    (2)

    where \({{k}}_{{i}}\) was the degree centrality of the node, and N was the number of nodes.

  3. (3)

    The average length of the path was the average value of the shortest length of the path among all pairs of nodes in the network. The smaller the value, the smoother and more convenient the connectivity between the nodes, and the formula was:

    $${L}=\frac{{\sum }_{{j}}^{{N}}{{d}}_{{ij}}}{{N}({N}-1)}$$
    (3)

    where \({{d}}_{{ij}}\) was the shortest length of the path between node i and node j, and N was the number of nodes in the network.

  4. (4)

    The clustering coefficient reflected the clustering of nodes in the network. It was defined as the ratio of the number of edges between a node and all its neighbors to the theoretically possible maximum number of connected edges. The formula was:

    $${C}=\frac{2{{E}}_{{i}}}{{N}{{k}}_{{i}}({{k}}_{{i}}-1)}$$
    (4)

    where \({{E}}_{{i}}\) was the actual number of connected edges between node i and all neighboring nodes, N is the number of nodes in the network, and \({{k}}_{{i}}\) is the degree of node i.

  5. (5)

    The modularity was mainly used to measure the strength of the network divided into clusters. The network with a higher degree of modularity has a strong connection in each module, while different modules are more sparsely connected to each other. It is defined as the number of edges within a module minus the expected value of the number of edges when randomly distributed using the formula.

$${Q}=\mathop{\sum }\limits_{{i}}{{e}}_{{ii}}-{{E}}_{{r}}$$
(5)

Text classification method—Naive Bayes classifier

Text classification was a method that calculated the category information to which the document data belonged by calculating it based on the content of the document data in a classification system that had already been trained and learned (McCallum and Nigam, 1998; Kim et al. 2006). The specific steps of the text classification method included five steps: text pre-processing, text representation, classification model training, performance evaluation, and classification determination (Kang et al. 2020).

  1. (1)

    Data pre-processing: In this paper, the Jieba Chinese Text Segmentation in the Python environment was selected to conduct segmentation processing on the business scope text of the above industrial chain and ecological chain enterprises, and the business scope texts were cut into separate words. Further, the deactivated stopwords from the Harbin Institute of Technology (HIT) were used to delete the useless contents in the text, such as punctuation marks, adverbs, and so on. Because the enterprise business scope text used in this paper also contained useless information such as state-restricted operation, projects required to be approved according to the law, general scope, business license, and so on, which also affected the accuracy of the text classification, so they were also deleted in the text pre-processing process.

  2. (2)

    Text representation: Text representation was to use the relevant module to denote the text into a machine-readable pattern, i.e., the text was transformed into the form of word vector, which was easy to process later. This paper selected a bag-of-words model for the textual representation of the above textual information, which is implemented based on the CountVectorizer tool in Scikit-learn. The bag-of-words model assumed that the text consists of a collection of several words combined, and it calculated the frequency of the words in the text appearing in the full lexicon, which constitutes the word vectors of that text (Salton and Buckley, 1988). Even though the bag-of-words method ignored the sentence structure of phrases and sentences and lost a large amount of information contained in the word sequence, given that the text analyzed in this paper was the information on the business scope of enterprises, the content was more scattered and fragmented and did not have the problem of ordering. Hence, the bag-of-words method was still applicable to a certain extent.

  3. (3)

    Classification model training: Classification model training was used to select and train a suitable classifier to get the classification model. In this paper, the approach of Kaushik et al. (2022) was taken as a reference to select a Naive Bayesian classifier for training and accomplishing the final classification goal. The idea was to know the prior probability, calculate the conditional probability of a given text to be classified into each category, compute the posterior probability based on the prior probability and the conditional probability, and finally select the category with the highest calculated posterior probability as the category to which the text belongs to, which was calculated by the formula:

    $${p}({{c}}_{{j}}|{{d}}_{{\rm{I}}};{\theta })=\frac{{p}({{c}}_{{j}}|{\theta }){p}({{d}}_{{i}}|{{c}}_{{j}};{\theta })}{{p}({{d}}_{{i}}|{\theta })}$$
    (6)

    where θ denoted the unknown parameter set in the probabilistic model; \({p}({{c}}_{{j}}|{{d}}_{{\rm{I}}};{\theta })\) represented the posterior probability that text \({{d}}_{{i}}\) belonged to category \({{c}}_{{j}}\), under the condition that the text \({{d}}_{{i}}\) and parameter θ were known; \({p}({{c}}_{{j}}|{\theta })\) stood for the prior probability of category \({{c}}_{{j}}\). \({P}({{d}}_{{i}}|{\theta })\) implied the probability of text \({{d}}_{{i}}\) if the parameter set θ is presupposed.

  4. (4)

    Performance evaluation of classifiers: This part evaluated the classification model’s performance using the existing pre-labeled test dataset for classification performance, which was commonly evaluated by the accuracy rate, checking precision rate, recall rate, and F1 value metrics (Meque et al. 2023). Among them, the Accuracy rate was used to evaluate the overall recognition ability of the model, i.e., the number of correctly classified texts as a percentage of the total number of texts, which was calculated by the formula:

    $${\rm{Accuracy}}=\frac{{\rm{TP}}+{\rm{TN}}}{{\rm{TP}}+{\rm{TN}}+{\rm{FP}}+{\rm{FN}}}$$
    (7)

    The Precision rate is the ratio of the number of texts correctly determined in a category to the total number of texts determined to be in that category, with the formula:

    $${\rm{Precision}}=\frac{{\rm{TP}}}{{\rm{TP}}+{\rm{FP}}}$$
    (8)

    The Recall rate was the ratio of the number of texts correctly identified for a category to the number of texts actually belonging to that category, and the formula was:

    $${\rm{Recall}}=\frac{{\rm{TP}}}{{\rm{TP}}+{\rm{FN}}}$$
    (9)

    The Precision and Recall rate reflect the performance of the classification algorithm from different perspectives, while the F1 value better balances the Precision rate and recall rate by combining the two for a comprehensive evaluation, which was given by the formula:

    $${\rm{F}}1=\frac{2\times {\rm{Precision}}\times {\rm{Recall}}}{{\rm{Precision}}+{\rm{Recall}}}$$
    (10)

    Among them, TP indicated the number of texts correctly determined as belonging to a category; FP indicated the number of texts incorrectly determined as belonging to a category by the classifier; FN indicated the number of texts actually belonging to a category but determined as belonging to other categories by the classifier, and TN indicated the number of texts incorrectly determined by the classifier as not belonging to the category.

  5. (5)

    Classification evaluation: Use the trained classifier to classify the unlabeled to-be-classified text to accomplish the final text classification goal.

Text matching method—corpus-based cosine similarity algorithm

The text-matching method was identifying and locating specific text in the original data, mainly done by measuring the similarity between words, sentences, paragraphs, and documents (Islam and Inkpen, 2008; Alqahtani et al. 2021). The steps of text matching included data pre-processing, text representation, and similarity calculation. Among them, data pre-processing and text representation were done as above.

In terms of similarity calculation, this paper selected the cosine similarity algorithm (Huang, 2008; Bewoor and Patil, 2018), which was widely used today, to calculate the similarity between the texts of cultural enterprise operation and the ecological chain corpus, using the following formula:

$${S}\left({D}1,{D}2\right)={\rm{cos}}{\theta }=\frac{\mathop{\sum }\nolimits_{{i}=1}^{{n}}{{D}1}_{{i}}{{D}2}_{{i}}}{\sqrt{\mathop{\sum }\nolimits_{{i}=1}^{{n}}{{D}1}_{{i}}^{2}}\times \sqrt{\mathop{\sum }\nolimits_{{i}=1}^{{n}}{{D}2}_{{i}}^{2}}}$$
(11)

where D1i, D2i were the word vectors of different texts, respectively. The calculated cosine similarity is between 0 and 1. The closer the value is to 1, the higher the degree of similarity between the texts. In this paper, the threshold of cosine similarity was set to 0.7 (Okazaki and Tsujii, 2010), which is used to determine the existence of ecological operation for the development of digital-real integration in the cultural industry.

Results

The characteristics of organizational form

In this paper, 3146 pairs of individual patent citations information were imported into Gephi software, and the number of citations among their patents was used as the weight of the edges. Based on the Fruchterman Reingold layout, the values of zone, gravity, and speed parameters are set to 22,370, 10, and 1, respectively. The organizational form for the development of digital-real integration of cultural industry was visualized and presented, and the results were shown in Fig. 2. Among them, the points represented different individuals in the digital-real integration network; the edges represented the referential relationship between individuals; the thickness of the edges represented the size of reference frequency between individuals. Further, this paper utilized Gephi software to present its network structure in detail.

Fig. 2
figure 2

Organizational form for the development of digital-real integration in cultural industries.

As shown in Table 3, the average degree and network density of the organizational network for the development of digital-real integration in the cultural industry were 1.343 and 0.001, respectively. Based on the measure of Hu et al. (2024), the overall activity of its network was low, and the nodes were more loosely connected to each other. Meanwhile, the average path length and clustering coefficient of the digital-real integration chain cluster network of the cultural industry were 6.41 and 0.01, respectively, which indicated that in the network’s internal nodes, any two points needed to be mediated more than six times on average before a connection can be generated. Namely, the internal accessibility was low, and the flow of information was not well disseminated in the network. In addition, the relatively low clustering coefficient of the chain-cluster network also indicated that the network did not form an obvious agglomeration pattern, i.e., it was not characterized as a small-world network (Eslami et al. 2013). However, it was worth noting that the network’s modularity reached 0.715, indicating that the network had typical modularity characteristics, i.e., tighter cliques within the network (Lacasa and Shubbak, 2018).

Table 3 Network topology.

Division of labor in the industrial chain

First, this paper used the train_test_split command in the scikit-learn library under the Python environment to split the industrial chain corpus into a training and test set. In reference to Kaushik et al. (2022), 80% was used as the training dataset and 20% as the test dataset, and this was used as the basis for training the classifier using the Multinomial NB command. The results showed that the overall accuracy of the model was 95.59%, and the accuracy, precision, recall, and F1 values for each category were shown in Table 4, which were all above 80%. It can be determined that the model has good classification performance.

Table 4 Classification results.

The classifier trained above was used to explore the division of labor in the industrial chain for the development of digital-real integration in the cultural industry, as shown in Fig. 3. The nodes of different colors in the figure represented individuals of different segments in the cultural industry chain. Among them, the purple point represented the category of cultural creativity supply, which occupied 44.32% of the overall industry chain; the green point indicated the category of cultural content R&D, which accounted for 32.36% of the overall industry chain; and the orange point represented the category of cultural product marketing, which accounted for 23.32% of the overall industry chain. It could be seen that even though the level of network agglomeration in the development of digital-real integration in the cultural industry was limited, a distinctive division of labor in the industry chain had been formed.

Fig. 3
figure 3

Diagram of the division of labor in the industry chain for the development of digital-real integration in cultural industry.

Division of labor in the ecological chain

The ecological chain similarity calculation was carried out using the ‘cosine_similarity‘ command in Python software, and the results were shown in Table 5. At the industry level, the number of enterprises with a similarity degree above 0.7 in the finance and tax service and platform management group reached 52 and 46, respectively, among which the representative enterprises of Tianjin Jincheng Bank Co., Ltd. and Datang Convergence IOT Technology Wuxi Co., Ltd. both had similarity degrees above 0.9. In the ecological interconnection group, there were 12 enterprises with a similarity degree of 0.7 or above. In contrast, the similarity degree of the scene construction group was generally lower than 0.7, which indicated that the scene construction group was the weak group in the ecological chain in the digital-real integration network for the cultural industry. At the basic level, the similarity degree of the cultural data presenting group had reached 45 enterprises with a similarity degree of 0.7 or above, among which the similarity degree of Weichuang Group Company Limited was 0.97. The similarity degrees of the cultural data processing and the cultural data trading groups were 14 enterprises each, and each enterprise’s similarity degree was 0.7 or above.

Table 5 Results of the division of labor in the ecological chain and representative enterprises.

In contrast, the number of enterprises with a similarity of 0.7 or more in cultural data collection and cultural data distribution was 3 and 0, respectively, which meant an urgent need to build and supplement chains. At the technology level, the number of enterprises with a similarity of 0.7 or more in the blockchain chain reached as high as 101, and the similarity degree of the representative enterprise, Shandong Guodun Information Science and Technology Company Limited, was 0.836. The numbers of enterprises with a similarity of 0.7 or more in 5G, big data, and cloud computing technologies were 33, 20, and 49, respectively. However, the applications of AI and VR/AR technologies were fewer, and the number of enterprises with a similarity degree above 0.7 was not more than 10. At the same time, none of the meta-universe technology group had a similarity degree of 0.7. This indicated that the development of digital-real integration in China’s cultural industry had not yet reached the advanced stage of ecological operation and that the technology gap still needed to be further guided and reinforced.

Discusion and conclusion

Given that previous research on the integration of digital-real industries has predominantly relied on qualitative analysis and employed overly generalized methods for assessing the level of integration, this study takes the cultural industry in China as a case study. By leveraging social network analysis, text classification and text similarity analysis, this paper systematically compares the characteristic differences across various stages of digital-real integration in the cultural industry over time. Furthermore, it proposes a scientifically robust methodological framework for evaluating the development of digital-real integration, offering a significant advancement in the field.

Conclusion

Based on China’s patent information, this paper found that the cultural industry has gradually moved towards the development of digital-real integration, and its cooperation network has presented a topological structure and possessed a complete industrial chain. However, the degree of agglomeration of the cooperative network for the development of digital-real integration in the cultural industry still remains to be improved. It has not yet realized a fully ecological operation, especially in the ecological chain links such as cultural data collection, cultural data distribution, meta-universe technology, VR/AR technology, and so on. Increasing the investment to promote the cultural industry better and build, strengthen, and supplement the chain is still necessary. To summarize, this paper argues that the road to the development of digital-real integration in the cultural industry is still a long way to go.

Theoretical implications

The study provides at least two theoretical implications. First, it suggests that science should be used to examine the digital-real integration industrial development trend. Previous studies focused on the development of digital-real integration of industries have mainly focused on the level of qualitative logical deduction, and few studies have quantitatively portrayed the development trend of digital-real integration of industries. At the same time, there is a lack of detailed insight into its organizational characteristics and division of labor. Therefore, this paper measures the development trend of digital-real integration in the cultural industry from the network characteristics and division of labor levels in the industrial chain and ecological chain, expanding the research field and enhancing the research tools.

Second, it is better to focus on the far-reaching impact of the development of digital-real integration on the change of industrial organization. The digital-real integration not only reshapes the online and offline cooperation relationship in the cultural industry but also changes the division of labor in the cultural industry chain and facilitates the formation of new business models and industrial ecology. This paper has revealed that digital-real integration is boosting the evolution of industrial organizations in a more flattened, networked, and collaborative direction through studying patent information, which will provide theoretical support for transforming and upgrading cultural enterprises in the context of digital-real integration.

Practical implications

The study provides at least two practical implications. First, customized policy support for weak links. Different from the previous general suggestions for optimizing the development of digital-real integration in the cultural industry, this paper points out the weak links in the industrial chain and ecological chain in the development process of digital-real integration in the cultural industry through empirical measurements, which is beneficial for the cultural industry itself and the relevant authorities can provide support and supplementation in a more targeted way. In other words, only by clarifying more clearly what the cultural industry needs to improve in building, strengthening, and supplementing the chain. It is possible to find the root causes of the problem to help the cultural industry develop digital-real integration better.

Second, multi-party cooperation is needed to promote innovative applications. Given that the level of cooperation network agglomeration in the development of digital-real integration in the cultural industry is still low, cultural enterprises, scientific research institutions, and government sectors must work together to formulate the direction of improvement based on the weak links to foster close collaboration and joint efforts among all parties, thus pushing forward the optimization and upgrading of the cultural industry chain and the ecological chain. The efficiency and value of each link in the cultural industry chain can only be truly enhanced by actively encouraging more cultural enterprises to explore and apply new technologies in digital-real integration.

Research limitations and future research directions

In terms of research limitations, this paper chose to utilize the patent cooperation network to approximate the morphological characteristics of the development of the digital-real integration in the cultural industry as a substitute, and based on this, it measured the division of labor in its industrial chain and ecological chain. This approach may produce some errors in the research results due to the limitations in the scope of patent search, years of experience, and degree of matching. However, since the Chinese government released the “14th Five-Year Plan for the Development of Cultural Industry” in 2021, which will vigorously promote the digitalization of the cultural industry, the data bias will not have a hard impact on the research conclusions. In the future, as the advanced stage of the development of digital-real integration for the cultural industry, the ecological operation of the cooperation network and the value co-creating mechanism among the entities in the chain group will be the possible direction for the digital-real integration of the cultural industry in the future to develop further. How to more comprehensively capture the ecological interaction behavior of the digital-real integration network of cultural industry and scientifically express the value co-creation process of cooperative entities under the ecological operation will be the focus of future research that needs to be explored urgently.