Background & Summary

In the context of rapid globalization, technological breakthroughs require cross-regional and cross-field collaboration1,2,3,4, causing the knowledge flow and technology flow to become more frequent and intricate5,6. Knowledge economies emphasize knowledge creation, dissemination and application, while knowledge fuels the economic development of cities and regions7,8,9. As the center of the economy and innovation, cities have become critical nodes for knowledge flows10,11. Nowadays, researchers are expressing a denser interest in unearthing cross-regional knowledge network activities in China and analyzing the status and distributional characteristics of inter-city knowledge flows12,13,14,15. Previous studies have explored the characteristics of city-level knowledge networks by constructing gravitational models and other means16,17. Recently, together with the development of big data and cloud computing technologies, high-volume patent data embeds information on the number of patents, the number of patents granted, and their respective technological fields, which creates more opportunities to investigate urban knowledge networks18,19. Compared with regional R&D expenditures and researcher numbers in earlier models, the detailed information derived from patent data offers a direct perspective on inter-city cooperation relationships and knowledge activities. These data more accurately reflect regional knowledge vitality and enable a more precise analysis13,20.

Nowadays, constructing knowledge networks based on patent data or models is gradually developing into a crucial topic in the field of knowledge14,21,22. Several studies have reported on knowledge network analyses based on patent data23,24,25. Nevertheless, for reasons of privacy, data size and processing complexity, the dataset publicly available alongside the paper is extremely sparse and limited, making it difficult for readers to comprehensively understand the formation of knowledge networks and to further explore its potential26. To this end, this paper collects and deals with large-scale patent data to construct an urban value co-creation knowledge network (UKN-C) weighted by the volume of value co-creation patents and an urban technology transfer knowledge network (UKN-T) weighted by the volume of technology transfer patents.

Initially, we extract fine-grained patent data from the China National Intellectual Property Administration (https://pss-system.cponline.cnipa.gov.cn/conventionalSearch)27. Patent data is defined as a collection of information relevant to patent application, authorization and management, primarily involving basic information such as the applicant’s name, application date and application number. The collection under consideration also encompasses the following categories of information: First, the title, type, abstract, claims, and specifications of the patent are included to provide a comprehensive description of the technical and innovation content of the patent. Secondly, legal status information is presented, such as the grant status and expiration date, as well as assignment and transferee information that covers the assignor, transferee, and date of the patent transfer. Thirdly, retrieval information such as International Patent Classification (IPC) and citations is featured to enable users to conduct technology searches and analyses. Finally, examination opinions and classification information of technical fields are comprised, which contributes to the subsequent management and market application of patents.

In light of the fact that the number of patents in China has expanded rapidly since 2011, this paper establishes a UKN-C network and a UKN-T network for each year from 2011 to 2021. Through data cleaning and pre-processing, a total of 1.23 million value co-creation patents and 1.42 million technology transfers were yielded (detailed information in the “Methods” section). Subsequently, the matching algorithm is employed to access the Baidu map API, thereby aligning the applicant information contained within the patent with the map data to retrieve the city information of multiple applicants in the value co-creation patent. The amount of value co-creation between each group of cities is then calculated. The same approach is applied to handle and analyze the assignors and assignees in the patent transfer data. Thereafter, we delineate two urban knowledge networks (UKN) with cities as nodes and inter-city knowledge connections as edges. More specifically, the first is a UKN-C network where the edge weight is defined by the amount of value co-creation patents, and the second is a UKN-T network where the edge weight is represented by the amount of directed technology transfer patents between cities.

In order to facilitate access to open data, the UKN-C network and UKN-T network information is saved in.csv format. Each network is comprised of 11 files, with each file recording the nodes and weights of the network. Additionally, raw data are provided in the dataset.

Methods

Origin data sources

This study utilizes a comprehensive patent dataset spanning from 2011 to 2021 derived from a public data source: the official digital repository of the China National Intellectual Property Administration (CNIPA; accessible via https://pss-system.cponline.cnipa.gov.cn/conventionalSearch)27. Specifically, joint patent application data describing value co-creation are systematically identified through the CNIPA database using “applicant count” as the key term. Concurrently, patent transfer records representing technology transfer are extracted from the CNIPA database through the “Patent Publication Notice” in the “Patent Examination Information Query” list. Then, value co-creation and technology transfer information is fetched through matching algorithms. Original patent data comprises patent application ID, patent name, patent type (invention, utility model, design), applicant’s information (name or organization), inventor’s information, application date, transferring parties (technology transferor and transferee), transferring date and other information. For the convenience of management and construction of urban knowledge networks, the large-scale data are sorted by year. After data cleaning, filtering and matching, altogether we obtain more than 1.23 million patents for value co-creation and more than 1.42 million patents for technology transfer. We develop UKN-C network and UKN-T network each year, wherein cities are designated as nodes and inter-city knowledge connections are recognized as edges. The edges in UKN-C network are weighted by the amount of value co-creation patents, while the edges in UKN-T network are weighted by the amount of directed technology transfer patents between cities. Each knowledge network contains accurate knowledge flow data and temporal information. Furthermore, we hereby present ArcGIS layers delineating prefecture-level administrative districts in China. Finally, we release the two UKN datasets on the figshare website as Open Data (https://doi.org/10.6084/m9.figshare.29401934.v1)28 under the Attribution 4.0 International (CCBY4.0) license.

Data preprocessing

The raw data is afflicted by personal privacy, administration changes, and company survival problems, which require rigorous data preprocessing. Specific steps are described as follows.

  • Personal privacy: Given the necessity to safeguard personal information and the mobility attributes of individuals, it is not possible to obtain the city attribution of individuals in patent applications. If the patent applicant includes an individual, delete this patent.

  • Administration changes: The administrative affiliation of Chinese cities has undergone significant changes over the past decade. In this study, we utilize the latest administrative plan as the benchmark, check the applicant’s city obtained by the matching algorithm, and delete the patents with vanished administrative units.

  • Company survival: In light of the cancellation of the patent application organization and associated concerns, the precise location of the application organization remains undetermined. Consequently, patents that hold a null value for the applicant’s location are subject to deletion.

Map-matching

Map matching algorithms are designed to match the applicant, assignor and assignee of a covered patent to cities on a digital map. Subsequent to invoking the Baidu map API interface, the selected patents have been paired with their respective prefecture-level cities. In particular, the attributed subject involved in a patent is converted into geographic latitude and longitude coordinates using Baidu Maps’ geocoding API interface. Subsequently, the latitude and longitude information is mapped to a specific geographical area in order to determine the attributed subject’s prefecture-level city.

Defining two UKN

Inter-city value co-creation patents and technology transfer patents have the ability to reveal the spatial-temporal distribution of knowledge flow between cities. Accordingly, two urban knowledge networks are constructed, in which the number of patents involved in value co-creation and technology transfer function as weighting indicators. Several definitions are presented below.

Definition 1: (The volume of value co-creation): The number of joint patent applications filed by two cities in one year.

Definition 2: (The volume of technology transfer): The number of patents transferred from one city to another in one year.

Definition 3: (Urban Value co-creation knowledge network, UKN-C): The knowledge network is delineated as an undirected graph (N, E, G), wherein N denotes the city nodes, E denotes the edges between cities due to value co-creation, and G denotes the weight of the edges, defined as the amount of value co-creation patents.

Definition 4: (Urban Technology transfer knowledge network, UKN-T): The knowledge network is illustrated as a directed graph (N, E, G), wherein N signifies the city nodes, E represents the directed edges between cities due to technology transfer, and G denotes the weights of the edges, which are equivalent to the amounts of technology transfer patents.

We first employ a map-matching algorithm to align patent-based knowledge flows with geographical representations, facilitating the identification of potential hotspots in patent activity. Valid patent data are then aggregated to calculate the amount of value co-creation and technology transfer between cities that characterize knowledge connections over one year. Eventually, two urban knowledge networks are created with cities as nodes and inter-city knowledge connections as edges: an urban value co-creation knowledge network with the amount of value co-creation as an edge weight, and an urban technology transfer knowledge network with the amount of directed technology transfer between cities as an edge weight.

Data Records

We respectively utilize approximately 1239771 pieces of value co-creation patents to construct the UKN-C network and employ 1425367 pieces of technology transfer patents to establish the UKN-T network of prefecture-level cities in China. The proposed dataset is publicly available in the figshare repository (https://doi.org/10.6084/m9.figshare.29401934.v1)28. The dataset28 structure is as shown in Fig. 1. The top-level directories include three folders, designated as “UKN-C”, “UKN-T” and “Raw Data” respectively.

  1. 1)

    “UKN-C” folder. The “UKN-C” folder consists of 11 lower-level sub-files named “2011-UKN-C”, “2012-UKN-C”, “2013-UKN-C”, etc. Each sub-file describes the UKN-C network for a given year, recording the network’s nodes and the volume of value co-creation patents between the nodes. Within each sub-file, there exist nine columns and column headings respectively are “City_ID_1”, “City_ID_2”, “Value Co-Creation Volume”, “Name of the City_ID_1”, “Longitude of the City_ID_1 “, “Latitude of the City_ID_1”, “Name of the City_ID_2”, “ Longitude of the City_ID_2”, and “Latitude of the City_ID_2”. Specific information about these column headings is described as follows.

  • “City_ID_1”/“City_ID_2”: City ID in UKN-C networks.

  • “Value Co-Creation Volume”: This weight represents the volume of value co-creation patents between “City_ID_1” and “City_ID_2”.

  • “Name of the City_ID_1”, “Longitude of the City_ID_1 “, “Latitude of the City_ID_1”: These columns describe the geographical information of “City_ID_1”, including the city’s name, longitude and latitude.

  • “Name of the City_ID_2”, “ Longitude of the City_ID_2”, “Latitude of the City_ID_2”: These columns illustrate the geographical information of “City_ID_2”, including the city’s name, longitude and latitude.

  1. 2)

    “UKN-T” folder. The “UKN-T” folder contains 11 lower-level sub-files named “2011-UKN-T”, “2012-UKN-T”, “2013-UKN-T”, etc. Each sub-file represents the UKN-T network for a particular year, recording the network’s nodes and the volume of technology transfer patents between the nodes. Within each sub-file, there exist nine columns and column headings respectively are “City_ID_1”, “City_ID_2”, “Technology Transfer Volume”, “Name of the City_ID_1”, “Longitude of the City_ID_1 “, “Latitude of the City_ID_1”, “Name of the City_ID_2”, “ Longitude of the City_ID_2”, and “Latitude of the City_ID_2”. Specific information about these column headings is described as follows.

  • “City_ID_1”/ “City_ID_2”: City ID in UKN-T networks.

  • “Technology Transfer Volume”: This weight represents the volume of technology transfer patents from “City_ID_1” to “City_ID_2”.

  • “Name of the City_ID_1”, “Longitude of the City_ID_1 “, “Latitude of the City_ID_1”: These columns describe the geographical information of “City_ID_1”, including the city’s name, longitude and latitude.

  • “Name of the City_ID_2”, “ Longitude of the City_ID_2”, “Latitude of the City_ID_2”: These columns illustrate the geographical information of “City_ID_2”, including the city’s name, longitude and latitude.

  1. 3)

    Raw Data” folder. The “Raw Data” folder comprises two sub-files, defined as “Raw Data of UKN-C.csv” and “Raw Data of UKN-T.csv”. The “Raw Data of UKN-C.csv” file introduces the raw data of value co-creation patents to construct UKN-C network, which including ten columns and column headings respectively are “Application ID of the patent”, “Year”, “City_ID_1”, “City_ID_2”, “Name of the City_ID_1”, “Longitude of the City_ID_1 “, “Latitude of the City_ID_1”, “Name of the City_ID_2”, “ Longitude of the City_ID_2”, “Latitude of the City_ID_2”. Specific information about these column headings is described as follows.

  • “Application ID of the patent”: The specific application ID of value co-creation patents.

  • “Year”: The specific application time of value co-creation patents.

  • “City_ID_1”/“City_ID_2”: City ID for specific value co-creation patents.

  • “Name of the City_ID_1”, “Longitude of the City_ID_1 “, “Latitude of the City_ID_1”: These columns describe the geographical information of “City_ID_1”, including the city’s name, longitude and latitude.

  • “Name of the City_ID_2”, “ Longitude of the City_ID_2”, “Latitude of the City_ID_2”: These columns illustrate the geographical information of “City_ID_2”, including the city’s name, longitude and latitude.

    The “Raw Data of UKN-T.csv” file introduces the raw data of technology transfer patents to construct UKN-T network, which including ten columns and column headings respectively are “Application ID of the patent”, “Year”, “City_ID_1”, “City_ID_2”, “Name of the City_ID_1”, “Longitude of the City_ID_1 “, “Latitude of the City_ID_1”, “Name of the City_ID_2”, “ Longitude of the City_ID_2”, “Latitude of the City_ID_2”. Specific information about these column headings is described as follows.

  • “Application ID of the patent”: The specific application ID of technology transfer patents.

  • “Year”: The specific application time of technology transfer patents.

  • “City_ID_1”/“City_ID_2”: City ID for specific technology transfer patents.

  • “Name of the City_ID_1”, “Longitude of the City_ID_1 “, “Latitude of the City_ID_1”: These columns describe the geographical information of “City_ID_1”, including the city’s name, longitude and latitude.

  • “Name of the City_ID_2”, “ Longitude of the City_ID_2”, “Latitude of the City_ID_2”: These columns illustrate the geographical information of “City_ID_2”, including the city’s name, longitude and latitude.

Fig. 1
figure 1

The structure of the datasets on figshare.

Technical Validation

The quality of our data on technology transfer and technology transfer is heavily dependent on the reliability of source data provided by the associated database. A series of validation tests were conducted to check the consistency and accuracy of our data.

Data’s logic consistency checks were performed by analyzing the overall network characteristics like the number of edges, the network density, the average path length, and the global efficiency across different years, as shown in Fig. 2. These indicators’ consistent variation trends demonstrate notable logical consistency of our data. Data’s statistical consistency checks were conducted by an exhaustive analysis of the distribution of node degrees in UKN-C networks and UKN-T networks, as presented in Fig. 5. The distribution of node degree conforms power law function with chi-square goodness-of-fit, which can validate the statistical consistency of our data.

Fig. 2
figure 2

Overall network characteristics of urban value co-creation knowledge network and urban technology transfer knowledge network from 2011 to 2021. (a) represents the trends in the number of edges. (b) shows network density. (c) depicts average path length. (d) describes global efficiency.

Furthermore, we check our data’s accuracy from temporal and spatial aspects. We plot the trends in the number of value co-creation patents and the number of technology transfer patents from 2011 to 2021. In 2020, the outbreak of COVID-19 influenced inter-city innovation collaboration. As shown in Fig. 3, the number of value co-creation patents exhibits a decreasing trend from 2020 to 2021. At the same time in 2020, the number of edges, the network density, and the global efficiency in Fig. 2 underwent a downward trend. These phenomena validate that the data is consistent with the outbreak of COVID-19, and also reflect our data’s accuracy. Figure 4 describes the network structure of UKN-C and UKN-T across different years. The spatial distribution extends over time to encompass inland cities in central and western China. Meanwhile, Fig. 6 indicates that the nodes and edges with high criticality in both the UKN-C and UKN-T networks are predominantly concentrated in developed cities in eastern China. This spatial distribution expansion and transition are consistent with the actual development of the region, further proving the accuracy of the data.

Fig. 3
figure 3

Number of patents for value co-creation and technology transfer from 2011 to 2021. (Note: Due to the directed nature of technology transfer, technology transfer patents are counted separately in terms of both transfer out and transfer in. The analysis does not consider special values because the network degree values conform to a power law distribution).

Fig. 4
figure 4

Spatial association architecture of the two UKNs in 2011,2016 and 2021. (ac) illustrate the spatial structure of the UKN-C network. (df) show the spatial structure of the UKN-T network. In the background map, pink represents eastern cities, purple represents central cities, blue represents western cities, and gray indicates missing data. It should be noted that the maps used are sourced from the Ministry of Natural Resources of the People’s Republic of China (http://bzdt.ch.mnr.gov.cn/). These maps are produced by the Standard Map Service System (license No. GS (2023)2767) with a cartographic scale of 1:780,000,000, which is one of the standard maps of the 2023 edition.

Fig. 5
figure 5

Distribution of network edge weight and network degree for urban value co-creation knowledge network and urban technology transfer knowledge network in 2011,2016 and 2021. (a,b) respectively show the explicit distribution of urban value co-creation knowledge network edge weight percentage and urban technology transfer knowledge network edge weight percentage. (ce) illustrate the distribution of node degrees in value co-creation knowledge networks and technology transfer knowledge networks. We employ a power law to fit the degree distribution of the urban knowledge network and test the results with chi-square goodness-of-fit.

Fig. 6
figure 6

Spatial distribution of the network’s degree. (ac) is the degree distribution of the urban value co-creation knowledge network. (df) shows the out-degree distributions of the directed urban technology transfer knowledge network. (g,h) (i) show the in-degree distributions of the directed urban technology transfer knowledge network.

Limitations

During data preprocessing, we systematically clean data considering personal privacy, administration changes, and company survival problems. While the data cleansing rule enhances compliance and relevance, it may lead to underestimation or systematic bias in knowledge flow activities of specific entities (such as individual inventors) or regions/periods. Future research could employ more elaborate methods to correct for this non-random attrition.