Introduction

A firm’s innovation capability is central to its survival, as proficiency in experimentation, risk-taking, and discovery enables it to explore new knowledge domains and achieve continual renewal (Markus et al. 2025). As the complexity of innovation increases, it becomes more difficult to rely on internal knowledge elements alone to support the firm’s innovation needs. Therefore, firms commonly choose to establish extensive technology R&D collaboration with external partners and acquire more new and heterogeneous knowledge elements through external collaboration in order to enrich their innovation resources. When determining whether and how to innovate, firms meticulously weigh the associated costs against the potential returns and divide innovation activities into two types, exploratory and exploitative, which typically involve distinct costs, risks, and profits (Lian et al. 2025). To maintain competitiveness in dynamic markets, firms must actively engage in exploratory innovation, which is essential for sustaining a competitive advantage. Unlike exploitative innovation, which involves low-cost and low-risk innovations based on existing knowledge and technology, exploratory innovation requires firms to go beyond the existing knowledge systems and break free of technological inertia to explore the needs of new markets and customers; their innovative outcomes materialize in the form of patents (Zinilli et al. 2023; Balsa-Barreiro et al. 2023; Balsa-Barreiro et al. 2019). At present, the artificial intelligence (AI) industry epitomizes exploratory innovation, in which China has excelled. Based on its mandate to become a top-tier innovative nation by 2030, China is emphasizing a strategic transition from technology adoption to indigenous innovation, particularly in frontier domains such as AI. Because of this policy backdrop, understanding how firms can effectively conduct exploratory innovation has become not only a matter of academic inquiry but a pressing practical imperative. In this study, we pose a general research question: How do firms conduct exploratory innovation, and what role do knowledge elements and collaboration networks play in this process?

Previous research has demonstrated the importance of knowledge elements and collaboration networks in promoting exploratory innovation performance. A central challenge in exploratory innovation is acquiring sufficient new knowledge elements (Zinilli et al. 2023; Balsa-Barreiro et al. 2020; Balsa-Barreiro et al. 2023). The knowledge-based view (KBV) defines the organizational knowledge base as an aggregation of knowledge elements and posits knowledge as a key resource for innovation (Lu and Atour 2025). Therefore, firms can obtain a lasting competitive advantage by internally integrating their existing knowledge elements, strengthening knowledge management capability. Knowledge substitutability, complementarity, and diversity are important characteristics of an organizational knowledge base. Previous research has confirmed that the substitutability or complementarity of the knowledge base of a firm has positive effects on its innovation outcomes (Kim et al. 2021). However, the impact of knowledge characteristics is still subject to debate. Although knowledge diversity can foster new ideas (Filippetti and Guy 2020), it can also incur information-processing costs that hinder innovation (Kim et al. 2021), hence, the innovation process could be shaped by multiple factors. Given the growing complexity of innovation activities, firms cannot fulfill their innovation requirements solely with internal knowledge elements (Lian et al. 2025; Balland and Boschma 2021; Zinilli et al. 2024). Prior papers have shown that during the innovation process, relationships form among knowledge elements, leading to the creation of internal knowledge networks. These networks enable firms to engage in external knowledge search in order to influence firms’ exploratory innovation performance through external collaboration networks (Zhang et al. 2024).

Social network theory (SNT) posits that a social structure is made up of complex networks of social relationships, which are intimately linked to an organization’s acquisition of knowledge and informational resources. Some studies show that the structural attributes of collaboration networks and the position of a firm in a collaboration network impact exploratory innovation because firms in advantageous collaboration networks can accelerate knowledge transfer, enhance absorption capacity, and identify valuable external knowledge to improve research and development (Zhao et al. 2024). Prior research typically divides network characteristics into those that are structural and those that are relational (Liu et al. 2024). Structural characteristics, such as centrality and structural holes, concern a firm’s position in a collaboration network. In general, firms that are centrally located in a collaboration network acquire new knowledge and innovation more easily because of frequent knowledge sharing. The relational characteristics indicate how much information is shared between firms and other network members, which is reflected in how close and extensive the relationships are with other objects of innovation. The collaboration breadth and depth are leading indicators of the impact of the relational dimension on exploratory innovation performance.

Although numerous scholars have investigated the impact of internal knowledge elements or external collaboration networks on firms’ exploratory innovation performance, the combined effects of internal and external factors have been overlooked. Furthermore, most studies have given much attention to the linear or simple nonlinear effects of knowledge elements and collaboration networks on exploratory innovation performance using traditional empirical methods and neglect the complex nonlinear interdependence and multivariate combinatorial effects among the variables. Additionally, prior studies often pay little attention to the heterogeneous characteristics of firms and fail to draw targeted conclusions across different contexts.

Therefore, in order to identify the crucial factors that affect exploratory innovation performance at Chinese AI firms and reveal the complex mechanisms that influence firms’ exploratory innovation performance, we examine the interplay between knowledge elements and collaboration networks using an integrated perspective. Using a sample of patent data and employing machine learning methods, we analyze the complex nonlinear effects of knowledge element and collaboration network characteristics on exploratory innovation performance and propose different enhancement strategies for improving firms’ exploratory innovation performance. This study offers valuable insights into innovation development at Chinese AI firms and outlines managerial implications for both managers and policy makers.

The study addresses these following core research questions: (1) How can Chinese AI firms be categorized based on patent-derived knowledge element and collaboration network characteristics, and what are the distinguishing features of each category? (2) Which characteristics of knowledge elements and collaboration networks are the crucial factors that affect the exploratory innovation performance of Chinese AI firms, and what combinations of characteristics are most effective for driving high innovation performance? (3) What complex nonlinear relationships exist between these factors and exploratory innovation performance? What distinct pathways can different firm types enhance exploratory innovation performance?

The rest of this paper is organized as follows. Section “Literature review” summarizes the progress of firm’s knowledge base, collaboration networks and exploratory innovation performance and puts forward current shortcomings. Section “Research design” introduces the research framework and research methods. Section “Data processing and variable measurement describes the data processing and main variable measurement”. Section “is of characteristics and distinction among firms” presents the correlation analysis and group division results. Section “Decision rule analysis” explains in detail the pathways for improving firms’ exploratory innovation performance. Section “Conclusions and discussions” discusses the conclusions, theoretical contributions, managerial implications, and limitations of this study as well as suggestions for future research.

Literature review

To examine the interactive effects of knowledge elements and collaboration networks on firms’ exploratory innovation performance, this section systematically reviews extant literature.

Knowledge base and exploratory innovation performance

The knowledge base is a collection of knowledge elements in various technical fields involved in the production and operation activities of the firm, and the continuous acquisition of knowledge and the iterative updating of technology have become the key to firm innovation. The KBV is commonly seen as an extension of the resource-based view (Feng et al. 2024). KBV posits that knowledge characterized by complexity, persistence, and transferability is the most crucial resource of a firm. Firms with superior knowledge bases demonstrate enhanced capabilities to identify latent business value within environments promptly and forecast market developments more accurately.

From an organizational boundary perspective, the knowledge architecture of firms bifurcates into internal and external knowledge bases. The internal knowledge base, typically manifested through prior knowledge stocks, governs the firm’s potential versus realized absorptive capacity. Conversely, the external knowledge base embodies relational capital through strategic linkages with clients, suppliers, research institutions, and industry peers within innovation ecosystems (AlNuaimi et al. 2021). On the one hand, many scholars have discussed the impact of knowledge integration and creation within firms on firms’ innovation performance: in a study based on the patents of the electronic medical device industry in the United States, Wu et al. (2009) find that the exploratory innovation performance of a firm is strongly linked to the breadth of its knowledge elements. They indicate that firms with fewer knowledge elements are more likely to attain high exploratory innovation performance. Xu and Zeng (2021) analyze patent from Chinese automotive firms spanning 2001 to 2014, employing four-digit International Patent Classification (IPC) codes to construct the firm’s knowledge base. Their findings demonstrate that solid knowledge restructuring capabilities result from high complementarity among knowledge elements, with this relationship holding consistently across heterogeneous innovation contexts. However, Dibiaggio et al. (2014) use patent statistics to trace firms’ technological competencies and to analyze their knowledge base characteristics and believe that excessive complementarity among knowledge elements can make it challenging for firms to apply their innovation experience to new fields when exploring new territories, negatively impacting exploratory innovation activities. On the other hand, many studies examine how firms utilize various search mechanisms to acquire new knowledge and combine it to enhance their innovation performance (Sharma et al. 2024). Stojcic and Chidlow (2024) focus on the impact of knowledge search channels on firm innovation and find that compared with non-digital knowledge search channels, digital search channels reduce the risk of innovation failure exclusively for knowledge-creating firms. Cen et al. (2023) clarify the relationship between knowledge search strategies and general purpose technology innovation in emerging industries, and find that the heterogeneity of an industry alliance positively moderates the relationship between distant search, balanced search, and general purpose technology innovation. Based on the data of collaborative patent, Tian et al. (2024) find that there exists a significant inverted U-shaped relationship between innovation performance above aspiration and the search breadth and depth.

Collaboration networks and exploratory innovation performance

Innovation is a social activity which is heavily influenced by an organization’s social relations and the network in which it is situated (Dalenogare et al. 2023). Many scholars believe that interaction among organizations can help firms overcome the insufficiency of information, knowledge, and resources. SNT examines the position and status of each member in the network and their relationship with other members. Many studies have discussed how network characteristics affect firms’ exploratory innovation performance using SNT. For example, Su et al. (2022) argue that collaboration networks with higher centrality and more structural holes have greater communication frequency between firms and their partners. This, in turn, improves the quantity and quality of technology and knowledge that firms acquire, further driving their capability for exploratory innovation. According to ma et al. (2020), although improving the network status of most firms can boost exploratory innovation performance, firms located in the core network position should avoid unthinkingly pursuing more structural holes. Excessive heterogeneous knowledge from structural holes can distract firms from existing research work and hinder exploratory innovation performance. Gölgeci et al. (2019) find that deep collaboration can help firms acquire heterogeneous knowledge, but extensive collaboration can be distracting and negatively impact innovation performance. In patent research on biotechnology companies, Karamanos (2011) finds that network density and clustering coefficients are crucial in regulating firms’ exploratory innovation performance. Information and knowledge are shared more easily in dense networks than sparse ones. Based on patent data from university alliances partnerships, Slavova and Jong (2021) show that collaboration with universities substantially increases collaborative breadth and drives firms’ exploratory innovation performance. However, existing research mainly uses qualitative or linear regression methods (Akman et al. 2023), making it challenging to study the nonlinear relationships between variables. Some studies have introduced machine learning into the management field, visualizing the complex relationships between variables through model construction and prediction (Liu et al. 2024; Zinilli et al. 2025), resulting in more accurate and objective conclusions.

Complex systems theory investigates systems composed of numerous interacting non-decomposable elements, where systemic behaviors exhibit emergent properties irreducible to linear aggregation of individual components (Estrada 2024). This theoretical framework emphasizes nonlinearity, emergence phenomena, and adaptation (Siegenfeld and Bar-Yam 2020; Morales et al. 2019). Within innovation ecosystems, the theory posits that nodal positionality and relational dynamics of firms co-determine innovation outcomes through network interdependencies. Balsa-Barreiro et al. (2019) conceptualizes innovation as emergent phenomena arising from interactions among heterogeneous agents in dynamic networks, demonstrating that firms occupying privileged network positions can accelerate knowledge recombination through diversified knowledge inflows. Complex systems theory plays a significant complementary role to the KBV and SNT, and it provides a new perspective for this study. Based on the KBV, SNT and complex systems theory, this study posits that firms’ exploratory innovation performance is influenced by the interaction of internal knowledge element and external collaboration network characteristics.

Research indicates that network topology decisively determines the stability of the system and the interactive behavior of network members (Balsa-Barreiro et al. 2020; Morales et al. 2019). With the increase in the number of collaboration network members, the network shows a trend of complexity and has received the attention of related scholars, some scholars believe that the topology of the network can determine the stability of the system and the interactive behavior of network members, numbers of scholars are focusing on the impact of network topology on firm innovation: Yu et al. (2024) focus on the impact of complex international trade network structure on green innovation performance, and conclude that there is a nonlinear U-shaped relationship between mediational centrality and proximity centrality and green innovation performance; Wu et al. (2024) explore the impact of semiconductor firms’ embeddedness in a global innovation network on innovation performance, and conclude that there is a positive U-shaped relationship between structural holes and innovation performance; Fang et al. (2019) comparatively analyze the differences in network capabilities required by exploration-oriented and development-oriented networks, and conclude that the positive impact of network structure capabilities on innovation performance is greater than that of network relationship capabilities in exploration-oriented networks, while the opposite is true for development-oriented networks.

Based on the above analysis, numerous scholars have investigated the impact of internal knowledge elements or external collaboration networks on firms’ exploratory innovation performance. However, existing researches predominantly examine the influence of either internal knowledge elements or external collaboration networks in isolation, lacking a holistic investigation into their synergistic effects on exploratory innovation performance. Furthermore, most research relies on traditional empirical method such as regression analysis to explore linear or simple nonlinear relationships, with limited attention to the complex nonlinear causal mechanisms between knowledge elements, collaboration networks, and exploratory innovation. Additionally, current research often ignores the heterogeneity of firms and adopts a generic perspective that fails to generate targeted conclusions. Therefore, this study proposes a data-driven approach employing machine learning methodologies to reveal the complex nonlinear effects of feature combinations derived from knowledge elements and collaboration networks on exploratory innovation performance across heterogeneous firm types, thus providing novel theoretical perspectives and methodological tools for innovation management research.

Research design

This section lays out the research framework and introduces the normal cloud model, the hierarchical clustering algorithm, and the classification and regression tree (CART) algorithm used in this study.

Research framework

To investigate the impact of knowledge elements and collaboration networks on the exploratory innovation performance of different types of firms, the hierarchical clustering algorithm is employed to group firms based on knowledge elements and collaboration network characteristics. Additionally, the CART algorithm is used to analyze the complex relationship between variables and firms’ exploratory innovation performance in depth. The study is divided into the following three parts, as illustrated in Fig. 1.

Fig. 1
Fig. 1
Full size image

Research framework.

First, feature selection is conducted by obtaining relevant patent data from the Chinese AI industry through the IncoPat Global Patent Database. The collaboration relationships between firms are determined based on the patentee of patents, and the IPC codes of patents are extracted as knowledge elements of firms. Based on the KBV and SNT, we select six characteristic variables of knowledge substitutability (KS), knowledge complementarity (KC), knowledge diversity (KD), collaboration breadth (CB), collaboration depth (CD), and local clustering coefficient (LCC) as knowledge element and collaboration network characteristics. Next, group division is performed by the hierarchical clustering algorithm, which divides firms calibrated by the normal cloud model with similar characteristics into distinct groups, considering the differences between firms. Finally, decision rule analysis is applied to mine potential decision rules for exploratory innovation performance among different types of firms by using the CART algorithm. Doing so reveals the complex relationships between characteristic variables and firms’ exploratory innovation performance and the heterogeneous influence of different feature combinations on firms’ exploratory innovation performance. Our results lead to management implications for firms and relevant departments.

Research methods

In the field of management research, traditional empirical methods such as regression analysis have been widely used to address various theoretical and practical issues, in exploring linear or simple nonlinear relationships between variables. However, in reality, management effects are usually created by the complex interdependence among multiple factors, rather than being influenced by a single independent factor. These relationships are often manifested as multivariate combinations and complex nonlinear interactions, which cannot be adequately captured by traditional statistical methods, which focus on linear or simple nonlinear relationships between variables. Similarly, firms’ exploratory innovation performance seldom arises from isolated factors; rather, it emerges from complex interplay among multiple factors, such as internal knowledge elements and external collaboration networks. Specifically, because of their reliance on prior theoretical models, traditional empirical methods have inherent limitations in revealing these complex relationships when they are used to examine the multivariate combinatorial effects and complex nonlinear relationships among the variables.

To address these complex multivariable configurations and nonlinear dynamics in management science more effectively, many pioneering scholars have introduced machine learning (ML) methods in management research. Grant and Yeo (2018), Zhou and li (2024), li et al. (2025), Wan et al. (2025), and others have explored issues in management research regarding complex nonlinear influence mechanisms using ML methods, such as normal cloud model, clustering algorithms, and decision tree algorithms. Unlike traditional empirical methods, this emerging research paradigm based on ML techniques not only yields effective solutions to complex management problems but also represents a significant breakthrough in research methodology. These interdisciplinary approaches have substantial potential for advancing theoretical understanding while retaining practical relevance for addressing contemporary management challenges.

In light of the methodological considerations, the main goal of this study is to examine the impact of complex influence mechanisms of external knowledge elements and internal collaboration networks on exploratory innovation performance by different types of firms. To do so, we design a coherent analytical pathway combining correlation analysis, variance inflation factor (VIF), a normal cloud model, hierarchical clustering, and the CART algorithm. We make these methodological choices for two primary reasons. First, they are driven by the intrinsic demands of the research questions. Second, they fully leverage relevant methodological explorations in previous management research, in order to reveal the complex relationships among variables and underlying knowledge rules from objective data using more suitable research methods.

Based on the foregoing analysis, the study employs a clear, sequential analytical framework with multiple ML methods. First, to confirm the validity of the measurements, we perform a correlation analysis to quantify the relationships among the variables, thereby mitigating potential multicollinearity issues. Second, we calculate the VIF to eliminate severe collinearity threats. Third, we use a normal cloud model to calibrate complex data and make it conform to a statistically normal distribution, reducing potential interference from data quality issues and establishing a rigorous data foundation for subsequent analysis. Fourth, because of the heterogeneity of internal knowledge elements and external collaboration strategies of different types of firms, we use the hierarchical clustering algorithm to perform a more targeted analysis by forming groups of firms with knowledge elements and collaboration networks that have similar characteristics. Finally, we use the CART algorithm to capture the complex nonlinear effects of knowledge elements and collaboration networks on firms’ exploratory innovation performance. This algorithm can efficiently reveal the complex nonlinear relationships and multivariate combinatorial effects among variables and, then, extract clear decision rules for improving exploratory innovation performance for different types of firms. Detailed technical specifications for the three core methods are outlined in subsequent sections.

The normal cloud model is a mathematical model used to describe uncertainty and ambiguity. It maps the original data to a specific range through the membership function to make it obey a normal distribution. By combining certainty, randomness, and fuzziness, the normal cloud model can better reflect the complexity and uncertainty of data in the real world, thereby improving the accuracy and reliability of subsequent data analysis. Data calibration is a critical prerequisite for enhancing data quality and ensuring the reliability of subsequent analysis. Traditional data calibration methods fundamentally lack the capacity to capture the inherent fuzziness and randomness in data, which is a significant limitation because the real-world innovation management data often have ambiguous boundaries and rarely strictly conform to predefined theoretical distributions. In contrast, the normal cloud model operates without requiring strict distributional assumptions. By effectively combining qualitative cognition with quantitative data through its digital characteristics, the essential statistical properties and distributional morphology of the data are preserved. This capability enables the normal cloud model to yield more robust and reliable analytical outcomes in addressing complex and uncertain research problems.

Hierarchical clustering is a clustering algorithm based on the distance between data objects (Kwon et al. 2021). As one of the main techniques in data mining, clustering utilizes information from unlabeled datasets to divide the objects into clusters. This process ensures that similar data objects are grouped together, while dissimilar ones are separated, enabling targeted analysis of distinct clusters (Higuchi and Maehara 2021). Clustering algorithms can divide data into multiple clusters by high in-group homogeneity and intergroup heterogeneity.

The pronounced heterogeneity in feature variables among firms requires that firms be divided into different types and that their factor combinations be investigated separately. Traditional research often relies on a priori distinctions, such as firm ownership (Dong et al. 2025), regional heterogeneity (Yang et al. 2025), and industry type (OuYang et al. 2024). This methodology inherently presupposes that all firms share have similar innovation contexts. Consequently, it might overlook critical differences in firm characteristics among firms and fail to capture innovative development patterns that go beyond traditional regional or industrial boundaries. Hierarchical clustering aims to group samples based on similarity, and it does not rely on any prior labels (Cabezas et al. 2023). By grouping samples with similar attributes, these algorithms effectively control intersample variability. This helps minimize interference from potentially irrelevant variables when variable relationships are examined, thereby serving a function analogous to that of traditional statistical control variables (Grant and Yeo 2018). Numerous scholars have applied clustering methods in management research to define and explore the heterogeneity of research subjects (Higuchi and Maehara 2021; Grant and Yeo 2018; Zhou and Li 2024). Some common clustering methodologies are hierarchical clustering, K-means, and DBSCAN Affinity Propagation clustering. Hierarchical clustering offers distinct advantages over other clustering methods: it requires no predefined cluster number, provides intuitive visual interpretations, and adapts to complex data structures. Based on the foregoing analysis, this study adopts the hierarchical clustering algorithm to divide the focal firms into types, fully consider the impact of the heterogeneity of different clusters on firms’ exploratory innovation performance, and ultimately come up with a more targeted research conclusion.

CART is a machine learning algorithm for classification and regression problems (Breiman et al. 1984). It has high computational efficiency and can deal with continuous data well. The CART algorithm mainly divides the dataset into subsets with lower impurity based on the Gini index and generates a tree structure that can accurately classify or regress the data (Yao et al. 2022). The leaf nodes of the decision tree represent the classification results, and the internal nodes represent the test conditions of a property division. The path from the root node to each leaf node is the decision rule in the combination of such conditional attributes. The CART algorithm visualizes decision making processes through a binary tree structure. Compared with other decision tree algorithms, it generates fewer branches and produces simple, interpretable decision rules.

The selection of the CART is justified by its unique advantages, which align closely with our research objectives. This study analyzes the complex nonlinear effects of knowledge element and collaboration network characteristics on exploratory innovation performance. The decision rules generated by its tree structure intuitively reveal the pathways for improving firms’ exploratory innovation performance, and the high interpretability enables clear revelation of complex nonlinear relationships among multiple influencing factors (Wang et al. 2024). Moreover, its ability to handle imperfect datasets makes it particularly suited for analyzing complex firm data, free of the constraints of the distributional assumptions required by traditional parametric models (li et al. 2025). Numerous scholars have employed the CART algorithm to investigate complex nonlinear relationships in firms’ innovation performance. For instance: li et al. (2025) utilize CART to compare the combinatorial differences of innovation performance factors between intelligent firms and flexible firms; Zhou and li (2024) analyze nonlinear relationships between multiple factors and exploratory innovation performance through CART; Wan et al. (2025) apply CART from a dual-network perspective to explore multidimensional pathways for enhancing technological catch-up performance. To better address the research problem. Based on previous studies, this study adopts the CART algorithm to systematically examine potential decision pathways for Chinese AI firms’ exploratory innovation performance from the perspectives of patented technological knowledge elements and patented technological collaboration networks, as well as to analyze the complex nonlinear effects between variables.

Data processing and variable measurement

This section introduces the data processing process, the main variables, and their corresponding measures.

Data source and processing

In this new era of scientific and technological advancement, the AI industry plays a crucial role in shaping our lives. Therefore, conducting research and focusing on AI industry’s development is essential. The AI industry is marked by frequent technological innovations and widespread collaboration among firms.

The definition of firms’ ownership of knowledge elements and their collaborative relationships has relied on explicit metrics such as firm annual reports (Tang 2025), R&D investment (Lian et al. 2025), and patent data (hou et al. 2024), with the IPC codes often used as the core metric of firms’ knowledge elements because of its unique advantages in the analysis of knowledge element portfolios. First proposed by Scherer (1965) and has been used in various research studies. As a crucial approach for firms to engage in formal external collaborations, patent collaboration effectively reflects the technological innovation partnerships between firms and external entities. The co-application relationships in collaborative patents serve as an excellent indicator of firms’ external technological collaboration networks. Although the use of IPC codes and patent co-application relationships to measure knowledge elements and collaboration network characteristics has certain limitations in capturing firms’ tacit knowledge and informal collaborative relationships, this methodology remains widely adopted by scholars. Particularly in technology-intensive industries, this measurement approach demonstrates significant applicability. Therefore, this study selects patent data from Chinese AI firms as proxy variables for both knowledge elements and collaboration network characteristics. This study focuses on authorized patents in the IncoPat Global Patent Database granted between 2017 and 2022 in the Chinese AI industry, with a sample of 145,184 patents applied for by firms. Extant research predominantly employs 3- or 5-year windows to measure exploratory innovation performance. Gilsing et al. (2008) contend that within three years after patenting knowledge, firms remain in the exploration phase aligned with exploratory innovation definitions, whereas after five years, patent value substantially depreciates due to knowledge obsolescence. Concurrently, China’s patent granting cycle typically requires 1-3 years from application to authorization. Given AI technology’s accelerated development, this sector exhibits significantly shorter iteration cycles than traditional industries (Gao et al. 2025). Considering the lagged effect of a firm’s knowledge elements and external collaboration networks on its exploratory innovation performance, this study adopts Zhang and Luo’s (2020) three-year window methodology and measures the independent variables by using 2017–2019 patent data to identify firms’ knowledge elements and collaboration networks. Simultaneously, by using all technological categories contained in firms’ granted patents during 2017–2019 as the baseline, we quantify the number of patents within newly emerged technological categories (IPC subclasses) that appeared during 2020–2022 but absent in 2017–2019, and measure firms’ exploratory innovation performance. Therefore, the patents are divided into two groups: the first group consists of patents acquired in the first three years, while the second group comprises patents obtained in the last three years. To construct collaboration networks among firms and calculate the knowledge element and collaboration network characteristic variables, we use 37,194 patents granted between 2017 and 2019. A further 107,990 patents granted from 2020 to 2022 are used to calculate firms’ exploratory innovation performance. To ensure that the collaboration networks are not too sparse, we include only firms with at least three collaborative patents between 2017 and 2019. After the data are cleaned and screened, the final sample consists of 260 focal firms with 17,891 patents between 2017 and 2022.

Variable measurement

To analyze the influence of internal knowledge element and external collaboration network characteristics on firms’ exploratory innovation performance, we draw on previous research, selecting knowledge element characteristics, including knowledge substitutability, knowledge complementarity, and knowledge diversity, and network characteristics, including collaboration breadth, collaboration depth, and local clustering coefficient. This study measures firms’ knowledge element characteristics through patent IPC codes and measures firms’ collaboration network characteristics through patent collaboration data by selecting.

Knowledge element characteristics

New knowledge is often the result of reorganizing existing knowledge elements. The structure of a firm’s knowledge base reflects the development strategy adopted by the firm to a large extent. Knowledge complementarity means that the firm’s knowledge base is not acquired randomly. Instead, it emphasizes coherence among knowledge elements to ensure high professionalism (Dibiaggio et al. 2014). Knowledge substitutability refers to the redundancy of existing knowledge elements within a firm, which can help firms gain a competitive advantage, especially in the face of immature technology (Dibiaggio et al. 2014). Knowledge diversity refers to firms having heterogeneous knowledge elements, which is critical to knowledge reorganization (Walrave et al. 2024). Therefore, knowledge substitutability (KS) and knowledge complementarity (KC) describe the combination of knowledge elements within firms, while knowledge diversity (KD) reflects the level of the stock of knowledge held by firms.

Knowledge substitutability (KS) is the extent to which different knowledge or skills are substitutable. In other words, if one knowledge element or skill is unavailable or infeasible, another knowledge element or skill can be used to achieve the same objective or complete the same task (Kim et al. 2021).

First, assume that at firm I, Plk = 1 means that patent k is in IPC code l, that is, knowledge domain l (l = 1, 2, …, n), so the number of patents in knowledge domains l and j is:

$${V}_{{lj}}=\mathop{\sum }\limits_{k}{P}_{{lk}}{P}_{{jk}}$$
(1)

For pairwise calculation of all knowledge domains, a knowledge co-occurrence matrix of firm i is obtained, which is a symmetric matrix composed of Vlj. Knowledge similarity Slj is calculated with Eq. (2), indicating that the degree of similarity between knowledge domains l and j is measured when another knowledge domain m co-occurs:

$${S}_{{lj}}=\frac{{\sum }_{m=1}^{n}{V}_{{lm}}{V}_{{jm}}}{\sqrt{{\sum }_{m=1}^{n}{V}_{{lm}}^{2}}\sqrt{{\sum }_{m=1}^{n}{V}_{{jm}}^{2}}}$$
(2)

Finally, knowledge domains l and j are weighted averages, respectively, to obtain the knowledge substitutability of firm i:

$${{WAD}}_{{lj}}=\frac{{\sum }_{j\ne l}{S}_{{lj}}{P}_{{jt}}}{{\sum }_{j\ne l}{P}_{{jt}}}$$
(3)
$${{KS}}_{i}=\mathop{\sum }\limits_{l}\left({{WAD}}_{{lj}}\frac{{P}_{{jt}}}{{{\sum }}_{{l}}{P}_{{lt}}}\right)$$
(4)

Knowledge complementarity (KC) is the extent to which individuals or teams within an organization have knowledge in diverse fields or areas of expertise. They can complement and enhance their knowledge by collaborating and communicating to effectively solve more complex and comprehensive problems or tasks (Kim et al. 2012).

$${\lambda }_{{jk}}=\frac{{V}_{{ik}}-{\mu }_{{jk}}}{{\sigma }_{{jk}}}$$
(5)
$${{WAD}}_{{jk}}=\frac{{\sum }_{j\ne l}{\lambda }_{{jk}}{p}_{{jt}}}{{\sum }_{j\ne l}{p}_{{jt}}}$$
(6)
$${{KC}}_{i}=\mathop{\sum }\limits_{k}\left({{WAD}}_{{kt}}\frac{{p}_{{kt}}}{{{\sum }}_{{k}}{p}_{{kt}}}\right)$$
(7)

where Vik represents the times of the co-occurrence of knowledge domains j and k, and are the expected value and standard deviation, respectively, when the co-occurrence of knowledge elements is assumed to be random and subject to hypergeometric distribution. Equations (7) and (8) are then used to calculate the weighted average knowledge domains j and k, respectively. Finally, the knowledge complementarity (KC) of firm i is obtained.

Knowledge diversity (KD) is a firm’s internal knowledge composed of diverse knowledge elements (Wen et al. 2021), as measured by the Herfindahl-Hirschman Index (HHI) (Stephan et al. 2019). The specific equation is as follows:

$${{KD}}_{i}=\mathop{\sum }\limits_{j=1}^{N}{({X}_{j}/X)}^{2}$$
(8)

where is the four-digit IPC code, and is the total number of four-digit IPC codes owned by firm i is the number of patents owned by firm i with the jth IPC code, and is the sum of all the patents owned by firm i.

Collaboration network characteristics

Many studies have discussed the relationship between collaboration networks and firms’ innovation capability. Collaboration breadth indicates the ability to acquire diverse knowledge and complementary resources, whereas collaboration depth suggests an opportunity to gain new knowledge from external collaboration (Kobarg et al. 2019). Clustering coefficient reflects the small-world attribute of the network, a higher local clustering coefficient can enhance the trust relationship between partners with small-world attributes in the collaboration network (Zhou and Li 2024). Therefore, this study selects collaboration breadth (CB), collaboration depth (CD), and local clustering coefficient (LCC) to reflect firms’ external collaboration.

Collaboration breadth (CB) refers to the number of collaborative relationships established between network members, which reflects the extent of collaboration of the focal firms in the network.

$${{CB}}_{i}=\mathop{\sum }\limits_{j=1}^{n}{N}_{{ij}}$$
(9)

where \(j\) is the number of nodes in the network. If node i is connected to node j in the network; \({N}_{{ij}}=1\); otherwise, \({N}_{{ij}}=0\).

Collaboration depth (CD) is how often focal firms interact with their partners. When interaction frequency is high, more stable collaboration relationships result. Some studies suggest that higher collaboration depth leads to better resource sharing and positively impacts exploratory innovation performance (Yan and Guan 2018). The specific equation is as follows.

$${{CD}}_{i}=\frac{{w}_{i}}{{k}_{i}}$$
(10)

where \({w}_{i}\) is the number of edges directly connected to node i and \({k}_{i}\) is the number of partners of node in the network.

Local clustering coefficient (LCC) is the information transmission and cohesion among local partners in a certain period (Yan and Guan 2018). A high local clustering coefficient is believed to promote information flow and improve mutual trust between partners. Following Watts and Strogatz (1998), we use the following equation.

$${{LCC}}_{i}=\mathop{\sum }\limits_{i=1}^{N}\frac{2{e}_{i}}{{k}_{i}({k}_{i}-1)}$$
(11)

where \({k}_{i}\) is the number of nodes adjacent to node i and \({e}_{i}\) is the number of connections between all neighbors of node i.

Exploratory innovation performance

Numerous scholars measure exploratory innovation performance using time windows. Zhang and Luo (2020) demonstrate that inter-firm collaborations typically persist for 3–5 years, establishing three-year windows as optimal for capturing exploratory innovation dynamics. Guan and Liu (2016) operationalized five-year windows to quantify exploratory innovation in nano-energy firms, an approach validated by Wen et al. (2021) who confirmed measurement robustness across 4–6-year windows. Given the rapid iteration cycles in AI technologies, this study adopts Zhang and Luo’s (2020) methodology, employing a three-year window to measure exploratory innovation performance, if a knowledge element appears in a firm’s patent in the t + 1 period but not in the previous t periods, it can be considered an exploratory knowledge element. This study uses the number of technology categories represented by the first four digits of the patent IPC code (IPC subclass) as the knowledge element to measure firms’ exploratory innovation performance. Specifically, the number of patents owned by a firm for new knowledge elements that emerged during 2020–2022 but did not emerge in 2017-2019 is considered the firm’s exploratory innovation performance.

Furthermore, to validate the robustness of the three-year window for measuring exploratory innovation performance, this study conducts systematic analysis using Levene’s test for homogeneity of variance. Drawing on Wen et al. (2021), we construct three biennial sliding-window measures (2019–2020, 2020–2021, 2021–2022) based on IPC codes identification. When tested alongside the triennial window measure (2020–2022), results indicate failure to reject the null hypothesis of variance homogeneity at α = 0.05. This confirms that the three-year window effectively captures dynamic fluctuations in exploratory innovation performance, exhibiting variance structures statistically indistinguishable from sliding-window measurements. Consequently, three-year window approach for measuring exploratory innovation performance in this study is methodologically justified, with robustness checks confirming the stability of our findings.

Analysis of characteristics and distinction among firms

This section conducts a correlation analysis of the variables to avoid potentially severe collinearity. Meanwhile, focal firms with similar network characteristics are clustered using the hierarchical clustering algorithm.

Correlation analysis

In order to prevent variable correlation from affecting the reliability of our results, and to explore more deeply the complex relationships among various variables in this study, we analyze correlation coefficients between pairs of variables and then calculate the VIF to assess the potential for multicollinearity among variables and avoid severe multicollinearity. Figure 2 illustrates the correlation coefficients of the knowledge element and collaboration network characteristics and firms’ exploratory innovation performance. The result demonstrates that there are significant differences in the degree of correlation among various variables. Notably, the correlation coefficient between knowledge complementarity (KC) and knowledge diversity (KD) demonstrates a moderate positive correlation, with a correlation coefficient of 0.596. It reveals that knowledge complementarity (KC) and knowledge diversity (KD) are not completely independent, and firms with higher knowledge complementarity (KC) may also have higher knowledge diversity (KD) at the same time. The reason might be that firms tend to develop both of these knowledge characteristics simultaneously in the process of improving exploratory innovation performance. While pursuing knowledge diversity to broaden the innovation vision, firms also pay attention to the complementarity among knowledge in different fields to achieve the effective integration and utilization of knowledge. The correlation coefficient between local clustering coefficient (LCC) and collaboration depth (CD) is -0.525, showing a moderate degree of negative correlation. This might imply that there exists a potential complementary or trade-off relationship between these two characteristics of collaboration network. The reason might be that when firms’ collaboration network is overly dense locally, it may to some extent limit the ability to conduct in-depth collaboration with external partners.

Fig. 2
Fig. 2
Full size image

Correlation analysis of characteristic variables and exploratory innovation performance.

Meanwhile, the result also indicates that, in addition to the moderate correlation presented by the above two pairs of variables, there are also some low or weak correlations among several other variables, reflecting that when these variables affect firm’s exploratory innovation performance, they may play a role from relatively independent different dimensions, and there may be no simple linear relationship among the variables. Instead, there exist complex nonlinear relationships. Although the overall correlation matrix indicates that the knowledge substitutability (KS), knowledge complementarity (KC), knowledge diversity (KD), collaboration breadth (CB), collaboration depth (CD), local clustering coefficient (LCC), and firms’ exploratory innovation performance (EIP) are not universally highly correlated (Cai et al. 2021), it is also crucial to acknowledge the presence of moderate correlations between specific variable pairs. Furthermore, we conducted a VIF analysis, and the result shows that the maximum VIF value is 1.901, which is much lower than the traditional concern threshold (VIF >10). This reveals that there is no severe multicollinearity among the variables (Li 2021).

Based on the above analysis, it can be known that on the one hand, the moderate correlation among certain specific variables is not sufficient to seriously affect the validity of the results of q. But on the other hand, further studying the relationships among these variables and understanding the potential mechanisms driving these relationships are particularly significant for firms to optimize their innovation strategies. Additionally, the result also implies that a single feature has a limited impact on firm’s exploratory innovation performance, and there may be a potential combined effect of multiple features that influence firm’s exploratory innovation performance. Therefore, it is necessary to reveal the complex relationships among these variables through further research. To delve more deeply and comprehensively into the complex nonlinear relationship between variables, we employ the hierarchical clustering algorithm and CART algorithm to identify the optimal feature combinations of different types of firms necessary to enhance exploratory innovation performance.

Different types of firms

Different types of firms have knowledge elements and collaboration networks with varying characteristics, often resulting in diverse development strategies. However, most current empirical research fails to consider these differences and, instead, groups all firms into a single model to explore the impact of their characteristics on exploratory innovation performance, hence the results might not be relevant. In order to obtain a comprehensive understanding of a firm’s existing knowledge element combination and external collaboration strategy, as well as its strengths and weaknesses, it is necessary to distinguish firms based on their characteristics. Doing so can help firms make appropriate decisions based on their unique conditions and lay the groundwork for targeted exploration of how to improve exploratory innovation performance. To address the heterogeneity among focal firms, this study employs an agglomerative hierarchical clustering algorithm with a bottom-up strategy, utilizing the Euclidean distance as the dissimilarity metric to quantify differences between focal firms based on their characteristics of knowledge elements and collaboration networks. The Ward.D2 linkage method is applied to iteratively merge clusters by minimizing incremental within-cluster variance at each step of the hierarchical process, thus categorizing firms with homogeneous characteristics into separate groups. To determine the optimal number of clusters, the NbClust package is used to evaluate clustering validity indices. As shown in Fig. 3, this comprehensive assessment identifies three clusters (k = 3) as the optimal number of clusters, receiving the highest consensus score among all candidate solutions.

Fig. 3
Fig. 3
Full size image

The optimal number of clusters.

By using the hierarchical clustering algorithm, we divide focal firms into three clusters and make all these clusters satisfy the principle of “similarities within groups and differences between groups.” According to the different characteristics of each cluster, we respectively name the three firm clusters as Collaboration-oriented Cluster (cluster 1), Knowledge-oriented Cluster (cluster 2), and Balanced Cluster (cluster 3). Table 1 and Figure 4 show the specific characteristics of each cluster. The level of exploratory innovation performance is determined by its median: Exploratory innovation performance above the median is considered high, and below the median it is considered low.

Table 1 Specific characteristics of each cluster.
Fig. 4
Fig. 4
Full size image

Radar map of CD, CB, LCC, KD, KS, and KC.

Among the focal firms, 93, or 35.7 percent, are in the collaboration-oriented cluster. They have high collaboration breadth (CB) and local clustering coefficients (LCC). However, they do not have any significant advantage in knowledge elements, which suggests that these firms are more focused on external collaboration. They tend to collaborate extensively with multiple partners, and their collaboration networks have significant small-world characteristics, which indicates that these firms are more inclined to acquire diverse knowledge through external collaboration to compensate for the lack of internal knowledge. As a result, 59.1 percent of these firms tend to achieve high exploratory innovation performance.

Knowledge-oriented firms account for 53, or 20.4 percent, of the total focal firms These firms focus more on reorganizing internal knowledge elements, rather than constructing an external collaboration network. This type of firm has strong knowledge complementarity (KC) and knowledge diversity (KD) but needs to improve its knowledge substitutability (KS). Such firms combine knowledge elements from multiple fields to enhance exploratory innovation performance. However, the overall exploratory innovation performance of this cluster is not ideal. When knowledge complementarity (KC) is too high, firms focus more on knowledge in related or similar technology fields. Their internal knowledge system becomes better, but the scope and space for combining knowledge elements become limited, which is not conducive to improving exploratory innovation performance.

The balanced cluster comprises 114, or 43.8 percent, of the total focal firms. Unlike the first two firm clusters, firms in this cluster have developed a relatively balanced development strategy, with no significant advantages or shortcomings in any characteristics. This cluster has a 56.1 percent probability of achieving high exploratory innovation performance.

Decision Rule Analysis

The adoption of the CART algorithm is motivated by its unique capability to reveal complex nonlinear relationships and interaction effects among variables, which traditional regression analysis cannot fully capture. By applying regression analysis, many previous studies have mainly focused on building analytical models based on linear relationships or simple nonlinear relationships between variables and firms’ exploratory innovation performance. However, most of the problems in current management practices are complex and holistic, making it difficult to create an empirical model using existing theories.

In innovation management research, firms’ exploratory innovation performance is often influenced by the combined effects of multiple factors, some of which may exhibit threshold effects or interactions, and traditional regression analysis is less likely to adequately present such complex relationships among variables. Although regression analysis is still valuable for exploring research hypotheses of linear or simple nonlinear relationships, in order to better fit our research objective, this study prioritizes the use of the CART algorithm to explore the combinatorial rules of firms obtaining exploratory innovation performance from a data-driven perspective. This study examines the complex nonlinear relationships between variables and firms’ exploratory innovation performance using machine learning algorithms. It provides a more objective perspective about firms through which to understand the complex impact of combinations of characteristics on exploratory innovation performance, which is essential for helping firms adjust their development strategies and make individualized decisions.

Based on the division into group, knowledge substitutability (KS), knowledge complementarity (KC), knowledge diversity (KD), collaboration breadth (CB), collaboration depth (CD), and local clustering coefficient (LCC) are considered condition attributes, and the level of firms’ exploratory innovation performance as decision-making attributes to analyze the influence pathways of different combinations of characteristic variables on exploratory innovation performance by the CART algorithm. Table 2 shows the detailed decision rules for each cluster. The support degree (SupD) is the ratio of the decision rule’s sample size to the cluster’s sample size. The confidence degree (ConD) is the ratio of the number of firms supporting the classification of the decision rule to the sample size of this decision rule.

Table 2 Decision rules of each cluster.

Decision rules for collaboration-oriented firms

As shown in Fig. 5, firms’ exploratory innovation performance is mainly affected by knowledge diversity (KD), collaboration breadth (CB), and a local clustering coefficient (LCC). Their relationship is complex and nonlinear. Firms’ exploratory innovation performance is significantly influenced by KD. When KD is low (≤0.114), firms are more likely to achieve high exploratory innovation performance. When the value is high (KD>0.114), the potential for obtaining high exploratory innovation performance is lower. Previous papers have found that diverse knowledge can help firms reorganize their knowledge, leading to an improvement in exploratory innovation performance (Guan et al. 2016). However, too much KD can negatively affect a firm’s ability to integrate and apply knowledge. When a firm knows too much in too many fields, some of it may become redundant due to limited absorption and carrying capacity. This redundant knowledge can distract firms, making it difficult for them to identify useful information. It can also lead to high maintenance costs and resource waste, ultimately limiting a firm’s exploratory innovation performance. However, high CB ( > 1.0) can mitigate the pain from knowledge elements that are too diverse and lead to better exploratory innovation performance. This is likely because the partners can provide a lot of valuable market information and innovation resources, which can help firms identify and integrate practical knowledge into innovation activities and significantly improve exploratory innovation performance (Xu et al. 2017). When CB is insufficient (<1.0), LCC can partially replace it. If LCC is high (>1.0), firms may still achieve high exploratory innovation performance.

Fig. 5
Fig. 5
Full size image

Classification result of collaboration-oriented cluster.

Alternatively, if LCC is low, firms find obtaining high exploratory innovation performance challenging. However, previous research found that a high LCC could restrict external knowledge flow and fail to improve exploratory innovation performance (LYu et al. 2019). A high LCC indicates close relationships among collaboration network members, making it easy to establish trust between firms and promote knowledge sharing within the network (li et al. 2021). Thus, the excessive knowledge elements that are difficult for a single firm to absorb can be jointly digested if they are in a network, driving growth in exploratory innovation performance.

These firms should prioritize KD to avoid the waste of resources caused by having too many fields beyond their scope. They should also expand collaboration channels to gather market information, identify helpful knowledge elements in specific fields, and avoid wasting energy and resources. When it is difficult to add new partners, firms should concentrate, instead, on maintaining existing collaboration channels. By deepening existing relationships, they can improve knowledge sharing within the network and mitigate the negative effects caused by limited knowledge absorption capacity.

Decision rules for knowledge-oriented firms

As shown in Fig. 6, knowledge diversity (KD) and collaboration breadth (CB) have complex interaction effects on firms’ exploratory innovation performance. KD directly affects the level of exploratory innovation performance. When firms have high KD (>0.839), most firms’ exploratory innovation performance is low. This implies that having higher KD reduces the probability that firms will attain high exploratory innovation performance, which aligns with the perspectives by Elia et al. (2019). Knowledge-oriented firms have a much greater KD than collaboration-oriented firms. However, having an excessive variety of knowledge elements can result in a lack of depth of knowledge in a specific field, leading to information overload and affecting the accuracy of information acquisition. When KD is low (≤0.839), CB has an inverted U-shaped effect on firms’ exploratory innovation performance. Specifically, when CB is high (>0.661), it is almost impossible for firms to attain high exploratory innovation performance. When CB is moderate (0.527 <CB ≤ 0.661), firms are likely to attain high exploratory innovation performance. When CB is low ( ≤ 0.527), firms have a low likelihood of attaining high exploratory innovation performance, which demonstrates that it is not always better to have more partners, confirming the conclusion by Tsinopoulos et al. (2019). Such firms do not prioritize the construction of external collaboration networks, effectively leading to a lack of experience and communication channels with partners. As a result, high coordination costs are incurred in dealing with multiple partners, ultimately hindering improvement in exploratory innovation performance. However, firms that lack partners will face limitations in knowledge exchange (Christensen et al. 2019), making it challenging for them to obtain new innovative ideas and resulting in innovation bottlenecks. Therefore, overcoming these challenges requires the maintenance of a certain number of partners.

Fig. 6
Fig. 6
Full size image

Classification results of knowledge-oriented cluster.

First, these firms should sort out and screen practical knowledge elements to avoid the negative impact caused by excessive KD. Second, they should emphasize the importance of knowledge elements and allocate resources for constructing and maintaining external collaboration networks, which can effectively mitigate the negative impact from a poor combination of internal knowledge elements. If KD is too high, firms should select appropriate partners based on their own conditions, guaranteeing the acquisition of external innovative knowledge without incurring high coordination costs.

Decision rules for balanced firms

As shown in Fig. 7, balanced firms’ exploratory innovation performance is mainly affected by knowledge complementarity (KC) and knowledge substitutability (KS), which have a complex nonlinear relationship with firms’ exploratory innovation performance. When KC is high (>0.651), only a few firms can achieve high exploratory innovation performance. But when KC is low (≤0.294), firms are more likely to attain high exploratory innovation performance. This supports the findings by Dibiaggio et al. (2014) that when a firm has many complementary knowledge elements, an excessively specialized knowledge combination reduces the potential for recombining knowledge elements, negatively affecting exploratory innovation performance. When KC is moderate (0.294 <KC ≤ 0.651), an increase in knowledge substitutability (KS) has a positive effect on a firm’s exploratory innovation performance. Specifically, when KS is high (>0.705), most firms can achieve a high level of exploratory innovation performance. But when KS is low ( ≤ 0.705), firms are more likely to attain low exploratory innovation performance. This result varies from that in prior research that suggests having too many knowledge elements with similar attributes can waste resources and hinder firms from achieving high exploratory innovation performance. The difference in findings may be due to the immaturity of the AI industry, which still faces risks such as technological uncertainty, changes in market acceptance, and imperfections in relevant laws and policies. Firms with a large amount of alternative knowledge have a better understanding of knowledge elements and increases the flexibility of companies in responding to changes in the external environment, providing firms with exploratory directions to help solve complex problems and achieve growth in exploratory innovation performance.

Fig. 7
Fig. 7
Full size image

Classification result of balanced cluster.

Balanced firms engaged in innovation activities must consider both complementary and substitutive knowledge elements. When exploring new fields, firms should avoid rigid thinking and path dependence caused excessive knowledge complementarity. Instead, they should consider increasing the proportion of substitutive knowledge, because knowledge elements with similar attributes drive experimentation and innovation that align with market trends.

Conclusions and Discussions

Conclusions

From the perspective of interaction between knowledge elements and collaboration networks, Chinese AI firms are selected as research objects, and based on patent data analysis, machine learning methods such as hierarchical clustering algorithm and CART algorithm are used to classify Chinese AI firms into three types with similar characteristics, and explore the factor combinations of different types of firms in obtaining exploratory innovation performance, which reveals a complex and nonlinear relationship between the factors and exploratory innovation performance. Based on our analysis, we draw the following conclusions.

First, the analysis based on patent data suggests that knowledge element characteristics and collaboration network characteristics have a significant impact on the exploratory innovation performance of Chinese AI firms. In particular, it should be noted that this finding is applicable to explicit knowledge-driven innovation scenarios in technology-intensive industries (e.g., AI) but needs to be cautiously generalized to domains relying on tacit knowledge or non-technical collaboration. The results suggest that knowledge elements often play a leading role, and the heterogeneous innovation resources made available by external collaboration can effectively compensate for the negative effects due to an irrational combination of knowledge elements. Knowledge diversity is critical in collaboration-oriented and knowledge-oriented firms. Greater knowledge diversity is not always better: the inclusion of too many fields can lead to information overload and resource dispersion, negatively affecting firms’ exploratory innovation performance. At the same time, having appropriate collaboration breadth can help firms eliminate information overload. At knowledge-oriented firms, knowledge complementarity and knowledge substitutability are significant factors that affect firms’ exploratory innovation performance. Excessive knowledge complementarity can limit firms’ ability to respond to market demand and hinder exploratory innovation. Rather, having a wealth of substitutive knowledge can help firms generate new ideas through trial and error.

Second, different combinations of knowledge elements and collaboration network characteristics at different types of firms drive firms to achieve different levels of exploratory innovation performance. This study uses a hierarchical clustering algorithm to distinguish three types of firms: collaboration-oriented, knowledge-oriented, and balanced. We find that collaboration-oriented firms prioritize external collaboration and have high levels of collaboration breadth (CB) and local clustering coefficient (LCC). While their knowledge elements characteristics are at a medium level, they are easiest to obtain high exploratory innovation performance. Knowledge-oriented firms attach importance to knowledge complementarity (KC) and knowledge diversity (KD) and have strong expertise in specific fields, but the exploratory innovation performance could be improved. Balanced firms focus on the all-round development of all aspects and have no significant shortcomings, making them more likely to achieve high exploratory innovation performance.

Third, knowledge elements and collaboration network characteristics have complex nonlinear effects on firms’ exploratory innovation performance. Collaboration-oriented firms can improve their exploratory innovation performance by reasonably combining collaboration breadth and local clustering coefficients to eliminate the negative effects of excessive knowledge diversity. Such firms should focus on maintaining their current collaboration strategies to build trust and encourage information sharing with their partners. Knowledge-oriented firms can achieve high exploratory innovation performance by combining less knowledge diversity and appropriate collaboration breadth. Such firms should prioritize the quality of their knowledge elements and avoid acquiring knowledge elements that are too diverse, resulting in the waste of resources. Balanced firms can attain high exploratory innovation performance by adjusting the combination of knowledge complementarity and knowledge substitutability and by not ignoring the optimization effect of knowledge substitutability on exploratory innovation performance when reducing knowledge complementarity. Such firms should increase substitutability knowledge elements as much as possible to deal with unknown risks in the innovation process.

Theoretical contributions

By analyzing the impact of knowledge elements and collaboration networks on firms’ exploratory innovation performance, this study makes several theoretical contributions to the innovation management literature, as follows.

First, this study expands the theoretical boundaries of the knowledge-based view and social network theory by revealing the synergistic influence mechanism of internal knowledge elements and external collaboration networks on firms’ exploratory innovation performance. Although previous studies have conducted fruitful explorations of these influencing factors (Khan et al. 2024; Zhao et al. 2024), our findings challenge this single perspective by revealing a fundamentally synergistic relationship between them. Our research reveals that external collaboration effectively compensates for suboptimal internal knowledge configurations, whereas robust internal knowledge base enables firms to better leverage external network resources. This conclusion shows the interaction effect between the internal knowledge system and the acquisition of external knowledge, thereby extending the existing theoretical framework. Furthermore, our findings indicate that having more complex knowledge elements does not invariably lead to better outcomes, consistent with prior research (Tian et al. 2024; Zhao et al. 2024)—that is, overly complex and diverse knowledge elements can have a negative effect on firms’ absorptive capacity, ultimately hindering innovation performance. At the same time, our findings also confirm prior research (Balland et al. 2019; Gao et al. 2025) showing that firms’ innovation performance is enhanced when the diverse knowledge elements and existing core knowledge elements have a high degree of similarity.

Second, by systematically applying data-driven ML methods, this study represents a significant methodological and theoretical advancement in management research. ML methods have been recognized and applied by numerous scholars in management research, though primarily for addressing technical challenges, such as processing large-scale datasets, unstructured data, and textual data (Ylinen and Ranta 2025). The advancement of ML technologies has sparked frontier methodological debates across the social sciences (Pan et al. 2025; Valizade et al. 2024). Traditional empirical studies in innovation management, which primarily employ methods such as regression analysis, have been effective at analyzing linear or simple nonlinear relationships between variables. However, they do not accurately and comprehensively identify or examine the complex nonlinear relationships and the combined effects of multiple variables. In contrast, ML methods can reveal complicated relationships among variables and valuable knowledge rules that are difficult to detect through traditional statistical methods, and they have superior predictive accuracy. Consequently, some studies have begun to explore how to apply ML methods more comprehensively and systematically in innovation management research (Choudhury et al. 2021; Doornenbal et al. 2022; Li 2025). By adopting data-driven ML methods to capture the complex nonlinear relationships and multivariate combinatorial effects among knowledge elements, collaboration networks, and exploratory innovation performance using objective data, this study serves as a powerful complement to those based on traditional statistical methods. This study reveals the complex mechanisms of interaction through which multiple factors jointly influence firms’ exploratory innovation performance, providing a new perspective and a valuable reference for advancing theoretical research on innovation management.

Third, this study extends the contingency perspective in innovation strategy by including firms’ differential strategic contexts in the analysis of firms’ exploratory innovation. In contrast, previous studies adopt a generalized perspective on the influencing factors of firms’ exploratory innovation performance, ignoring the heterogeneous characteristics of different firms and resulting in less targeted research conclusions and managerial insights. Recent literature has established that heterogeneity in knowledge types (Mendoza et al. 2025; Zhao et al. 2024), collaboration network structures (Li 2025), and innovation activities (Whang et al. 2023) influence the innovation outcomes. However, a more comprehensive and specific discussion of these combinatorial effects still needs to occur. By taking a differentiated research perspective and revealing different combinations of influencing factors that affect firms’ exploratory innovation performance in different management contexts, our study challenges the universal applicability of a “one size fits all” theoretical model. This offers a direction for future research by emphasizing that innovation can be pursued via many different paths, which are critically dependent on firms’ specific strategic context. Hence, more targeted research conclusions can be reached and differentiated management strategies can be designed, enhancing the applicability and reference value of our findings and managerial insights.

Managerial implications, limitations, and future research

The results of our study have important managerial implications. First, to optimize firms’ knowledge structure is crucial for improving exploratory innovation performance in the context of the Chinese AI industry. The combination of knowledge elements plays a leading role in achieving this goal. Therefore, rational management of knowledge elements can significantly enhance firms’ exploratory innovation performance. When firms engage in technological innovation activities, pursuing diversification and complexity irrationally will not necessarily lead to better performance but, rather, can negatively affect innovation output. Second, partnerships should be thoughtfully planned by Chinese AI industry firms. Collaborating with other organizations is a critical way for firms to acquire external innovative resources. To compensate for the shortcomings in their internal knowledge structure, firms must interact effectively with their partners to obtain external knowledge and resources based on their innovation requirements in order to ensure better innovation output. Third, the government can optimize the industrial knowledge structure and encourage external collaboration among firms to promote exploratory innovation. The accumulation and combination of knowledge are crucial for developing new technologies. The Chinese government can facilitate cross-border collaboration and technical exchange among firms in the AI industry by establishing industry associations. Furthermore, it can promote technological innovation and the development of related industries by providing a good external environment that encourages firms to optimize their internal knowledge structure.

This study has several limitations that present excellent opportunities for future research. Firstly, this study measures variables including the knowledge element, collaboration network characteristics and firms’ exploratory innovation performance based on patent data. However, patents primarily reflect explicit technical knowledge, while failing to adequately capture the tacit knowledge. Furthermore, the patent-based collaboration networks focus exclusively on technological partnerships, neglecting non-technological collaborations such as market-oriented alliances, supply chain partnerships, and informal collaborations with external partners. Future research could enhance the measurement of organizational knowledge attributes and collaboration networks by integrating multi-source data, including financial reports, policy documents, and executive interview transcripts. Secondly, while the hierarchical clustering algorithm employed in this study effectively groups firms with similar characteristics to mitigate sample heterogeneity, which serves as a control variable approach, the absence of firm-specific and context-specific control variables may constrain the generalizability of findings. Subsequent investigations should incorporate additional control variables that capture organizational characteristics and contextual factors to reduce potential biases in research conclusions. Thirdly, although the temporal threshold approach adopted for variable measurement is methodologically valid, this approach may partially obscure short-term fluctuations between independent and dependent variables. Future research could employ multiple measurement strategies alongside diverse machine learning approaches to conduct comparative analysis, thereby enhancing the methodological robustness.