Introduction

There is an intrinsic relationship between the generation of target attack sequences and the identification of key nodes in operational system networks. Nodes within these networks typically represent entities, while edges signify the relationships or interactions between nodes. Identifying key nodes within a network aids in uncovering critical targets within the network structure. For instance, this can involve pinpointing super spreaders in infectious disease transmission networks or determining which nodes are pivotal in infrastructure networks1,2,3,4. In the biopharmaceutical industry, it can facilitate the generation of candidate drug targets and the discovery of essential proteins based on existing molecular connectivity features5. It also plays a role in controlling the trend of public opinion and rumor analysis in social media platforms such as Weibo and TikTok6,7,8. Furthermore, it assists in apprehending suspects and judging the communication relationships among criminal suspects to combat terrorist organizations9,10,11. It is also crucial for maintaining the stability of cyberspace12,13,14, detecting vulnerable nodes in networks15, and constructing smart cities16. In complex real-world scenarios, attack resources are often constrained, and the identification of key nodes within the network holds significant practical importance for the strategy of “destroying a part to disrupt the whole system”.

Currently, there are two distinct research trends in the assessment of node importance: one focuses on studying from the network structure characteristics, and the other is based on the existing network structure from the perspective of information propagation and flow between nodes. This chapter will delve into the study from the perspective of complex network propagation characteristics. The propagation process17 is ubiquitous in nature and social networks, resulting from the interaction between infected individuals and susceptible ones. Specifically, nodes with broad influence are often the source and amplification points of propagation, and their effectiveness in spreading information is significantly superior to that of ordinary nodes. Therefore, identifying the optimal relevant nodes to maximize the propagation effect is a common strategy for controlling propagation. However, accurately assessing the propagation capacity of nodes and discovering super spreaders in large-scale complex networks still faces numerous challenges. Ruan Yirun’s team18proposed an improved algorithm based on the structural hole gravity model, considering the node H-index, node core number, and the position of structural holes. Yin Mengmeng’s team19proposed a method that integrates multiple centrality indicators from multiple perspectives and uses the entropy weight method to determine the weight of each indicator, avoiding human bias. Li Shikai20proposed an identification algorithm that combines node weight with structural entropy based on local network range information. Bing21established an SIR model in homogeneous and heterogeneous networks, adding mechanisms such as time delay. Sartori22 executed node removal strategies, comparing the optimal vaccination strategies of non-adaptive and semi-adaptive methods. In summary, new methods for assessing node importance continue to emerge, providing a variety of tools for complex network analysis. Existing literature describes such nodes in the following ways:

  1. (1)

    Nodes identified as important based on the propagation range, by simulating the propagation process and assessing the information propagation range when different nodes act as the source, with nodes having a broader range being more important.

  2. (2)

    Key nodes that play a crucial role in controlling or optimizing network propagation.

  3. (3)

    Super spreaders, nodes with a significant influence on the propagation of information or viruses in the network.

  4. (4)

    Priority propagation nodes, which should be considered as information source points first when resources are limited, to enhance the propagation effect.

Existing methods have problems:

  1. (1)

    There are various topological analysis methods for networks, but methods for identifying key nodes often focus more on local or global aspects, making it difficult to address both simultaneously.

  2. (2)

    Existing analysis methods based on the SIR model have high computational complexity and cannot balance computational efficiency and result accuracy.

  3. (3)

    Existing methods are insufficient in terms of precision in identifying nodes within networks, especially when using Kendall’s coefficient as the final measure in experiments based on the propagation SIR method, which requires significant computational power to support the average results of multiple SIR simulations.

In response to the above three issues, this paper has undertaken the following work:

  1. (1)

    This paper proposes a new method that integrates multiple existing indicators to derive a comprehensive score, thereby more fully and accurately reflecting the importance of nodes within the network.

  2. (2)

    Compares the computational complexity of existing methods with the method proposed in this paper.

Methods

Introduction of comparison methods

The performance of the proposed method is studied using several well-known metrics, including:

Degree centrality (DC)

Degree centrality is a basic ranking algorithm to identify the importance of nodes, the degree of a node i is defined as follows:

$${k_i}=\sum\limits_{{j=1}}^{N} {{a_{ij}}}.$$
(1)

Collective influence (CI)

The collective influence of nodes i is defined as follows:

$$C{I_i}=\left( {{k_i} - 1} \right)\sum\limits_{{j \in set(i,l)}} {\left( {{k_j} - 1} \right)}.$$
(2)

Betweenness centrality (BC)

Intermediary centrality (betweenness centrality) refers to the number of times a node appears in all the shortest paths in a graph, that is, the importance of the node connecting other nodes in the network. It is defined as follows:

$$B{C_i}=\sum\limits_{{i \in d{{\left( {i,j} \right)}_{\hbox{min} }},i \ne j}} 1.$$
(3)

K-shell

The k-shell decomposition method proposed by Kitsak et al. is an algorithm based on graph theory, used to decompose a directed graph into multiple k-shell subgraphs. A k-shell subgraph refers to a subgraph formed by nodes with a degree of at least k, meaning that the degree of any node in this subgraph is greater than or equal to k. The core idea of this algorithm is to repeatedly prune nodes with the smallest degree and update the degrees of their adjacent nodes until all nodes are decomposed into k-shell subgraphs.

Harmonic centrality (HC)

The definition is as follows:

$$H{C_i}=\sum\limits_{{j \ne i}} {\frac{1}{{d\left( {i,j} \right)}}}.$$
(4)

Closeness centrality (CC)

It is a node centrality measurement method, used to measure the distance between a node and other nodes, that is, its average shortest path length to other nodes. It is defined as follows:

$$C{C_i}=\frac{1}{n}\sum\limits_{{j \ne i}} {d\left( {i,j} \right)}.$$
(5)

ISM

ISM method is a comprehensive algorithm based on the gravity formula. This algorithm integrates k-shell method, H index and structural hole index based on the gravity formula to measure the balance relationship between global ks value and local structural holes as well as the relationship between nodes with H index, and comprehensively determines the important nodes in the network. The specific definition is as follows:

$$IS{M_i}=\sum\limits_{{{d_{ij}} \leqslant {\psi _i}}} {{e^{ - {C_i}}}\frac{{(k{s_i}+\gamma {h_i})(k{s_j}+\gamma {h_j})}}{{d_{{ij}}^{2}}}}.$$
(6)

Among them, \({\psi _i}\) is the set of nodes from node i to other nodes whose step length is less than r after a given step length. \({C_i}\) is the network constraint coefficient proposed by Burt, which is used to quantify the control of structural hole nodes over these relationships.

$${C_i}={\sum\limits_{j} {\left( {{\mu _{ij}}+\sum\limits_{{q \in \Gamma (i) \cap \Gamma (j)(q \ne i,j)}} {{\mu _{iq}}{\mu _{qj}}} } \right)} ^2},\;{\mu _{ij}}=\frac{{{z_{ij}}}}{{\sum\limits_{{j \in \Gamma (i)}} {{z_{ij}}} }}.$$
(7)

Among them, \({\mu _{ij}}\) represents the proportion of maintaining the relationship between node i and node j to the total sum of the relationship between node i. \(\Gamma (i)\) represents the neighborhood of node i, and q is the neighbor node shared by node i and node j.

CR

It is a propagation-based method proposed based on the local and global performance of nodes. This method integrates the improved network constraint coefficient INCC and toughness TC to achieve a balanced relationship between nodes. The formula is as follows:

$$C{R_i}=\frac{{INC{C_i}}}{{\sqrt {\sum\nolimits_{{j=1}}^{N} {INC{C_j}} } }}+\frac{{T{C_i}}}{{\sqrt {\sum\nolimits_{{j=1}}^{N} {T{C_j}} } }}.$$
(8)

The improved network constraint coefficient is:

$$INC{C_i}={\sum\limits_{j} {\left( {{{\mu ^{\prime}}_{ij}}+\sum\limits_{{q \in \Gamma (i) \cap \Gamma (j)(q \ne i,j)}} {{{\mu ^{\prime}}_{iq}}{{\mu ^{\prime}}_{qj}}} } \right)} ^2},{\mu ^{\prime}_{ij}}=\frac{{{Q_j}}}{{L{R_i}}}.$$
(9)

\({Q_j}\) is the sum of the first-order neighbors of node j, and \(L{R_i}\) is the sum of the second-order neighbors of node i.

To further validate the effectiveness of the proposed method, a more extensive comparative analysis was conducted with the traditional SIR model in Table 1.

Table 1 Distinguishes the propagation mechanisms.

Introduces the method

This manuscript introduces an Improved K-shell23 Position method based on Second-order degree\(IKSP - 2D,\) which integrates the second-order degree decomposition of the K-shell method with an enhanced network constraint coefficient to refine the existing second-order degree K-shell method, culminating in a novel ranking metric for important nodes. The specific procedural steps are delineated as follows:

Firstly, based on the peeling order of the K-shell method, this study proposes appropriate modifications to the dependent parameters. Node degree is readily ascertainable, thereby facilitating the subsequent calculation of the node’s second-order degree, expressed as:

$${k^{(2)}}\left( i \right)=\alpha k(i)+(1 - \alpha )\sum\limits_{{j \in \phi (u)}} {k(j)}.$$
(10)

Where \(k(i)\) denotes the degree of node i, \(j \in \Gamma (u)\) represents the set of neighboring nodes of node i, and α is a weighting coefficient that can be adjusted according to practical preferences within a specified range\(\alpha \in [0,1]\). If \(\alpha =1\), the second-order degree \({k^{(2)}}\left( i \right)\) reduces to the first-order degree (or simply degree). The process is as follows:

Step 1: Remove nodes with a second-order degree of \({k^{(2)}}\).

Step 2: Repeatedly remove new nodes with a degree less than or equal to \({k^{(2)}}\) until all nodes have a degree greater than \({k^{(2)}}\).

Step 3: All nodes removed in Steps 1 and 2 constitute a \(K{S_{{k^{(2)}}}}\) layer.

Step 4: Starting with \(K{S^{(2)}}_{1}={k^{(2)}}=1\), repeat Steps 1 to 3; if the network is devoid of isolated nodes, the nodes with degree 1 are the least important.Consequently, one can obtain up to k KS layers, denoted as \(KS_{1}^{{(2)}},KS_{2}^{{(2)}}, \ldots ,KS_{K}^{{(2)}}\).

For the convenience of subsequent explanations, this manuscript abbreviates the first step of the improved second-order degree-based K-shell decomposition method as \({K^2} - shell\).

Secondly, this manuscript introduces the INCC (Improved Network Constraint Coefficient) factor, which more effectively captures the local structural characteristics of nodes and quantitatively assesses each node’s local structural advantage. The INCC method captures the indirect impact of network changes on each node’s nearest and next-nearest neighbors after node deletion, formulated as:

$$INC{C_i}=\sum\limits_{{j \in \Gamma (i)}} {{{\left[ {{{p^{\prime}}_{ij}}+\sum\limits_{{k=1,k \ne i,j}}^{n} {{{p^{\prime}}_{ik}}{{p^{\prime}}_{kj}}} } \right]}^2}}.$$
(11)

Where,\({p^{\prime}_{ij}}=\frac{{{Q_j}}}{{L{R_i}}},\) \({Q_j}=\sum\limits_{{w \in \Gamma (j)}} {{N_w}},\) \(L{R_i}=\sum\limits_{{j \in \Gamma (i)}} {{Q_j}},\) the sum of the degrees of the neighboring nodes of node j is denoted as \({N_w},\) and \(\Gamma (j)\) represents the set of neighboring nodes of node j, while \(\Gamma (i)\) denotes the set of neighboring nodes of node i. A higher INCC value indicates a greater number of bridging nodes between a node and its neighbors, which can lead to an enrichment effect (or rich-club effect) during propagation, hindering the spread to a wider range. To align the INCC value with node importance, this manuscript incorporates the INCC into a negative exponential function as a parameter, as shown below: \({W_i}={e^{ - INCC}}.\)

The third step involves incorporating the INCC, the enhanced network constraint coefficient obtained from the second step, into the\({K^2} - shell\) step. Using the improved \({K^2} - shell\) method, record which KS layer each node belongs to. Secondly, when nodes within the same KS layer are peeled off sequentially, record the KS layer number at which the node is removed. At this time, calculate the difference between the current node’s KS layer and the maximum KS layer among its neighboring nodes, denoted as \(KS_{{i|next}}^{{(2)}} - KS_{i}^{{(2)}}\). If node i is in the maximum layer of the current network, set \(KS_{{i|next}}^{{(2)}} - KS_{i}^{{(2)}}=1\),\(\hbox{max} (1,KS_{{i|next}}^{{(2)}} - KS_{i}^{{(2)}})\). Also, record the order in which the node is peeled off and the KS layer number and order number \({l_i}\). Within each node set of the same KS layer and the same peeling order, assign a weighted proportion to the difference between the node and its neighboring maximum layer based on the INCC coefficient weight, as shown in the following formula (3):

$$IKSP - 2D=KS_{i}^{{(2)}}+\frac{{{k_i}}}{{{k_{N(i)\hbox{max} }}+1}}*\frac{{{W_i}}}{{{W_{\hbox{max} }}+1}}*\hbox{max} (1,KS_{{i|next}}^{{(2)}} - KS_{i}^{{(2)}}).$$
(12)

Where, the value \(KS_{i}^{{(2)}}\) represents the \(k - shell\)of node i. \({k_i}\) denotes the degree of node i, \({k_{N(i)\hbox{max} }}\) is the maximum degree value among the neighboring nodes of node i, \({W_i}\) is the INCC weight of node i during the iterative peeling operation within the same \(KS_{i}^{{(2)}}\) layer (as detailed in the second step), \({W_{\hbox{max} }}\) is the maximum \(KS_{i}^{{(2)}}\)-layer number among the neighboring nodes of node i, and \(KS_{{i|next}}^{{(2)}}\) refers to the maximum \(KS_{i}^{{(2)}}\)-value among the neighbors of node i. To prevent the maximum \(KS_{i}^{{(2)}}\)-layer from being excluded from calculations, when a node is in the maximum \(KS_{i}^{{(2)}}\)-layer, the difference\(KS_{{i|next}}^{{(2)}} - KS_{i}^{{(2)}}\) is automatically set to 1. These constitute the specific steps of the \(IKSP - 2D\) method.

To facilitate a more intuitive understanding of the method, this manuscript employs an illustrative example to elucidate the process and presents the actual demonstration of its effectiveness.

Fig. 1
figure 1

The IKSP-2D method demonstrates the network effect.

Table 2 The score values of each nodes in the network under the different methods.
Table 3 Order values of nodes in the demonstration network according to the different methods.

To demonstrate the algorithmic efficacy and computational process, we present a concise illustration in Fig. 1 using a randomly generated small-world network comprising 20 nodes. The scores of these nodes computed by different algorithms are systematically listed in Table 2, while their ranked sequences in descending order are documented in Table 3. Comparative analysis reveals distinct node identification outcomes between the proposed IKSP-2D algorithm and classical approaches. Specifically, the top five nodes identified by six classical algorithms (nodes 11, 18, 10, 12, and 2) differ markedly from those prioritized by IKSP-2D (nodes 17, 6, 7, 3, and 4). As visualized in Fig. 1, nodes 17, 6, 7, 3, and 4 exhibit a balanced distribution of central and localized structural features, demonstrating enhanced capacity to represent the network’s global topology. In contrast, nodes 11, 18, 10, 12, and 2 form a highly clustered subgroup with dense interconnections, potentially inducing rich-club effects that may paradoxically constrain their propagation efficiency despite their local centrality. This observation suggests that conventional methods prioritizing local clustering coefficients might overlook critical structural balances essential for optimal information dissemination.

Data from Tables 2 and 3 reveal that the IKSP-2D method, an improvement upon the K-shell and second-order degree methods, significantly enhances the discernibility of nodes. The DC, BC, CC, and CI methods all ranked node 11 as the best. The K-shell method did not perform as well in this case, with a multitude of nodes falling within the same layer, hindering the identification of high-order node importance. The INCC method identified node 2 as the most influential, while the IKSP-2D method introduced in this chapter identified node 17 as the most influential.

Fig. 2
figure 2

The effect diagram of BC, CI and IKSP-2D through SIR transmission experiment.

As shown in Fig. 2, to more vividly demonstrate the effectiveness of the algorithm in identifying propagation nodes in scale-free networks, we randomly generated a 100-node Watts-Strogatz small-world network as an example. The figure displays the initial state of propagation and the propagation results after 20 steps for the BC, CI, and IKSP-2D methods. In the initial state, only the top 10 ranked infected nodes were labeled, and after the propagation ended, the recovered nodes were marked in red. Ultimately, it was observed that the diffusion effect of the method proposed in this paper, as evidenced by the SIR propagation experiment, was significantly greater than that of the CI and BC methods.

Experimental setting

Experimental environment and the datasets

The experimental environment is: 2.5 GHz Intel (R) Core(TM) i5-12500 H CPU, 16.00GB memory. The software environment is the Python2022. In order to verify the effect of the node propagation influence.

The experiments are selected as follows as five real data sets: Infect-hyper, Infect-dublin, DD244, Infect-euroroad, Mammalia-dophin-florida-social, and the detailed information is shown in Table 4. At the same time, the artificial data set selected in this paper is: watts-strogatz.

Table 4 Specific parameters of the real network of scale 102–103.

Results

Analysis of simulation experiment

This experiment presents the results from five real-world network datasets and one artificial small-world network dataset to validate the performance differences between the algorithm proposed in this chapter and eight other classic algorithms. The actual influence of the target nodes was compared with the target node importance rankings calculated by nine different methods, with a particular focus on comparing the ranking accuracy of each algorithm under varying recovery rates. Furthermore, the manuscript demonstrates the practical effects of node sets selected based on different indicators, thereby validating the rationality of the improved K-shell second-order degree decomposition method and the INCC metric. Finally, the paper also provides actual results from the artificial small-world network under different transmission rates. (the horizontal axis of each graph represents the order value of each method, and the R value of SIR propagation of nodes corresponding to the vertical axis)

Fig. 3
figure 3

Kendall’s coefficient plot of the infect-hyper network.

Fig. 4
figure 4

Kendall’s coefficient plot for the infect-dublin network.

Fig. 5
figure 5

Kendall’s coefficient plot for DD244 network.

Fig. 6
figure 6

Kendall’s coefficient plot for infect-euroroad network.

Fig. 7
figure 7

Kendall’s coefficient plot for mammalia-dophin-florida-social network.

It can be seen from Figs. 3, 4, 5, 6 and 7 that the kendall coefficient results are not as good as expected in this paper, so this paper arranges the top 10 nodes under the different methods corresponding to each data, and the results are shown in Tables 5, 6, 7, 8 and 9.

Table 5 Order of the top 10 nodes of the Inf-euroroad network.
Table 6 Order of the top 10 nodes of the Mammalia-dophin-florida-social network.
Table 7 Order of the top 10 nodes of the Infect-dublin network.
Table 8 Order of the top 10 nodes of the Infect-hyper network.
Table 9 Order of the top 10 nodes of the DD244 network.

Data from Tables 5, 6, 7, 8 and 9 reveal that there is a significant variation in the node ranking values across most methods, which contributes to the poor performance of Kendall’s coefficient in evaluating these nine methods. This is reflected in the scatter plot as the data points struggle to align with the main diagonal of the XoY coordinate system. To facilitate rapid reading of the tabular data, this manuscript has bolded the data that appears three or more times. This highlighting is intended to demonstrate any similarities in the top-ranked node rankings among different methods.

From Table 7, it can be observed that the ranking results of the IKSP-2D algorithm proposed in this manuscript differ from those of other algorithms. In the Infect-hyper network, the algorithm identifies node 112 as the most influential, followed by node 64, and then node 82. No other ranking algorithms have identified the same data.At the same time, for the DC, CI, BC, HC, and CC methods, nodes 30 and 42 are considered the most influential, followed by nodes 38 and 102. Ultimately, it is evident that across the five networks studied, the IKSP-2D method and the CR method yield results that are most divergent from those of other methods.

Based on these observations, this manuscript will employ the SIR model to simulate the propagation process in real networks, setting the top 10 nodes from each method as infected I nodes. Furthermore, to eliminate the interference of unstable factors in a single SIR propagation process, this study will repeat the SIR simulation process for each dataset under fixed parameters for a minimum of 100 times for each method. The average value obtained from these 100 simulations will be taken as the final result for each method.(Where the horizontal axis represents the time step when each method reaches the maximum I + R value, and the vertical axis is the sum value of the infected node and the recovery node, namely the I + R value size).

Fig. 8
figure 8

(ae) Results of each data with a response rate of 0.2.

Fig. 9
figure 9

(ae). Results of each data with a response rate of 0.3.

As shown in Figs. 8 and 9, these illustrate the results for five networks at recovery rates of 0.2 and 0.3, respectively. It is evident from the figures that the method proposed in this paper outperforms other algorithms across these five datasets, particularly in the Infect-hyper dataset where the propagation rate is optimal at \(0.5*{\beta _{th}}\).when the recovery rate is 0.2,the mammalia network also shows a relatively pronounced result at the low recovery rate of 0.2. Overall, the experimental results indicate that the ranking of important nodes in complex networks based on propagation characteristics, as proposed in this paper, is more accurate and stable.

Additionally, this paper compares the results of the small-world network as shown below in Figs. 10, 11, 12, 13, 14 and 15, The horizontal axis of each graph represents the order value of each method, the vertical axis corresponds to the R value of the node, and the random probability of the current watts network is 10–90%,with the current propagation rate being \({\beta _{th}}\): 0.1384.

Fig. 10
figure 10

The kendall coefficient diagram of the small-world network(10%).

Fig. 11
figure 11

The kendall coefficient diagram of the small-world network(30%).

Fig. 12
figure 12

The kendall coefficient diagram of the small-world network(50%).

Fig. 13
figure 13

The kendall coefficient diagram of the small-world network(70%).

Fig. 14
figure 14

The kendall coefficient diagram of the small-world network (90%).

The corresponding SIR propagation results are shown as follows:

Fig. 15
figure 15

(ae) SIR propagation R value diagram of Watts network.

where the horizontal axis represents the time stage of propagation, and the vertical axis represents the current data of the I + R and the proportion of summary points in different time stages.

In terms of the results of small world networks, the model proposed in this chapter has its own unique advantages in the identification of propagation nodes, which is different from most other existing algorithms. At the same time, it can be seen that this method is not suitable for the calculation of Kendall correlation coefficient, but prefers networks with large randomness.

Conclusion

This study proposes an improved K-shell method based on second-order degree decomposition (IKSP-2D) to address propagation characteristics in complex networks with fixed structures. By integrating an enhanced network constraint coefficient (INCC) into the second-order K-shell framework, the method achieves a refined partitioning of nodes, effectively combining local degree features and global structural hole connectivity. This approach resolves the local bias inherent in traditional methods when distinguishing node importance within the same K-shell layer. Experimental validation using SIR propagation models demonstrates its robust performance in identifying critical nodes, even under scenarios where the Kendall coefficient exhibits suboptimal reliability. The innovation lies in establishing a novel paradigm for node importance evaluation that simultaneously incorporates topological structure and propagation dynamics. The proposed method holds significant practical value, offering a reliable theoretical tool for proactive identification and system optimization of key nodes in applications such as epidemic control, public opinion management, and cyber-security defense.

Looking ahead, there is an anticipation for further refinement and optimization of methods for identifying important nodes in multi-layer networks based on structural and propagation characteristics. This includes the comprehensive consideration of various network performance indicators to continuously and effectively identify key nodes during dynamic attacks, thereby providing support for the proactive defense and optimization of complex networks.