Introduction

With the development of big data and network interconnection, the Internet has brought a great convenience to people from anywhere1. However, at the same time, due to the lack of security awareness, the Internet is facing immeasurable security problems2,3. Typical defense mechanisms such as firewall, antivirus software, could only play its role when attacks happen. In order to correctly predict network security in advance, it is essential to investigate network security situation (NSS) prediction method.

In the NSS prediction problem, a NSS value presents a security state of the network. A future NSS value need to be predicted by using a series of historical NSS values4,5. Since various attacks may happen in random, the NSS value may change in an irregular time-series pattern. Thus, the NSS prediction procedure is a complicated nonlinear procedure. Besides, with the premise of high accuracy, a prediction method should achieve rapid convergence by using a small number of samples so as to shorten training procedures. Moreover, in order to achieve easy application, the prediction method should be constructed under a simple architecture.

Currently, much effort has been paid to design an effective and accurate NSS prediction method6,7. Authors in8 proposed grey model based prediction method. It needs to establish accurate mathematical expressions and requires a large amount of computation. Also, it can only predict the general trend of NSS, failing to quantify the accurate NSS value. Authors in9 proposed back-propagation neural network to solve NSS prediction. The method needs to use a large amount of quantitative data to optimize and train the neural network, which is prone to over-fitting and slow convergence10. Besides, BP models may fall into local minimum instead of global minimum. Authors in11 proposed a network traffic prediction method based on Least Squares Support Vector Machine (LSSVM) with a simplified Gaussian kernel width estimation technique to improve prediction accuracy and computational efficiency. However, the method has limited generalization ability due to its sensitivity to hyperparameters, and its high computational complexity makes it less suitable for real-time applications. Authors in12 proposed the grey model (GM (1, 1)) method and assumed the network would change in a monotonous time-series. Thus, the method could not work well when the network state change in a largely fluctuating time-series. Authors in13 introduced a Temporal Convolutional Network (TCN)-based approach for network security situation prediction, which captures long-term dependencies in time-series data effectively. The model’s high complexity and resource requirements, along with its heavy reliance on data quality, limit its practicality in dynamic network environments.

In fact, radial basis functional neural network (RBF-NN)14 model is a promising technique to solve NSS prediction. RBF-NN model could achieve comparatively high accuracy with relatively small number of samples. Moreover, the commonly-used RBF-NN model enjoys a simple architecture and a relatively fast convergence. These advantages make it suitable to be applied in real network to predict NSS value in a time series.

In order to further enhance accuracy and accelerate training speed, several optimized techniques have been studied15,16. Authors in17 proposed a Graph Neural Network (GNN)-based method for network security situation awareness, leveraging network topology and node attributes to improve prediction accuracy. The method faces challenges in graph construction and high computational resource consumption, especially in large-scale and dynamic networks. In particular, these methods employ the ideology of controlling the training samples and weights in the RBF-NN model for improving the NSS prediction performance18. However, since the weights in these methods are initialized by the Gaussian kernel, these methods suffer from poor stability in predicting non-stationary NSS samples. Furthermore, some heuristic methods are also exploited to optimize the prediction model to achieve better training effectiveness. Authors in19 proposed a modified genetic algorithm (GA), which improves the population, fitness function, crossover probability, and mutation probability of the GA, to maintain the diversity of the population and avoid premature convergence. Authors in20 combined the GA and extreme learning machine (ELM) to establish the NSS prediction method, which offers higher prediction accuracy and better generalization ability. However, the con-vergence of GA is always slow, which leads to the increase of time complexity in return. In21, a hybrid model combining Radial Basis Function Neural Networks (RBF-NN) and attention mechanisms were proposed to enhance network security situation prediction accuracy and generalization. However, the model’s training complexity and sensitivity to hyperparameters, along with its limited adaptability to dynamic network environments, are significant drawbacks. In22, a real-time network security situation prediction method using federated learning and edge computing was proposed to ensure data privacy and improve prediction efficiency. The method suffers from high communication overhead and resource limitations on edge devices, as well as challenges in handling data heterogeneity across distributed nodes. Authors in23 also proposed to combine the PSO method and ELM to avoid the instability during the prediction of NSS. The PSO method relies on the values of local optimum and global optimum, which influences the convergence when handling complex and variable data. In summary, current optimization methods still could not achieve both rapid convergence, higher accuracy, and well generalization ability. This is the motivation of our work.

In this paper, we propose a new way to optimize BRF-NN to achieve both rapid convergence and high accurate with well generalization ability. Our main approach is to optimize the number of neurons by proposing a resource allocation network(RAN), and optimize the weights by proposing an iterative GA algorithm based on crossing model(CM). These techniques could contribute for a higher accuracy. Besides, a chaos search strategy is used in our GA algorithm and could help to accelerate the training procedures and enhance generalization ability. Concretely, we first conduct the RAN to obtain the optimal number of hidden layers. Then, the connected weights between layers in the RBF-NN model are optimized by an iterative optimized GA cross model. Finally, the chaos search strategy is added to the iterative optimization procedure to prevent the network from falling into local extreme points. To validate the feasibility and effectiveness of the proposed method, we construct a network with attacks and defenses, and record the NSS values within a few days as a dataset for the prediction experiment. The contribution of this paper is summarized as follow:

  1. (1)

    A RAN model is used to determine an optimal neuron number of hidden layer to determine an optimal BRF-NN architecture.

  2. (2)

    An improved CM-GA method is used to optimize the weights of the RBF-NN model, which is better than the conventional constant weights of Gaussian distribution

  3. (3)

    A chaos search strategy is used to prevent over-fitting problem so as to enhance generalization ability.

  4. (4)

    We apply the proposed method in a real network and record an NSS value dataset. The experimental results show that the proposed method could enhance the accuracy and speed up the learning procedure compared with other optimization techniques.

RBF-NN model with optimal number of neurons in the hidden layer

Architecture of RBF-NN model

The classical RBF-NN model includes an input layer, a hidden layer, and an output layer22. As shown in Fig. 1, the RBF-NN contains computations through the stream of input layer, hidden layer, and output layer. Among the computations, the transform calculations from input layer and hidden layer are nonlinearity for different weights, but the transform calculations between hidden layer and output layer are linearity for different weights.

Fig. 1
Fig. 1
Full size image

The architecture of RBF-NN model.

By synthesizing nonlinearity and linearity transforms, the RBF-NN model maps dimension \(n\) of input layer to dimension \(m\) of output layer. The information and features of RBF-NN model are reflected by the number of neurons in the hidden layer. Assumed that there are \(n\), \(h\), and \(m\) neurons for the input layer, hidden layer, and output layer of the RBF-NN model, we can define the nonlinearity computation procedure for the \(i - th\) output result in the neurons of hidden layer from the input layer :

$$\varphi_{i} = \exp \left( { - \frac{{\left\| {X - C_{i} } \right\|^{2} }}{{\sigma_{i}^{2} }}} \right),\;\;i = 1,2,...h$$
(1)

where \(X = (x_{1} ,x_{2} ,...,x_{n} )^{T} \in R^{n}\) expresses the \(n\) dimension inputs for RBF-NN model. \(C_{i}\) expresses the central nodes number of RBF-NN’s hidden layer. \(\left\| {X{ - }C_{i} } \right\|^{2}\) expresses a square norm for the computation of Eucliean distance. The width of basis function is defined as \(\sigma\), and the number of hidden layers is denoted as \(h\).

Therefore, based on the above definition, the output information from the \(j - th\) hidden layer will be computed by the following procedure:

$$y_{j} = \sum\limits_{i = 1}^{h} {\omega_{ji} \varphi_{i} (X) + b_{j} } ,j = 1,2,...,m$$
(2)

where \(\omega_{ji}\) expresses the weight between the \(i - th\) hidden layer and the \(j - th\) output layer. The \(j - th\) output layer contains totally \(m\) neurons with the additive bias of \(b_{j}\).

Optimization of hidden layer based on RAN

The number of neurons in the hidden layer decides the accuracy and efficiency of a RBF-NN model. When a RBF-NN model has less number of neurons in the hidden layer, the accuracy of prediction may be low which can’t reach the requirements. However, when the number of neurons has been added, the training consumption will be increased and the RBF-NN model has a complicated architecture. The efficiency will be decreased for a complicated RBF-NN model. Hence, determining the number of neurons in the hidden layer by a direct definition method is not suitable. To confirm the accuracy and efficiency, the determining procedure must be accord with the fact. In this paper, we determine the number by the RAN23, which will simultaneously guarantee the accuracy and the efficiency. For the requirements of a RBF-NN model, the RAN model with a dimension of \(n - h - 1\) has been proposed for optimizing the architecture of the RBF-NN model. Figure 2 shows the architecture of the RAN model. Noted that the RAN model could also be generalized to other neural network architectures, with some modifications to adapt to their specific structures and requirements.

Fig. 2
Fig. 2
Full size image

The architecture of the RAN model.

According to the architecture of RAN model, the output could be described as

$$y_{h} (k) = \omega_{0} + \sum\limits_{i = 1}^{h} {\omega_{i} \phi (X,c_{i} )} ,1 \le k \le L$$
(3)

where \(\omega_{0}\) is the threshold. \(k\) expresses the hidden layer neurons’ serial number. \(L\) is the amount of neurons in the hidden layers.

First, we assume that the input sample sequence of RAN is

$$S(i) = \{ (x_{i} ,y_{i} ),i = 1,2,...,N\}$$
(4)

In the equation, two populations of samples \((x_{1} ,y_{1} )\),\((x_{2} ,y_{2} )\) can be initialized by the following step

$$\omega_{0} = y_{1} ,\omega_{1} = y_{2} - y_{1} ,c_{1} = x_{2} ,\sigma_{1} = \mu \delta_{\max }^{2}$$
(5)

where \(\mu \in (0,1)\) expresses the initial extension center and \(\sigma_{1}\) represents the variance of the hidden layer. We utilize the parameter of \(\delta_{\max }\) to show the maximum distance. By using the error criterion and distance criterion, we can judge whether to add a new neuron for the hidden layer. Thus, the computations of error criterion and distance criterion can be defined as

$$|e_{k} | = ||y_{h} (k) - T(k)|| > \varepsilon$$
(6)
$$d_{i} = ||x_{i} - c_{nearest} || = \mathop {\min }\limits_{1 \le k \le L} ||x_{i} - c_{k} || > \delta_{i}$$
(7)

where \(T(k)\) expresses the RAN model’s desired output. The nearest node is \(c_{nearest}\), and the equation \(\delta_{i} = \max (\gamma \delta_{\max } ,\delta_{\min } ),\gamma \in (0,1)\) shows the attenuation constant. In addition, the introduced \(\varepsilon\) is used to express the desired precision.

Once the situations simultaneously satisfy the error criterion and the distance criterion, we will add a new neuron \(L{ + }1\) in the hidden layer, and set the all weights for the neuron. Otherwise, we will not add a new neuron. The settings of weights can be expressed as follow:

$$\omega_{L + 1} = e_{k} ,c_{L + 1} = x_{i} ,\sigma_{L + 1} = \gamma d_{i}$$
(8)

The judgment of whether to add a new neuron in the hidden layer of the RAN model is conducted by using error criterion and distance criterion. However, with the iterations of computation, redundant neurons will be added to the hidden layer. To prevent the redundant neurons in the hidden layer, we propose a new algorithm to prune the redundant neurons and the corresponding weights. The proposed algorithm gives the judgment of pruning by computing the maximum output of current hidden layer neuron. Three steps of the proposed algorithm have been illustrated as follow:

  1. (1)

    Compute the output results of each neuron in the hidden layer:

    $$\phi_{i}^{n} = \exp \left( { - \frac{{||x - c_{i} ||}}{{2\sigma_{i}^{2} }}} \right),i = 1,2,...,h$$
    (9)
  2. (2)

    Find the maximum output of neuron \(\phi_{\max }^{n}\), and normalize the output of all neurons by the maximum output:

    $$r_{i}^{n} = \left| {\frac{{\phi_{i}^{n} }}{{\phi_{\max }^{n} }}} \right|,i = 1,2,...,h$$
    (10)
  3. (3)

    Set the threshold \(\theta\), and judge the in-equation \(r_{i}^{n} < \theta\) after network optimizing iterations. If the results satisfy the in-equation, we can confirm the corresponding neurons have less correlation. These neurons can be ignored, and we prune these neurons for the RBF-NN model.

Connected weights optimization of RBF-NN model based on CM-GA

Conventional GA

GA24 is a good way to solve the weights optimization problem. Except for the number of neurons, the constant basis function width \(\sigma_{i}\) and weights \(\omega_{ji}\) extracted from the Gaussian distribution highly influence the performance of the RBF-NN model. We propose an improved GA to prevent the network falling into a local extreme point. By using CM to improve the crossover operation in the GA, the improved GA will search the global optimal solution and local optimal solution within the searching space25. By using the proposed algorithm, the searching efficiency and local optimization of basis function width \(\sigma_{i}\) and the connected weights \(\omega_{ji}\) will be significantly enhanced, and the diversity of weights for the network will be ensured for easier optimization. The conventional GA has five steps:

  1. (1)

    Gene coding: The gene coding is the pre-condition of GA. The crossover and mutation operation of GA are based on the gene coding. In this paper, GA is applied to self-adapt optimization of weights \(\sigma_{i} ,\omega_{ji}\) in the RBF-NN model. This optimization procedure will help RBF-NN model to reach the optimal accuracy and efficiency. The coding way of GA is defined as:

    $$X = \{ x_{1} ,x_{2} \}^{T}$$
    (11)
  2. (3)

    Select operation: The purpose of GA is to select more excellent genes to adapt more weights for optimization. In fact, we adopt an elitist strategy to select little fraction of excellent genes to drop out selection and crossover operations. These selected excellent genes directly pass to the next population. The elitist strategy will help to ensure the convergence of GA, and reduce the processing data to improve the efficiency of GA.

  3. (3)

    Crossover operation: The crossover operation is one of the operations which help the population of GA to create new samples. By using the crossover operation to improve the diversity of population, the conventional crossover operator is defined as:

    $$p_{c} = \left\{ \begin{gathered} p_{{c_{1} }} - (p_{{c_{2} }} - p_{{c_{1} }} )(f_{c} - f_{avg} )/(f_{\max } - f_{avg} ),(f_{c} \ge f_{avg} ) \hfill \\ p_{{c_{2} }} ,(f_{c} < f_{avg} ) \hfill \\ \end{gathered} \right.$$
    (12)

    where \(f_{c}\) expresses the higher fitness sample for both two samples of crossover operation. In current group, the maximum and average fitness are two important intermediate results, which are expressed by \(f_{\max }\) and \(f_{avg}\), respectively. \({p}_{{c}_{1}}\) and \({p}_{{c}_{2}}\) denote two different crossover possibilities. Such two parameters of \(0 < p_{{c_{1} }} < p_{{c_{2} }} < 1\) are used to adjust the crossover procedure in the training procedure.

  4. (4)

    Mutation operation: The mutation operation simulates people’s mutation procedure, which increases the ability of local searching for populations. In the GA, changing the fitness of some samples suddenly will do favor for improving the diversity of samples and reducing the risk of falling into a local extreme point for a single sample. The gene mutation operation is defined as:

    $$p_{m}^{t} = \left\{ \begin{gathered} p_{m}^{0} ,(t \le t_{0} ) \hfill \\ p_{m}^{0} \exp [k(t - t_{0} )/t_{\max } ],(t_{0} < t < t_{\max } ) \hfill \\ \end{gathered} \right.$$
    (13)

    where \(t\) is the number of current genetic generation, \({t}_{0}\) and \({t}_{max}\) are the start and end generation od genectic generation. \({p}_{m}^{0}\) is the mutation rate of random initialization, \({p}_{m}^{0}\) and \(k\) are two key parameters for the GA, which are used to control the process of mutation operation.

  5. (5)

    Loss function and iterations: Loss function is defined to measure the optimization degree of current genetic generation. To obtain the optimal weights of \(\sigma_{i} ,\omega_{ji}\) the RBF-NN model, the current testing samples error rates are defined as the loss function. By using the loss function, it is easy to optimize such weights \(\sigma_{i} ,\omega_{ji}\) in the network. The objective for optimizing is to obtain the minimal error rates by using select, crossover, and gene mutation operations. The loss function based on testing error rates of the RBF-NN model is defined as:

    $$U\left(f\right)=(1-\frac{{t}{\prime}}{T})\times 100$$
    (14)

    where \(T\) expresses the number of NSS samples that would be predicted. \({t}{\prime}\) expresses the NSS samples that are right predicted. We can iterative optimize to obtain the optimal weights by this loss function.

Improved GA with crossing model

The crossover operation in conventional GA happens in two different samples in the population, and the optimizing direction and speed can’t be guaranteed only by considering the individual differences by each single sample in the population. Due to such shortages, the two main weights in RBF-NN model can’t be converged to the right direction, and the convergence speed is slow. Therefore, a novel CM in this paper is proposed to achieve a better performance for GA optimization. The CM-GA considers the genetic operations within samples in the same category, and adds a competitive strategy for the samples under different operations. The competitive strategy on generative samples is the mechanism for selecting the superior ones. The individual samples with higher fitness are selected for the next generation to iterate optimizing two weights in the RBF-NN model. In fact, due to the similar individuals in closely breeding will protect the excellent genetic patterns, we keep the excellent characteristics and speed up the convergence procedure by retaining the similar individuals24. However, the strategy will excessively protect the excellent individuals which will reduce the diversity of individuals to make population falls into a local extreme point. Hence, our improved CM-GA will prevent crossover operations on individuals from different populations. This strategy will keep the otherness for individuals from different populations to sustain population diversity and prevent closely breeding. The population diversity and convergence speed will be counter-balance from the cooperative relationship and competitive relationship.

The improved CM-GA has ten steps which are defined as follow:

  1. (1)

    Compute the distance between two random individuals, and construct a adjacent matrix \(D\);

  2. (2)

    Solve the minimum spanning tree \(T\) from adjacent matrix \(D\) by the Prim algorithm;

  3. (3)

    Calculate the average weight \(W\) of \(T\), and get the maximum weight from \(T\) by threshold \(V\);

  4. (4)

    Traverse the tree \(T\). By searching the weights greater than \(V\) and pruning all connected edge, then obtain several sub-connected graphs;

  5. (5)

    Traverse all sub-connected graphs and get sub-classes. Store all sub-classes by a unique number;

  6. (6)

    Get an individual samples \(x\) by roulette selection, and record the population number \(i\) of the individual sample. Select the best individual sample by calculating the fitness value;

  7. (7)

    Select another population \(j\) which has the largest distance with sub-population \(i\). Random select individual samples \(y\) from sub-population \(j\) by roulette selection;

  8. (8)

    Perform the crossover operations between individual \(x_{1}\) and \(x_{2}\) to construct the crossing individual set \(X\);

  9. (9)

    Judge the distance between individual \(y\) and \(x_{1}\), \(x_{2}\) to find the longer distance individual sample. Assumed \(x_{2}\) has a longer distance, we will execute crossover operation between individual \(y\) and \(x_{2}\) to construct the individual samples set \(Y\);

  10. (10)

    The generative individual samples are selected from the optimal one of set \(X\) and \(Y\) by a greedy algorithm.

Iterative optimization based on chaos search strategy

On the iterative optimization based on improved CM-GA, to prevent the iterative optimizing procedures falling into a local extreme point, a chaos search strategy is conducted within optimizing procedures. Chaos is a non-linearity phenomenon with randomness and ergodicity. The strategy has an advantage to keep without repeating search in a specific range. Therefore, the strategy can be regarded as a method to help the improved CM-GA to jump out of the local extreme points25. The chaos search is applied about \(T\) times for the optimal of each generation and the population difference degrees. The original individual will be replaced by a better individual by a searching strategy. The strategy will improve the global searching ability of CM-GA.

Because of the searching space is small for the prediction problem of NSS, we adopt one dimension logistic mapping chaos model, which is defined as a vector product form:

$$Z^{m + 1} = \mu Z^{m} (1 - Z^{m} )$$
(15)

where \(\mu\) is used to control the chaos procedure based on the vector \(Z\), which is a random vector of \(D\)-dimension. The first iteration will be executed on initial values. The iterative procedure for the optimal of each generation and the population difference degrees are defined as:

$$X_{i}^{m + 1} = X_{i} + \alpha Z^{m + 1}$$
(16)
$$\alpha = \left\{ {\begin{array}{*{20}c} {1,} & {r \ge 0.5} \\ { - 1,} & {otherwise} \\ \end{array} } \right.$$
(17)

where \(X_{i}\) is the optimal individual of the population or the center of difference degrees. \(X_{i}^{m + 1}\) is the new individual after chaos search. \(\alpha\) is the adjusting parameter to facilitate the searching procedure to positive and negative directions. \(r\) denotes a random constant range in [0,1].

Figure 3 gives the detailed description of the improved NSS prediction algorithm. The iterative optimized RBF-NN model proposed in this paper will prevent the optimization procedure considering both global optimal solution and local optimal solution, which will improve the prediction performance of the NSS by using the RBF-NN model. Chaos is a non-linear phenomenon with randomness and ergodicity. This makes it possible for the chaos search strategy to conduct non-repetitive searches within a specific range. Therefore, it can help the improved CM-GA jump out of local extreme points. In the problem of network security situation prediction, due to the relatively small search space, we adopt the one-dimensional logistic mapping chaos model, which is defined in the form of a vector product \(Z^{m + 1} = \mu Z^{m} (1 - Z^{m} )\) (Formula 15), where μ is used to control the chaos process based on the n-dimensional random vector r. The first iteration is calculated based on the initial values. The iterative process for the optimal solution of each generation and the population difference degree is defined as \(X_{i}^{m + 1} = X_{i} + \alpha Z^{m + 1}\) (Formula 16), \(\alpha = \left\{ \begin{gathered} 1,r \ge 0.5 \hfill \\ - 1,otherwise \hfill \\ \end{gathered} \right.\) (Formula 17). Among them, \(X_{i}\) is the optimal individual of the population or the center of the difference degree, which provides an important reference benchmark for the chaos search. In practical applications, for example, when the population is stagnant in a certain local area and it is difficult to find a better solution, \(X_{i}\) represents the optimal state found by the current population. \(X_{i}^{m + 1}\) is the new individual obtained after the chaos search, which may jump out of the current local optimal area. The adjustment parameter α can adjust the search process in the positive and negative directions. When α is positive, the search direction tends to be away from the current point, which helps to explore new areas in a larger range. When α is negative, the search direction is closer to the current point, which can more carefully explore the potential solutions in the current area. R is a random constant with a value in [0,1], which introduces randomness into the chaos search and makes the search process more flexible.

Fig. 3
Fig. 3
Full size image

The detailed procedure of the proposed CM-GA improved RBF-NN model for the NSS prediction.

In the iterative optimization process of the improved CM-GA, the chaos search strategy is integrated as follows: In each generation of optimization, according to the difference degree of the population and the situation of the optimal individual, some individuals are selected for the chaos search operation. Specifically, for areas with a large difference degree, the frequency of the chaos search is appropriately increased to explore more potential solution spaces. For areas close to the optimal individual, the individuals participating in the chaos search are carefully selected to avoid over—destroying the current excellent solutions. When performing the chaos search, the selected individuals are updated according to Formulas (16) and (17). The updated new individuals will be compared with the individuals in the original population. If the fitness of the new individuals is higher, they will replace the original individuals and be integrated into the population, thus promoting the evolution of the population in a better direction. In this way, the chaos search strategy is closely combined with CM-GA, effectively enhancing the global search ability of the model and avoiding the iterative optimization process from falling into local extreme points.

Time complexity analysis

According to the standard procedures and steps for the time complexity analysis, the parameters in the proposed method are the maximum iteration number \(g_{\max }\), the scale of population \(N\), and the dimension of solution space \(D\)26. In each computing and updating iteration, the width and weights connected between different layers consume a time complexity of \(O(N \times D)\). The crossing and genetic mutation operations consume a time complexity of \(O(N \times D)\)27. The chaos search procedure for optimal individual and the center of difference degrees consume a time complexity of \(O(N \times D \times g_{\max } )\). In all, the time complexity of our proposed iterative optimized RBF-NN model is \(O(N \times D \times (2 + g_{\max } ))\).

Experiments and results analysis

In this section, we evaluate the proposed method by applying it into a real network. First, we analyze suitable parameters and construct the prediction model. Then, by comparing with different optimization techniques, the effectiveness of CM-GA is demonstrated. Besides, the performance of the CM-GA based BRF-NN is evaluated by comparing with different prediction models. Moreover, we also analyze time costs among different methods.

Experimental setups

To evaluate the performance of the proposed method, we apply it into a real data center network as shown in Fig. 4. The network topology is a 2-pod 3-layer fat tree and it is a typical architecture for data center. However, one should note that the proposed method could be applied into all kinds of network topologies as the performance of the proposed method is not related to the network topology. 8 regions of users are connected to the data center. Users in each region may randomly launch various attacks, including security vulnerability, DDoS attacks, backdoors, etc. NSS prediction module records the NSS value by observing the security state on the core switch, including the number and types of attacks, and the damage degree of the host after the attacks. During the experiment, the NSS values are evaluated at the same time of each day. We construct the NSS dataset by collecting 100 NSS values over 15 days. These data were used in the following training and prediction experiments. The dataset was divided into training and testing sets with a ratio of approximately 80:20. For the first experimental setup (with a time duration of 3), 82 samples were used for training, and 16 samples were used for testing. For the second setup (with a time duration of 5), 80 samples were used for training, and 16 samples were used for testing. This division ensures that the model is trained on a sufficient amount of data while leaving enough samples for validation. To ensure that the dataset size is sufficient for robust generalization, we conducted additional experiments with varying dataset sizes. The results showed that the proposed method consistently achieves high prediction accuracy even when the dataset size is reduced, indicating that the model is robust and generalizes well.

Fig. 4
Fig. 4
Full size image

The constructed network security situation environment.

Data pre-processing

To validate the effectiveness of the proposed method, four different metrics are adopted to evaluate the performance of the NSS prediction results, namely mean relative error (MRE), mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R2). MRE reflects the reliability of the predicted results while MSE describes the variation degree of the predicted results. A smaller value of MSE represents a better prediction performance. In addition, MAE and R2 reflect the real situations between the ground-truth and the predicted results. A smaller value of MAE means a better prediction results while a larger value of R2 means a better prediction results. The four adopted metrics are computed as follow:

$$MRE = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {\frac{{y_{i} - \hat{y}_{i} }}{{y_{i} }}} \right|} \times 100\%$$
(19)
$$MSE = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } \times 100\%$$
(20)
$$MAE = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {y_{i} - \hat{y}_{i} } \right|} \times 100\%$$
(21)
$$R^{2} = \frac{{\left[ {\sum\nolimits_{i = 1}^{N} {(y_{i} - \overline{y})(y_{i} - \overline{\hat{y}}_{i} )} } \right]^{2} }}{{\left[ {\sum\nolimits_{i = 1}^{N} {(y_{i} - \overline{y})^{2} } } \right]\left[ {\sum\nolimits_{i = 1}^{N} {(y_{i} - \overline{\hat{y}}_{i} )^{2} } } \right]}}$$
(22)

where \(y_{i}\) is the ground-truth of NSS, and \(\hat{y}_{i}\) is the predicted value. \(N\) is the total number of NSS samples. \(\overline{y}\) is the mean value of the ground-truth, and \(\overline{\hat{y}}_{i}\) is the mean value of the predicted values.

To prevent deflection of the recorded samples, the original NSS values are normalized by the following method:

$$\hat{x} = \frac{{x - x_{\min } }}{{x_{\max } - x_{\min } }}$$
(22)

where \(x_{\max }\) is the maximum value of whole NSS values, and \(x_{\min }\) is the minimum value of the whole NSS values. The normalization process was applied to NSS values that make all values are ranged in [0, 1].

The attacks have the characteristics of randomness and continuity. Therefore, the span of time durations influences the prediction performance of NSS. To measure the performance of the proposed method under different settings, two different time spans (\(\tau = 3\) and \(\tau = 5\)) have been used in the experiments followed by the settings in28. For \(\tau = 3 \,\), we set 82 samples for training and the input dimension is set as 3. For \(\tau = 5\), we set 80 samples for training and the input dimension is set as 5. For both \(\tau = 3\) and \(\tau = 5\), we set 16 samples for testing and the output dimension is 1. Table 1 illustrates the input and output samples for the prediction setting under two settings.

Table 1 The input and output samples for different prediction settings.

Optimal parameters selection

Selection of optimal time duration and neuron number

To determine the optimal time duration and the number of neurons in the hidden layer, we employ a systematic approach based on cross-validation and grid search, rather than relying solely on trial-and-error. We consider 2 kinds of time durations, namely \(\tau = 3\) and \(\tau = 5\). Based on the RAN model, several hidden layer’s neurons have been added and optimized by pruning. We compare the different prediction results under \(\tau = 3\; \, or\; \, 5\) as shown in Table 2.

Table 2 NSS prediction results under \(\tau = 3\) and \(\tau = 5\).

One can find that our method achieves a better estimation error from the perspective of 4 metrics under \(\tau = 5\). We can conclude that the NSS prediction at \(\tau = 5\) gives more effective information than \(\tau = 3\) and shows a better prediction performance than the time duration of 3. Thus, an optimal number of neurons \(h = 10\) and time duration \(\tau = 5\) is obtained for the hidden layer.

Selection of optimal parameters of chaos search based CM-GA

The most important parameters in the CM-GA method are the population \(N\), the mutation probability \(p_{m}\), and crossover probability \(p_{c}\). To ensure that the chosen parameters are near-optimal, we conducted a sensitivity analysis by varying pm and pc across a range of values and evaluating the prediction performance in terms of mean square error (MSE). The results of this analysis are presented in Figs. 5 and 6.We keep the same setting of the RBF-NN model in "Selection of optimal time duration and neuron number" section, and measure the performance of NSS prediction on these three parameters. Figures 5 and 6 show the prediction performance under different settings of mutation probability and crossover probability during the increase of populations.

Fig. 5
Fig. 5
Full size image

The prediction performance of the proposed CM-GA method in terms of mutation probability and population.

Fig. 6
Fig. 6
Full size image

The prediction performance of the proposed CM-GA method in terms of crossover probability and population.

From the results in Figs. 5 and 6, one can find that the population of the genetic algorithms will highly influence the NSS prediction performance. As can be seen from the average MSE results, with the increase of populations in the CM-GA method, the average MSE of the prediction NSS values will decrease within different settings of the mutation probability \(p_{m}\) and crossover probability \(p_{c}\). However, it is not that more population \(N\) of GA is better for the proposed method. With the increase of population \(N\), the computational amount in the iterative process is also increased, which leads to an increase of the time complexity and the number of iterations also increases. Therefore, according to the actual NSS value prediction with a large data volume, the setting of population \(N\) with a high accuracy and excellent efficiency can be selected through the trial-and-error strategy, which will obtain a good performance. In addition, by comparing the four lines with different values of \(p_{m}\) in Fig. 5, one can find that when the population N is larger, a larger \(p_{m}\) could lead to a lower error of predicting the NSS values. Moreover, one can fine from Fig. 6 that a larger \(p_{c}\) will decrease the error of predicting the NSS values. Therefore, when we apply the CM-GA-RBF-NN method to a real-network security situation prediction, selecting a suitable group of parameters is needed.

These parameter settings were chosen based on a sensitivity analysis that evaluated the impact of different values of \(p_{m}\) and \(p_{c}\).on the prediction performance. The analysis showed that \(p_{m}\) = 0.15 and \(p_{c}\). = 0.3 provide a good balance between prediction accuracy and computational efficiency. In our experiments, the improved CM-GA algorithm gets an initialized population as \(N{ = }85\). The genetic mutation rate is \(p_{m} = 0.15\), and the crossing probability is \(p_{c} = 0.3\). The maximum number of iterations is set as \(g_{\max } = 500\). To further enhance the generalization ability of the proposed method, we introduce an adaptive parameter tuning strategy. This strategy dynamically adjusts the mutation and crossover probabilities based on the diversity of the population and the convergence rate during the optimization process. By doing so, the proposed method avoids premature convergence to local optima and ensures robust performance across different network environments.

Effectiveness of chaos search based CM-GA

To comprehensively compare the NSS prediction performance under different optimized techniques, the performance of original RBF-NN, improved RBF-NN with the GA, simple genetic algorithm (SGA)29, and compact genetic algorithm (CGA)30 are all compared under different time duration. The convergence ability and global searching ability are compared for these methods to validate the effect of the proposed method.

From the results in Fig. 7, one could find that the proposed method achieves the best NSS prediction performance on MRE, MSE, MAE, and R2 indexes. Compared with original RBF-NN and three typical optimization techiniques, our method achieves a best MSE of 0.00072 at \(\tau = 3 \,\), and a highest R2 of 0.9243. That is to say, by using the proposed method to predict NSS, the predicted results will be very close to the ground-truth values, with an excellent relationship. Besides, our method achieves a best MRE of 0.0219 at \(\tau = 5\), and a highest MAE of 0.9243. That is to say, by using the proposed method to predict NSS, the predicted results are very similar than the ground-truth results, so the proposed method will really reflect the security situation of the whole network in any certain tense. Compared with the prediction results between Fig. 7a and b, we can conclude that the NSS prediction at \(\tau = 5\) will give more effective results than the NSS prediction at \(\tau = 3 \,\). Since an input of 5 NSS values will provide more information of the network’s security situation, therefore it shows a better prediction performance than the input of 3 NSS values for all compared methods.

Fig. 7
Fig. 7
Full size image

NSS prediction performance comparison under \(\tau = 3 \,\) and \(\tau = 5\).

From Fig. 8, one can find that the number of iterations shows the convergence of the proposed method is faster in the NSS prediction, resulting in a lower time complexity. In addition, thanks to the chaos search strategy of CM-GA method, the iterations of the proposed method converge more efficiently, respectively requiring a number of 93 and 197 iterations at \(\tau = 3 \,\) and \(\tau = 5\).

Fig. 8
Fig. 8
Full size image

Iterations during the NSS prediction under \(\tau = 3 \,\) and \(\tau = 5\).

From the perspective of time series, we also compare R2 for each test sample. Figure 9 shows the prediction performance by using R2 under \(\tau = 5\) when different optimized techniques are applied. Compared with other three techniques, namely GA, SGA, CGA-RBF-NN methods, the conventional RBF-NN method achieves a relatively poor NSS predicting results, since the weights are initialized by the Gaussian kernel. However, the GA method is very useful for the optimization of the RBF-NN model, which will increase the robustness and stability during the prediction of non-stationary NSS samples. Furthermore, since the proposed CM-GA method could decrease the number of iterations during training the weights of the RBF-NN model, it further leads to a high prediction performance and efficiency.

Fig. 9
Fig. 9
Full size image

The prediction performance by using R2 under \(\tau = 5\) when different optimized techniques are applied.

Moreover, we also analyze the generalization ability for RBF-NN under different optimized techniques. Despite that we have obtained an optimal neuron number \(h = 10\), we also compare the prediction performance under \(h = 5,15\). Experimental results are shown in Figs. 10, 11, 12 with three evaluation metrics: average fitness \(f_{avg}\), maximum fitness \(f_{\max }\), and mean absolute percent error (MAPE), where \(f_{\max }\) and \(f_{avg}\) values are the final maximum fitness and average fitness results of GA, and the results of MAPE reflects the predict results. From the results, we have found that a large number of neurons in the hidden layer will increase the final maximum fitness and average fitness results for all compared four GA-based methods. However, it is not that more numbers of neurons are better for the predicting of NSS values. The results of \(h = 15\) will reduce the prediction performance on the GA-based methods, and when \(h = 10\) of the hidden layer, the minimum value of MAPE is obtained for four GA-based methods. Therefore, before training the RBF-NN model, it is worth to apply the RAN model to automatically determined the number of neurons in the hidden layer, which will tremendously improve the NSS prediction performance in real-world network security.

Fig. 10
Fig. 10
Full size image

Comparison of maximum fitness of different models under different hidden neurons.

Fig. 11
Fig. 11
Full size image

Comparison of f average fitness of different models under different hidden neurons.

Fig. 12
Fig. 12
Full size image

Comparison of f MAPE of different models under different hidden neurons.

Performance of the proposed method

To compare the NSS prediction performance between the proposed method and the state-of-the-art methods, four methods, ARMA11, GM(1,1)12, Least Squares Support Vector Machine (LSSVM)31, K-Means32, are selected for the comparative experiments on the recorded NSS samples. For all predicted NSS sample recorded from the 15 days, the ground-truth and the predicted results from five compared results have been given for each NSS sample, and the statistical results of MSE, MAE, and MRE have also been given.

As can be seen from the Fig. 13, the proposed method (in red line) is nearest to the ground-truth (in black line) than other compared methods. Despite the unstable changes of the NSS time series, the proposed method can describe such transformations by iteratively optimized weights of RBF-NN model, and then it can track the characteristics of the NSS under two different conditions, both \(\tau = 3 \,\) and \(\tau = 5\) .

Fig. 13
Fig. 13
Full size image

the predicted NSS value under \(\tau = 3 \,\) and \(\tau = 5\).

Figure 14 shows the prediction performance of 5 different methods from the perspective of MAPE, RMSE, R2. Our proposed method obtains the highest accuracy for NSS prediction, and other algorithms have different degrees of errors during NSS predicting of each day. Our accuracy is increased by at most 71.7%, 73.3%, 86.6% in terms of MAE, MSE, MRE, respectively, The ARMA method is used for random stationary time-series, but the NSS time-series is non-stationary because of the randomness and complexity of network’s attacks. The GM(1,1) method is used for monotonic time-series, but the NSS time-series is chaos with non-monotonic data. In addition, the LSSVM obtains the results of all data are used for support vectors, and the results will be lack of sparsity for SVM. Moreover, the K-means-RBF needs to set a reliable initialized value for the hidden layer, but this strategy will ignore the characteristic of data and weak the generalization ability for RBF-NN architecture. In fact, the weights of the RBF-NN model are modulated by the CM-GA method, so the proposed method is capable of catching the nonlinear and non-stationary characteristics of NSS samples of each day. Therefore, the proposed method outperforms other 4 methods. Besides, the differences of the three indexes of the proposed method is certain less, indicating that the predicted NSS results are more robust than the compared methods.

Fig. 14
Fig. 14
Full size image

prediction performance of different methods.

We also compared our results with newest methods21,22. While TCN and GNN-based methods show strong performance in capturing temporal and structural dependencies, they require significant computational resources and are less efficient in dynamic network environments. The hybrid RBF-NN and attention mechanism model21 demonstrates competitive accuracy but suffers from high training complexity. The federated learning-based approach22 ensures data privacy but faces challenges in communication overhead and edge device resource limitations. In contrast, the proposed method balances accuracy, efficiency, and adaptability, making it more suitable for real-world NSS prediction tasks.

In fact, the proposed method demonstrates good predictive performance but has practical constraints related to memory overhead, real-time feasibility, and incremental model updates. These constraints can be addressed through efficient memory management, near real-time prediction strategies, and incremental learning techniques.

Time complexity analysis

Time complexity is also another important index to measure the effectiveness of the NSS prediction methods. For the compared GA-based series methods, and the state-of-the-art methods, Table 3 shows the comparisons results of training and testing time consumption. As can be seen from Table 3, compared with the GA-based series methods, the training time consumption of the proposed method is lower than other GA-based RBF-NN methods by 7.5–29.2%, because of less iterations during training the RBF-NN model using the CM-GA method. Nevertheless, the testing time consumption is similar among the GA-based series methods. Compared with the state-of-the-art methods, the training time consumption of the proposed method is significantly high than the other compared methods. The ARMA and GM (1,1) methods don’t need the process of training, but the prediction results are less effective. The LSSVM and K-means methods used two conventional machine learning models, and the efficiency of such two models is better than the RBF-NN model. The proposed method adopts a training model for a longer time to achieve better NSS prediction performance, but the testing time consumption is similar than the compared methods.

Table 3 Time complexity compared with the state-of-the-art NSS prediction methods.

While the proposed method achieves superior accuracy in complex NSS prediction tasks, its training time is higher compared to simpler models like ARMA and GM(1,1). We further explore several techniques to reduce training time without sacrificing accuracy.

  • Parallelization: The iterative optimization process in CM-GA can be parallelized to take advantage of multi-core processors or distributed computing environments. By parallelizing the chaos search and genetic operations, the training time can be significantly reduced

  • Early Stopping: Implementing an early stopping mechanism based on validation error can help terminate the training process once the model’s performance plateaus. This prevents unnecessary iterations and reduces training time without compromising accuracy.

  • Approximate Nearest Neighbor Search: The chaos search strategy involves searching for optimal solutions in a high-dimensional space. Using approximate nearest neighbor search techniques (e.g., locality-sensitive hashing) can speed up the search process while maintaining solution quality.

  • Model Pruning: During the training process, redundant neurons or weights can be pruned based on their contribution to the model’s performance. This reduces the complexity of the network and speeds up training without significantly affecting accuracy.

In this way, the proposed method could be optimized to reduce training time without sacrificing accuracy.

To further evaluate the performance of the proposed CM-GA method, we compare it with two widely used optimization techniques: Particle Swarm Optimization (PSO) and Differential Evolution (DE). The comparison focuses on robustness, computational overhead, convergence speed, and generalization ability. While PSO is effective in many optimization problems, it tends to converge prematurely to local optima, especially in high-dimensional and non-linear problems like NSS prediction. This limits its robustness in dynamic network environments. DE generally has lower computational overhead compared to CM-GA, as it relies on simple mutation and crossover operations. However, DE may require more iterations to achieve similar levels of accuracy, especially in complex NSS prediction tasks. Thus, CM-GA offers superior robustness, faster convergence, and better generalization ability, especially in complex and dynamic NSS prediction tasks, making it more suitable for NSS prediction tasks. PSO is computationally lighter but may struggle with premature convergence and robustness in complex scenarios. DE is simpler and computationally lighter but may require more iterations to achieve similar accuracy in complex problems.

Computational overhead analysis

To comprehensively evaluate the proposed method, we compare its computational overhead and performance with state-of-the-art NSS prediction methods, including Temporal Convolutional Networks (TCN)13, Graph Neural Networks (GNN)17, a hybrid model combining RBF-NN and attention mechanisms21, and a federated learning-based approach22. The evaluation metrics include training time, testing time, memory usage, and prediction accuracy.

As shown in Table 4, the proposed method achieves the lowest training time (18.25 s) and testing time (1.85 s) among the compared methods, demonstrating its computational efficiency. Additionally, the proposed method requires only 0.9 GB of memory, which is significantly lower than the memory usage of TCN (1.2 GB), GNN (1.5 GB), and the federated learning-based approach (1.3 GB). This makes the proposed method more suitable for deployment in resource-constrained environments.

Table 4 Computational overhead and performance comparison with state-of-the-art NSS prediction methods.

In terms of prediction accuracy, the proposed method achieves the lowest mean relative error (MRE) of 0.0195, outperforming TCN (0.0219), GNN (0.0225), and the federated learning-based approach (0.0231). The hybrid RBF-NN model21 shows competitive accuracy with an MRE of 0.0208, but its training time and memory usage are higher than the proposed method.

Conclusions

To improve the performance and efficiency of the NSS prediction, this paper first constructs an entity of network of attacks and defenses, and records the NSS dataset of each day. Then, an iteratively optimized RBF-NN model is proposed to predict the NSS. During the experiment, the RBF-NN model can quickly train the NSS samples with the determination of neurons in the hidden layer, and an improved CM-GA method with a chaos search strategy is used to optimize the weights of the RBF-NN model, which can accurately predict the NSS of the following days. Comparative experimental results have shown that he proposed CM-GA-RBF-NN model has certain advantages compared with the state-of-the-arts methods in the real-world NSS prediction, with a high convergence speed and excellent prediction accuracy. However, the CM-GA-RBF-NN model also has some shortcomings. It has a large contingency in the convergence of optimizing the weights of RBF-NN model by using the CM-GA method. Besides, more temporal characteristics of the NSS time-series are never considered for the proposed method, which will play an important role in predicting the NSS value. For the future works, we will further study the temporal characteristics by using the sliding window method, and explore more robust methods to optimize the weights of RBF-NN model. We believe these enhancements will significantly improve the real-world applicability of the system.