Introduction

As the key to the new energy transformation, the security of power grid directly affects people's livelihood and social stability. Due to the deep cross-fertilisation of cyber-physical power systems, the security of new power systems stems from both physical and cyber systems1,2. However, the traditional security control of the cyber or physical power system is difficult to cope with the high-level persistent threat from cyber-physical space3. Research works have shown that different types of malicious attacks faced power system are increasing, especially the threat of false data attacks4,5. At present, malicious attacks in power system can be classified into two categories: denial-of-service attacks (Dos) and false data attacks (FDAs)6. As shown in Fig. 1, due to the openness of power system, hackers can inject malicious attacks, such as Dos and FDA. Dos attacks aim to disrupt the communication layer of power system causing it to go down7. It's worth noting that Dos attacks can be detected by using KF state estimation. In contrast, FDAs can bypass the above detection mechanism by injecting a bank of false data8,9. With incomplete network information, Jin et al., constructed an FDI attack aiming at AC state estimation10. For instance, power system in Taiwan suffered a hacking attack that caused very large losses in 2023. Without a timely response to the above cyber-physical attacks, an attacker can wreak havoc on the power system. In this regard, rapid and accurate detecting false data attacks is a critical issue for cyber-physical power system.

Fig. 1
Fig. 1
Full size image

Description of malicious attacks in power grid.

To detect injected FDAs, lots of research works have been devoted. The existing detection methods can be classified into two categories: model-based techniques and learning-based techniques. Model-based detection techniques are based on static power data. By constructing accurate mathematical models, model-based techniques can detect and identify the injected attacks11,12,13,14,15,16. In11, a secure observer-based attack detection approach was developed. The proposed method considered the influence of FDAs on system and designed a security technique for detecting and preventing FDAs by using observer residuals. In12, an attack detection and reconstruction approach was proposed. The state residual generated by a robust sliding mode observer was used to detect and reconstruct the dynamic load altering attacks (DLAAs). In addition, to cut down the influence of precomputed threshold, an adaptive observer-based detection approach using adaptive threshold was developed13. Motivated by the super-twisting sliding mode algorithm, an effective attack detection and reconstruction framework was constructed14. In contrast to above centralized detection methods, Chowdhury rt al., developed a decentralized estimator to detect and identify distributed FDAs in power system15. In16, a novel detection method using unknown interval observer was proposed. The proposed detection approach can use interval residuals to replace the traditional precomputed threshold. As a sum, model-based detection techniques can enhance detection performance against FDAs to a certain extent. Meanwhile, the accuracy of power model and design of detection thresholds are key factors affecting model-based detection techniques.

With the rapid development of AI technology, learning-based detection techniques are gradually being applied to detect anomalous data in power grids. Different from model-based detection techniques, learning-based detection techniques are not limited by the model accuracy and design of detection thresholds. In17, an improved random forest algorithm-based detection algorithm against FDAs and random attacks was proposed in power grids. Based on the established linear power model, a data-driven using low-rank detection approach was developed8. Taking the unknown system parameters into account, a detection approach using the observable Markov decision process was proposed18. In19, a support vector machine based detection model was developed to identify the injected FDAs. However, the detection accuracy of above learning-based methods can be affected by the selection of detection threshold. In20, a machine learning-based detection technique using semi-supervised generative adversarial network was proposed to detect and locate the injected FDAs. In addition, the deep convolutional neural network (CNN) has gradually been an efective method for effective feature extraction. In21, a group-fusion one-dimensional CNN was proposed to improve the model recognition accuracy. And a multi‐scale group‐fusion one‐dimensional CNN for high‐resolution range profile (HRRP) target recognition was presented22. To reduce the computational complexity of vanilla CNNs, a lightweight depth-wise separable fusion CNN for ballistic target HRRP recognition is developed23. A graph convolutional network framework was constructed to detect FDAs, which considered the impact of the attack on the spatial properties of the grid system structure24. However, above learning-based detection techniques lack consideration of attack detection from a spatio-temporal perspective. By ignoring the topological relationship among various instruments within a grid deployment, these techniques frequently lead to an incomplete extraction of spatial information. Furthermore, recent study in25 indicates that load variations in smart grids have distinct spatial-temporal characteristics.

Motivated by above problems, this paper explores a spatial-temporal detection framework against FDAs in power grids. As shown in Fig. 2, the proposed detection framework consists of Kepler Optimization Algorithm (KOA)-convolutional neural networks (CNN), bidirectional gate recurrent unit (BiGRU) and attention mechanism. Specifically, CNN is mainly used to extract local and spatial features from data. Through convolutional and pooling layers, CNN can extract local features of power network data (such as voltage, current, power, etc.) from time series data; BiGRU is an enhanced recurrent neural network (RNN) that can simultaneously capture the forward and backward dependencies of time series data. Through gate control mechanisms (reset and update gates), BiGRU can capture long-term dependencies in network data, such as the correlation between abnormal events before and after; Attention mechanisms are used to dynamically assign weights, highlight important features and suppress irrelevant features. There can be a large amount of noise or redundant information in power system data, and attention mechanisms can help models focus on key features of anomalous data, improving detection accuracy. In addition, KOA is an optimisation algorithm used to adjust the hyperparameters of a model, such as learning rate, convolution kernel size, number of GRU units, etc. Grid data typically has high dimensionality and complexity, and KOA can help find the most appropriate model parameters for data features, thereby improving detection accuracy. Main contributions can be summarized as follows.

  1. 1.

    A spatial–temporal detection framework against FDAs is proposed, which consists of spatial features extraction model using KOA-CNN; temporal features extraction using BiGRU. KOA is introduced to enhance the feature extraction performance of MCNN; Attention mechanism is further proposed to improve the detection performance of constructed detection model.

  2. 2.

    Simulation tests on IEEE 14-bus and 118-bus grid system are provided. Compared with the existing detection methods, the detection indicators, such as , such as accuracy, missed alarm, precise, \(F_{1}\)-score, and recall, by at least at least 1.49%, 1.15%, 11.24%, and 4.15%, respectively. In addition, simulation results verify that the proposed detection model has a good robustness performance against attack intensity.

Fig. 2
Fig. 2
Full size image

Spatial–temporal detection framework against FDAs.

The framework of this work is organized as follows. Section "Background" presents the covert feature of FDAs. The proposed spatial-temporal detection framework against FDAs is provided in Section "The spatial-temporal detection framework against FDAs". Simulation cases are tested in Section "Case studies". Section "Conclusions and discussion" shows the conclusion and future works.

Background

In general, Phasor Measurement Unit (PMU) can acquire the operating status of the power system in real time, such as power voltage, phase angles, etc. Based on these collected data, Supervisory Control And Data Acquisition (SCADA) can implement data anomaly analysis and anomaly alerts by using state estimator. Therefore, accurate state estimation is critical to power system security. According to the work in26, the designed FDA can fool the detection mechanism using chi-square detector. Based on this, power grid and FDA models are presented in this section.

Power system model

In this section, a AC power model is presented. Considering power voltage,phase angles, etc, AC power model can be described as follows27.

$$ z = H\left( x \right) + v, $$
(1)

where \({\mathbf{z}} = \left[ {z_{1} , \ldots ,z_{m} } \right]\) is measurement data, \(x = \left[ {x_{1} , \ldots ,x_{m} } \right]\) is system state, v is measurement noise, \(H = \left[ {\begin{array}{*{20}c} {H_{1} ,} & { \cdots ,} & {H_{m} } \\ \end{array} } \right]\) denotes the Jacobi matrix for power system estimation, which describes the relationship between measurement data and system state as follows.

$$ P_{n} = V_{n} \sum\limits_{k = 1}^{m} {V_{k} \left( {G_{nk} \cos \theta_{nk} + B_{nk} \sin \theta_{nk} } \right)} $$
(2)
$$ Q_{n} = V_{n} \sum\limits_{k = 1}^{m} {V_{j} \left( {G_{nk} \sin \theta_{nk} - B_{nk} \cos \theta_{nk} } \right)} $$
(3)
$$ P_{nk} = - V_{n}^{2} G_{nk} + V_{n} V_{k} \left( {G_{nk} \cos \theta_{nk} + B_{nk} \sin \theta_{nk} } \right) $$
(4)
$$ Q_{nk} = - V_{n}^{2} G_{nk} - V_{n} V_{m} \left( {G_{nk} \sin \theta_{nk} - B_{nk} \cos \theta_{nk} } \right) $$
(5)

where \(\theta_{nk} = \theta_{n} - \theta_{k}\) denotes the phase difference between bus n and k, \(V_{k}\) denotes voltage amplitude, \(G_{nk}\) and \(B_{nk}\) are conductance and susceptance, respectively, \(P_{n}\) and \(Q_{n}\) denote the active and reactive power, respectively, \(P_{nk}\) and \(Q_{nk}\) denote the real and reactive power flow between bus n and k, respectively.

Based on the collected measurement data from PMU, SCADA can determine the operational status of power system. Commonly, bad data detection mechanism (BDDM) is used as a criterion for judging anomalous data, which can be described as follows26.

$$ \gamma = z - \tilde{z} $$
(6)
$$ \tilde{z} = H\left( {\tilde{x}} \right) $$
(7)
$$ \tilde{x} = \mathop {\min }\limits_{x} [z - h(\tilde{x})]^{T} R^{ - 1} [z - h(\tilde{x})] $$
(8)

where \(\gamma\), \(\tilde{z}\) and \(\tilde{x}\) denote the measurement residual, measurement estimated value,

respectively.

Based on precomputed threshold and measurement residuals in Eq. (6), the corresponding detection criterion can be expressed as

$$ \left\{ \begin{gathered} ||\gamma || \le \upsilon ,Normal \, \hfill \\ ||\gamma || > \upsilon ,{\text{Abnormal}} \hfill \\ \end{gathered} \right. $$
(9)

where \(\upsilon\) is the precomputed threshold. It is noted that the selection of precomputed threshold is given in26.

False data attack model

To fool the BDDM, hackers can design a bank of false data, which satisfies the following constraint as9

$$ \xi^{f} = H\left( \delta \right) $$
(10)

where \(\xi^{f}\) is false attack sequence, \(\delta\) is the attack-induced state change.

Taking false attack sequence \(\xi^{f}\) into Eq. (6), one can obtain

$$ \begin{gathered} \gamma^{f} = \left\| {z^{f} - H\tilde{x}^{f} } \right\| \\ = \left\| {(z + \xi^{f} ) - H({\tilde{\mathbf{x}}} + \delta )} \right\| \\ = \left\| {(z - H\left( {{\tilde{\mathbf{x}}}} \right)) + (\xi^{f} - H\left( \delta \right))} \right\| \\ = \left\| {z - H\left( {{\tilde{\mathbf{x}}}} \right)} \right\| + \left\| {\xi^{f} - H\left( \delta \right)} \right\| \\ = \gamma \\ \end{gathered} $$
(11)

Equation (11) indicates that \(\gamma^{f} = \gamma\) if hacker can inject false data to satisfy \(\xi^{f} = H\left( \delta \right)\). Since measurement residual \(\gamma^{f}\) under FDAs cannot exceed the precomputed threshold \(\upsilon\), the BDDM can be cheated without triggering an alarm. In the following, Example 1 is given to shown the covert feature of FDA.

Example 1

As shown Fig. 3, it is assumed that hackers can inject false data into generator 1 at \(t = 80s\) on IEEE 6-bus grid system. Of noted, the related simulation parameters are given in26. Figures 4 and 5 show the state and residual change under FDAs.

Fig. 3
Fig. 3
Full size image

IEEE 6-bus grid system.

Fig. 4
Fig. 4
Full size image

Change in power voltage under FDAs.

Fig. 5
Fig. 5
Full size image

Change in residual under FDAs.

Simulation results in Example 1 demonstrate the covert feature of FDAs. By exploring the vulnerability of BDDM, hackers can tamper with grid operation without triggering alarms. For this reason, detection of FDAs is a pressing issue for current grid security mechanisms. Based on this, a spatial-temporal detection framework using KOA-CNN-BiGRU-Attention detection framework is developed.

The spatial–temporal detection framework against FDAs

In this section, a spatial-temporal detection framework against FDAs is proposed. The proposed detection framework includes the CNN-based spatial feature extraction and BiGRU-based temporal feature extraction. In addition, KOA is introduced to enhance the feature extraction performance of CNN; Attention mechanism is further proposed to improve the detection performance of constructed detection mode. The detailed framework is given as follows

CNN-based spatial feature extraction

To extract spatial features from the grid dataset, CNN model is constructed. As shown in Fig. 6, the constructed CNN model consists of input layer, a convolutional layer, a pooling layer and a fully connected layer. The input layer obtains the input data (including normal and abnormal) of power system; convolution and pooling layers extract the spatial features of input data; finally the fully connected layer outputs the features. KOA is introduced to enhance the feature extraction performance of CNN.

Fig. 6
Fig. 6
Full size image

KOA-CNN-BiGRU-Attention-based spatial–temporal feature extraction framework.

In the convolutional layer, the convolutional kernel is summed by multiplying with the corresponding positions of input data matrix \(\gamma = \left[ {x_{1} , \cdots \cdots ,x_{m} } \right]\), respectively. Then, the spatial features of the data are extracted by means of a sliding window, the corresponding mathematical formulas is given as follows:

$$ F_{i} = \rho \left( {W_{i} * \gamma + b_{i} } \right) $$
(12)

where \(F_{i}\) is the ith convolutional layer output;\(\rho\) is the nonlinear activation function ReLU; \(W_{i}\) is the ith convolutional kernel;\(b_{i}\) denotes the bias term in convolutional layer.

The role of pooling layer is mainly used to reduce the amount of data processing. After the pooling layer, it is possible to reduce the amount of data to be processed and to preserve the characteristic information of the input signals. The corresponding mathematical formulas is given as follows:

$$ S_{i/\max } = \max pool\left[ {F_{i} } \right] $$
(13)

where \(\max pool[]\) indicates a maximum pooling operation, \(S_{i/\max }\) is the ith pooling layer output.

The fully-connected layer is a fully-connected layer of high-level features obtained from multiple convolutional layers and multiple pooling layers. The corresponding mathematical formulas is given as follows:

$$ \zeta_{i} = {\text{Re}} LU\left( {\xi_{i} *S_{i/\max } + \eta_{i} } \right) $$
(14)

where \(\eta_{i}\) denotes the bias term in fully-connected layer, \(\xi_{i}\) is the weight value, \(\zeta_{i}\) is the ith fully-connected layer output.

BiGRU-based temporal feature extraction

As shown in Fig. 6, BiGRU-based temporal feature extraction framework is constructed. The BiGRU model including update gate, reset gate and new candidate state, is a recurrent neural network, which consists of two independent GRU units. Of noted, one GRU aims at processing the data in the forward direction of the time series and the other GRU is to process the data in the reverse direction of the time series. With above bi-directional structure, the constructed BiGRU model is able to capture both forward and backward information of sequence data to extract the temporal feature in the sequence. The calculation procedure of temporal feature extraction for BiGRU model is given as follows.

$$ h_{{\text{t/1}}} = \chi \left( {\Psi_{o} \cdot \left[ {\begin{array}{*{20}c} {h_{t - 1/1} } & {x_{t} } \\ \end{array} } \right] + \Phi_{o} } \right) $$
(15)
$$ h_{t/2} = \chi \left( {\Psi_{\mu } \cdot \left[ {\begin{array}{*{20}c} {h_{t - 1/2} } & {x_{t} } \\ \end{array} } \right] + \Phi_{\mu } } \right)t - 1 $$
(16)
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{h}_{t} = \left[ {\begin{array}{*{20}c} {h_{t/1} } & {h_{t/2} } \\ \end{array} } \right] $$
(17)

where \(h_{{\text{t/1}}}\) is hidden status in the forward at \(t\),\(\chi\) is activation function, \(\Psi_{o}\) is weight matrix in the forward,\(\Psi_{\mu }\) is weight matrix in backward,\(x_{t}\) is data input, \(\Phi_{o}\) is the bias parameter in the forward, \(h_{t - 1/1}\) is hidden status in the forward at \(t - 1\),\(h_{{\text{t/2}}}\) is hidden status in the backward at \(t\),\(\Phi_{\mu }\) is the bias parameter in the backward, \(h_{t - 1/2}\) is hidden status in the backward at \(t - 1\),\(\Phi_{o}\) is the output of BiGRU model.

In order to enhance the feature extraction performance of CNN-BiGRU, KOA is applied to optimize three important parameters in the CNN-BiGRU model: learning rate, number of iterations and number of neurons. The settings of these parameters will directly affect the performance and accuracy of the model.The detailed process of KOA is given as follows.

The algorithm is initialized as28:

$$ \left\{ \begin{gathered} X_{i} = X_{low} + rand(X_{up} - X_{low} ) \hfill \\ \kappa_{i} = rand_{{\left[ {0,1} \right]}} \hfill \\ \lambda_{i} = \left| j \right|,i = 1, \cdots ,m \hfill \\ \end{gathered} \right. $$
(18)

where \(X_{i}\) denotes the ith planet,\(X_{low}\) and \(X_{up}\) denote the lower and upper bounds, \(rand_{{\left[ {0,1} \right]}}\) is a random number between 0 and 1, \(\kappa_{i}\) is the eccentricity, \(\lambda_{i}\) is the orbital period,\(j\) a normally distributed random number.

The velocity \(V_{i}\) calculation of KOA is given as:

$$ V_{{\text{i}}} = \left\{ \begin{gathered} \begin{array}{*{20}c} {\smallint_{1} \times \smallint_{4} + \smallint_{2} \times \smallint_{5} + \smallint_{3} \times \smallint_{9} \times \smallint_{10} \times {\vec{\text{j}}}_{5} \times \smallint_{7} } & {\begin{array}{*{20}c} {if} & {1 - \varphi_{in} } \\ \end{array} } \\ \end{array} \hfill \\ \begin{array}{*{20}c} {j_{4} \times \left[ {\eta \times \left( {\Re_{1} + \Re_{2} } \right)\left| {\frac{2}{{\Re_{3} + \varsigma }} - \frac{1}{{\Re_{4} + \varsigma }}} \right|} \right]^{\frac{1}{2}} \times \smallint_{6} + \smallint_{3} \times \smallint_{9} \times \smallint_{11} \times {\vec{\text{j}}}_{5} \times \smallint_{8} } & {\begin{array}{*{20}c} {else} & {} \\ \end{array} } \\ \end{array} \hfill \\ \end{gathered} \right. $$
(19)

with

$$ \smallint_{1} = \vec{L} \times \left( {{\text{j}}_{3} \times \left( {1 - {\text{j}}_{4} } \right) + {\text{j}}_{4} } \right) \times \left[ {\eta \times \left( {\Re_{1} + \Re_{2} } \right)\left| {\frac{2}{{\Re_{3} + \varsigma }} - \frac{1}{{\Re_{4} + \varsigma }}} \right|} \right]^{\frac{1}{2}} $$
$$ \smallint_{2} = \left( {1 - \vec{L}} \right) \times \left( {j_{3} \times \left( {1 - \vec{j}_{5} } \right) + \vec{j}_{5} } \right) \times \left[ {\eta \times \left( {\Re_{1} + \Re_{2} } \right)\left| {\frac{2}{{\Re_{3} + \varsigma }} - \frac{1}{{\Re_{4} + \varsigma }}} \right|} \right] $$
$$ \smallint_{3} = \left( {1 - \varphi_{{{\text{in}}}} } \right),\smallint_{4} = 2j_{4} \vec{X}_{i} - \vec{X}_{a} ,\smallint_{5} = \vec{X}_{a} - \vec{X}_{b} $$
$$ \smallint_{6} = \vec{X}_{{\text{a}}} - \vec{X}_{{\text{i}}} ,\smallint_{7} = \vec{X}_{{{\text{up}}}} - \vec{X}_{{{\text{low}}}} ,\smallint_{8} = j_{3} \vec{X}_{{{\text{up}}}} - \vec{X}_{{{\text{low}}}} $$
$$ \vec{L} = \left\{ \begin{gathered} \begin{array}{*{20}c} 0 & {\vec{j}_{5} \le \vec{j}_{6} } \\ \end{array} \hfill \\ \begin{array}{*{20}c} 1 & {Else} \\ \end{array} \hfill \\ \end{gathered} \right.,\smallint_{9} = \left\{ \begin{gathered} \begin{array}{*{20}c} 1 & {j_{4} \le 0.5} \\ \end{array} \hfill \\ - \begin{array}{*{20}c} 1 & {Else} \\ \end{array} \hfill \\ \end{gathered} \right.,\smallint_{10} = \left\{ \begin{gathered} \begin{array}{*{20}c} 0 & {\vec{j}_{5} \le j_{4} } \\ \end{array} \hfill \\ \begin{array}{*{20}c} 1 & {Else} \\ \end{array} \hfill \\ \end{gathered} \right.,\smallint_{11} = \left\{ \begin{gathered} \begin{array}{*{20}c} 0 & {j_{3} \le j_{4} } \\ \end{array} \hfill \\ \begin{array}{*{20}c} 1 & {Else} \\ \end{array} \hfill \\ \end{gathered} \right. $$

where and are random numbers between 0 and 1, and are random vectors between 0 and 1, and are solutions, which are selected at random from the population, is the universal gravitational constant, anddenote the mass, denotes the distance,is the the semimajor axis,  denotes a small value, is the normalizing the Euclidian distance.

Updating positions and distances is presented as

$$ \vec{X}_{{{\text{i}}/t + 1}} = \vec{X}_{{{\text{i}}/t}} + \smallint_{9} \times \vec{V}_{{{\text{i}}/t}} + \left( {\lambda + \left| j \right|} \right) \times \vec{L} \times \left( {\vec{X}_{s} - \vec{X}_{{{\text{i}}/t}} } \right) $$
(20)
(21)

where \(\vec{X}_{{{\text{i/}}t + 1}}\) is the next momentary position of \(\vec{X}_{{{\text{i/}}t}} ,\,\vec{X}_{s}\) is the sun, \(\lambda\) is the universal law of gravitation.

The elitist strategy of KOA is given as

$$ \vec{X}_{{{\text{i,new}}/t + 1}} = \left\{ \begin{gathered} \begin{array}{*{20}c} {\vec{X}_{{{\text{i}}/t + 1}} } & {{\text{if }}f\left( {\vec{X}_{{{\text{i}}/t + 1}} } \right) \le f\left( {\vec{X}_{{{\text{i}}/t}} } \right)} \\ \end{array} \hfill \\ \begin{array}{*{20}c} {\vec{X}_{{{\text{i}}/t}} } & {{\text{else}}} \\ \end{array} \hfill \\ \end{gathered} \right. $$
(22)

By using the above KOA, the related parameters of CNN-BiGRU model, such as learning rate, number of iterations and number of neurons can be optimized. Of noted, detailed algorithmic calculations and benefits are given in28.

Attention-based detection module

Based on the output results in BiGRU model, the attention score is calculated as

$$ \Delta_{{e_{i} }} = a\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{h}_{t} } \right) $$
(23)

where \(a\) denotes the Query.

The attention weight is calculate as

$$ \Delta_{{\alpha_{i} }} = \frac{{\exp \left( {\Delta_{{e_{i} }} } \right)}}{{\sum\limits_{j = 1}^{n} {\left( {\Delta_{{e_{j} }} } \right)} }} $$
(24)

Then, we can get the weighted sum as

$$ \Delta_{c} = \sum\limits_{i = 1}^{n} {\Delta_{{\alpha_{i} }} } \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{h}_{t} $$
(25)

In a sum, the model prediction output can be obtained as

$$ y_{pre} = \sum\limits_{i = 1}^{T} {SoftMax\left( {\Delta_{c} } \right)} $$
(26)

Since the FDA detection task is a binary classification task, the cross-entropy loss function is given as:

$$ Loss = - \frac{1}{T}\sum\limits_{i = 1}^{T} {\left[ {y_{i} \log \left( {\hat{y}_{pre} } \right) + \left( {1 - y_{i} } \right)\log \left( {1 - \hat{y}_{pre} } \right)} \right]} $$
(27)

Remark 1

Attention mechanisms are used to dynamically assign weights, highlight important features and suppress irrelevant features. There can be a large amount of noise or redundant information in power system data, and attention mechanisms can help models focus on key features of anomalous data, improving detection accuracy.

Algorithm 1
Algorithm 1
Full size image

Detection process of FDAs using KOA-CNN-BiGUR-Attention.

KOA-CNN-BiGRU-attention-based detection framework

By constructing KOA-CNN and BiGRU-Attention model, a detection framework using spatial–temporal features is developed, as shown in Fig. 7. The detailed process are shown as follows.

Fig. 7
Fig. 7
Full size image

FDA attack detection model using KOA-CNN-GRU-Attention network.

Step 1: Establish the CNN-based spatial features extraction model;

Step 2: Optimize the related parameters of CNN model by using KOA,

Step 3: Construct the BiGRU-based temporal features extraction model;.

Step 4: Input power measurement to train the above model offline based on KOA-CNN-BiGRU-Attention .

Step 5: Apply KOA-CNN-BiGRU-Attention-based detection model to identify normal and abnormal data online.

Based on the above detection step, the detection algorithm against FDAs is summarized in the Algorithm 1.

Case studies

In this section, simulation tests are provided to demonstrate the performance of the constructed spatial-temporal detection framework against FDAs. Compared with the existing detection models, such as GCN and GGNN-GAT , simulation studies on IEEE 14-bus and IEEE 118-bus power system verify the superiority of the proposed detection model.

Simulation and data setup

The simulation environment in this paper is realized on a desktop computer based on Matlab 2023a with the following configuration: i9-13900HX, 2.20 GHz, 16G RAM, and NVIDIA TITAN RTX 4060 graphics card. KOA: the population size: 20, maximum iteration times: 100; the optimized parameters are 0.00408136, 4, 86. CNN: Convolution layers: 2; Convolutional kernel size: 3x1; Activation function: ReLU. BiGRU: Number of hidden units: 128; Number of layers: 2; Dropout rate: 0.2. Attention dimension: 64. Optimizer: Adam; Learning rate: 0.001;gamma=2.

Consistent with work in29, we employed the publicly accessible data released by NYISO to create the dataset. Leveraging the 2022 user load fluctuation pattern of New York, we utilized linear interpolation to expand the dataset with a time interval of five minutes. Following this, we standardized the obtained dataset in relation to the capacities of the grid systems under consideration. Subsequently, we linked the expanded dataset to each of the aforementioned two grid systems and utilized the Matpower 7.0 toolbox to conduct power flow analyses. The specific division of the training and test datasets is shown in Table 1.

Table 1 Training and Test Datasets Division.

Evaluation indicators

To evaluate the performance of the detection model, evaluation indicators (Accuracy, Missed Alarm, Precision, \(F_{1}\)-Score, and Recall Alarm) are selected30. Then, the corresponding mathematical expressions are given as follows.

$$ Acc = \frac{{\pi_{TN} + \pi_{TP} }}{{\pi_{TN} + \pi_{TP} + \pi_{FN} + \pi_{FP} }} $$
(28)
$$ Ma = \frac{{\pi_{FN} }}{{\pi_{TP} + \pi_{FN} }} $$
(29)
$$ \Pr = \frac{{\pi_{TP} }}{{\pi_{TP} + \pi_{FP} }} $$
(30)
$$ Ra = \frac{{\pi_{TP} }}{{\pi_{TP} + \pi_{FN} }} $$
(31)
$$ F_{1} /S = \frac{2\Pr \times Ra}{{\Pr + Ra}} $$
(32)

where \(\pi_{FN}\) denotes the number of normal data in the data judged to be abnormal, \(\pi_{FP}\) denotes the number of abnormal data in the data judged to be normal, \(\pi_{TP}\) denotes the number of normal data in the data judged to be normal, \(\pi_{TN}\) denotes the number of abnormal data in the data judged to be abnormal, \(Acc\), \(Ma\), \(\Pr\),\(Ra\) and \(F_{1} /S\) denote the accuracy, missed alarm, precision, recall and \(F_{1}\)-Score, respectively.

Ablation analysis

To evaluate the contributions of KOA, CNN, BiGRU, and Attention modules to KOA-CNN-BiGRU-Attention model performance, we designed the following ablation experiments: In this section, the full KOA-CNN-BiGRU-Attention model is established as the benchmark. The performance of the following ablation models is considered:Model 1: Removing KOA and using default hyperparameters. Model 2: Removing CNN and utilizing only BiGRU Attention. Model 3: Removing BiGRU and utilizing only CNN Attention. Model 4: Removing Attention and using only CNN BiGRU. Model 5: Removing both CNN and BiGRU, utilizing only Attention. The comparative results are presented, as shown in Fig. 8.

Fig. 8
Fig. 8
Full size image

Results of the ablation experiments for each module of the KOA-CNN-BiGRU-Attention model.

As shown in Fig. 8, the contributions of each module to the overall performance can be summarized as follows. After removing KOA, the model performance slightly dropped (F1 score decreased from 96.45% to 94.28%), suggesting that KOA contributes to some degree in hyperparameter optimization, albeit with a relatively minor impact. Upon removing CNN, the model performance significantly declined (F1 score dropped from 96.45% to 90.65%), revealing CNN's vital role in extracting local features. The removal of BiGRU led to a decrease in model performance (F1 score fell from 96.45% to 89.53%), indicating that BiGRU is important for capturing time series dependencies. With Attention removed, the model performance suffered a slight drop (F1 score decreased from 96.45% to 92.19%), hinting that the Attention mechanism aids in feature weighting and noise suppression to a certain extent. When both CNN and BiGRU were removed, the model performance drastically declined (F1 score fell from 96.45% to 83.15%), underlining the absolute cruciality of the combination of CNN and BiGRU for model performance.

Detection performance under different detection models

In this section, the accuracy of proposed detection model under KOA is analyzed, as shown in Fig. 9. In addition, compared with other detection models, such as particle swarm optimization (PSO)-CNN-BiGRU-Attention, group-fusion CNN (GFCNN) and CNN, accuracy and loss comparison during training are selected as evaluation indicators under the same conditions. The comparison of the accuracy and loss of each model are shown in Table 2.

Fig. 9
Fig. 9
Full size image

Accuracy analysis of proposed detection model under KOA.

Table 2 Comparison of the accuracy and loss of each model.

Obviously, the selection of optimal hyperparameters in detection model can affect the detection performance against malicious attacks. As the error of the fitness function gets smaller, the KOA can obtain the optimal hyperparameters of the detection model, as shown in Fig. 9. Based on this, the corresponding accuracy of proposed detection model can be enhanced. In other words, the introduction of KOA for the proposed detection model can improve the detection performance against attacks.

Simulation results in Table 2 demonstrate that the proposed detection model has the highest accuracy and the lowest value of the loss function, indicating that it has the best overall performance. Compared with PSO-CNN-BiGRU-Attention, the KOA can further improve the ability of the model to generalize by optimizing the hyper-parameters. In addition, compared with works, such as GFCNN and CNN, the proposed detection model have significantly improved the model's ability to capture time series features. However, in comparison with standard CNN, it can be seen that GFCNN can further improve the detection performance of anomalous data.

Detection performance with evaluation indicators under different detection models

To evaluate the detection performance against FDAs under different detection models on IEEE 14-bus and 118-bus grid system, evaluation indicators (accuracy, missed alarm, precise, \(F_{1}\)-score, and recall) are conducted, as shown in Table 3-Table 4. In addition, the corresponding confusion matrices are presented, as shown in Fig. 10 and Fig. 11.

Table 3 Comparison results of evaluation indicators on IEEE 14-bus grid system.
Table 4 Comparison results of evaluation indicators on IEEE 118-bus grid system.
Fig. 10
Fig. 10
Full size image

The confusion matrices under different detection models on IEEE 14-bus grid system.

Fig. 11
Fig. 11
Full size image

The confusion matrices under different detection models on IEEE 118-bus grid system.

The simulation experiments presented in Table 3 reveal that the accuracy rates of the proposed detection model, GCN, and GGNN-GAT are 98.73%, 95.99%, and 96.35%, respectively. It is evident that the accuracy rate of the proposed detection model surpasses those of the other models. Additionally, the precision rates for the proposed detection model, GCN, and GGNN-GAT are 98.85%, 95.23%, and 97.03%, respectively. Furthermore, the recall rate and F1 score of the proposed detection model exhibit improvements of 11.34% and 4.15%, respectively. Consequently, when compared to the works in30 and29, the detection performances of the proposed detection model surpass those of other detection models on the IEEE 14-bus power system. Moreover, the comparison results on the IEEE 118-bus grid system in Table 3 further underscore the enhanced detection performance of the proposed model. Specifically, the evaluation indicators such as accuracy rate, precision rate, recall rate, and F1 score have improved by at least 1.49%, 1.15%, 11.24%, and 8.4%, respectively. In summary, the utilization of the KOA-CNN-BiGRU-Attention framework for spatio-temporal detection enhances detection performance compared to GCN and GGNN-GAT detection models.

Figures 10 and 11 show the comparison results of the confusion matrices for different detection models. As seen in Figs. 10 and 11, the proposed detection model can identify the abnormal data better than other detection models on the IEEE 14-bus and 118-bus grid system. Compared with GCN and GGNN-GAT detection models, the proposed detection model accurately identifies positive and negative samples and has fewer false positives and false negatives in the prediction, which indicates the effectiveness of the proposed KOA-CNN-BiGRU-Attention loss for improving the binary classification task distinguishability.

From the above experiments, we can conclude that the proposed detection model using KOA-CNN-BiGRU-Attention has a powerful detection performance, which makes the proposed detection model outperform the compared methods in both test accuracy for detecting FDAs.

Detection performance with variable attack intensity and ROC under different detection models

To evaluate the robustness of the proposed detection model, we further select the variable attack intensity and ROC as indicators on IEEE 14-bus system. As shown in Fig. 12, the detection rate under different attack strengths for proposed detection model, GCN and GGNN-GAT are tested. And ROC under all detection models are shown in Table 5.

Fig. 12
Fig. 12
Full size image

Detection rate under different attack strengths under different detection models.

Table 5 Comparison results of ROC under different detection models.

Figure 11 demonstrates the detection rate under different attack strengths under different detection models. Obviously, detection rates become more accurate as the intensity of the attack increases. However, the detection rate under proposed detection model is higher than that of other detection models under the same attack strength. Compared with other detection models, simulation experiments in Table 5 indicate that the ROC of proposed detection model for FDAs detection is best. It is concluded that our proposed detection framework can achieve high detection rate and robustness for FDAs in power grids.

Conclusions and discussion

In this paper, we construct a spatio-temporal detection framework for FDAs in power networks. Firstly, a CNN-BiGRU-based network model is established, which can effectively extract the spatio-temporal features of power data. Secondly, the KOA and attention mechanisms incorporated in the constructed model can effectively improve the detection performance against FDAs. Finally, compared with GCN and GGNN-GAT detection models, the accuracy, precision, score and recall of the proposed detection model can be improved on IEEE 14-bus and 118-bus grid system. What's more, simulation experiments have verified that our detection models for FDAs detection show robustness to attack strength.

However, there remain certain deficiencies that serve as incentives for further research in the future.

  1. 1.

    The KOA-CNN-BiGRU-Attention model combines several algorithms and techniques, resulting in a highly complex model. Complex models based on standard model combinations will influence the detection performance against FDAs. For example, Table 2 indicates that introducing GFCNN23 can further optimize the detection model.

  2. 2.

    Despite the excellent performance of the KOA-CNN BiGRU Attention model in several applications, its generalisation ability remains a concern. In some cases, the model may over-rely on specific patterns or features in the training data, resulting in poor performance on unseen data. To improve the generalisation ability of the model, it may be necessary to employ more regularisation techniques, data augmentation strategies or ensemble learning methods.