An incremental adversarial training method enables timeliness and rapid new knowledge acquisition

Ge, Yuxin; Dong, Yanhua; Sun, Hongyu; Liu, Yuetong; Wang, Chengli

doi:10.1038/s41598-025-19840-8

Download PDF

Article
Open access
Published: 14 October 2025

An incremental adversarial training method enables timeliness and rapid new knowledge acquisition

Yuxin Ge¹,
Yanhua Dong¹,
Hongyu Sun¹,
Yuetong Liu¹ &
…
Chengli Wang¹

Scientific Reports volume 15, Article number: 35826 (2025) Cite this article

1558 Accesses
Metrics details

Subjects

Abstract

Adversarial training is an effective defense method for deep models against adversarial attacks. However, current adversarial training methods require retraining the entire neural network, which consumes a significant amount of computational resources, thereby affecting the timeliness of deep models and further hindering the rapid learning process of new knowledge. In response to the above problems, this article proposes an incremental adversarial training method (IncAT) and applies it to the field of brain computer interfaces (BCI). Within this method, we first propose a deep model called Neural Hybrid Assembly Network (NHANet) and then train it. Then, based on the original samples and the trained deep model, calculate the Fisher information matrix to evaluate the importance of deep neural network parameters on the original samples. Finally, when calculating the loss of adversarial samples and real labels, an Elastic Weight Consolidation (EWC) loss is added to limit the variation of important weights and bias parameters in the Neural Hybrid Assembly Network (NHANet). The above incremental adversarial training method was applied to the publicly available epilepsy brain computer interface dataset at the University of Bonn. The experimental results showed that when facing three different attack algorithms, including fast gradient sign method (FGSM), projected gradient descent (PGD) and basic iterative method (BIM), the method proposed in this paper achieved robust accuracies of 95.33%, 94.67%, and 93.60%, respectively, without affecting the accuracy of clean samples, which is 5.06%, 4.67%, and 2.67% higher than traditional training methods respectively, thus fully verifying the generalization and effectiveness of the method.

A multi-layered defense against adversarial attacks in brain tumor classification using ensemble adversarial training and feature squeezing

Article Open access 14 May 2025

Defense against adversarial attacks: robust and efficient compressed optimized neural networks

Article Open access 17 March 2024

A hybrid local-global neural network for visual classification using raw EEG signals

Article Open access 08 November 2024

Introduction

In recent years, deep neural networks have achieved significant success in fields such as brain computer interfaces^1,2, object detection^3,4,5, texture recognition⁶, image classification^7,8,9,10,11, etc. However, Szegedy et al.¹² revealed the existence of adversarial samples in deep learning models, making them exceptionally vulnerable to adversarial attacks. Attackers only need to add small perturbations generated by specific algorithms to clean samples, and deep neural networks can output erroneous classification results with high confidence^13,14. For example, in the process of neural rehabilitation, if attackers add small perturbations to electroencephalogram (EEG) signals, deep neural networks may misunderstand the patient’s intentions due to adversarial attacks, leading to treatment failure or adverse reactions. Therefore, the robustness and security issues of deep learning models have received widespread attention and research from both academia and industry.

To address the vulnerability of deep learning models to adversarial samples, researchers have developed various defense methods to enhance model robustness. Among them, adversarial training (AT) is considered one of the most effective. The core idea is to introduce carefully designed adversarial samples into the training set, so as to have stronger resistance to interference and disturbance. Madry et al.¹⁵ proposed an adversarial training method using projected gradient descent, which effectively improves the model’s ability to resist adversarial attacks. However, its multi-step perturbation process requires high computational resources and time, which to some extent limits the practicality of this method. To address this problem, researchers have proposed an alternative method—fast adversarial training¹⁶, which only uses the one-step fast gradient sign method to generate training data. However, this rapid adversarial training method has a significant drawback, which is that it can easily lead to overtraining and overfitting of the model on the training data, resulting in poor performance when faced with new and unseen data. To alleviate this problem, Rice et al.¹⁷ proposed an early stopping version of projection gradient descent adversarial training. Unlike traditional adversarial training methods, this method introduces a stop criterion to avoid the degradation of model performance due to overtraining of adversarial samples. In addition, Zhang et al.¹⁸ proposed an adversarial training method called Trades, which aims to achieve an ideal balance between clean sample accuracy and robust accuracy by optimizing the loss function.

Although existing adversarial training methods improve model robustness and security to some extent, they suffer from notable limitations:(1) retraining the original and adversarial samples not only increases the complexity of model training, but also reduces the timeliness of the model, especially in application scenarios that require fast iteration and response. (2) When faced with large-scale datasets, these adversarial training methods significantly increase the demand for computing resources, which may become a bottleneck in resource constrained environments. To address the aforementioned problems, this paper proposes an incremental adversarial training method (IncAT). This method first uses clean samples and a pre trained Neural Hybrid Assembly Network (NHANet) to calculate the Fisher information matrix. The weights with higher Fisher information matrix values are considered more critical for clean samples. Therefore, in the learning process of adversarial samples, if these weights are significantly updated, they will receive greater punishment. Then, adversarial samples are generated during the training phase. Finally, by introducing a quadratic penalty term in the loss function, the significant changes in weights during the learning of adversarial sample features in deep learning models can be alleviated. This strategy not only enables deep learning models to learn features of adversarial samples, but also maintains memory of clean samples, thereby improving the robustness and generalization ability of the model. The main contributions of this paper on incremental adversarial training are as follows:

(1)
To address the issues of insufficient feature extraction and poor generalization ability of existing deep learning models for brain-computer interfaces in complex scenarios, this paper proposes a hybrid neural network, NHANet. This model integrates the advantages of multiple deep learning modules, aiming to more effectively process time-series data and capture long-term dependencies as well as complex spatial features. This innovation not only significantly improves the model’s performance in complex environments but also provides new ideas and valuable practical experience for the application of deep learning in the field of brain-computer interfaces.
(2)
In response to the security risks of adversarial attacks faced by deep learning models in BCI application scenarios, this study conducts adversarial attacks on the trained NHANet model. The aim is to conduct a multi-dimensional performance evaluation to deeply analyze the impact of adversarial perturbations on the feature representation ability and classification decision stability of deep learning models in the BCI field, thereby revealing the importance and urgency of enhancing the robustness of deep learning models in BCI applications.
(3)
This paper introduces the incremental adversarial training method for the first time. This approach utilizes adversarial examples to continuously adjust the parameters of the baseline model, thereby enhancing the robustness and security of deep learning models and avoiding the problem that traditional adversarial training methods require retraining the entire network. In addition, to further verify the effectiveness of the proposed method, the robust accuracy is introduced as an evaluation index to reflect the ability of the deep learning model to resist adversarial attacks after adversarial training.
(4)
The proposed method was extensively tested on the publicly available epilepsy BCI dataset from the University of Bonn. The experiments demonstrated that the proposed method outperformed traditional adversarial training methods in terms of accuracy on clean samples and robustness accuracy.

Methods

The incremental adversarial training method proposed in this article is designed as shown in Fig. 1. Firstly, we train the NHANet model to help it better understand the underlying patterns in the data. Then, we carry out adversarial attacks on all the original data to generate adversarial samples. Next, the EWC loss term is introduced when calculating the adversarial sample loss function, and the total loss function is constructed based on this. Finally, utilizing the backpropagation mechanism, the model parameters are adjusted based on the total loss function to enhance the deep model’s ability to resist adversarial attacks.

Neural hybrid assembly network architecture design

This article proposes a hybrid neural network model called NHANet, which integrates various cutting-edge deep learning techniques such as convolutional neural networks, bidirectional long short-term memory networks, multi head attention mechanisms, residual connections, and fully connected layers. The goal is to fully utilize the strengths of different neural networks in processing specific data, thereby enhancing the performance of deep models in complex EEG signal recognition tasks. The specific network framework is shown in Fig. 2.

Firstly, add a channel dimension to the preprocessed data to ensure it meets the input requirements of the one-dimensional deep convolution module. Then, in the one-dimensional deep convolution module, 64 convolution kernels of size 3 are used to convolve along the time axis of the original signal, and ReLU activation function is used to increase nonlinearity. Finally, a maximum pooling layer with a size of 2 is used for downsampling to reduce data redundancy. Although the deep convolution module can obtain local features of EEG signal data, it cannot capture long-term dependencies of time series data. Therefore, a bidirectional LSTM layer was introduced to compensate for this deficiency.

In the NHANet model, by introducing a bidirectional LSTM layer, its bidirectional recursive structure is utilized to conduct bidirectional feature extraction of time series data, thereby effectively capturing the long-term dependencies in the signal sequence and further improving the representation ability of the deep learning model for the dynamic features of electroencephalogram signals.

Specifically, there are 32 hidden units in each direction of the bidirectional LSTM layer, which work together to capture complex features in the input sequence. Although the BiLSTM layer can effectively process time series data, it still has limitations in capturing the global dependencies of the entire sequence. To further improve the performance of deep learning models, we added a multi head attention mechanism module after the BiLSTM layer.

By introducing this mechanism, deep learning models can focus on different parts of input data in parallel, significantly improving their ability to capture key information in sequences. Inside the multi head attention module, 16 attention heads work in parallel, each with 64 embedding dimensions, capable of independently focusing on different aspects of information. X = [X₁, X₂ ,...,X_n] is the matrix output by the BiLSTM module, which is mapped to three vector spaces Q (query), K (key), and V (value) through linear transformation. The specific formula is shown as follows:

$$\begin{array}{*{20}c} {Q = XW^{Q} } \\ \end{array}$$

(1)

$$\begin{array}{*{20}c} {K = XW^{K} } \\ \end{array}$$

(2)

$$\begin{array}{*{20}c} {V = XW^{V} } \\ \end{array}$$

(3)

where ${W}^{Q}$, ${W}^{K}$, ${W}^{V}$ is the weight matrix and the three are trainable parameter matrices. In the multi head attention mechanism, each head independently calculates attention weights and generates multiple attention outputs in parallel. The formula is as follows:

$$\begin{array}{*{20}c} {Attention\left( {Q,K,V} \right) = softmax\left( {\frac{{QK^{T} }}{{\sqrt {d_{k} } }}} \right)V} \\ \end{array}$$

(4)

and finally the outputs of all attention are concatenated using the concat function.

However, as the number of neural network layers increases, training may encounter problems such as vanishing or exploding gradients. To address this challenge, we incorporated a residual connection mechanism into the deep learning model, directly connecting the input and output of the multi-head attention mechanism. This cross-layer connection design helps to enhance the flow of gradients within the network, thereby improving the training stability and performance of the model.

Finally, first use a fully connected layer to map the input vector to a hidden space with a dimension of 128, and increase nonlinearity through the ReLU activation function. Then, use another fully connected layer as the output layer to map the representation in this hidden space to the number of corresponding final output categories, thereby completing the classification task.

Adversarial attack based on neural hybrid assembly network

This section focuses on the impact of adversarial attacks on the performance and robustness of NHANet models. By implementing adversarial attacks on deep neural networks, we can gain a deeper understanding of the vulnerability of deep learning models and better design security defense mechanisms to resist the negative impact of adversarial attacks on deep models.

The research on the adversarial attack is divided into the following three parts: first, based on the trained NHANet model with same weight parameters, three algorithms such as fast gradient sign method (FGSM)¹⁹, basic iterative method (BIM)²⁰, and projected gradient descent (PGD)¹⁵ are respectively used to generate the adversarial sample by conducting adversarial attack on all the original sample. Then, the trained deep neural network is used to predict the generated adversarial samples, and the impact on the classification performance of the model is observed by adjusting the epsilons. In addition, by visualizing the raw data and adding perturbed data, we can observe whether the generated adversarial samples have concealment. Finally, in order to further reveal the impact of adversarial attacks on deep models, we also conducted adversarial attacks on common deep learning models. That is, adversarial attacks not only affect the classification effect of NHANet model, but also affect the performance of other deep learning models.

Neural hybrid assembly network incremental adversarial training

In response to the problems of lack of timeliness and consumption of computational resources of traditional adversarial training methods, this paper employs an incremental learning algorithm to continuously learn the generated adversarial samples. Because existing research shows that in the case of limited storage space and computing resources, adopting incremental learning method can not only effectively cope with the challenge of new tasks or data, but also maintain the performance of old tasks.

The framework of the method is shown in Fig. 3. Among them,${\uptheta }_{{\text{i}}} \left( {{\text{i }} \in { }1,2,3,...{\text{N}}} \right)$ is the neural network parameters, N is the number of neural network parameters, F is the Fisher information matrix, $\uplambda$ is a hyper parameter to measure the importance of the original sample relative to the adversarial sample, ${L}_{B}(\theta )$ is the loss function of the adversarial sample dataset, and ${\theta }_{A,i}^{*}$ is the original model parameters.

First, the NHANet deep learning model is trained to enhance its predictive performance for EEG signals. After the training is completed, the model weights are saved. Then, based on the original dataset samples and the parameters of the original NHANet deep learning model, the first derivatives of the NHANet deep learning model output about the neural network parameters are calculated, and the Fisher information matrix is constructed. The importance of the neural network parameters on the original samples can be reflected by the Fisher information matrix. Among them, the larger FIM value represents the higher importance of the parameters in the original dataset. Finally, during the adversarial training process, all original data is attacked to generate adversarial samples. When calculating the loss between the predicted results of adversarial samples and the true labels, an additional EWC loss term is added to limit the changes in important weights and bias parameters in the NHANet hybrid model. The specific total loss value is shown as follows:

$$\begin{array}{*{20}c} {L\left( \theta \right) = L_{B} \left( \theta \right) + \mathop \sum \limits_{i} \frac{\lambda }{2}F_{i} \left( {\theta_{i} - \theta_{A,i}^{*} } \right)^{2} } \\ \end{array}$$

(5)

For parameters that are more important in the original dataset, a greater penalty value will be assigned during the update process to ensure that they are less prone to significant changes. Therefore, when using adversarial training methods based on incremental learning to improve the robustness of deep learning models, it not only firmly grasps existing knowledge but also flexibly responds to new challenges, maintaining its robustness and adaptability in a dynamic environment.

Experiments

To verify the effectiveness of the proposed method and the classification performance of the deep learning model, we conducted a systematic experiment on the epilepsy dataset. Firstly, we constructed the neural hybrid assembly network NHANet, which achieved efficient feature extraction and high-precision classification in complex scenarios through a multi-module collaborative mechanism. Secondly, three typical adversarial attack algorithms, FGSM, BIM, and PGD, were used to conduct adversarial attacks on the trained NHANet, aiming to illustrate the impact of adversarial attacks on deep learning models. Finally, we introduced the incremental adversarial training method to enhance the model’s defense performance and compared it with existing adversarial training methods to verify the effectiveness and generalization of this method.

Experimental design

Dataset

This article selects the epilepsy dataset publicly available from the University of Bonn²¹ to verify the effectiveness of incremental adversarial training method. It consists of five categories, each containing 100 channel sequences with a duration of 23.6 seconds and 4097 signal sampling points. To further improve model performance and accelerate convergence, we performed a series of preprocessing operations on the dataset. To address the issue where high data dimensionality might increase computational burden, we implemented dimensionality reduction on the data to enhance efficiency while retaining key information. Considering that the limited size of the original dataset could easily lead to overfitting due to insufficient training samples, we expanded the dataset by synthesizing new samples. Additionally, we standardized and normalized the data, and converted non-numerical labels into numerical encodings to meet the requirements of deep learning models.

Experiment details

The experiment is implemented based on the PyTorch deep learning framework, and the dataset is split into a training set and a test set in a 7:3 ratio. The optimizer uses adaptive momentum estimation, and the Dropout value is set to 0.5, the batch-size is set to 32, and the learning rate is set to 0.0003, and the number of heads in the multi-head attention mechanism is set to 16. When conducting incremental adversarial training, $\uplambda$ is set to 1e−5.

Evaluation metrics

Model evaluation metrics

In order to illustrate the effectiveness and stability of the deep learning model, accuracy, precision, recall, and F1-score are introduced as evaluation metrics.

Attack evaluation metrics

This study uses four indicators, namely adversarial accuracy, attack success rate, average L₁ distance, and average L₂ distance, to evaluate the impact of adversarial attacks on deep learning models.

Adversarial accuracy refers to the accuracy of a classification model on adversarial samples. It is measured by calculating the proportion of adversarial samples where the predicted labels match the true labels. The higher the adversarial accuracy, the stronger the ability of the deep learning model to resist adversarial attacks. Conversely, the lower the adversarial accuracy, the weaker the ability of the model to resist adversarial attacks.

The attack success rate is used to measure the attack effect of adversarial samples on the target model. This metric reflects the effectiveness of adversarial attacks. The closer the ASR value is to 1, the stronger the attack capability is²². The specific formula is shown as follows:

$$\begin{array}{*{20}c} {I_{ASR} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {F\left( {x_{i} } \right) = real_{i} } \right) \wedge \left( {F\left( {x_{i} + \delta_{i} } \right) \ne real_{i} } \right)}}{{\mathop \sum \nolimits_{i = 1}^{N} \left( {F\left( {x_{i} } \right) = real_{i} } \right)}}} \\ \end{array}$$

(6)

where, F (*) is the sample label predicted by the depth model, and ${real}_{i}$ is the true label of the ith sample.

The average L₂ distance is used to measure the degree of difference between adversarial samples and raw samples. The smaller the average L₂ distance, the smaller the perturbation amplitude added to the original sample and the closer it is to the original sample.

The average L₁ distance refers to first calculating the sum of the absolute differences between each generated adversarial sample and the elements of the original sample, then adding the L₁ distances of all samples, and finally dividing by the number of samples. The larger the average L₁ distance, the greater the difference between the generated adversarial samples and the original samples.

Defense evaluation metrics

In order to evaluate the performance of the adversarial training method proposed in this paper, the accuracy, precision, recall and F1-score are used as evaluation metrics in the original data set. On the generated adversarial sample data set, the robust accuracy is used as the evaluation metric. Among them, the robust accuracy refers to the accuracy of the deep learning model in the face of adversarial samples, which reflects the ability of the deep learning model to resist adversarial attacks after adversarial training. The specific formula is shown as follows:

$$\begin{array}{c}Robust{\text{-}}accuracy=\frac{{N}_{corr}}{{N}_{total}}\end{array}$$

(7)

where ${\text{N}}_{\text{corr}}$ is the number of correctly classified adversarial samples, and ${\text{N}}_{\text{total}}$ is the total number of adversarial samples.

Network model analysis

The division of the dataset

In the development of deep learning models, the proportion of dataset division is a crucial step, and its rationality directly affects the training effect of the model, parameter optimization, and generalization ability. Given the small size of the dataset, this experiment only divided it into the training set and the test set. In this experiment, to explore the impact of different division ratios on the model performance, we set two typical schemes with training set-test set ratios of 8:2 and 7:3. The specific results are presented in Tables 1 and 2.

Table 1 Model performance for different Batch-sizes when the dataset is divided in a ratio of 8:2.

Subjects

Abstract

Similar content being viewed by others

A multi-layered defense against adversarial attacks in brain tumor classification using ensemble adversarial training and feature squeezing

Defense against adversarial attacks: robust and efficient compressed optimized neural networks

A hybrid local-global neural network for visual classification using raw EEG signals

Introduction

Methods

Neural hybrid assembly network architecture design

Adversarial attack based on neural hybrid assembly network

Neural hybrid assembly network incremental adversarial training

Experiments

Experimental design

Dataset

Experiment details

Evaluation metrics

Model evaluation metrics

Attack evaluation metrics

Defense evaluation metrics

Network model analysis

The division of the dataset

Model performance analysis

Comparative experiments

Exploring the impact of adversarial attacks on deep model performance

Experimental results and analysis

Further investigation into the impact of adversarial attacks on deep learning models

Neural hybrid assembly network incremental adversarial training

The effectiveness of incremental adversarial training

Variable parameter analysis

Comparison of methods

The universality of incremental adversarial training

Related works

Accelerated adversarial training

Parameter adaptive adversarial training

Semi-supervised or unsupervised adversarial training

Conclusions and future work

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links