Introduction

The impact of deep neural networks (DNNs) spans a wide array of applications, from image classification1 and speech recognition2 to strategic gameplay3,4, protein folding5, semiconductor design6, particle discovery7, and quantum system analysis8,9. Despite their remarkable success, these models have recently come under scrutiny due to their susceptibility to adversarial attacks-subtle perturbations imperceptible to humans that can dramatically degrade performance10,11.

Empirical12,13,14,15,16,17,18,19,20 and theoretical21,22,23,24 studies have shown that these networks can be easily fooled by minor, non-random perturbations, leading to erroneous predictions. A notable example is the Fast Gradient Sign Method (FGSM) attack22 (Fig. 1 ), highlighting the vulnerabilities of DNNs. If such slight disruptions can undermine their performance, the reliability of technologies relying on state-of-the-art deep learning could be at risk. To make the neural networks more robust, the perturbed inputs are often reused as additional training data. These strategies are known as adversarial training23,25,26,27,28,29,30,31, where the robustness is strengthened in sacrifice of the test accuracy, suggesting a fundamental trade-off between accuracy and robustness in neural networks. Therefore, understanding the trade-off between accuracy and robustness in neural networks is crucial, especially given their increasing role in critical decision-making systems, from autonomous vehicles to medical diagnostics and secure authentication systems.

Fig. 1
figure 1

Adding an imperceptible non-random noise to the images, the network will fail to predict the correct label. The trained Googlenet32 on Cifar-1033 gives a 90.78% accuracy on the test set and only 3.67% accuracy on the slightly attacked images. The attack, called FGSM22, is achieved via the transformation \(x = x_{0}+\epsilon \cdot \text {sign}(\nabla _{x}l(f(x,\theta ),Y)| _{x=x_0})\), where \(x_{\text {0}}\) denotes the input images of the training set and and Y is the true label for image \(x_{0}\). x denotes the image to be attacked, \(\epsilon =8/255\) and \(\text {sign}(\nabla _{x}l(f(x,\theta ),Y)| _{x=x_0})\) gives the non-random noise with \(l(f(x,\theta ),Y)\) denoting the loss function. \(\theta\) denotes the weights of the trained network.

In our previous work, we uncovered an intrinsic characteristic of neural networks: their vulnerability shares a mathematical equivalence with the uncertainty principle in quantum physics34,35, which demonstrates the inherent trade-off between the network accuracy and its robustness. This is observed when gradient-based attacks22,36,37,38,39,40 on the inputs are identified as their conjugates. Despite classical approximation theorems asserting that neural networks can approximate continuous functions with arbitrary precision41,42,43,44, we have found that this uncertainty principle is a natural result of minimizing the loss function, which is inevitable. Since modern network structures always include a loss function with respect to some inputs, we can “design” the conjugates by taking the gradient of the loss function with respect to the input variables. These conjugates, when involved in the inputs, can drastically decrease prediction accuracy.

However, the theoretical nature and complexity of our previous findings have made it confusing for readers to fully grasp the implications. To address this, we present a new study in the present work, which aims to elucidate the concept of the “uncertainty principle of neural networks” through a simple example. Here, we focus on a binary classification task where the input is a random number between [−1, 1]. If the number is greater than 0, it is classified as category 1; otherwise, it is classified as category 0. By training a neural network, we investigate the network’s behavior in terms of the loss values and the corresponding frequency domain across different epoch numbers.

Through this straightforward example, we reveal that the complementary principle45, a cornerstone of quantum physics, applies to neural networks, imposing a limit on their simultaneous accuracy and robustness. Furthermore, by integrating the mathematics developed in physics into the study of neural networks, we illuminate new pathways to investigate and explain the various properties of neural networks. This intersection of disciplines provides the potential to enhance our understanding of the theoretical underpinnings of these black-box networks.

Results

Explanation of uncertainty relation in neural networks

In this section, we draw an analogy to the uncertainty principle in signal processing via Fourier transforms46 to provide an intuitive understanding. However, for neural networks, a more rigorous mathematical formalism of the uncertainty relation can be conveniently expressed using Dirac notations from quantum mechanics, which is detailed in the Methods section for interested readers.

Consider a loss function \(l(f(x,\theta ),Y)\) of a trained neural network with a given label \(Y\). We introduce a new function \(\psi _{Y}(x)\), named as neural packet:

$$\begin{aligned} \psi _{Y}(x) = \frac{l(f(x,\theta ),Y)}{\sqrt{\beta }}, \end{aligned}$$
(1)

where the normalization coefficient \(\beta\) ensures that the integral of the squared neural packet over its input space equals one. The Fourier transform converts the function \(\psi _{Y}(x)\) into its conjugate space as:

$$\begin{aligned} {\mathcal {F}}\left\{ \psi _{Y}(x)\right\} = \hat{\psi }_{Y}(p) = \int _{-\infty }^{\infty } \psi _{Y}(x) e^{-i p x} \, dx, \end{aligned}$$
(2)

where \(p\) is the conjugate variable of \(x\), and vise versa.


The uncertainty principle in Fourier transform primarily refers to the resolution limit between the time domain (here in neural networks, it is the domain of the input x) and the frequency domain (i.e., \(p\), which is the Fourier transform of input) of a signal: for a signal \(\psi _{Y}(x)\), the standard deviations in the time domain and frequency domain are constrained by the relation (see derivations in Methods):

$$\begin{aligned} dxdp \ge \frac{1}{2}, \end{aligned}$$
(3)

where the explicit formulas for calculating \(dx\) and \(dp\) can be found in Methods. This uncertainty principle has important applications in both signal processing and quantum mechanics, reflecting the inherent limitations when converting between different domains.

In the context of a neural network, Eq. (3) implies that for a trained network with loss function \(l(f(x,\theta ),Y)\), it is impossible for the network to extract features of both \(x\) and \(p\) with arbitrary high precision simultaneously. Here, the term “feature” is an abstract concept that depends on the specific scenarios. For instance, in classification tasks, inputs belonging to the same category share common features. These inputs typically form a distribution representing the features, with some inputs at the center of the distribution and others at the boundaries. During training, we aim for \(dx\) associated with label \(Y\) to be as small as possible to achieve high test accuracy. This analogy holds for other tasks as well, where categories are replaced by abstract concepts.

In a standard classification task, the neural network primarily extracts features of the input \(x\) without considering its conjugate \(p\). However, the scenario changes when gradient-based attacks are involved. To illustrate this, we perform the Fourier transform of the gradient of \(\psi _{Y}(x)\) with respect to \(x\):

$$\begin{aligned} {\mathcal {F}}\left\{ \frac{\partial \psi _{Y}(x)}{\partial x}\right\} = \int _{-\infty }^{\infty } \frac{\partial \psi _{Y}(x)}{\partial x} e^{-i p x} \, dx = i p \hat{\psi }_{Y}(p), \end{aligned}$$
(4)

where we observe that the gradient involves the distribution \(\hat{\psi }_{Y}(p)\) and the conjugate variable \(p\), indicating that the neural network need to extract the features of p in order to accurately extract the features of the gradient, i.e., the gradient-based “attack”. Thus, when the gradient is added to the input variables, the network is compelled to identify both \(x\) and \(p\) with high accuracy, which is prohibited by the uncertainty relation in Eq. (3). Hence, if the network is trained to have higher accuracy on input x, it will be less accurate in extracting features on p.

In practice, the gradient term can appear either in the input or in the label, with the former contributing to adversarial attacks and the latter to generation tasks. In this paper, we focus solely on the former, leaving the discussion of generation tasks for future work.

Illustration of the neural packet in the binary case

Since the mathematical proof of the uncertainty principle is achieved via the Dirac notations in quantum theory, analogous to the wave packet, we use the term “neural packet” to denote the normalized loss function (see the comparison table in Methods). To elucidate the concept of the “neural packet” and its relationship with the uncertainty principle, we use a simple binary classification task. In the binary classification task, the input domain is from -1 to 1. If the input \(-1 \le x \le 0\), we tabulate it to be class “0”. If the input \(0 \le x \le 1\), we tabulate it to be class “1”. The loss function of a neural network measures the difference between the predicted and true outputs and is integrable over the input space. Figure 2(A) exhibits the loss function \(l(f(x,\theta ),Y)\) of a 1-D binary classification network under different training epochs, where \(f(x, \theta )\) denotes the output of the network with input x, and weight parameters \(\theta\). Y refers to the label corresponding to input x. The value of the loss function reaches a minimum at the correctly classified input regions and starts to increase as inputs approach the classification interfaces, resembling the wave packet in quantum mechanics.

Here, since the input space of type “0” is in the range \([-1, 0]\), \(\psi _{Y=``0''}\) is normalized in range \([-1, 0]\). Similarly, \(\psi _{Y=``1''}\) is normalized in range [0, 1]. In the case we have obtained the two wave packets, we can calculate the corresponding averaged deviation of input x measured by the probability density \(\psi _{Y}(x)\), i.e., \(dx_1\) and \(dx_2\) (the explicit formulas are provided in Methods).

As shown in Fig. 2(B), we can observe that with the increase of the test accuracy, \(dx_1\) and \(dx_2\) both decrease, corresponding to a growing slope of the neural packet at the interface. Hence, the prediction ability of the network increases with the decrease of dx.

Fig. 2
figure 2

Illustration of Neural packet for binary classification. (A) Loss function of a 1-D binary classification network. (B) Relationship between test accuracy and dx.

Accurate networks are vulnerable

When training a neural network, the test accuracy of the classifier typically increases gradually before reaching a plateau. Concurrently, we can evaluate the variation in robust accuracy throughout the training process. Analogous to the FGSM attack, we introduce an attack without the sign function:

$$\begin{aligned} x^\prime= & x \pm \epsilon \cdot \nabla _{x}l(f(x,\theta ),Y^{*}), \end{aligned}$$
(5)

where \(x^\prime\) represents the perturbed input, and \(\epsilon\) is a constant controlling the magnitude of the perturbation. \(Y^{*}\) denotes the specific type. For the purposes of this discussion, we refer to the “+” sign as an “attack” and the “-” sign as an “anti-attack.”

As illustrated in Eq. 5, the added term induces the two attacks. Focusing on the neural packet of category “0” in Fig. 2, whose neural packet is defined in the range [-1, 0], we observe that near the boundary at \(x=0\), the gradient of the loss function with respect to the input x is always positive. Consequently, adding the gradient will shift the points towards other classes. Similarly, for category “1,” the gradient is negative near the boundary, and adding the negative gradient will also shift the points towards class “0.”

Therefore, in the context of simple binary classification, we can generally conclude that the more accurately the model is trained, the steeper the boundary becomes. If we apply the “+” sign in Eq. (5), the network will be attacked, resulting in decreased accuracy. Conversely, if we apply the “-” sign, the network will be “anti-attacked,” leading to increased accuracy.

Fig. 3
figure 3

(A) Evolution of test, attacked, and anti-attacked accuracies during the training process. (B) Relation between dx and dp measured by the neural packets.

Figure 3(A) shows the evolution of test, attacked, and anti-attacked accuracies throughout the training process. In the initial epochs, we observe an increase in all three accuracies, indicating that the network is fitting the data distribution. As training progresses, a discrepancy emerges between test and attacked accuracies, while the anti-attacked accuracy continues to increase until it reaches a plateau. This observation aligns with the descriptions in literature.

The phenomena presented in Fig. 3(A) can be attributed to a deeper and more fundamental mechanism, which we term the “uncertainty principle of neural networks”. To elucidate how the uncertainty relation manifests, we also compute the average deviation of the conjugate variable p, denoted as \(dp_1\) and \(dp_2\), measured by the probability density \(\hat{\psi }_{Y}(p)\), where \(Y=0 \text { or } 1\) corresponding to the two categories. As shown in Fig. 3(B), a decrease in \(dx_1\) and \(dx_2\) corresponds to an increase in \(dp_1\) and \(dp_2\). This inverse relationship highlights the network’s inherent limitation in simultaneously extracting features of both x and p with arbitrary precision, similar to the uncertainty principles observed in signal processing and quantum mechanics.

To understand the meaning of dp, we use the idea of the Fourier transform on the “neural packet” (i.e., the normalized loss function). In Fourier transform, input x is transformed into a new space of p. In Figs. 2 and 3(B) we see that dx measures the accuracy of x (where a smaller dx indicates higher classification accuracy), and similarly dp measures the accuracy of p (where a smaller dp indicates higher accuracy in measuring p with \(\hat{\psi }_{Y}(p)\)). Since both x and p cannot be measured accurately simultaneously, a trade-off between x and p is evident. In the simple binary case, dp aligns with robust accuracy, where a larger dp indicates lower robust accuracy. Therefore, dp can be viewed as a manifestation of the network’s vulnerability: larger value of dp indicates more vulnerable network.

Building on this understanding, we can interpret the attacks through the lens of the uncertainty principle. During an attack, all data points are perturbed by “adding” the gradient terms via Eq. (5). These gradient terms, as conjugates of the inputs in terms of the normalized loss function \(\psi _{Y}(x)\), forces the network to accurately discern both the input and the added gradients (atttacks). This process, however, is strictly limited by the uncertainty relation, leading to a decrease of the test accuracy. On the other hand, during anti-attack scenarios, points are shifted towards their correct categories, all data points are adjusted by “subtracting” the conjugates, eliminating the need for the network to identify the gradient terms, which in turn improves accuracy. Therefore, a larger dp value indicates a higher susceptibility of the network to gradient-based attacks. This relationship underscores the trade-off between the network’s precision in feature extraction and its vulnerability to perturbations.

Why the uncertainty principle to understand neural network vulnerabilities?

In our simple binary classification task, the trade-off between accuracy and robustness can be understood through the slope at the decision boundaries, i.e., the direction of the attack. If this vulnerability phenomenon can be explained in such a straightforward manner, one might question the necessity of adopting the concept of the uncertainty principle.

The response to this question can be divided into three key points. Firstly, in a simple binary scenario, the alignment of the slope at the interface with the uncertainty phenomenon is readily apparent. In contrast, in more complex real-world cases, such as the classification of the ImageNet dataset, network accuracy often significantly declines due to minor, imperceptible perturbations. This suggests that a substantial portion of images lie near classification boundaries, indicating that there exist many singularities like in Fig. 2(A) leading to the network to be unstable. This complex phenomenon cannot be easily understood quantitatively through the lens of the slopes alone. However, by applying the more fundamental concept of the uncertainty principle, these phenomena become more intuitive. Introducing conjugates into the inputs inherently leads to a decrease in network performance, aligning with the principles of uncertainty which can be quantified by dx and dp.

Secondly, the uncertainty principle provides a complementary perspective in terms of the input \(x\) and its conjugate \(p\), and asserts that a neural network cannot extract features of both \(x\) and \(p \sim \frac{\partial l(f(x,\theta ),Y)}{\partial x}\) with arbitrary precision. This is a general and profound conclusion in physics that applies to all relevant tasks, not limited to adversarial attacks. For example, the uncertainty principle is also valid in generative networks. In a generative task, if the labels of the dataset include gradient information of the trained neural network, the network’s accuracy will be inherently limited, a point we will demonstrate in our future work.

Thirdly, in practical scenarios, inputs are typically defined in high-dimensional spaces. For instance, in the MNIST dataset, each input image resides in a 28\(\times\)28-dimensional space, which is too large to visualize effectively. Additionally, different classes often intersect, blurring both the boundaries and the gradients. These complexities make it difficult to understand why highly accurate networks are so vulnerable to tiny perturbations. The uncertainty principle, however, does not rely on the dimensions of the input space and the network, making it a unuversal property for all neural networks.

By adopting the concept of the uncertainty relation, we know that the accuracy-robustness trade-off is an inherent property of all neural networks. Readers are encouraged to refer to our previous work for empirical verification of the uncertainty relation on the MNIST and CIFAR-10 datasets34.

Conclusion and future directions

In our previous works34,35, we mathematically proved the uncertainty principle in neural networks. However, the theoretical nature and complexity of those studies made the concept difficult to grasp intuitively. In this new work, we aim to provide a clearer and more accessible explanation through a simple binary classification task. By training a neural network to classify random numbers between -1 and 1, we illustrate how the trade-off between accuracy and robustness manifests in a tangible example.

Our findings reveal that as the neural network becomes more accurate, its susceptibility to adversarial attacks increases. This phenomenon is quantified using the concept of a “neural packet,” which provides a normalized representation of the error distribution in the input space. The binary classification task serves as a straightforward yet powerful demonstration of this trade-off, making the underlying principles more accessible and easier to understand.

The implications of this uncertainty principle are profound, particularly for the development and deployment of large AI models47,48. These models learn deep features and their interrelations, which are crucial for tasks such as natural language processing, image recognition, and autonomous decision-making (e.g., self-driving cars). Since these features can form neural packets, they are limited by the uncertainty principle - accurate models are vulnerable. Therefore, it is essential for the community to recognize the universality of this principle and explore ways to optimize it.

In conclusion, our study provides an analysis of the uncertainty principle in neural networks through the lens of binary classification. By highlighting the intrinsic trade-off between accuracy and robustness, we pave the way for understanding the resilience of the neural networks.

Understanding the underlying mechanisms that contribute to the vulnerability of neural networks opens avenues for developing more robust network structures in future research. One promising approach is inspired by the work of Frank Tipler49, who demonstrated how eliminating singularities in classical physics can formally reproduce Bohm’s global quantum potential. In the realm of neural networks, this concept translates to the formation of distinct boundaries between different classes or concepts. In high-dimensional spaces, these boundaries become more pronounced, contributing to the networks’ vulnerability. Our research suggests that this phenomenon is inherent and unavoidable due to the underlying uncertainty principle. Although it may not be possible to completely eliminate this vulnerability, we can mitigate it using the methods proposed by Tipler. By applying the relevant mathematical frameworks, we may be able to develop more robust neural networks without relying heavily on extensive adversarial training.

Methods

Uncertainty relation of neural networks

Formulas and notations for neural networks

Without loss of generality, we can assume that the loss function \(l(f(X,\theta ),Y)\) is square integrableFootnote 1,

$$\begin{aligned} \int l(f(X,\theta ),Y)^{2}dX= & \beta . \end{aligned}$$
(6)

Eq. (6) allows us to further normalize the loss function as

$$\begin{aligned} \psi _{Y}(X)= & \frac{l(f(X,\theta ),Y)}{\beta ^{1/2}}, \end{aligned}$$
(7)

so that

$$\begin{aligned} \int \psi _{Y}(X)^{2}dX=1. \end{aligned}$$
(8)

For convenience, we refer \(\psi _{Y}(X)\) as a neural packet in the later discussions. Note that under different labels Y, a neural network will be with a set of neural packets.

An input \(X=(x_{1},...,x_{i},...,x_{M})\) with M components can be seen as a point in the multi-dimensional space, where the numerical values of \((x_{1},...,x_{i},...,x_{M})\) correspond to the input values. The input and attack operators of the neural packet \(\psi _{Y}(X)\) can then be defined as:

$$\begin{aligned} \hat{x}_{i}\psi _{Y}(X)= & x_{i}\psi _{Y}(X),\nonumber \\ \hat{p}_{i}\psi _{Y}(X)= & \frac{\partial }{\partial x_{i}}\psi _{Y}(X). \end{aligned}$$
(9)

The average input value at \(x_{i}\) associated with neural packet \(\psi _{Y}(X)\) can be evaluated as

$$\begin{aligned} \langle \hat{x}_{i}\rangle= & \int \psi _{Y}^*(X)x_{i}\psi _{Y}(X))dX. \end{aligned}$$
(10)

Since \(\psi _{Y}(X)\) corresponds to a purely real number without imaginary part, the above equation is equivalent to:

$$\begin{aligned} \langle \hat{x}_{i}\rangle= & \int \psi _{Y}(X)x_{i}\psi _{Y}(X))dX. \end{aligned}$$
(11)

Besides, the attack operator \(\hat{p}_{i}=\frac{\partial }{\partial x_{i}}\) corresponds to the conjugate variable of \(x_{i}\). And we can obtain the average value for \(\hat{p_{i}}\) as

$$\begin{aligned} \langle \hat{p}_{i}\rangle= & \int \psi _{Y}(X)\frac{\partial }{\partial x_{i}}\psi _{Y}(X)dX. \end{aligned}$$
(12)

Derivation of the uncertainty relation

Table 1 Comparison of the uncertainty principle between quantum physics and neural networks.

The uncertainty principle of a trained neural network can then be deduced by the following theorem:

Theorem 1

The standard deviations \(\sigma _{{p}_{i}}\) and \(\sigma _{{x}_{i}}\) corresponding to the attack and input operators \(\hat{p_{i}}\) and \(\hat{x_{i}}\), respectively, are restricted by the relation:

$$\begin{aligned} \sigma _{{p}_{i}}\sigma _{{x}_{i}}\ge & \frac{1}{2}. \end{aligned}$$
(13)

Proof

We first introduce the standard deviations \(\sigma _{a}\) and \(\sigma _{b}\) corresponding to two general operators \(\hat{A}\) and \(\hat{B}\). Then it follows that:

$$\begin{aligned} \sigma _{a}\sigma _{b} = \langle (\hat{A}-\langle \hat{A}\rangle )^{2} \rangle ^{\frac{1}{2}}\langle (\hat{B}-\langle \hat{B}\rangle )^{2} \rangle ^{\frac{1}{2}} \equiv \langle \hat{a}^{2}\rangle ^{\frac{1}{2}} \langle \hat{b}^{2}\rangle ^{\frac{1}{2}}. \end{aligned}$$
(14)

In general, for any two unbounded real operators \(\langle \hat{a}\rangle\) and \(\langle \hat{b}\rangle\), the following relation holds

$$\begin{aligned} 0\le \langle (\hat{a}-i\hat{b})^{2}\rangle = \langle \hat{a}^{2} \rangle -i\langle \hat{a}\hat{b}-\hat{b}\hat{a}\rangle +\langle \hat{b}^{2}\rangle . \end{aligned}$$
(15)

If we further replace \(\hat{a}\) and \(\hat{b}\) in Eq. (15) by operators \(\hat{a}\langle \hat{a}^{2}\rangle ^{-1/2}\) and \(\hat{b}\langle \hat{b}^{2}\rangle ^{-1/2}\), we can then obtain the property \(2\langle \hat{a}^{2}\rangle ^{1/2}\langle \hat{b}^{2}\rangle ^{1/2} \ge i\langle \hat{a}\hat{b}-\hat{b}\hat{a}\rangle\), which gives the basic bound for the commutator \([\hat{a},\hat{b}]\equiv \hat{a}\hat{b}-\hat{b}\hat{a}\),

$$\begin{aligned} \langle \hat{a}^{2}\rangle ^{\frac{1}{2}}\langle \hat{b}^{2}\rangle ^{\frac{1}{2}}\ge & |i\frac{1}{2}\langle [\hat{a},\hat{b}]\rangle |. \end{aligned}$$
(16)

Seeing the fact that \([\hat{a},\hat{b}]=[\hat{A},\hat{B}]\), we finally obtain the uncertainty relation

$$\begin{aligned} \sigma _{a}\sigma _{b}\ge & |i\frac{1}{2}\langle [\hat{A},\hat{B}]\rangle |. \end{aligned}$$
(17)

In terms of the neural networks, we can simply replace operators \(\hat{A}\) and \(\hat{B}\) by \(\hat{p}_{i}\) and \(\hat{x}_{i}\) introduced in Eq. (9), and this leads to

$$\begin{aligned} \sigma _{{p}_{i}}\sigma _{{x}_{i}}\ge & |i\frac{1}{2}\langle [\hat{p}_{i},\hat{x}_{i}]\rangle | = \frac{1}{2}, \end{aligned}$$
(18)

where we have used the relation

$$\begin{aligned} [\hat{p}_{i},\hat{x}_{i}]\psi _{Y}(X)= & [\hat{p}_{i}\hat{x}_{i} -\hat{x}_{i}\hat{p}_{i}]\psi _{Y}(X)\nonumber \\= & \frac{\partial }{\partial x_{i}}[x_{i}\psi _{Y}(X)] \nonumber \\ & -x_{i}\frac{\partial }{\partial x_{i}}\psi _{Y}(X)\nonumber \\= & \psi _{Y}(X). \end{aligned}$$
(19)

\(\square\)

Note that for a trained neural network, \(\psi _{Y}(X)\) depends on the dataset and the structure of the network.

Equation (18) is a general result for general neural networks (see extension to the generation network in supplementary material). For convenience, we compare the formulas in quantum physics with those used in neural networks in Table 1 to facilitate easy understandings for readers.

The binary classifier

Data generation

To generate the dataset for the binary classification, we created a function that produces data points randomly distributed between −1 and 1. The labels are binary, with values less than 0 labeled as 0 and values greater than or equal to 0 labeled as 1. This simple binary classification task allows us to evaluate the performance and robustness of our neural network model.

Neural network architecture

The neural network architecture consists of four fully connected layers with batch normalization and dropout layers to prevent overfitting. The network uses Leaky ReLU activation functions. The architecture is designed to balance complexity and performance, ensuring that the model can learn effectively from the data while avoiding overfitting.

Training procedure

The network was trained using stochastic gradient descent (SGD) with a learning rate of 0.01. The training process involved 1000 epochs, and the model was evaluated every 10 epochs. During each epoch, the model parameters were updated to minimize the cross-entropy loss between the predicted and true labels. The training process was monitored by evaluating the model’s performance on a validation set at regular intervals.

Evaluation metrics

We evaluated the model using several metrics, including loss, accuracy, attacked accuracy (robustness to gradient-based attacks), and anti-attacked accuracy. The loss and accuracy were computed by comparing the model’s predictions with the true labels.

To evaluate the model’s resilience to attacks, we implemented a modified version of the Fast Gradient Sign Method (FGSM) attack (Eq. (5)). By introducing both the plus and minus effect of the gradient without the “sign” function, we test the model’s ability to maintain performance under different types of adversarial inputs.

Quantum inspired metrics

At some training epochs, we compute the quantum metrics such as dx and dp to gain insights into the model’s behavior from a quantum perspective. These metrics were derived from the loss function and its gradients, providing a unique perspective on the model’s performance. The dx metric measures the spread of the input data, while the dp metric quantifies the resilience of the model under attacks.

The calculation of dx and dp in the binary example involves several steps:

  1. 1.

    Normalization Constant We first compute the normalization constant, \(\beta\), for the loss function over a specified range. This ensures that the loss function is properly scaled.

    $$\begin{aligned} \beta = \sqrt{\int _{a}^{b} l(f(x, \theta ), Y)^2 \, dx} \end{aligned}$$
  2. 2.

    Neural Packet Using the normalization constant, we define the neural packet, \(\psi (x)\), which represents the normalized loss function.

    $$\begin{aligned} \psi (x) = \frac{l(f(x,\theta ),Y)}{\sqrt{\beta }}, \end{aligned}$$

    where the label \(Y\in {0,1}\).

  3. 3.

    Calculation of dx The dx metric is calculated as the standard deviation of the input (i.e., the input operator34 \(\hat{x}\)) weighted by the neural packet.

    $$\begin{aligned} dx = \sqrt{\int _{a}^{b} \psi (x)(\hat{x} - \langle \hat{x} \rangle )^2 \psi (x) \, dx} \end{aligned}$$

    where

    $$\begin{aligned} \hat{x} = x, \text { and } \langle x \rangle = \int _{a}^{b} x \psi (x)^2 \, dx. \end{aligned}$$
  4. 4.

    Calculation of dp: The dp metric is calculated as the standard deviation of the gradient (i.e., the attack operator34 \(\hat{p}\)) of the neural packet.

    $$\begin{aligned} dp = \sqrt{\int _{a}^{b} \psi (x) (\hat{p} - \langle \hat{p} \rangle )^2\psi (x) \, dx} = \sqrt{\int _{a}^{b} \psi (x) (\hat{p}^2 - 2\hat{p}\langle \hat{p} \rangle + \langle \hat{p} \rangle ^2)\psi (x) \, dx}, \end{aligned}$$

    where

    $$\begin{aligned} \langle \hat{p} \rangle = \int _{a}^{b} \frac{d\psi (x)}{dx} \psi (x) \, dx{\textbf {, }}\hat{p}\psi (x)=\frac{\partial \psi (x)}{\partial x}\text {, and }\hat{p}^2\psi (x)=\frac{\partial ^2\psi (x)}{\partial x^2} \end{aligned}$$

These calculations provide a quantitative measure of the model’s accuracy and robustness, allowing us to assess the trade-off between these two aspects.

Model evaluation

We evaluated the model every 10 epochs over a total of 1000 epochs. For each evaluation, we generated 100 uniformly distributed points in the range [-1, 1] to compute the loss curves. We also performed gradient-based and anti-gradient-based attacks with an epsilon value of 0.05 to assess the model’s resilience under attacks. The results were averaged over 50 independent runs to ensure robustness and reliability.