Introduction

The intersection between machine learning and quantum computing gives rise to the field of quantum machine learning that has attracted considerable attention. With the exponentially large Hilbert space, a quantum computer promises to offer representational power for recognizing complex data patterns that are challenging to recognize classically1,2,3,4,5,6,7. Recently, the power of quantum learning models has been extensively studied in terms of quantum neural network (QNN) and quantum convolutional neural network (QCNN)8,9,10,11,12,13,14, and, with the rapid advances of quantum technologies across various physical platforms15,16,17,18,19, supported by a growing number of experiments20,21,22,23,24. A crucial step towards practical applications is to test the quantum learning models on large-scale datasets. However, handling extensive datasets presents significant challenges, necessitating the development of additional methods to enhance the performance of learning models. Ensemble learning is one such method, which leverages the power of joint decision and has achieved dramatic success in improving the accuracy of classical machine learning models over the last three decades25,26,27,28,29. It is thus natural to contemplate the implementation of ensemble strategies in a quantum setting.

To date, research on quantum ensemble methods can be broadly divided into two categories depending on whether the base learners are implemented in a quantum superposition or not. A coherent superposition of base learners promises to speed up the ensemble learning procedure30,31,32,33,34,35,36,37, which, however, is generally not feasible for near-term quantum devices, as it relies on subroutines that are only viable in fault-tolerant quantum computing regimes. On the other hand, classical ensemble methods have been explored to improve the performance of quantum classifiers from the perspective of error mitigation and resource savings38,39,40, which can be well suited for current noisy quantum devices. However, directly applying classical ensemble techniques to quantum systems overlooks the unique properties of quantum classifiers. This naturally raises the following question: Can we harness the strengths of quantum classifiers to enhance ensemble methods while maintaining compatibility with the limitations of near-term quantum hardware?

In this work, we take a step forward by presenting a quantum version of the adaptive boosting (AdaBoost) ensemble algorithm, dubbed AdaBoost.Q. Our method uses the statistic information generated by quantum classifiers to improve the efficiency of AdaBoost algorithm by refining the attention mechanism during the adaptive training procedure. Using a superconducting quantum processor, we demonstrate the effectiveness of AdaBoost.Q in enhancing the performance of quantum models through two supervised learning experiments. In the first experiment, we perform a ten-class classification task on the MNIST handwritten digits dataset41 with a ten-qubit QNN classifier. By employing AdaBoost.Q, we improve the testing accuracy from 80% to above 86% over the full-size MNIST test dataset. Additionally, numerical simulations comparing AdaBoost.Q with the conventional AdaBoost.M1 algorithm42 (used in refs. 38,39) reveal the superior performance of our approach. The second experiment aims to classify three quantum phases of a spin chain model with a 15-qubit QCNN classifier. We show that the performance of the QCNN classifier can be significantly improved by AdaBoost.Q, with the testing accuracy enhanced from 77% to 100% over 1564 test samples. Our results provide a widely applicable method for pushing quantum machine learning towards practical applications.

Framework and experimental setup

It is a common human practice to aggregate and weigh different opinions to make a complex decision. The ensemble methodology extends this idea to the world of machine learning, aiming to construct a highly accurate classifier (referred to as a “strong” classifier) by combining multiple “weak” classifiers, each of which may only slightly outperform a random guess. The AdaBoost algorithm is among the most prominent ensemble methods to generate a strong classifier25,26,28. It works by training the weak classifiers sequentially on the same training set. At the core of the AdaBoost algorithm is an attention mechanism, where more attention is paid to data points that were previously misclassified when training the subsequent classifier. The level of attention paid is determined by a sample weight that is assigned to each training point. After the training procedure, each weak classifier is also assigned a weight for scoring its importance in forming the strong classifier. In the AdaBoost.Q algorithm, we extract these weights based on the probabilistic nature of the quantum classifiers.

We consider quantum classifiers that are built with parameterized quantum circuits. For a supervised K-class classification task, the training set consists of pairs of datasets, \({\{{x}_{i},{y}_{i}\}}_{i = 1}^{N}\), where xi represents the data sample, yi indexes the corresponding class label, and N is the training size. To classify the data samples, we select m qubits, with \(m\ge \lceil {\log }_{2}K\rceil\), from the quantum classifier and measure them in the computational basis, which can be described by a set of basis projectors \({\{{\Pi }_{j}\}}_{j = 0}^{{2}^{m}-1}\) known as the projection-valued measures (PVM). We divide the PVM equally into K groups, with the kth group containing projectors indexed from k2m/K to (k + 1)2m/K − 1, while discarding the last 2mK2m/K projectors. The data sample is classified as the kth class if the measured state is located in the kth group.

According to Born’s rule, for an input sample xi, the probability of measuring the basis state in the kth group is \({P}_{k}({x}_{i},{\boldsymbol{\theta }})=\mathop{\sum }\nolimits_{j = k\lfloor {2}^{m}/K\rfloor }^{(k+1)\lfloor {2}^{m}/K\rfloor -1}{\rm{Tr}}(\rho ({x}_{i},{\boldsymbol{\theta }}){\Pi }_{j})\), where ρ(xi, θ) denotes the reduced density matrix of the m measured qubits of the quantum classifier parameterized by θ. The corresponding predicted label of the quantum classifier is obtained as \({\tilde{y}}_{i}=\mathop{{\rm{arg}}\,{\rm{max}}}\limits_{k}{P}_{k}({x}_{i},{\boldsymbol{\theta }})\). Note that the probability \(P({x}_{i},{\boldsymbol{\theta }})=\mathop{\max }\limits_{k}{P}_{k}({x}_{i},{\boldsymbol{\theta }})\) naturally characterizes the confidence of the prediction.

Using the probability information, we establish a refined criterion for calculating the weights of the data samples and classifiers following the spirit of the real AdaBoost algorithm26,43. Specifically, we initialize all the sample weights to be 1/N when training the first classifier. During the iterative training of the subsequent classifiers, the sample weights are updated according to

$${w}_{l+1,i}=\frac{{w}_{l,i}}{{Z}_{l+1}}\exp [P({x}_{i},{{\boldsymbol{\theta }}}_{l}^{* })\cdot (1-2{\delta }_{{\tilde{y}}_{l,i}{y}_{i}})],$$
(1)

where i indexes the samples, l indexes the weak classifiers, \({{\boldsymbol{\theta }}}_{l}^{* }\) is the optimal parameter for the lth classifier, δ denotes the Kronecker delta, and \({Z}_{l+1}=\mathop{\sum }\nolimits_{i = 1}^{N}{w}_{l,i}\,\exp [P({x}_{i},{{\boldsymbol{\theta }}}_{l}^{* })\cdot (1-2{\delta }_{{\tilde{y}}_{l,i}{y}_{i}})]\) is the normalizing factor. The loss function of the lth classifier is given by \({{\mathcal{L}}}_{l}=\mathop{\sum }\limits_{b=1}^{B}{{\mathcal{L}}}_{l,b}\), with B denoting the batch size and

$${{\mathcal{L}}}_{l,b}=-{w}_{l,b}\,\,\text{ln}\,\,[{P}_{{y}_{b}}({x}_{b},{\boldsymbol{\theta }})].$$
(2)

After training the lth classifier, we calculate its weight as

$${\alpha }_{l}=\ln \frac{{c}_{l}{{\mathcal{P}}}_{l}^{\,\text{true}\,}}{{{\mathcal{P}}}_{l}^{\,\text{false}\,}},$$
(3)

where \({{\mathcal{P}}}_{l}^{\,\text{true}\,}={\sum }_{i}{w}_{l,i}P({x}_{i},{{\boldsymbol{\theta }}}_{l}^{* }){\delta }_{{\tilde{y}}_{l,i}{y}_{i}}\) and \({{\mathcal{P}}}_{l}^{\,\text{false}\,}={\sum }_{i}{w}_{l,i}P({x}_{i},{{\boldsymbol{\theta }}}_{l}^{* })(1-{\delta }_{{\tilde{y}}_{l,i}{y}_{i}})\). The additional parameter cl, which takes the value of unity by default, can be slightly tuned to optimize the training accuracy in practice. The ensemble classifier is constructed by combining all the trained weak classifiers (also referred to as base classifiers), which classifies xi according to

$${\tilde{y}}_{i}=\mathop{{\rm{arg}}\,{\rm{max}}}\limits_{k}\sum _{l}{\alpha }_{l}{P}_{k}({x}_{i},{{\boldsymbol{\theta }}}_{l}^{* }).$$
(4)

The workflow of AdaBoost.Q is illustrated in Fig. 1 and the pseudocode is summarized in Methods. In the conventional AdaBoost.M1 algorithm (see Supplementary Sec. I), sample weights are determined exclusively by the correctness of the classification results—specifically, all misclassified samples have their weights increased uniformly. By incorporating classification probabilities as a measure of confidence into the weight calculation, our algorithm enables more precise reweighting of data samples. This refinement allows for a more nuanced adjustment of sample weights, which can enhance the overall performance of the ensemble model.

Fig. 1: Schematic diagram of AdaBoost.Q.
Fig. 1: Schematic diagram of AdaBoost.Q.
Full size image

The algorithm is designed to generate a strong quantum classifier (right) by combining multiple weak quantum classifiers (middle). Each weak quantum classifier takes the same training data as an input, and it outputs the classification result of each data sample along with a probability P, which characterizes the confidence of the prediction. The weak quantum classifiers are trained iteratively, using reweighted versions of the training set, with the weights depending on the correctness of the predictions and finely tuned by the output probabilities of the previous classifier. This allows the subsequent quantum classifiers to focus on samples that were not well classified previously. The sample weights for the first classifier are assigned evenly among the training set.

We experimentally demonstrate the effectiveness of our approach on a fully programmable superconducting quantum processor17. The qubits on the processor are of the frequency-tunable transmon type, which are arranged in an 11 × 11 square lattice, with the neighboring qubits connected by tunable couplers. For the ten-class classification of the MNIST dataset, we select 20 qubits to construct two copies of a ten-qubit QNN classifier, which run in parallel to accelerate training. For the quantum data classification task, we build a QCNN classifier with a carefully designed one-dimensional (1D) chain consisting of 15 qubits. All the classifiers are essentially variational quantum circuits compiled into the native gate sets, i.e., the parameterized single-qubit gates and two-qubit CZ gates between neighboring qubits. The median Pauli error rates of the parallel single- and two-qubit gates, characterized with the simultaneous cross-entropy benchmarking technique, are around 5 × 10−4 and 6 × 10−3, respectively. See Supplementary Sec. IIA for details on device and gate performances.

Ensemble learning with QNN

As a first demonstration, we apply our method to improve the performance of QNN-based classifiers, which have been intensively studied in recent years10,44,45,46. We focus on a ten-class classification task with the MNIST handwritten digit dataset, which is widely used in benchmarking machine learning models. This task had been challenging for quantum hardware, and it was not until recently that an experimental testing accuracy of around 62% over 500 test samples was reported23. We construct the QNN with a shallow circuit consisting of ten qubits, which contains three layers of single-qubit gates to encode 30 trainable parameters and two layers of CNOT gates to entangle all qubits (see Supplementary Section IIC for details about the circuit design). The training dataset contains 3600 28 × 28-pixel images (360 for each digit) selected from the MNIST dataset. To encode the classical data, we adapt the encoding scheme from the end-to-end learning framework47. Specifically, we first vectorize the image data and then transform it to an array of rotational angles x with a transform matrix W, following which we encode them into the single-qubit rotation of the QNN circuit alternatively with the trainable parameters θ (Fig. 2a). Both W and θ are trained simultaneously to minimize the loss function during the learning procedure. To reduce the runtime, we further parallelize quantum computing by constructing and running two copies of the QNN simultaneously on the processor.

Fig. 2: Ten-class classification of the MNIST dataset with QNN.
Fig. 2: Ten-class classification of the MNIST dataset with QNN.
Full size image

a Experimental setup. The training set is composed of 3600 MNIST handwritten digit images, each with a size of 28 × 28 pixels. An image is transformed into a 30-dimensional vector x using a trainable matrix W for further quantum encoding. At each training step, we select a batch of 30 images to train the classifier, which is split evenly into two sub-batches, denoted here by x and \({\boldsymbol{x}}^{\prime}\), and fed to two copies of the QNN in parallel. Each copy is constructed with a ten-qubit chain selected from the quantum processor. b The two QNN circuits, each of which is composed of three layers of single-qubit gates followed by two layers of CNOT gates. The rotation angles of single-qubit gates are used to encode the data vector and trainable parameters θ. Here Rx and Rz denote the single-qubit rotation gates around the x- and z-axis, respectively. c Loss function (top) and accuracy for the test and training sets (bottom) at each training step. The training is carried out for 7 epochs, with each epoch consisting of 120 training steps as separated by the dashed lines. After each epoch, the QNN is monitored with 1000 images randomly selected from the MNIST test set (orange circle dots). At the end of the training, we measure the test accuracy over the whole MNIST test set containing 10,000 images (red square dot).

The training procedure of a single QNN classifier is exemplified in Fig. 2c, where the loss function converges to about 0.007 after 840 training steps. At the end of the training procedure, we input all 10,000 samples of the MNIST test dataset to the trained classifier, obtaining an overall testing accuracy around 80.5%, which is consistent with the training accuracy, verifying the generalizability of the trained QNN model. See Methods for details about the training procedure.

With the performance of the base QNN classifier established, we proceed to AdaBoost.Q. In Fig. 3a, we plot the testing accuracies of the ensemble QNN classifiers during the implementation of AdaBoost.Q. A notable increase of the accuracy is observed, from 80.5% to 86.7% over the first two iterations, after which the accuracy saturates. The observation is consistent with the weight calculated for each base classifier, which dramatically drops to near zero for the fourth base classifier (Fig. 3b). In Fig. 3c, we compare the classification results with and without using AdaBoost.Q, observing an improvement of testing accuracy for the ensemble classifier across almost all digits. To assess the intrinsic performance of AdaBoost.Q, we conduct numerical simulation to compare our method with the conventional AdaBoost.M1 algorithm42. The ensemble classifiers are constructed using four base classifiers, each employing the same QNN circuit structure as utilized in the experiment. We record the accuracy improvements of the ensemble classifier over a single classifier with randomly initialized parameters across 500 trials. As depicted in Fig. 3d, our method achieves an average accuracy improvement of 5%, aligning with the experimental results and surpassing AdaBoost.M1, which attains only a 1.6% average accuracy improvement. The results highlight the efficiency gains achieved through the refined attention mechanism of AdaBoost.Q. See Supplementary Section I for details of the numerical simulation and the AdaBoost.M1 algorithm.

Fig. 3: Experimental implementation of AdaBoost.Q.
Fig. 3: Experimental implementation of AdaBoost.Q.
Full size image

a Testing accuracy of the ensemble classifiers composed of L base classifiers, with L = 1 to 4. The error bars are obtained as the standard deviations from bootstrapping (resampled 1000 times from the original data sets). The accuracy is estimated based on the whole MNIST test set. b The experimental weight αl and adjustment parameter cl of each base classifier l. c Comparison of the classification performance on the whole MNIST test set with single and ensemble classifiers, where the accuracy of classifying each digit is displayed. d Comparison of the performance between the conventional AdaBoost.M1 and AdaBoost.Q with numerical simulation considering the same task, where both ensemble classifiers are composed of four base classifiers. The simulation is performed 500 times each. The histogram of the accuracy improvement, AeAs with Ae and As being the testing accuracy of the ensemble and single classifiers, is plotted, with the average values denoted by the dashed lines.

Ensemble learning with QCNN

To demonstrate the versatility of AdaBoost.Q, we further apply it to QCNN-based classifiers. The task is to classify the ground states of a cluster-Ising Hamiltonian48, which can either belong to a symmetry-protected topological (SPT) phase, a paramagnetic (PM) phase, or an Ising phase (See Methods). Previous studies12,21 have revealed the advantages of QCNN over the direct measurement of the string order parameter in identifying the SPT phase. Here, we consider a more complex task, i.e., the classification of all three phases.

The QCNN circuit is implemented on a 1D chain of 15 qubits on our device. QCNN can achieve an exponential reduction of trainable parameters compared with the generic QNN circuit12, at the cost of involving long-range two-qubit gates along the chain. We circumvent this challenge by carefully designing the topology of the selected 1D chain, such that all necessary two-qubit gates can be directly implemented, as shown in Fig. 4a. A QCNN classifier is typically composed of convolutional, pooling, and fully connected layers. In our experiment, we construct the convolutional layer with two layers of convolutional kernels applied in a translationally invariant way. Each convolutional kernel is a parameterized two-qubit U(4) unitary49. A pooling layer applies parallel controlled-X (CNOT) gates and then discards the control qubits. After performing the convolutional and pooling layers for three rounds, we implement a CZ gate on the remaining two qubits as the fully connected layer. The two qubits are then readout to obtain the classification results.

Fig. 4: Ensemble learning of quantum data with QCNN.
Fig. 4: Ensemble learning of quantum data with QCNN.
Full size image

a Structure of the QCNN circuit, which consists of three alternating convolutional (Ci) and pooling (Pi) layers, followed by a fully-connected (FC) layer. Each convolutional layer contains two layers of convolutional kernels, each of which consists of seven single-qubit gates with 15 variational parameters and three two-qubit CNOT gates. We apply the convolutional kernels in a translationally invariant way, resulting in a total of 45 variational parameters. The pooling layer applies a layer of two-qubit CNOT gates and then passes the target qubits to the subsequent layer, thus reducing the qubit number by half. After three rounds of pooling, we are left with two qubits, on which we apply a CZ gate as a fully connected layer. The two qubits are then measured in the computational basis for the classification task. b The experimentally measured string order parameters 〈S〉 for the quantum states in the training and test sets, respectively. Each quantum state is the ground state of a cluster-Ising Hamiltonian with 15 spins, which has two parameters h1 and h2 (see Methods). The states can be approximately prepared with variational quantum circuits and directly input into the QCNN classifier. The test set is generated in a parameter regime larger than that for the training set, and we highlight their difference with the red box. c Histogram of the sample weights for training the second QCNN base classifiers. The weights for the quantum states in different phases are plotted separately. The dashed line represents the initial weight that is assigned to all data samples when training the first base classifier. d Testing accuracy of the first (blue) and the second (yellow) base classifier at each epoch. e Classification results of the first (top), second (middle), and ensemble (bottom) classifiers on the test set. The classifier weights are 1.78 and 3.26, respectively.

We use variational quantum circuits to prepare approximate ground states for constructing the training and test datasets. The quantum data can be directly input into the QCNN circuit as initial states. Within the reach of variational quantum circuits, our training (test) dataset consists of 2204 (1564) points spanning the phase diagram of the cluster-Ising model, with the experimentally measured string order parameters shown in Fig. 4b. To examine the generalization of the QCNN classifier, we extend to the parameter regime in generating the test samples (red box in Fig. 4b). See Methods and Supplementary Section IIB for more details about the generation of training and test sets.

In Fig. 4c–e, we show the performance of AdaBoost.Q. For a single QCNN classifier, we find that the testing accuracy for the three-class classification task is limited to around 80%. A detailed analysis of the classification results reveals that the states in the SPT phase are most likely to be misclassified (Fig. 4e, top panel). By tuning the weights of the training states (Fig. 4c), the second classifier achieves a remarkable increase in the testing accuracy, reaching a value of 93%, with most of the misclassified samples located in the Ising phase (Fig. 4e, middle panel). Finally, by combining only two base classifiers, the ensemble classifier achieves a testing accuracy of 100% on the whole test set (Fig. 4e, bottom panel), demonstrating the high efficiency of our algorithm.

Conclusion and discussion

We utilize the statistical property of quantum classifiers to enhance the performance of the AdaBoost algorithm by refining its core weight-update rule. We validate the efficacy of our approach through a prototypical ten-class classification task on the MNIST handwritten digit dataset, observing a significant improvement in testing accuracy. To the best of our knowledge, this represents the first experimental benchmark of a quantum classifier on the full-size MNIST test set, although the achieved testing accuracy of 86.7% remains modest compared to the classical benchmark of approximately 99%. To attain competitive accuracy, further advancements in quantum learning models—particularly in terms of system size and circuit structure—are necessary. In this context, AdaBoost.Q can alleviate the accuracy requirements for individual quantum classifiers. The resource efficiency of AdaBoost.Q compared with improving the performance of a single classifier is discussed in Supplementary Section IID.

On the other hand, quantum learning models operating on quantum data present a more promising route towards achieving a quantum advantage8,9,10,11,12,13. Our method also demonstrates its utility in this direction, as evidenced by the results of the second task. While our results showcase the promise of this approach, we note that a comprehensive exploration of quantum advantage in quantum machine learning constitutes a distinct and significant research direction. A definitive demonstration of such advantage will likely require further advances in both algorithmic design and hardware capabilities, building upon the foundational insights provided by studies like those in ref. 6,7,8,9,10,20.

A caveat regarding the future use of AdaBoost.Q is its susceptibility to overfitting, particularly in the presence of noisy data, which is commonly observed within the AdaBoost algorithm family due to their iterative nature. To mitigate such risk, strategies such as early stopping50, learning rate adjustment51, and data cleaning52 can be applied. For example, one can monitor the performance on a validation set and stop training when the validation error starts to increase, even if the training error continues to decrease. This prevents the model from over-adapting to noise. Our approach is broadly compatible with various quantum learning models and can be implemented across different physical platforms, both in the current noisy intermediate-scale quantum era and the coming fault-tolerant quantum era, which we anticipate would benefit the future exploration of quantum learning advantages.

Methods

Data generation

The training and test sets used for the ten-class classification task originate from the MNIST handwritten digits dataset. To accelerate the training procedure, we use the k-means algorithm53 to select 3600 representative images from the MNIST training dataset to form the training set. Specifically, we use KMeans function of the scikit-learn package54 to divide each number into 360 clusters. We select one image from each cluster to form ten numbers totalling 3600 images. During the clustering process of KMeans, we perform 180 random initial centroids and select the case the clustering with the best inertia. The test set is constructed with all 10,000 images from the MNIST test dataset. During the training procedure, we also randomly select 1000 images, with 100 for each digit, from the test set to monitor the testing accuracy.

For the quantum phase recognition task, the quantum dataset consists of ground states of the cluster-Ising Hamiltonian:

$$H=-\mathop{\sum }\limits_{i=1}^{N-2}{Z}_{i}{X}_{i+1}{Z}_{i+2}-{h}_{1}\mathop{\sum }\limits_{i=1}^{N}{X}_{i}-{h}_{2}\mathop{\sum }\limits_{i=1}^{N-1}{X}_{i}{X}_{i+1},$$
(5)

where N = 15 in our system. \(\left\{{X}_{i},{Z}_{i}\right\}\) are the Pauli operators acting on the ith spin. Depending on the choice of the two model parameters {h1, h2}, the ground states can belong to either the SPT phase, the PM phase, or the Ising phase. The SPT phase can be distinguished by measuring the string order parameter:

$$S={Z}_{1}{X}_{2}{X}_{4}...{X}_{12}{X}_{14}{Z}_{15},$$
(6)

as shown in Fig. 4b. To construct the training (test) set, we select 2, 204 (1, 564) ground states in the parameter regimes of h1 [0, 1) (h1 [0, 1.2]) and h2 (− 2.3, 1.6], respectively. For a given {h1, h2}, we use a variational quantum circuit (VQC) to prepare the corresponding ground state. The circuit is trained on a classical computer before being deployed on the quantum processor (See Supplementary Section IIB for details on the training procedure).

Algorithms for quantum ensemble learning

The AdaBoost.Q algorithm proposed in this work is shown in Algorithm 1.

Algorithm 1

AdaBoost.Q

Training the QNN classifier

We train the QNN classifier with epochs. At each epoch, we first shuffle the training set and then divide it evenly into 120 batches, with each batch containing 30 images. During the training procedure, we iterate through the 120 batches of dataset for each epoch, and train the QNN classifier for 7 epochs. To encode an image sample xi, we first flatten it into a 784-dimensional vector vi, which is then divided by 2552 and transformed into a 30-dimensional vector xi by a trainable matrix W. The data vector xi is encoded together with the trainable parameter θ into the QNN circuit, as shown in Fig. 2b. We initialize θ and W by randomly sampling each of their element from the Gaussian distribution \({\mathcal{N}}\left(\pi ,{\left(\frac{\pi }{3}\right)}^{2}\right)\) and \({\mathcal{N}}\left(\frac{\pi }{60},{\left(\frac{\pi }{180}\right)}^{2}\right)\), respectively. W and θ are trained by using the gradient based Adam optimizer55, with the gradient experimentally measured by using the parameter shift rule56. The updating rules for W and θ are detailed in Algorithm 2.

Algorithm 2

Updating rules for W and θ.

Training the QCNN classifier

The training procedure of the QCNN classifier is similar to that of the QNN classifier. Each QCNN classifier is trained for 5 epochs. At each epoch, the training set is shuffled and divided into 76 batches, with each batch containing 29 quantum states. The quantum states can be directly input into the QCNN classifier. The trainable parameter θ is initialized by randomly generating each of its element in a range of [− π, π). The gradient of the loss function with respect to θ is evaluated by using the finite difference method, with the updating rules detailed in Algorithm 3. The robustness of the finite difference method used here is illustrated in Supplementary Section IIE.

Algorithm 3

Updating rules for θ