Introduction

Quantum computing has gained significant attention due to its potential to solve complex problems that classical computers cannot handle by exploiting the laws of quantum mechanics1. Over the past thirty years, researchers have made substantial progress in the theoretical and experimental domains of quantum computing. Quantum algorithms have been simulated on both quantum simulators and real quantum computers, indicating a shift from purely theoretical research to practical applications.

The combination of quantum computing and machine learning is one of the most promising applications of quantum technology2. Some classical machine learning algorithms have already been optimized with quantum computing, such as quantum support vector machines3,4,5, quantum k-means clustering6,7, quantum neural networks8,9, and so on.

The complexity of most machine learning algorithms is related to the number of data points. Hence, the large number of samples in the data set limits the efficiency of machine learning algorithms. However, human cognition prioritizes large areas, i.e., humans tend to first glance at the broader scope and then focus on the narrower aspects10. Based on this cognitive characteristic, Zadeh proposed “Granular-ball Computing”11. Later, Xia et al. proposed the “Granular-ball Computing” model, which replaces individual samples in the data set with “granular-balls” to represent the overall characteristics of the data set12. They also proposed various granular-ball generation methods to enhance the efficiency, accuracy, and robustness of classification algorithms13,14,15. However, in these granular-ball generation algorithms, each sample point must perform distance calculations with all other sample points multiple times to determine which granular-ball center it belongs to. This results in the algorithm’s time complexity being proportional to the number of sample points in the data set. For large data sets, it will consume very high resources. Unlike previous works on granular-ball generation methods, this paper fully exploits quantum parallelism and designs two methods for generating granular-balls.

Granular-balls have also been applied to SVM, rough set, and reinforcement learning methods, significantly enhancing the efficiency of existing approaches16,17,18,19,20,21,22. Thus, we can postulate that the quantum granular balls can also be used to accelerate these machine-learning algorithms.

In this paper, our contributions are as follows:

  1. (1)

    We make the first attempt to utilize quantum computing to accelerate the process of granular-ball generation.

  2. (2)

    We apply the generated granular-ball to the quantum k-nearest neighbors algorithm (QkNN), demonstrating the effectiveness of the quantum granular-ball generation method.

  3. (3)

    Time complexity analysis is conducted, indicating that our granular-ball generation method is more efficient than the current ones. Additionally, our quantum k-nearest neighbors algorithm based on granular-balls (QGBkNN) is also proved to be more efficient than kNN and QkNN algorithms.

Preliminaries

Foundations for quantum computing

QRAM

QRAM (Quantum Random Access Memory) is a type of quantum memory that enables random access in quantum computing by utilizing the properties of quantum superposition and entanglement23. The storage method of QRAM is to store the address and data points in the way of quantum entanglement, and the specific encoding process is shown in Eq. 1. Addresses are stored in the address register in the left superposition state \(\sum \limits _{j} \psi _j{\vert j \rangle }_a\), where \({\vert {j}\rangle }_a\) represents the jth memory cell. After being stored in QRAM, data points are encoded into the data register. \({\vert {D_j}\rangle }_a\) represents the jth data point, and the specific value is D. \(\ psi_j\) is the probability amplitude for each quantum state.

$$\sum\limits_{j} {\psi _{j} } |j\rangle _{a} \xrightarrow{{{\text{QRAM}}}}\sum\limits_{j} {\psi _{j} } |j\rangle _{a} |D_{j} \rangle _{a}$$
(1)

To express more clearly, we make the following explanation. \(\sum\) means the summation symbol. In order to encode data point based on the aforementioned encoding method, two different architectures have been implemented: one is called the fanout architecture, and the other is the bucket-brigade architecture. The fanout architecture exhibits low resource consumption in classical computing but is relatively expensive to implement in quantum computing24. The bucket-brigade architecture is better suited for large-scale data points and can reduce exponential data access efficiency to logarithmic levels. Relatively speaking, the bucket-brigade architecture is the most commonly used architecture for QRAM23.

The quantum circuit of the bucket-brigade architecture is designed in25. Quantum circuit for storing four data points as shown in Fig. 1. In this circuit, from top to bottom, \(\vert {a_1a_0}\rangle\) represents the address qubits. Next are four routing qubits who are initialized to \(\vert {0010}\rangle\). Then there are four memory qubits \(\vert {m_3m_2m_1m_0}\rangle\), which store four data points. The last one is a bus qubit initialized to \(\vert {0}\rangle\). The bucket-brigade architecture is divided into three stages. First, the controlled-NOT (CNOT) gate and the Toffoli gate are used to map the address qubits to the routing qubits one-to-one. Then, the Toffoli gate transfers the values from the memory qubits to the bus qubits, establishing entanglement between the address qubits and the bus qubits. Finally, decoupling the address qubits from the routing qubits is necessary.

Fig. 1
figure 1

Quantum circuit for bucket-brigade architecture25.

Angle encoding

Angular encoding is a more effective data encoding method for quantum algorithms that do not require arithmetic operations. Angle encoding utilizes the rotation angles of the rotation X (RX), rotation Y (RY), and rotation Z (RZ) gates to encode classical information. These three rotation gates transform data points into the relative phase angles of qubits, that is, map the data to a point on the Bloch sphere. As shown in Fig. 2, taking the controlled RY (CRY) gate as an example, \(\vert {a}\rangle\) is the control bit for the CRY gate. The target qubit below is initialized to \(\vert {0}\rangle\). The quantum circuit can accomplish the following task:

$$\begin{aligned} CR_y(\theta ){\vert a \rangle }{\vert 0 \rangle }\rightarrow {\vert 00 \rangle }+{\vert 1 \rangle }[cos(\theta ){\vert 0 \rangle }+sin(\theta ){\vert 1 \rangle }] \end{aligned}$$
(2)
Fig. 2
figure 2

Quantum circuit for CRY gate. To achieve angular rotation of one qubit controlled by another qubit, the upper serves as the control bit, and the lower is the target bit initialized to \(\vert {0}\rangle\). \(RY(\theta )\) signifies a rotation of \(\theta\) degrees around the y-axis on the Bloch sphere.

Swap test

The swap test26 is commonly used to estimate the fidelity of two quantum states, which refers to the squared inner product of the two states. Quantum circuit of the swap test as shown in Fig. 3, in which \(\vert {\phi }\rangle\) and \(\vert {\varphi }\rangle\) represent the two quantum states whose fidelity is to be measured. An additional qubit is added, initialized to \(\vert {0}\rangle\), and is used as the control register for the controlled swap (CSWAP) gate.

Fig. 3
figure 3

Quantum circuit for swap test26.

The input and output of Fig. 3 can be expressed as follows:

$$\begin{aligned} {\vert 0 \rangle }{\vert {\phi } \rangle }{\vert {\varphi } \rangle }\rightarrow \frac{1}{2}{\vert 0 \rangle }({\vert {\phi } \rangle }{\vert {\varphi } \rangle }+{\vert {\varphi } \rangle }{\vert {\phi } \rangle })+\frac{1}{2}{\vert 1 \rangle }({\vert {\phi } \rangle }{\vert {\varphi } \rangle }-{\vert {\varphi } \rangle }{\vert {\phi } \rangle }) \end{aligned}$$
(3)

By measuring the output of this quantum circuit, the following probability can be obtained:

$$\begin{aligned} p(0)= & \frac{1}{2}+\frac{1}{2}|{\langle {\varphi } \vert {\phi } \rangle }|^2 \end{aligned}$$
(4)
$$\begin{aligned} p(1)= & \frac{1}{2}-\frac{1}{2}|{\langle {\varphi } \vert {\phi } \rangle }|^2 \end{aligned}$$
(5)

where \({\left| \langle {\varphi }\vert {\phi }\rangle \right| }^2\) represents the fidelity between \(\vert {\phi }\rangle\) and\(\vert {\varphi }\rangle\). p(0) and p(1) represent the probability of obtaining state \(\vert {0}\rangle\) and \(\vert {1}\rangle\) after quantum measurement. As the fidelity increases, the probability of obtaining \(\vert {0}\rangle\) during measurement increases, while the probability of obtaining \(\vert {1}\rangle\) decreases.

Quantum analog to digital conversion algorithm

The data encoding methods can be categorized into two types: analog encoding, where data is stored in the amplitude of quantum states, and digital encoding, where data is stored in strings of qubits. QADC(Quantum analog to digital conversion) algorithm27 is designed for quantum simulation and can convert analog encodings to digital encodings. Since the amplitudes of quantum states are generally complex numbers, the QADC algorithm has proposed three versions in total: absolute-QADC (abs-QADC), real-QADC, and imaginary-QADC. The abs-QADC algorithm transforms the absolute values of amplitudes into digital encoding, the real-QADC algorithm converts the real parts of amplitudes into digital encoding, and the imaginary-QADC algorithm turns the imaginary parts of amplitudes into digital encoding. In this paper, the abs-QADC algorithm is applied. Quantum circuit of the abs-QADC as shown in Fig. 4.

Fig. 4
figure 4

Definition for gate G in abs-QADC27.

Fig. 5
figure 5

Quantum circuit for the quantum cosine function28.

The abs-QADC algorithm employs a three-step process to convert values from analog to digital encoding. The first step involves preparing analog encoded values. The second step is the construction of a specific unitary operator G. Finally, in the third step, the algorithm utilizes the quantum circuit of quantum phase estimation (QPE) to implement the conversion of analog encoded values to digital encoding. \(\vert {add}\rangle \vert {data}\rangle \vert {A}\rangle \vert {B}\rangle\) prepared analog coded values using H gate, CNOT gate, and V. The unitary algorithm G is designed by the abs-QADC algorithm:

$$G = V({\text{CNOT}})S({\text{CNOT}})V^{\dag } Z$$
(6)

where \(V^{\dagger }\) represents the inverse operation of V, \(S=I-2\vert {0}\rangle \langle {0}\vert\), and Z functions to rotate around the z-axis of a qubit by \(180^{\circ }\).

Then add a t-bit quantum register and initialize it to \(\vert {0}\rangle ^{\otimes t}\). Subsequently, the quantum circuit of QPE is applied to convert the analog encoded values into digital encoding, which is then stored in the t-qubit string.

Quantum cosine function

The quantum circuit of the quantum cosine function algorithm uses the value of \(\theta\) to calculate the absolute value of the cosine function \(|cos(\pi \theta ) |\). In this algorithm, \(\theta\) is represented as \(0.v_{n-1} \ldots v_{1}v_{0}\), where \(v_{n-1}, \ldots v_1, v_0\) are binary digits that can be either 0 or 1, and the range of \(\theta\) is between \([0, \frac{1}{2}]\). The approximate value of \(|cos(\pi \theta ) |\) is denoted as \(a_{n}\), which is obtained through an iterative formula as presented in Eq. 7.

$$\begin{aligned} a_0=1; a_{i+1}=\left\{ \begin{aligned} \sqrt{{1+a_i}/2},v_i=0\\ -\sqrt{{1-a_i}/2},v_i=1 \end{aligned} \right. i=0,1,2,\ldots ,n-1 \end{aligned}$$
(7)

In28, a quantum circuit is described that employs an iterative approach to calculate the absolute value of the cosine function \(|cos(\pi \theta ) |\). As shown in Fig. 5, n qubits are first prepared to serve as an angle register, holding each bit of the angle \(\theta = 0.v_{n-1} \ldots v_{1}v_{0}\), where \(v_{n-1}, \ldots v_1, v_0\) are binary digits. Then, another n qubit is prepared as a cosine register \(\vert a_{0} \rangle\) and initialized to \(\vert 01 \rangle \vert 0 \rangle ^{\otimes {n-1}}\) for storing the computational result. By executing the quantum circuit illustrated in Fig. 5, the state of the cosine register becomes \(\vert a_{n} \rangle\). Based on this state, the approximate value of the absolute value of the cosine function \(|cos(\pi \theta ) |\) is obtained as \(a_{n}\). This quantum circuit is defined as the unitary operator COS.

Quantum k-nearest neighbors algorithm.

The kNN algorithm is one of the fundamental algorithms in the field of machine learning, which operates on the principle of determining the category of test sample points by measuring distances between samples. Researchers have proposed several QkNN algorithms utilizing various distance calculations29,30. Although existing QkNN algorithms have surpassed traditional classical kNN algorithms in terms of efficiency and accuracy, there is still room for improvement in the realm of quantum computing, and the potential of QkNN algorithms has not yet been fully exploited. In reference12,13, granular-balls were introduced into the kNN algorithm, proposing granular-ball k-nearest neighbors (GBkNN) algorithm, which improved the efficiency and accuracy of classification.

Classical methods for generating granular-ball

Before introducing the algorithm for generating granular-ball, we will first provide an introduction to the concept of granular-ball and the relevant key parameters. For a granular-ball \(\mathcal {G} = \{ \textbf{x}_i, i = 1,\cdots ,n\}\), where \(\textbf{x}_i\) represents a sample point within the granular-ball, and n represents the number of sample points in the granular-ball, the center C and radius R of the granular-ball can be described as:

$$\begin{aligned} \textbf{C}= & \frac{1}{n}\sum \limits _{i = 1}^n {{\textbf{x}_i}}, \end{aligned}$$
(8)
$$\begin{aligned} R= & \frac{1}{n}\sum \limits _{i = 1}^n \Vert \textbf{x}_i -\textbf{C} \Vert . \end{aligned}$$
(9)

The purity of the granular-ball will be explained below: Taking binary classification as an example, suppose in a dataset with a total of n, the number of major categories is \(n_1\). Then, we define the current purity T of the granular-ball as \(\frac{n_1}{n}\).

The classic granular-ball algorithm13 uses a granular-ball of different sizes to represent the sample space. Learning tasks such as clustering and classification based on granular-ball can enhance efficiency, accuracy, and robustness. The methods for determining the center of granular-balls can adopt different rules, among which the two most effective methods are the 2-means clustering method and splitting to generate granular-balls13. The specific method for generating granular-balls through splitting is as follows: The procedure begins by treating the entire sample space as one granular-ball, randomly selecting a sample point from one category as the center of the granular-ball. Subsequently, sample points from other categories within the granular-ball are chosen as additional granular-ball centers. Next, split and generate a new granular-ball based on the distance between sample points and granular-ball centers. Check whether each granular-ball meets the purity T requirements, and continue splitting for those that do not meet the purity T requirements. Once the purity of all granular-ball reaches the purity threshold, the granular-ball splitting process is complete.

Methodology

In this section, we will introduce two quantum granular-ball generation methods: one based on iterative splitting and the other on fixed splitting numbers. To show the practicality of granular-balls, a simple application of QGBkNN is introduced.

Quantum granular-ball generation based on iterative splitting (Algorithm 1)

Quantum granular-ball generation based on iterative splitting represents the sample space by generating multiple granular-balls through iterative splitting. In the following, an example is presented to expound on the method. Consider a data points set D, each of which is characterized by a label and a 2-dimensional structure. In this instance, we assume that the dimension of the data set is 2, but it is worth noting that our algorithm can handle higher-dimensional data. The numerical values of these samples are confined to the range of \([0, 2^{h}-1]\), where h is a positive integer. Randomly extract a sample point from each data type in D as the granular-ball center, and these sample points form a granular-ball center data set \(C_0\). The remaining data points constitute the data set \(D_0\). When the i-th round of splitting ends, there will be granular-balls that meet the requirements and no longer participate in the splitting process. The remaining data points that needs to be split will form a new data set and a granular-ball center data set, reddenoted by \(D_i\) and \(C_i\), respectively. The steps for each round of iterative splitting are as follows.

The dataset \(D_i\) and the granular-ball center dataset \(C_i\) are encoded into angles. Using these angles, granular-balls are generated through a series of steps: First, the fidelity for each data point between the dataset \(D_i\) and the granular-ball center dataset \(C_i\) is calculated and is analogous to coding. Second, the analog encoded fidelity from this step is projected onto the quantum ground state. Third, the nearest granular-ball center for each data point in the dataset \(C_i\) is determined. Finally, based on the data obtained in the previous steps, new granular-balls are generated.

After the i-th round of granular-ball splitting is completed, d granular-balls are generated. The purity \(P^{i}_1\), \(P^{i}_2\), \(\ldots\), \(P^{i}_d\) of all granular-balls are computed. Based on a predefined purity threshold T, it is determined whether all granular-balls meet the requirement. No further splitting is conducted if all the purity reaches the purity threshold. Otherwise, those that fall short are extracted to the next splitting round. Among these granular-balls that need to continue to be split, randomly select heterogeneous points as the new granular-ball centers. These newly selected granular-ball centers, combined with the previous granular-ball centers in these granular-balls, form a new set of granular-ball center data \(C_{i+1}\). Simultaneously, all data points from granular-balls that can not meet the purity requirement are grouped to constitute a new data set \(D_{i+1}\). Upon the completion of data extraction and preparation, the next round of splitting is initiated. Repeat the above splitting steps till all granular-balls meet the purity requirement. The pseudocode for the quantum granular-ball generation method based on iterative splitting is shown in Algorithm 1.

Algorithm 1
figure a

Quantum granular-ball generation based on iterative splitting

Data preparation

The initial phase of our quantum granular-ball algorithm is data preparation, which maps classical data points onto quantum states. This process involves encoding the data points of the dataset and the granular-ball center dataset as qubit angles. Let \(N_i\) represent the number of data points in dataset \(D_i\) and \(M_i\) and the number of granular-ball centers in dataset \(C_i\). Since our dataset contains two categories, we concentrate on three main issues during data encoding: data preprocessing, angle mapping, and quantum circuit preparation.

Step 1: Data preprocessing The number of data points in real-world databases is rarely a power of 2, conflicting with the quantum encoding methods that assume power-of-2 dataset sizes. To resolve this, we adjust the dataset size by padding with additional sample points to the nearest power of 2 value. For simplicity, padding data points are set to have the same values as the last data point in the dataset. Post supplementation, dataset \(D_i\) contains N data points, and dataset \(C_i\) contains M granular-ball centers.

Step 2: Angle mapping Data are then mapped to rotation angles on the Bloch sphere. Given a data value \(x=x_{h-1}\ldots x_1 x_0\) with \(x_{h-1}\) as the most significant bit, the corresponding rotation angle \(\theta _{r}\) is calculated by:

$$\begin{aligned} \theta _{r}=\pi \left( \frac{1}{2^1} \times x_{h-1}+\ldots +\frac{1}{2^{h-1}} \times x_{1}+\frac{1}{2^{h}} \times x_{0}\right) \end{aligned}$$
(10)

Note that \(\theta _{r}\) is directly proportional to x and will not exceed \(\pi\).

Step 3: Data preparation quantum circuit Both the dataset and the granular-ball center set consist of 2-dimensional points. We first introduce the data preparation for the first dimension of each dataset, followed by a combined approach for both.

The process of preparing the first dimension of the dataset into rotation angles is detailed in three stages, as depicted in Fig. 6a.

In the first stage, we use Quantum Random Access Memory (QRAM)25 to prepare data points on the ground state of the bus register, represented by the unitary operator Q. The data from dataset \(D_i\) is stored in a superposition state within \(|bus\rangle\), and this also completes the entanglement between the address register \(\vert add_D\rangle\) and the bus register \(\vert {bus}\rangle\).

The second stage employs CRY gates to encode data point from the bus register into angular qubits, with an additional angle register \(\vert ang_{D0} \rangle\) for rotation angles between 0 and \(\pi\). Using the bus register as control bits, the angles of qubit \(\vert {a}\rangle\) are encoded via CRY gates, resulting in:

$$\theta _{r} = \left( {\frac{1}{{2^{1} }} \times \left| {{\text{bus}}_{{h - 1}} } \right\rangle + \ldots + \frac{1}{{2^{{h - 1}} }} \times \left| {{\text{bus}}_{1} } \right\rangle + \frac{1}{{2^{h} }} \times \left| {{\text{bus}}_{0} } \right\rangle } \right)\pi {\text{ }}$$
(11)

The third stage reverts \(\vert {route}\rangle\) and \(\vert bus \rangle\) to their initial states using the inverse QRAM operation, allowing for the reuse of subsequent QRAM circuits.

The complete process results in the data set addresses being maintained in a superposition state within \(\vert {add_D}\rangle\), and the data values encoded in the y-axis angles of \(\vert ang_{D0} \rangle\). This entire quantum circuit is referred to as a unitary transformation \(W_D\).

Preparing the granular-ball center dataset \(C_i\) follows a similar process, as shown in Fig. 6b. This entire quantum circuit is referred to as a unitary transformation \(V_C\).

Finally, by utilizing the unitary operators \(W_D\) and \(V_C\), the preparation of all data points for both the dataset and the granular-ball center dataset is completed, as shown in Fig. 6c, and is represented as follows:

$$\left| {\psi _{a} } \right\rangle = \left| i \right\rangle _{{{\text{add}}_{D} }} \left| j \right\rangle _{{{\text{add}}_{C} }} \left| \phi \right\rangle _{{{\text{ang}}_{D} }} \left| \varphi \right\rangle _{{{\text{ang}}_{C} }}$$
(12)

where \(\left| i \right\rangle _{{{\text{add}}_{D} }}\) and \(\vert {j}\rangle _{add_C}\) are the stored addresses, and \(\vert \phi \rangle\) and \(\vert \varphi \rangle\) represent the encoded data in the angle registers \(\vert {ang_D}\rangle\) and \(\vert {ang_C}\rangle\) respectively.

Fig. 6
figure 6

Quantum circuit for data preparation. a Preparing data points from the first dimension of the dataset, which we define as the unitary operator \(W_D\). b Preparing data points from the first dimension of the granular-ball center dataset, which we define as the unitary operator \(V_C\). c Preparing the dataset and granular-ball center dataset.

Generating new granular-balls

The generation of new granular-balls integrates concepts of fidelity calculation, fidelity transfer to the ground state, and a quantum minimum value search algorithm. The aim is to assign each data point to the nearest granular-ball center, forming new granular-balls.

Step 1: Fidelity calculation. The swap test26 Measures data-point to granular-ball center distance via fidelity between \(\vert ang_D \rangle\) and \(\vert ang_C \rangle\). Higher fidelity means closer distance.

In the quantum circuit depicted in Fig. 7, the control bit \(\vert B \rangle\) becomes a superposition state after an H gate operation.

Fig. 7
figure 7

Quantum circuit for computing fidelity.

The CSWAP gates, controlled by \(\vert B \rangle\), perform SWAP operations on \(\vert ang_{D0} \rangle\) with \(\vert ang_{C0} \rangle\) and \(\vert ang_{D1} \rangle\) with \(\vert ang_{C1} \rangle\). The circuit is defined as a unitary transformation U. \(\vert \phi \rangle = \vert \phi \rangle _{ang_D}, \vert \varphi \rangle = \vert \varphi \rangle _{ang_C}\) The quantum state after the completion of the quantum circuit in Fig. 7 is as follows:

$$\begin{aligned} \begin{aligned}&\vert \psi \rangle = \frac{1}{2} ( \left( \vert \phi \rangle \vert \varphi \rangle + \vert \varphi \rangle \vert \phi \rangle \right) \vert 0 \rangle _B + \left( \vert \phi \rangle \vert \varphi \rangle - \vert \varphi \rangle \vert \phi \rangle \right) \vert 1 \rangle _B ) \\ \end{aligned} \end{aligned}$$
(13)

The relationship between \(\theta\) and fidelity is defined as \(\cos (\pi \theta ) = \sqrt{\frac{1}{2}(1 - |\langle \varphi \vert \phi \rangle |^2)}\), with \(\theta\) ranging from \(\frac{1}{4}\) to \(\frac{1}{2}\). The simplified quantum state \(\vert \psi \rangle\) is as follows:

$$\begin{aligned} \begin{aligned}&\vert {\psi _{\pm }}\rangle =\frac{1}{2}(\frac{\vert {\phi }\rangle \vert {\varphi }\rangle +\vert {\varphi }\rangle \vert {\phi }\rangle )\vert {0}\rangle _B}{\sqrt{1+{|\langle {\varphi } \vert {\phi } \rangle |}^2}}\pm \frac{\vert {\phi }\rangle \vert {\varphi }\rangle -\vert {\varphi }\rangle \vert {\phi }\rangle )\vert {1}\rangle _B}{\sqrt{1-{|\langle {\varphi } \vert {\phi } \rangle |}^2}})\\&\vert \psi \rangle = \frac{-i}{\sqrt{2}} \left( e^{i\pi \theta } \vert \psi _+ \rangle - e^{-i\pi \theta } \vert \psi _- \rangle \right) \\ \end{aligned} \end{aligned}$$
(14)

A larger value of \(\theta\) results in a smaller \(\cos (\pi \theta )\), corresponding to higher fidelity and closer proximity to granular-ball centers. Fidelity can be determined through quantum measurements on the registers, aiding in identifying the nearest center for each data point. However, this process is resource-intensive. To mitigate this, fidelity information is transferred to the ground state, which requires minimal quantum measurements for information retrieval.

Step 2: Fidelity transfer to the ground state. Using the absolute-quantum analog to digital conversion (abs-QADC) algorithm27 to convert the angle \(\theta\) from analog to digital encoding, we first need to construct the unitary operator G. This operator G is constructed using the quantum circuit for data preparation and fidelity calculation as depicted in Fig. 8a. Once constructed, the unitary operator is \(G=UVWSW^{\dagger }V^{\dagger }\) ,as shown in Fig. 8b.

Fig. 8
figure 8

Quantum circuit for constructing the unitary operator G. a W preps the data set, V preps granular-ball center data, U calculates fidelity. b The circuit includes the inverse operations \(U^\dagger\) for U, \(V^\dagger\) for V, and \(W^\dagger\) for W. Additionally, the function of Z is to rotate \(180^{\circ }\) around the z-axis of qubit \(\vert {B}\rangle\).

As shown in the red box in Fig. 9, a t-qubit \(\vert dig \rangle\) register is added and initialized to 0 to store the digitally encoded \(\theta\). Utilizing the quantum state \(\vert \psi \rangle\), the register \(\vert dig \rangle\), and the unitary operator G, the abs-QADC quantum circuit is designed. The quantum output state of the circuit is as follows:

$$\begin{aligned} \vert { \psi }\rangle \vert {0}\rangle _t \xrightarrow {QADC} \vert {i}\rangle \vert {j}\rangle \frac{-i}{\sqrt{2}}(e^{i\pi \theta }\vert {\theta }\rangle \vert {\psi _+}\rangle -e^{-i\pi \theta }\vert {1-\theta }\rangle \vert {\psi _-}\rangle ) \end{aligned}$$
(15)

Next, we utilize the quantum cosine function algorithm28 to compute \(\cos (\pi \theta )\) with the \(\theta\) from equation 15. As shown in Fig. 9, an additional register \(\vert fidelity \rangle\) of r qubits is introduced to store the computation result \(\cos (\pi \theta )\). Using the quantum circuit COS of the quantum cosine function algorithm, the result obtained is \(\vert {i}\rangle _{add_D} \vert {j}\rangle _{add_C} \vert \cos (\pi \theta )\rangle _{fidelity}\), where \(\vert i\rangle _{add_D}\) and \(\vert j\rangle _{add_C}\) represent the addresses of the data set and the granular-ball center data set, respectively, and \(\vert \cos (\pi \theta )\rangle _{fidelity}\) indicates the digitally encoded form of \(\cos (\pi \theta )\).

Fig. 9
figure 9

Quantum circuit for obtaining the digitally encoded \(cos(\pi \theta )\) and finding the nearest granular-ball center for each data point.

Step 3: Generate new granular-balls. By identifying the minimum \(\cos (\pi \theta )\) entangled with i, we can determine the nearest quantum granular-ball center for each data point, thereby achieving the generation of granular-balls. We have adopted the quantum minimum value search algorithm proposed in reference31 to design the corresponding quantum circuit, as shown in Fig. 9. We’ve added two new registers: the \(\vert y \rangle\), storing the initial threshold for the quantum minimum search, set to 0. The MIN in Fig. 9 represents the quantum minimum value search algorithm itself, ensuring that whenever the value in the \(\vert y \rangle\) register is updated, the address from the \(\vert add_D \rangle\) register, entangled with \(\vert y \rangle\), is copied to the \(\vert closest \rangle\) register. Another register, \(\vert closest \rangle\), stores the addresses of the nearest granular-ball centers with m qubits. After completing the quantum circuit, the final state is as follows:

$$\begin{aligned} \vert out \rangle =\vert i \rangle _{add_D} \vert closest_j \rangle _{closest} \end{aligned}$$
(16)

where \(\vert {i}\rangle _{add_D}\) represents the address i of the data set stored in a superposition state within the address register \(\vert add_D \rangle\); \(\vert {closest_j}\rangle _{closest}\) represents the address \(closest_j\) of the nearest granular-ball centers for each data point i. In this way, each data point in the data set \(D_i\) successfully finds its nearest granular-ball center.

Once all data points have found their nearest granular-ball centers, these centers and their corresponding data points can be grouped to form new granular-balls, completing one round of granular-ball generation. Subsequently, in classical computation, the purity of each granular-ball is calculated and compared with a preset purity threshold. Data points that do not meet the purity requirements will enter the next round of the loop until all granular-balls satisfy the purity requirements, thus completing the generation of all granular-balls.

Quantum granular-ball generation based on fixed splitting number (Algorithm 2)

The iterative splitting method can generate granular-balls that are suitable for the sample distribution and offer advantages in time complexity compared to classical methods. However, the hybrid model of quantum and classical and the multiple iterations required during the granular-ball splitting process lead to significant consumption of quantum resources. To overcome this issue, a new quantum granular-ball generation method based on a fixed number of splits is proposed in this paper, which can generate a fixed number of granular-balls for the entire sample space in a single operation. Moreover, it is a fully quantized process after data preprocessing. For N samples, the number of clusters produced by clustering is less than \(c_{max}\). A rule of thumb that many investigators use is \(c_{max} \le \sqrt{N}\)32,33. Reference34 has verified the correctness of this rule. We have integrated the concept of this rule to determine the number of splits to be \(\sqrt{N}\), ultimately resulting in \(\sqrt{N}\) granular-balls. Implementing the quantum granular-ball generation method that uses a fixed number of splits follows the same process as a single splitting round in the iterative splitting method.

Each of these quantum granular-ball generation methods has its own set of strengths and weaknesses. Algorithm 1 is notable for its ability to generate granular-balls that are well-adapted to the data distribution, which is a significant advantage, but it also requires multiple iterations and conditional judgments, potentially leading to a more complex process.

Algorithm 2
figure b

Quantum granular-ball generation by fixed splitting

In contrast, Algorithm 2, due to its fixed number of splits, is capable of rapidly and simultaneously splitting all granular-balls. It can achieve full quantization without the need for multiple switches between classical and quantum computations. However, the downside is that it may produce an inappropriate number of granular-balls, The pseudocode for the quantum granular-ball generation method based on a fixed splitting number is shown in Algorithm 2.

Quantum k-nearest neighbors algorithm based on granular-balls

The current QkNN enhances efficiency by leveraging quantum entanglement and quantum parallelism. However, the efficiency of both kNN and QkNN is constrained by large-scale data sets. To address this issue, this paper uses granular-balls to represent large data sets for implementing the kNN. The core concept of QGBkNN is to apply the quantum granular-ball generation method to the training set, using granular-balls to represent the entire sample space, allowing test samples with unknown labels to determine their category based on the category of the nearest granular-ball center. Since each granular-ball represents multiple samples with the same label from the training set, it is sufficient to set k to 1. The algorithm consists of two stages:

  1. (1)

    Training: generate the quantum granular-balls, and replace the original training set with the generated granular-balls.

  2. (2)

    Testing: Assign the test sample’s category based on its nearest granular-ball center.

Suppose a training set of N samples and a test sample. The dimensions of the training set and the sample dimensions of the test sample points are both 2, and the value range is \([0, 2^h-1]\). Taking the training set and test sample as examples, the two steps of QGBkNN are analyzed as follows.

The first step involves converting the sample points in the training set into granular-balls and using the granular-ball centers as the new training set E. The number of granular-balls generated by the quantum granular-ball generation method based on iterative splitting is uncertain, while quantum granular-ball generation based on a fixed splitting number generates \(\sqrt{N}\) granular-balls. The second step involves determining the test sample determining its category based on the nearest granular-ball center. This process is similar to the concept of finding the nearest granular-ball center and includes the following steps: first, encoding the training set E and the test sample point into angles; then, calculating the fidelity between the test sample point c and the training set E; transferring the analog encoded fidelity to the ground state; and finally, identifying the most similar sample point to the test sample point within the training set. It is worth noting that the QGBkNN based on Algorithm 2 needs to add another step to further increase its classification accuracy, which is that after the measurement, the label of the majority of the selected granular-ball is taken as the label of the test sample point. The pseudocode for the Quantum k-nearest neighbors algorithm based on granular-balls is shown in Algorithm 3.

Algorithm 3
figure c

Quantum granular-ball nearest neighbour classification

Performance analysis

This subsection aims to conduct a comparative analysis of the time complexity between the two quantum granular-ball generation methods developed and their corresponding classical counterparts. Subsequently, a performance comparison will be made between the proposed QGBkNN algorithm and other kNN algorithms.

To make the explanation clearer, we provide the following clarification. The time complexity we evaluate is the depth of the quantum circuit of the quantum algorithm, and both the quantum circuit depth of the algorithm and the time complexity assessment are related to the steps of the algorithm’s execution. Therefore, we directly equate the quantum circuit depth of the algorithm with its time complexity for description to achieve uniformity and facilitate performance comparison with subsequent algorithms.

Time complexity analysis of quantum granular-ball generation methods

Time complexity analysis of Algorithm 1: During one splitting round, the data preparation time complexity is \(O(\log {MN})\), where M is the number of granular-balls and N is the number of samples. The time complexity of the abs-QADC is related to the number of bits to be estimated, and it is \(O((\log ^{2}{MN})/2^{-t})\). Due to the limited number of estimated bits, we simplify it to \(O(\log ^{2}{MN})\). The time complexity of the quantum cosine function quantum circuit is not dependent on the data set or dimensionality, so it is O(1). The time complexity of finding the nearest granular-ball center is \(O(\sqrt{M}\log ^{2}{MN})\). The overall time complexity for one round of splitting is \(O(\log {MN}+\log ^{2}{MN}+(\log ^{2}{MN})\sqrt{M})=O(\sqrt{M}\log ^{2}{MN})\). Assuming a total of d iterations, the total time complexity is \(O(d(\log ^{2}{MN})\sqrt{M})\).

Time complexity analysis of Algorithm 2: The number of data points is N, and the number of granular-ball center data points is \(\sqrt{N}\). The data preparation time complexity is \(O(\log {N^{\frac{3}{2}}})=O(\log N)\). The time complexity for fidelity transfer to the ground state is \(O(\log ^{2}{N})\). The time complexity for finding the nearest granular-ball center is \(O((\log ^{2}{N})N^\frac{1}{4})\). The overall time complexity is \(O(\log {N}+\log ^{2}{N}+(\log ^{2}{N})N^\frac{1}{4})=O((\log ^{2}{N})N^\frac{1}{4})\).

Table 1 compares the time complexity of two quantum granular-ball generation algorithms with that of the classical granular-ball generation algorithm. Currently, the time complexity of the classical granular-ball generation algorithm is O(mdN), where m is the number of granular-balls. Algorithm 1 reduces the time complexity to \(O(\sqrt{M}\log ^{2}{MN})\), with M being the number of data points in the granular-ball center dataset, which is significantly less than N. Algorithm 2 further eliminates the iterative process, thereby reducing the time complexity to \(O((\log ^{2}{N})N^\frac{1}{4})\).

Table 1 Comparisons of the time complexity of quantum granular-ball generation methods.

Comparison of performance between QGBkNN and other kNN algorithms

Regarding time complexity, as shown in Tab. 2, QGBkNN proposed in this paper reduces the time complexity compared to GBkNN13, and QkNN.

Table 2 Performance analysis of quantum kNN variants.

Experiment

Generation of quantum granular-ball

Quantum circuits for data points preparation and fidelity calculation were executed on a quantum simulator to more intuitively observe the granular-ball generation process. Meanwhile, simulations were performed on a classical computer for phase estimation and the subsequent quantum circuits. Taking the data set [[1, 2], [1, 3], [2, 4], [3, 6], [4, 7], [5, 5], [6, 6], [7, 1]] and granular-ball center data set [[1, 1], [6, 7]] as an example. The probability histogram obtained from the execution on the quantum simulator is shown in Fig. 10.

Following quantum measurements, the probability histogram reflects the similarity between the data points and the central data point in each granular-ball. Let us take the first two data points, “0 000 0”, and “0 000 1”, as examples. From left to right, the first position, “0,” represents the fidelity storage location. Positions 2 to 4, “000”, represent the first data address in the data set. The last position, “0”, represents the address of the first data point in the granular-ball center data set. The second data point’s last position, “1”, represents its address in the granular-ball center data set. The probability obtained from measuring the first data point is greater than that obtained from measuring the second data point. Therefore, the first data point in the data set is more similar to that in the granular-ball center data set.

Fig. 10
figure 10

The probability distribution histogram of the fidelity between the data set and the granular-ball center data set. The horizontal axis represents the address of the granular-ball center data set \(\vert add_C \rangle\), the data set address \(\vert add_D \rangle\), and the fidelity registers \(\vert B \rangle\). The vertical axis represents the probability obtained from the measurement of each quantum state.

The process of generating granular-balls for a data set through the iterative splitting method, as shown in Fig. 11. We set the purity threshold T to 0.8 throughout the experimental process. Taking Fig. 11 as an example, a plot is generated after each splitting operation. Since the purity of the granular-balls after the first split does not meet the requirements, the splitting process continues. After the second split, all granular-balls meet the required purity, making the plot after the second split the final result of granular-ball splitting.

Fig. 11
figure 11

Visualizing the quantum granular-ball generation algorithm results with iterative splitting.

We conducted two experiments on the dataset in Fig. 12a based on the fixed splitting method. The experimental results are shown in Fig. 12b and c, respectively.

In the GBkNN algorithm, the result of the test sample point selecting the granular-ball center point is shown in Fig. 13. The test sample point in Fig. 13 is the green sample point, which ultimately selects the category represented by the black point as its category. Note that the circles in these figures are only graphical indications of the granular-balls, with centers being the granular-ball centers and radius being the average distance from all sample points in the granular-ball to these centers.

Fig. 12
figure 12

Visualizing the quantum granular-ball generation algorithm results based on fixed splitting.

Fig. 13
figure 13

Visualizing the quantum granular-ball generation algorithm results based on fixed splitting.

KNN based on quantum granular-ball

Before presenting the accuracy of the KNN algorithm, we will introduce the experimental data set and simulation environment of the KNN experiment. For the experimental data set, we used the iris dataset and selected the petal length and width as the features, with k set to 3. For the simulation environment, we used the IBM qasm_simulator that was still available in 2024 for the simulation experiment and added noise, such as bit flipping, that exists in real quantum computers on this basis.

Table 3 Accuracy analysis of kNN.

In this section, we will introduce the experiment of KNN based on quantum granular-ball. In Tab. 3, the accuracy is defined as

$$\begin{aligned} \begin{aligned} accuarcy=\frac{R}{R+W} \end{aligned} \end{aligned}$$
(17)

where R is the number of correctly classified instances in the fold, and W is the number of wrongly classified instances. Using the dataset of Iris to analyze, the QGBkNN based on iterative splitting has an accuracy of 0.942 with a standard deviation of 0.0466, and the one based on a fixed number of splittings has an accuracy of 0.912 with a standard deviation of 0.0924. These results indicate that QGBkNN can maintain good accuracy while reducing time complexity, regardless of the method used.

Conclusion and discussion

This paper proposes two granular-ball generation methods that significantly improve the efficiency of granular-ball generation. To the best of our knowledge, it is the first time that quantum mechanics has been combined with granular-balls generation, overcoming the considerable time consumption in distance calculations between points. Both quantum granular-ball generation algorithms achieve significant time complexity reduction, leading to further complexity reduction in the proposed QGBkNN algorithm. Applying granular-balls generated through quantum computation provides significant advantages when integrated into practical algorithms. For example, the time complexity of Locality Sensitive Hashing (LSH) mainly consists of two parts: index construction and single query. By applying the quantum granular-ball optimization for data preprocessing (i.e., data encoding) to the index construction of LSH, the query time complexity of the algorithm can be reduced to the logm level (where m is the number of granular-ball and is much smaller than the original dataset size).

Unlike conventional data preparation, our quantum computing design does not solely rely on QRAM or angle encoding; instead, our design combines the advantages of both approaches. We have also addressed a previously overlooked issue: a data padding scheme is proposed when the number of data points in the data set is not a power of 2.

In summary, leveraging quantum computing’s characteristics allows us to address the excessive time complexity present in machine learning algorithms. In the future, we plan to further optimize quantum granular-ball generation and employ them in more machine learning algorithms to achieve higher efficiency and performance.