Abstract
Granular-balls reduce the data volume and enhance the efficiency of fundamental algorithms such as clustering and classification. However, generating granular-balls is a time-consuming process, posing a significant bottleneck for the practical application of granular-balls. In this paper, we propose two innovative quantum granular-ball generation methods that capitalize on the inherent properties of quantum computing. The first method employs an iterative splitting technique, while the second utilizes a predetermined number of splits. The iterative splitting method significantly reduces time complexity compared to existing classical granular-ball generation methods. Notably, the method employing a fixed number of splits delivers a substantial quadratic acceleration over the iterative technique. Moreover, we also propose a quantum k-nearest neighbors algorithm based on granular-balls (QGBkNN) and empirically show the effectiveness of our approach.
Similar content being viewed by others
Introduction
Quantum computing has gained significant attention due to its potential to solve complex problems that classical computers cannot handle by exploiting the laws of quantum mechanics1. Over the past thirty years, researchers have made substantial progress in the theoretical and experimental domains of quantum computing. Quantum algorithms have been simulated on both quantum simulators and real quantum computers, indicating a shift from purely theoretical research to practical applications.
The combination of quantum computing and machine learning is one of the most promising applications of quantum technology2. Some classical machine learning algorithms have already been optimized with quantum computing, such as quantum support vector machines3,4,5, quantum k-means clustering6,7, quantum neural networks8,9, and so on.
The complexity of most machine learning algorithms is related to the number of data points. Hence, the large number of samples in the data set limits the efficiency of machine learning algorithms. However, human cognition prioritizes large areas, i.e., humans tend to first glance at the broader scope and then focus on the narrower aspects10. Based on this cognitive characteristic, Zadeh proposed “Granular-ball Computing”11. Later, Xia et al. proposed the “Granular-ball Computing” model, which replaces individual samples in the data set with “granular-balls” to represent the overall characteristics of the data set12. They also proposed various granular-ball generation methods to enhance the efficiency, accuracy, and robustness of classification algorithms13,14,15. However, in these granular-ball generation algorithms, each sample point must perform distance calculations with all other sample points multiple times to determine which granular-ball center it belongs to. This results in the algorithm’s time complexity being proportional to the number of sample points in the data set. For large data sets, it will consume very high resources. Unlike previous works on granular-ball generation methods, this paper fully exploits quantum parallelism and designs two methods for generating granular-balls.
Granular-balls have also been applied to SVM, rough set, and reinforcement learning methods, significantly enhancing the efficiency of existing approaches16,17,18,19,20,21,22. Thus, we can postulate that the quantum granular balls can also be used to accelerate these machine-learning algorithms.
In this paper, our contributions are as follows:
-
(1)
We make the first attempt to utilize quantum computing to accelerate the process of granular-ball generation.
-
(2)
We apply the generated granular-ball to the quantum k-nearest neighbors algorithm (QkNN), demonstrating the effectiveness of the quantum granular-ball generation method.
-
(3)
Time complexity analysis is conducted, indicating that our granular-ball generation method is more efficient than the current ones. Additionally, our quantum k-nearest neighbors algorithm based on granular-balls (QGBkNN) is also proved to be more efficient than kNN and QkNN algorithms.
Preliminaries
Foundations for quantum computing
QRAM
QRAM (Quantum Random Access Memory) is a type of quantum memory that enables random access in quantum computing by utilizing the properties of quantum superposition and entanglement23. The storage method of QRAM is to store the address and data points in the way of quantum entanglement, and the specific encoding process is shown in Eq. 1. Addresses are stored in the address register in the left superposition state \(\sum \limits _{j} \psi _j{\vert j \rangle }_a\), where \({\vert {j}\rangle }_a\) represents the jth memory cell. After being stored in QRAM, data points are encoded into the data register. \({\vert {D_j}\rangle }_a\) represents the jth data point, and the specific value is D. \(\ psi_j\) is the probability amplitude for each quantum state.
To express more clearly, we make the following explanation. \(\sum\) means the summation symbol. In order to encode data point based on the aforementioned encoding method, two different architectures have been implemented: one is called the fanout architecture, and the other is the bucket-brigade architecture. The fanout architecture exhibits low resource consumption in classical computing but is relatively expensive to implement in quantum computing24. The bucket-brigade architecture is better suited for large-scale data points and can reduce exponential data access efficiency to logarithmic levels. Relatively speaking, the bucket-brigade architecture is the most commonly used architecture for QRAM23.
The quantum circuit of the bucket-brigade architecture is designed in25. Quantum circuit for storing four data points as shown in Fig. 1. In this circuit, from top to bottom, \(\vert {a_1a_0}\rangle\) represents the address qubits. Next are four routing qubits who are initialized to \(\vert {0010}\rangle\). Then there are four memory qubits \(\vert {m_3m_2m_1m_0}\rangle\), which store four data points. The last one is a bus qubit initialized to \(\vert {0}\rangle\). The bucket-brigade architecture is divided into three stages. First, the controlled-NOT (CNOT) gate and the Toffoli gate are used to map the address qubits to the routing qubits one-to-one. Then, the Toffoli gate transfers the values from the memory qubits to the bus qubits, establishing entanglement between the address qubits and the bus qubits. Finally, decoupling the address qubits from the routing qubits is necessary.
Quantum circuit for bucket-brigade architecture25.
Angle encoding
Angular encoding is a more effective data encoding method for quantum algorithms that do not require arithmetic operations. Angle encoding utilizes the rotation angles of the rotation X (RX), rotation Y (RY), and rotation Z (RZ) gates to encode classical information. These three rotation gates transform data points into the relative phase angles of qubits, that is, map the data to a point on the Bloch sphere. As shown in Fig. 2, taking the controlled RY (CRY) gate as an example, \(\vert {a}\rangle\) is the control bit for the CRY gate. The target qubit below is initialized to \(\vert {0}\rangle\). The quantum circuit can accomplish the following task:
Quantum circuit for CRY gate. To achieve angular rotation of one qubit controlled by another qubit, the upper serves as the control bit, and the lower is the target bit initialized to \(\vert {0}\rangle\). \(RY(\theta )\) signifies a rotation of \(\theta\) degrees around the y-axis on the Bloch sphere.
Swap test
The swap test26 is commonly used to estimate the fidelity of two quantum states, which refers to the squared inner product of the two states. Quantum circuit of the swap test as shown in Fig. 3, in which \(\vert {\phi }\rangle\) and \(\vert {\varphi }\rangle\) represent the two quantum states whose fidelity is to be measured. An additional qubit is added, initialized to \(\vert {0}\rangle\), and is used as the control register for the controlled swap (CSWAP) gate.
Quantum circuit for swap test26.
The input and output of Fig. 3 can be expressed as follows:
By measuring the output of this quantum circuit, the following probability can be obtained:
where \({\left| \langle {\varphi }\vert {\phi }\rangle \right| }^2\) represents the fidelity between \(\vert {\phi }\rangle\) and\(\vert {\varphi }\rangle\). p(0) and p(1) represent the probability of obtaining state \(\vert {0}\rangle\) and \(\vert {1}\rangle\) after quantum measurement. As the fidelity increases, the probability of obtaining \(\vert {0}\rangle\) during measurement increases, while the probability of obtaining \(\vert {1}\rangle\) decreases.
Quantum analog to digital conversion algorithm
The data encoding methods can be categorized into two types: analog encoding, where data is stored in the amplitude of quantum states, and digital encoding, where data is stored in strings of qubits. QADC(Quantum analog to digital conversion) algorithm27 is designed for quantum simulation and can convert analog encodings to digital encodings. Since the amplitudes of quantum states are generally complex numbers, the QADC algorithm has proposed three versions in total: absolute-QADC (abs-QADC), real-QADC, and imaginary-QADC. The abs-QADC algorithm transforms the absolute values of amplitudes into digital encoding, the real-QADC algorithm converts the real parts of amplitudes into digital encoding, and the imaginary-QADC algorithm turns the imaginary parts of amplitudes into digital encoding. In this paper, the abs-QADC algorithm is applied. Quantum circuit of the abs-QADC as shown in Fig. 4.
Definition for gate G in abs-QADC27.
Quantum circuit for the quantum cosine function28.
The abs-QADC algorithm employs a three-step process to convert values from analog to digital encoding. The first step involves preparing analog encoded values. The second step is the construction of a specific unitary operator G. Finally, in the third step, the algorithm utilizes the quantum circuit of quantum phase estimation (QPE) to implement the conversion of analog encoded values to digital encoding. \(\vert {add}\rangle \vert {data}\rangle \vert {A}\rangle \vert {B}\rangle\) prepared analog coded values using H gate, CNOT gate, and V. The unitary algorithm G is designed by the abs-QADC algorithm:
where \(V^{\dagger }\) represents the inverse operation of V, \(S=I-2\vert {0}\rangle \langle {0}\vert\), and Z functions to rotate around the z-axis of a qubit by \(180^{\circ }\).
Then add a t-bit quantum register and initialize it to \(\vert {0}\rangle ^{\otimes t}\). Subsequently, the quantum circuit of QPE is applied to convert the analog encoded values into digital encoding, which is then stored in the t-qubit string.
Quantum cosine function
The quantum circuit of the quantum cosine function algorithm uses the value of \(\theta\) to calculate the absolute value of the cosine function \(|cos(\pi \theta ) |\). In this algorithm, \(\theta\) is represented as \(0.v_{n-1} \ldots v_{1}v_{0}\), where \(v_{n-1}, \ldots v_1, v_0\) are binary digits that can be either 0 or 1, and the range of \(\theta\) is between \([0, \frac{1}{2}]\). The approximate value of \(|cos(\pi \theta ) |\) is denoted as \(a_{n}\), which is obtained through an iterative formula as presented in Eq. 7.
In28, a quantum circuit is described that employs an iterative approach to calculate the absolute value of the cosine function \(|cos(\pi \theta ) |\). As shown in Fig. 5, n qubits are first prepared to serve as an angle register, holding each bit of the angle \(\theta = 0.v_{n-1} \ldots v_{1}v_{0}\), where \(v_{n-1}, \ldots v_1, v_0\) are binary digits. Then, another n qubit is prepared as a cosine register \(\vert a_{0} \rangle\) and initialized to \(\vert 01 \rangle \vert 0 \rangle ^{\otimes {n-1}}\) for storing the computational result. By executing the quantum circuit illustrated in Fig. 5, the state of the cosine register becomes \(\vert a_{n} \rangle\). Based on this state, the approximate value of the absolute value of the cosine function \(|cos(\pi \theta ) |\) is obtained as \(a_{n}\). This quantum circuit is defined as the unitary operator COS.
Quantum k-nearest neighbors algorithm.
The kNN algorithm is one of the fundamental algorithms in the field of machine learning, which operates on the principle of determining the category of test sample points by measuring distances between samples. Researchers have proposed several QkNN algorithms utilizing various distance calculations29,30. Although existing QkNN algorithms have surpassed traditional classical kNN algorithms in terms of efficiency and accuracy, there is still room for improvement in the realm of quantum computing, and the potential of QkNN algorithms has not yet been fully exploited. In reference12,13, granular-balls were introduced into the kNN algorithm, proposing granular-ball k-nearest neighbors (GBkNN) algorithm, which improved the efficiency and accuracy of classification.
Classical methods for generating granular-ball
Before introducing the algorithm for generating granular-ball, we will first provide an introduction to the concept of granular-ball and the relevant key parameters. For a granular-ball \(\mathcal {G} = \{ \textbf{x}_i, i = 1,\cdots ,n\}\), where \(\textbf{x}_i\) represents a sample point within the granular-ball, and n represents the number of sample points in the granular-ball, the center C and radius R of the granular-ball can be described as:
The purity of the granular-ball will be explained below: Taking binary classification as an example, suppose in a dataset with a total of n, the number of major categories is \(n_1\). Then, we define the current purity T of the granular-ball as \(\frac{n_1}{n}\).
The classic granular-ball algorithm13 uses a granular-ball of different sizes to represent the sample space. Learning tasks such as clustering and classification based on granular-ball can enhance efficiency, accuracy, and robustness. The methods for determining the center of granular-balls can adopt different rules, among which the two most effective methods are the 2-means clustering method and splitting to generate granular-balls13. The specific method for generating granular-balls through splitting is as follows: The procedure begins by treating the entire sample space as one granular-ball, randomly selecting a sample point from one category as the center of the granular-ball. Subsequently, sample points from other categories within the granular-ball are chosen as additional granular-ball centers. Next, split and generate a new granular-ball based on the distance between sample points and granular-ball centers. Check whether each granular-ball meets the purity T requirements, and continue splitting for those that do not meet the purity T requirements. Once the purity of all granular-ball reaches the purity threshold, the granular-ball splitting process is complete.
Methodology
In this section, we will introduce two quantum granular-ball generation methods: one based on iterative splitting and the other on fixed splitting numbers. To show the practicality of granular-balls, a simple application of QGBkNN is introduced.
Quantum granular-ball generation based on iterative splitting (Algorithm 1)
Quantum granular-ball generation based on iterative splitting represents the sample space by generating multiple granular-balls through iterative splitting. In the following, an example is presented to expound on the method. Consider a data points set D, each of which is characterized by a label and a 2-dimensional structure. In this instance, we assume that the dimension of the data set is 2, but it is worth noting that our algorithm can handle higher-dimensional data. The numerical values of these samples are confined to the range of \([0, 2^{h}-1]\), where h is a positive integer. Randomly extract a sample point from each data type in D as the granular-ball center, and these sample points form a granular-ball center data set \(C_0\). The remaining data points constitute the data set \(D_0\). When the i-th round of splitting ends, there will be granular-balls that meet the requirements and no longer participate in the splitting process. The remaining data points that needs to be split will form a new data set and a granular-ball center data set, reddenoted by \(D_i\) and \(C_i\), respectively. The steps for each round of iterative splitting are as follows.
The dataset \(D_i\) and the granular-ball center dataset \(C_i\) are encoded into angles. Using these angles, granular-balls are generated through a series of steps: First, the fidelity for each data point between the dataset \(D_i\) and the granular-ball center dataset \(C_i\) is calculated and is analogous to coding. Second, the analog encoded fidelity from this step is projected onto the quantum ground state. Third, the nearest granular-ball center for each data point in the dataset \(C_i\) is determined. Finally, based on the data obtained in the previous steps, new granular-balls are generated.
After the i-th round of granular-ball splitting is completed, d granular-balls are generated. The purity \(P^{i}_1\), \(P^{i}_2\), \(\ldots\), \(P^{i}_d\) of all granular-balls are computed. Based on a predefined purity threshold T, it is determined whether all granular-balls meet the requirement. No further splitting is conducted if all the purity reaches the purity threshold. Otherwise, those that fall short are extracted to the next splitting round. Among these granular-balls that need to continue to be split, randomly select heterogeneous points as the new granular-ball centers. These newly selected granular-ball centers, combined with the previous granular-ball centers in these granular-balls, form a new set of granular-ball center data \(C_{i+1}\). Simultaneously, all data points from granular-balls that can not meet the purity requirement are grouped to constitute a new data set \(D_{i+1}\). Upon the completion of data extraction and preparation, the next round of splitting is initiated. Repeat the above splitting steps till all granular-balls meet the purity requirement. The pseudocode for the quantum granular-ball generation method based on iterative splitting is shown in Algorithm 1.
Data preparation
The initial phase of our quantum granular-ball algorithm is data preparation, which maps classical data points onto quantum states. This process involves encoding the data points of the dataset and the granular-ball center dataset as qubit angles. Let \(N_i\) represent the number of data points in dataset \(D_i\) and \(M_i\) and the number of granular-ball centers in dataset \(C_i\). Since our dataset contains two categories, we concentrate on three main issues during data encoding: data preprocessing, angle mapping, and quantum circuit preparation.
Step 1: Data preprocessing The number of data points in real-world databases is rarely a power of 2, conflicting with the quantum encoding methods that assume power-of-2 dataset sizes. To resolve this, we adjust the dataset size by padding with additional sample points to the nearest power of 2 value. For simplicity, padding data points are set to have the same values as the last data point in the dataset. Post supplementation, dataset \(D_i\) contains N data points, and dataset \(C_i\) contains M granular-ball centers.
Step 2: Angle mapping Data are then mapped to rotation angles on the Bloch sphere. Given a data value \(x=x_{h-1}\ldots x_1 x_0\) with \(x_{h-1}\) as the most significant bit, the corresponding rotation angle \(\theta _{r}\) is calculated by:
Note that \(\theta _{r}\) is directly proportional to x and will not exceed \(\pi\).
Step 3: Data preparation quantum circuit Both the dataset and the granular-ball center set consist of 2-dimensional points. We first introduce the data preparation for the first dimension of each dataset, followed by a combined approach for both.
The process of preparing the first dimension of the dataset into rotation angles is detailed in three stages, as depicted in Fig. 6a.
In the first stage, we use Quantum Random Access Memory (QRAM)25 to prepare data points on the ground state of the bus register, represented by the unitary operator Q. The data from dataset \(D_i\) is stored in a superposition state within \(|bus\rangle\), and this also completes the entanglement between the address register \(\vert add_D\rangle\) and the bus register \(\vert {bus}\rangle\).
The second stage employs CRY gates to encode data point from the bus register into angular qubits, with an additional angle register \(\vert ang_{D0} \rangle\) for rotation angles between 0 and \(\pi\). Using the bus register as control bits, the angles of qubit \(\vert {a}\rangle\) are encoded via CRY gates, resulting in:
The third stage reverts \(\vert {route}\rangle\) and \(\vert bus \rangle\) to their initial states using the inverse QRAM operation, allowing for the reuse of subsequent QRAM circuits.
The complete process results in the data set addresses being maintained in a superposition state within \(\vert {add_D}\rangle\), and the data values encoded in the y-axis angles of \(\vert ang_{D0} \rangle\). This entire quantum circuit is referred to as a unitary transformation \(W_D\).
Preparing the granular-ball center dataset \(C_i\) follows a similar process, as shown in Fig. 6b. This entire quantum circuit is referred to as a unitary transformation \(V_C\).
Finally, by utilizing the unitary operators \(W_D\) and \(V_C\), the preparation of all data points for both the dataset and the granular-ball center dataset is completed, as shown in Fig. 6c, and is represented as follows:
where \(\left| i \right\rangle _{{{\text{add}}_{D} }}\) and \(\vert {j}\rangle _{add_C}\) are the stored addresses, and \(\vert \phi \rangle\) and \(\vert \varphi \rangle\) represent the encoded data in the angle registers \(\vert {ang_D}\rangle\) and \(\vert {ang_C}\rangle\) respectively.
Quantum circuit for data preparation. a Preparing data points from the first dimension of the dataset, which we define as the unitary operator \(W_D\). b Preparing data points from the first dimension of the granular-ball center dataset, which we define as the unitary operator \(V_C\). c Preparing the dataset and granular-ball center dataset.
Generating new granular-balls
The generation of new granular-balls integrates concepts of fidelity calculation, fidelity transfer to the ground state, and a quantum minimum value search algorithm. The aim is to assign each data point to the nearest granular-ball center, forming new granular-balls.
Step 1: Fidelity calculation. The swap test26 Measures data-point to granular-ball center distance via fidelity between \(\vert ang_D \rangle\) and \(\vert ang_C \rangle\). Higher fidelity means closer distance.
In the quantum circuit depicted in Fig. 7, the control bit \(\vert B \rangle\) becomes a superposition state after an H gate operation.
The CSWAP gates, controlled by \(\vert B \rangle\), perform SWAP operations on \(\vert ang_{D0} \rangle\) with \(\vert ang_{C0} \rangle\) and \(\vert ang_{D1} \rangle\) with \(\vert ang_{C1} \rangle\). The circuit is defined as a unitary transformation U. \(\vert \phi \rangle = \vert \phi \rangle _{ang_D}, \vert \varphi \rangle = \vert \varphi \rangle _{ang_C}\) The quantum state after the completion of the quantum circuit in Fig. 7 is as follows:
The relationship between \(\theta\) and fidelity is defined as \(\cos (\pi \theta ) = \sqrt{\frac{1}{2}(1 - |\langle \varphi \vert \phi \rangle |^2)}\), with \(\theta\) ranging from \(\frac{1}{4}\) to \(\frac{1}{2}\). The simplified quantum state \(\vert \psi \rangle\) is as follows:
A larger value of \(\theta\) results in a smaller \(\cos (\pi \theta )\), corresponding to higher fidelity and closer proximity to granular-ball centers. Fidelity can be determined through quantum measurements on the registers, aiding in identifying the nearest center for each data point. However, this process is resource-intensive. To mitigate this, fidelity information is transferred to the ground state, which requires minimal quantum measurements for information retrieval.
Step 2: Fidelity transfer to the ground state. Using the absolute-quantum analog to digital conversion (abs-QADC) algorithm27 to convert the angle \(\theta\) from analog to digital encoding, we first need to construct the unitary operator G. This operator G is constructed using the quantum circuit for data preparation and fidelity calculation as depicted in Fig. 8a. Once constructed, the unitary operator is \(G=UVWSW^{\dagger }V^{\dagger }\) ,as shown in Fig. 8b.
Quantum circuit for constructing the unitary operator G. a W preps the data set, V preps granular-ball center data, U calculates fidelity. b The circuit includes the inverse operations \(U^\dagger\) for U, \(V^\dagger\) for V, and \(W^\dagger\) for W. Additionally, the function of Z is to rotate \(180^{\circ }\) around the z-axis of qubit \(\vert {B}\rangle\).
As shown in the red box in Fig. 9, a t-qubit \(\vert dig \rangle\) register is added and initialized to 0 to store the digitally encoded \(\theta\). Utilizing the quantum state \(\vert \psi \rangle\), the register \(\vert dig \rangle\), and the unitary operator G, the abs-QADC quantum circuit is designed. The quantum output state of the circuit is as follows:
Next, we utilize the quantum cosine function algorithm28 to compute \(\cos (\pi \theta )\) with the \(\theta\) from equation 15. As shown in Fig. 9, an additional register \(\vert fidelity \rangle\) of r qubits is introduced to store the computation result \(\cos (\pi \theta )\). Using the quantum circuit COS of the quantum cosine function algorithm, the result obtained is \(\vert {i}\rangle _{add_D} \vert {j}\rangle _{add_C} \vert \cos (\pi \theta )\rangle _{fidelity}\), where \(\vert i\rangle _{add_D}\) and \(\vert j\rangle _{add_C}\) represent the addresses of the data set and the granular-ball center data set, respectively, and \(\vert \cos (\pi \theta )\rangle _{fidelity}\) indicates the digitally encoded form of \(\cos (\pi \theta )\).
Step 3: Generate new granular-balls. By identifying the minimum \(\cos (\pi \theta )\) entangled with i, we can determine the nearest quantum granular-ball center for each data point, thereby achieving the generation of granular-balls. We have adopted the quantum minimum value search algorithm proposed in reference31 to design the corresponding quantum circuit, as shown in Fig. 9. We’ve added two new registers: the \(\vert y \rangle\), storing the initial threshold for the quantum minimum search, set to 0. The MIN in Fig. 9 represents the quantum minimum value search algorithm itself, ensuring that whenever the value in the \(\vert y \rangle\) register is updated, the address from the \(\vert add_D \rangle\) register, entangled with \(\vert y \rangle\), is copied to the \(\vert closest \rangle\) register. Another register, \(\vert closest \rangle\), stores the addresses of the nearest granular-ball centers with m qubits. After completing the quantum circuit, the final state is as follows:
where \(\vert {i}\rangle _{add_D}\) represents the address i of the data set stored in a superposition state within the address register \(\vert add_D \rangle\); \(\vert {closest_j}\rangle _{closest}\) represents the address \(closest_j\) of the nearest granular-ball centers for each data point i. In this way, each data point in the data set \(D_i\) successfully finds its nearest granular-ball center.
Once all data points have found their nearest granular-ball centers, these centers and their corresponding data points can be grouped to form new granular-balls, completing one round of granular-ball generation. Subsequently, in classical computation, the purity of each granular-ball is calculated and compared with a preset purity threshold. Data points that do not meet the purity requirements will enter the next round of the loop until all granular-balls satisfy the purity requirements, thus completing the generation of all granular-balls.
Quantum granular-ball generation based on fixed splitting number (Algorithm 2)
The iterative splitting method can generate granular-balls that are suitable for the sample distribution and offer advantages in time complexity compared to classical methods. However, the hybrid model of quantum and classical and the multiple iterations required during the granular-ball splitting process lead to significant consumption of quantum resources. To overcome this issue, a new quantum granular-ball generation method based on a fixed number of splits is proposed in this paper, which can generate a fixed number of granular-balls for the entire sample space in a single operation. Moreover, it is a fully quantized process after data preprocessing. For N samples, the number of clusters produced by clustering is less than \(c_{max}\). A rule of thumb that many investigators use is \(c_{max} \le \sqrt{N}\)32,33. Reference34 has verified the correctness of this rule. We have integrated the concept of this rule to determine the number of splits to be \(\sqrt{N}\), ultimately resulting in \(\sqrt{N}\) granular-balls. Implementing the quantum granular-ball generation method that uses a fixed number of splits follows the same process as a single splitting round in the iterative splitting method.
Each of these quantum granular-ball generation methods has its own set of strengths and weaknesses. Algorithm 1 is notable for its ability to generate granular-balls that are well-adapted to the data distribution, which is a significant advantage, but it also requires multiple iterations and conditional judgments, potentially leading to a more complex process.
In contrast, Algorithm 2, due to its fixed number of splits, is capable of rapidly and simultaneously splitting all granular-balls. It can achieve full quantization without the need for multiple switches between classical and quantum computations. However, the downside is that it may produce an inappropriate number of granular-balls, The pseudocode for the quantum granular-ball generation method based on a fixed splitting number is shown in Algorithm 2.
Quantum k-nearest neighbors algorithm based on granular-balls
The current QkNN enhances efficiency by leveraging quantum entanglement and quantum parallelism. However, the efficiency of both kNN and QkNN is constrained by large-scale data sets. To address this issue, this paper uses granular-balls to represent large data sets for implementing the kNN. The core concept of QGBkNN is to apply the quantum granular-ball generation method to the training set, using granular-balls to represent the entire sample space, allowing test samples with unknown labels to determine their category based on the category of the nearest granular-ball center. Since each granular-ball represents multiple samples with the same label from the training set, it is sufficient to set k to 1. The algorithm consists of two stages:
-
(1)
Training: generate the quantum granular-balls, and replace the original training set with the generated granular-balls.
-
(2)
Testing: Assign the test sample’s category based on its nearest granular-ball center.
Suppose a training set of N samples and a test sample. The dimensions of the training set and the sample dimensions of the test sample points are both 2, and the value range is \([0, 2^h-1]\). Taking the training set and test sample as examples, the two steps of QGBkNN are analyzed as follows.
The first step involves converting the sample points in the training set into granular-balls and using the granular-ball centers as the new training set E. The number of granular-balls generated by the quantum granular-ball generation method based on iterative splitting is uncertain, while quantum granular-ball generation based on a fixed splitting number generates \(\sqrt{N}\) granular-balls. The second step involves determining the test sample determining its category based on the nearest granular-ball center. This process is similar to the concept of finding the nearest granular-ball center and includes the following steps: first, encoding the training set E and the test sample point into angles; then, calculating the fidelity between the test sample point c and the training set E; transferring the analog encoded fidelity to the ground state; and finally, identifying the most similar sample point to the test sample point within the training set. It is worth noting that the QGBkNN based on Algorithm 2 needs to add another step to further increase its classification accuracy, which is that after the measurement, the label of the majority of the selected granular-ball is taken as the label of the test sample point. The pseudocode for the Quantum k-nearest neighbors algorithm based on granular-balls is shown in Algorithm 3.
Performance analysis
This subsection aims to conduct a comparative analysis of the time complexity between the two quantum granular-ball generation methods developed and their corresponding classical counterparts. Subsequently, a performance comparison will be made between the proposed QGBkNN algorithm and other kNN algorithms.
To make the explanation clearer, we provide the following clarification. The time complexity we evaluate is the depth of the quantum circuit of the quantum algorithm, and both the quantum circuit depth of the algorithm and the time complexity assessment are related to the steps of the algorithm’s execution. Therefore, we directly equate the quantum circuit depth of the algorithm with its time complexity for description to achieve uniformity and facilitate performance comparison with subsequent algorithms.
Time complexity analysis of quantum granular-ball generation methods
Time complexity analysis of Algorithm 1: During one splitting round, the data preparation time complexity is \(O(\log {MN})\), where M is the number of granular-balls and N is the number of samples. The time complexity of the abs-QADC is related to the number of bits to be estimated, and it is \(O((\log ^{2}{MN})/2^{-t})\). Due to the limited number of estimated bits, we simplify it to \(O(\log ^{2}{MN})\). The time complexity of the quantum cosine function quantum circuit is not dependent on the data set or dimensionality, so it is O(1). The time complexity of finding the nearest granular-ball center is \(O(\sqrt{M}\log ^{2}{MN})\). The overall time complexity for one round of splitting is \(O(\log {MN}+\log ^{2}{MN}+(\log ^{2}{MN})\sqrt{M})=O(\sqrt{M}\log ^{2}{MN})\). Assuming a total of d iterations, the total time complexity is \(O(d(\log ^{2}{MN})\sqrt{M})\).
Time complexity analysis of Algorithm 2: The number of data points is N, and the number of granular-ball center data points is \(\sqrt{N}\). The data preparation time complexity is \(O(\log {N^{\frac{3}{2}}})=O(\log N)\). The time complexity for fidelity transfer to the ground state is \(O(\log ^{2}{N})\). The time complexity for finding the nearest granular-ball center is \(O((\log ^{2}{N})N^\frac{1}{4})\). The overall time complexity is \(O(\log {N}+\log ^{2}{N}+(\log ^{2}{N})N^\frac{1}{4})=O((\log ^{2}{N})N^\frac{1}{4})\).
Table 1 compares the time complexity of two quantum granular-ball generation algorithms with that of the classical granular-ball generation algorithm. Currently, the time complexity of the classical granular-ball generation algorithm is O(mdN), where m is the number of granular-balls. Algorithm 1 reduces the time complexity to \(O(\sqrt{M}\log ^{2}{MN})\), with M being the number of data points in the granular-ball center dataset, which is significantly less than N. Algorithm 2 further eliminates the iterative process, thereby reducing the time complexity to \(O((\log ^{2}{N})N^\frac{1}{4})\).
Comparison of performance between QGBkNN and other kNN algorithms
Regarding time complexity, as shown in Tab. 2, QGBkNN proposed in this paper reduces the time complexity compared to GBkNN13, and QkNN.
Experiment
Generation of quantum granular-ball
Quantum circuits for data points preparation and fidelity calculation were executed on a quantum simulator to more intuitively observe the granular-ball generation process. Meanwhile, simulations were performed on a classical computer for phase estimation and the subsequent quantum circuits. Taking the data set [[1, 2], [1, 3], [2, 4], [3, 6], [4, 7], [5, 5], [6, 6], [7, 1]] and granular-ball center data set [[1, 1], [6, 7]] as an example. The probability histogram obtained from the execution on the quantum simulator is shown in Fig. 10.
Following quantum measurements, the probability histogram reflects the similarity between the data points and the central data point in each granular-ball. Let us take the first two data points, “0 000 0”, and “0 000 1”, as examples. From left to right, the first position, “0,” represents the fidelity storage location. Positions 2 to 4, “000”, represent the first data address in the data set. The last position, “0”, represents the address of the first data point in the granular-ball center data set. The second data point’s last position, “1”, represents its address in the granular-ball center data set. The probability obtained from measuring the first data point is greater than that obtained from measuring the second data point. Therefore, the first data point in the data set is more similar to that in the granular-ball center data set.
The probability distribution histogram of the fidelity between the data set and the granular-ball center data set. The horizontal axis represents the address of the granular-ball center data set \(\vert add_C \rangle\), the data set address \(\vert add_D \rangle\), and the fidelity registers \(\vert B \rangle\). The vertical axis represents the probability obtained from the measurement of each quantum state.
The process of generating granular-balls for a data set through the iterative splitting method, as shown in Fig. 11. We set the purity threshold T to 0.8 throughout the experimental process. Taking Fig. 11 as an example, a plot is generated after each splitting operation. Since the purity of the granular-balls after the first split does not meet the requirements, the splitting process continues. After the second split, all granular-balls meet the required purity, making the plot after the second split the final result of granular-ball splitting.
We conducted two experiments on the dataset in Fig. 12a based on the fixed splitting method. The experimental results are shown in Fig. 12b and c, respectively.
In the GBkNN algorithm, the result of the test sample point selecting the granular-ball center point is shown in Fig. 13. The test sample point in Fig. 13 is the green sample point, which ultimately selects the category represented by the black point as its category. Note that the circles in these figures are only graphical indications of the granular-balls, with centers being the granular-ball centers and radius being the average distance from all sample points in the granular-ball to these centers.
KNN based on quantum granular-ball
Before presenting the accuracy of the KNN algorithm, we will introduce the experimental data set and simulation environment of the KNN experiment. For the experimental data set, we used the iris dataset and selected the petal length and width as the features, with k set to 3. For the simulation environment, we used the IBM qasm_simulator that was still available in 2024 for the simulation experiment and added noise, such as bit flipping, that exists in real quantum computers on this basis.
In this section, we will introduce the experiment of KNN based on quantum granular-ball. In Tab. 3, the accuracy is defined as
where R is the number of correctly classified instances in the fold, and W is the number of wrongly classified instances. Using the dataset of Iris to analyze, the QGBkNN based on iterative splitting has an accuracy of 0.942 with a standard deviation of 0.0466, and the one based on a fixed number of splittings has an accuracy of 0.912 with a standard deviation of 0.0924. These results indicate that QGBkNN can maintain good accuracy while reducing time complexity, regardless of the method used.
Conclusion and discussion
This paper proposes two granular-ball generation methods that significantly improve the efficiency of granular-ball generation. To the best of our knowledge, it is the first time that quantum mechanics has been combined with granular-balls generation, overcoming the considerable time consumption in distance calculations between points. Both quantum granular-ball generation algorithms achieve significant time complexity reduction, leading to further complexity reduction in the proposed QGBkNN algorithm. Applying granular-balls generated through quantum computation provides significant advantages when integrated into practical algorithms. For example, the time complexity of Locality Sensitive Hashing (LSH) mainly consists of two parts: index construction and single query. By applying the quantum granular-ball optimization for data preprocessing (i.e., data encoding) to the index construction of LSH, the query time complexity of the algorithm can be reduced to the logm level (where m is the number of granular-ball and is much smaller than the original dataset size).
Unlike conventional data preparation, our quantum computing design does not solely rely on QRAM or angle encoding; instead, our design combines the advantages of both approaches. We have also addressed a previously overlooked issue: a data padding scheme is proposed when the number of data points in the data set is not a power of 2.
In summary, leveraging quantum computing’s characteristics allows us to address the excessive time complexity present in machine learning algorithms. In the future, we plan to further optimize quantum granular-ball generation and employ them in more machine learning algorithms to achieve higher efficiency and performance.
Data availability
All data generated or analysed during this study are included in this published article and its supplementary information files.
References
Gyongyosi, L. & Imre, S. A survey on quantum computing technology. Computer Sci. Rev. 31, 51–71 (2019).
Biamonte, J. et al. Quantum machine learning. Nature 549, 195–202 (2017).
Rebentrost, P., Mohseni, M. & Lloyd, S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503 (2014).
Rojas, I., Joya, G. & Catala, A. Advances in Computational Intelligence: 13th International work-conference on artificial neural networks, IWANN 2015, Palma de Mallorca, Spain, June 10-12, 2015. Proceedings, Part I, vol. 9094 (Springer, 2015).
Özpolat, Z. & Karabatak, M. Exploring the potential of quantum-based machine learning: A comparative study of qsvm and classical machine learning algorithms. In 2023 11th international symposium on digital forensics and security (ISDFS), 1–5 (IEEE, 2023).
Kerenidis, I., Landman, J., Luongo, A. & Prakash, A. q-means: A quantum algorithm for unsupervised machine learning. Advances in Neural Information Processing Systems 32 (2019).
Khan, S. U., Awan, A. J. & Vall-Llosera, G. K-means clustering on noisy intermediate scale quantum computers. arXiv preprint (2019). ArXiv:1909.12183.
Dallaire-Demers, P.-L. & Killoran, N. Quantum generative adversarial networks. Phys. Rev. A 98, 012324 (2018).
Bausch, J. Recurrent quantum neural networks. Adv. Neural Inf. Process. Syst. 33, 1368–1379 (2020).
Chen, L. Topological structure in visual perception. Science 218, 699–700 (1982).
Zadeh, L. A. Fuzzy sets and information granularity. In Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers, 433–448 (1979).
Xia, S. et al. Granular ball computing classifiers for efficient, scalable and robust learning. Inf. Sci. 483, 136–152 (2019).
Xia, S., Dai, X., Wang, G., Gao, X. & Giem, E. An efficient and adaptive granular-ball generation method in classification problem. IEEE Trans. Neural Netw. Learn. Syst. 35(4), 5319–5331 (2022).
Xie, J. et al. Mgnr: A multi-granularity neighbor relationship and its application in knn classification and clustering methods. IEEE Trans. Pattern Anal. Mach. Intell. 46(12), 7956–7972 (2024).
Yang, J. et al. 3wc-gbnrs++: A novel three-way classifier with granular-ball neighborhood rough sets based on uncertainty. IEEE Trans. Fuzzy Syst. 32(8), 4376–4387 (2024).
Xia, S. et al. Granular-ball fuzzy set and its implement in svm. IEEE Trans. Knowl. Data Eng. 36(11), 6293–6304 (2024).
Xia, S., Wang, G., Gao, X. & Lian, X. Granular-ball computing: an efficient, robust, and interpretable adaptive multi-granularity representation and computation method. arXiv preprint (2023). ArXiv:2304.11171.
Xia, S., Xia, C., Dai, D. & Ni, J. (2023) Image segmentation algorithm based on multi-granularity clustering. In Seventh International Conference on Mechatronics and Intelligent Robotics (ICMIR 2023), 12779: 597–604
Cheng, D., Liu, S., Xia, S. & Wang, G. Granular-ball computing-based manifold clustering algorithms for ultra-scalable data. Expert Syst. Appl. 247, 123313 (2024).
Xie, J. et al. W-gbc: An adaptive weighted clustering method based on granular-ball structure. In 2024 IEEE 40th International conference on data engineering (ICDE), 914–925 (IEEE, 2024).
Yang, J. et al. Adaptive three-way KNN classifier using density-based granular balls. Inf. Sci. 678, 120858 (2024).
Sun, Y. & Zhu, P. Online group streaming feature selection based on fuzzy neighborhood granular ball rough sets. Expert Syst. Appl. 249, 123778 (2024).
Giovannetti, V., Lloyd, S. & Maccone, L. Quantum random access memory. Phys. Rev. Lett. 100, 160501 (2008).
Nielsen, M. A. & Chuang, I. L. Quantum computation and quantum information. Phys. Today 54, 60 (2001).
Arunachalam, S., Gheorghiu, V., Jochym-O’Connor, T., Mosca, M. & Srinivasan, P. V. On the robustness of bucket brigade quantum ram. New J. Phys. 17, 123010 (2015).
Buhrman, H., Cleve, R., Watrous, J. & De Wolf, R. Quantum fingerprinting. Phys. Rev. Lett. 87, 167902 (2001).
Mitarai, K., Kitagawa, M. & Fujii, K. Quantum analog-digital conversion. Phys. Rev. A 99, 012301 (2019).
Wang, S. et al. Quantum circuits design for evaluating transcendental functions based on a function-value binary expansion method. Quantum Inf. Process. 19, 1–31 (2020).
Wiebe, N., Kapoor, A. & Svore, K. M. Quantum nearest-neighbor algorithms for machine learning. Quantum Inf. Computation 15, 318–358 (2015).
Ruan, Y., Xue, X., Liu, H., Tan, J. & Li, X. Quantum algorithm for k-nearest neighbors classification based on the metric of hamming distance. Int. J. Theor. Phys. 56, 3496–3507 (2017).
Kang, Y. & Heo, J. Quantum minimum searching algorithm and circuit implementation. In 2020 International Conference on Information and Communication Technology Convergence (ICTC), 214–219 (IEEE, 2020).
Pal, N. R. & Bezdek, J. C. On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3, 370–379 (1995).
Bezdek, J. C. & Pal, N. R. Some new indexes of cluster validity. IEEE Trans. Syst., Man, Cybern. Part B (Cybernetics) 28, 301–315 (1998).
Yu, J. & Cheng, Q. The upper bound of the optimal number of clusters in fuzzy clustering. Sci. China Series F: Inf. Sci. 44, 119–125 (2001).
Basheer, A., Afham, A. & Goyal, S. K. Quantum \(k\)-nearest neighbors algorithm. arXiv preprint (2020). arXiv:2003.09187.
Quezada, L. F., Sun, G.-H. & Dong, S.-H. Quantum version of the k-nn classifier based on a quantum sorting algorithm. Annalen der Physik 534, 2100449 (2022).
Feng, C., Zhao, B., Zhou, X., Ding, X. & Shan, Z. An enhanced quantum k-nearest neighbor classification algorithm based on polar distance. Entropy 25, 127 (2023).
Gao, L.-Z., Lu, C.-Y., Guo, G.-D., Zhang, X. & Lin, S. Quantum k-nearest neighbors classification algorithm based on mahalanobis distance. Front. Phys. 10, 1047466 (2022).
Zardini, E., Blanzieri, E. & Pastorello, D. A quantum k-nearest neighbors algorithm based on the euclidean distance estimation. Quantum Mach. Intell. 6, 1–22 (2024).
Author information
Authors and Affiliations
Contributions
Xiaojiang Tian and Wenping Lin wrote the main manuscript text. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yuan, S., Tian, X., Lin, W. et al. Quantum granular-ball generation methods and their application in KNN classification. Sci Rep 15, 29779 (2025). https://doi.org/10.1038/s41598-025-14724-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-14724-3