Introduction

Quantum computing achieved a significant milestone in 2019 when Google’s quantum computer, Sycamore, outperformed a classical supercomputer in a specialized task known as random quantum circuit sampling1. While a classical super computer needed about 10,000 years, the 53 qubits Sycamore took around 200 seconds to finish the task, thanks to its capability in representing \(2^{53} \approx 10^{16}\) computational state-space. The next stage after this milestone, according to this paper, is showing the capability of a quantum computer to solve a more valuable computing applications. Although an ideal fault-tolerant system with a sufficient number of qubits for implementing practical quantum algorithms has not yet been achieved–placing us in what is known as the NISQ (Noisy Intermediate-Scale Quantum) era–various efforts in this direction have already been initiated. One potential approach for using NISQ devices to solve real-world computing problems is through hybrid classical-quantum algorithms, such as the Quantum Approximate Optimization Algorithm (QAOA) proposed by Farhi et al.2.

To date, researchers have extensively explored the theoretical foundations, enhancements, and potential applications of the QAOA. Ruan et al.3 present an application of QAOA-based algorithm designed to solve combinatorial optimization problems with equality and/or inequality constraints. They formalized methods for encoding various types of constraints and demonstrated the effectiveness of these approaches through examples and results on well-known NP optimization problems. Boulebnane et al.4 investigated the performance of QAOA in sampling low-energy states for protein folding problems. Their findings suggest that while QAOA performs well on simpler instances, more complex problems demand deeper quantum circuits, yielding results comparable to those of random sampling. He et al.5 studied the selection of initial states in QAOA for constrained portfolio optimization problems. By employing the connection between QAOA and the adiabatic algorithm, they discovered that the optimal initial state corresponds to the ground state of the mixing Hamiltonian. A broader overview of the current status of QAOA–including performance analyses across different scenarios, applicability to various problem instances, and hardware-specific challenges such as error susceptibility and noise resilience–is described by Blekos et al.6. This review also compares selected extensions and variants of QAOA, highlighting future directions for its development.

Another significant result in the use of NISQ devices is the recent demonstration of quantum utility before fault tolerance by IBM researchers7. This result brings hopes on the implementation and demonstration of quantum advantage for real-world applications. In line with this spirit, we propose a hard problem of discovering a particular discrete structure–which is a specific order of Hadamard matrix, as a potential instance of such practical applications and use QAOA for implementation in gate-based quantum computers.

A Hadamard matrix (H-matrix) is an orthogonal binary matrices with various scientific and engineering applications8,9,10,11. An M-order H-matrix exists only when M equal to 1, 2, and multiples of 4. The converse, that for every positive integer k there is a H-matrix of order 4k is also believed to be true11,12, which is the well known Hadamard matrix conjecture. When \(M=2^n\), for a non-negative integer n, the H-matrix can be constructed easily by Sylvester method12. Construction of H-matrix with other values of \(M=4k\) has also been developed, among others are the methods by Paley13, Williamson14, Baumert-Hall15, and Turyn16. More recently, co-cylic techniques are developed by Delauney-Horadam17,18,19, and Alvarez et al.20. Nevertheless, not all of H-matrices are neither easily constructed nor discovered. The latest one is a H-matrix of order 428, which was found by Kharaghani and Tayfeh-Rezaie21. Up to this day, for order \(M<1000\), the H-matrices of order 668, 716, and 892 have neither been discovered nor proven to exist. Our previous study indicates that, by using currently known methods, present-day (classical) computing resources are insufficient to find those matrices in practical time.

In principle, an M-order H-matrix can be found, or proven non-existent, by exhaustively testing all possible \(+1/-1\) combinations of its \(M \times M\) elements. However, when M becomes sufficiently large, this approach becomes computationally impractical, as the number of orthogonality tests grows exponentially, by \(O(2^{M \times M})\), even though each test can be performed in polynomial time. To address this issue, we have developed several methods based on Simulated Annealing (SA) and Simulated Quantum Annealing (SQA)22, and Quantum Annealing (QA)23,24. Our latest method was implemented on a quantum annealer, specifically the D-Wave quantum computer, where we successfully discovered several H-matrices up to order greater than one hundred24. Although current quantum annealers have over 5,000 qubits, the need for ancillary qubits to represent interactions beyond 2-body limits hinders implementation to find higher-order H-matrices. We have estimated that the implementation of the method for finding a 668-order H-matrix needs at least 15, 400 physical qubits24.

A tentative way to pursue this task is by developing a qubit-efficient method. This paper explores this idea by proposing the use of a circuit/gate-based quantum computer instead of a quantum annealer to implement the method. An almost straightforward extension for the previous method is by formulating the problem as an instance of the QAOA (Quantum Approximate Optimization Algorithm) method2. The Hamiltonian in the Hadamard-search problem involves not only 1-body and 2-body interactions but also extends to 3-body and 4-body terms. However, quantum annealers can natively implement only up to 2-body interactions. To simulate higher-order terms like 3-body and 4-body interactions, additional qubits–known as ancillary qubits–must be introduced to mediate these interactions. These ancillary qubits serve as intermediaries, substantially increasing the total number of physical qubits required. In the direct method23, the use of ancillary qubits increases the algorithmic complexity from \(O(M^2)\) to \(O(M^3)\). Subsequent developments using the Williamson/Baumert-Hall and Turyn methods24 reduced the number of required qubits to O(M). However, implementing these methods on quantum annealers still requires ancillary qubits for 3-body and 4-body terms, thereby raising the qubit requirement back to \(O(M^2)\). In contrast, gate-based quantum computers can directly implement higher-order interactions without relying on ancillary qubits, enabling more efficient use of qubit resources. This approach significantly reduces the total number of qubits needed, bringing it down to O(M).

QAOA for Hadamard search and error metric

QAOA formulation of H-matrix searching problem

The QAOA is a hybrid classical-quantum algorithm proposed by Farhi et al.2. It is proposed as a solution for near-term quantum computing, which can be implemented on NISQ devices–quantum computers with limited qubits, connectivity, gate errors, and short coherence times. A typical N-bit and M-clause combinatorial optimization problem addressed by the QAOA can be formulated as follows. Consider an N-length bit string or a boolean vector \(\vec {b}=b_0b_1 \cdots b_{N-1} \triangleq \left[ b_0, b_1, \cdots ,b_{N-1} \right] ^T\) and let \(C(\vec {b})\) be a cost or an objective function given by the following expression

$$\begin{aligned} C(\vec {b}) = \sum _{m=0}^{M-1} C_m(\vec {b}) \end{aligned}$$
(1)

The value of \(C_m(\vec {b})\) is equal to 1 if \(\vec {b}\) satisfies the clause \(C_m\), otherwise it is 0 . When C is the maximum value of Eq. (1), the approximation means that we seek for a bit string \(\vec {b}\) where \(C(\vec {b})\) is close to C.

For applying the QAOA to the H-matrix searching problem, we change the Boolean vector \(\vec {b}\) in Eq. (1) into its spin vector representation \(\vec {s}=\left[ s_0, s_1, \cdots , s_{N-1}\right] ^T\), while the maximization is recast as minimization. We can restate the previous approximation problem into finding a vector \(\vec {s}\) that minimize a non-negative cost function given by

$$\begin{aligned} C(\vec {s}) = \sum _{m=0}^{M-1} C_m(\vec {s}) \end{aligned}$$
(2)

Then, the approximation means that we seek for a bit string \(\vec {b}\) corresponding to the vector \(\vec {s}\) that makes \(C(\vec {s})\) close to zero. The bit string \(\vec {b}\) is obtained by measuring the qubits after executing the QAOA in a circuit-based quantum computer, which is done by evolving the system under a Hamiltonian \(\hat{H}\) that represents a given problem.

In the QAOA method, we have a Hamiltonian \(\hat{H}\) that consists of a problem Hamiltonian \(\hat{H}_C\) and a mixer Hamiltonian \(\hat{H}_B\),

$$\begin{aligned} \hat{H} = \hat{H}_C + \hat{H}_B \end{aligned}$$
(3)

The problem Hamiltonian \(\hat{H}_C\) is equivalent to the potential Hamiltonian \(\hat{H}_{pot}\) in the quantum annealing formulation24. It is derived from the cost function of the Hadamard searching problem, either in the Williamson/Baumert-Hall (WBH) method \(C_W(\vec {s})\) or the Turyn method \(C_T(\vec {s})\). The details of these cost functions are provided in the Methods section.

Based on Eq. (3), we construct a quantum circuit to perform the following unitary transform

$$\begin{aligned} \hat{U}(\gamma , \beta ) = e^{-i\beta _P \hat{H}_B}e^{-i\gamma _P \hat{H}_C} e^{-i\beta _{P-1} \hat{H}_B}e^{-i\gamma _{P-1} \hat{H}_C} \cdots e^{-i\beta _p \hat{H}_B}e^{-i\gamma _p \hat{H}_C} \cdots e^{-i\beta _1 \hat{H}_B}e^{-i\gamma _1 \hat{H}_C} \end{aligned}$$
(4)

where

$$\begin{aligned} \hat{H}_B=\sum _j b_j \hat{\sigma }^x_j \end{aligned}$$
(5)

and

$$\begin{aligned} \hat{H}_C=\sum _{j,k, \cdots ,m,n} c_{jk\cdots mn} \hat{\sigma }^z_j\hat{\sigma }^z_k\cdots \hat{\sigma }^z_m\hat{\sigma }^z_n \end{aligned}$$
(6)

In these equations, P is the number of layers (Trotter slice), \(\gamma _p\) and \(\beta _p\) are (angle) parameters at layer p, \(b_j\) and \(c_{j,k,\cdots , m,n}\) are constants, whereas \(\hat{\sigma }^x_j\) and \(\hat{\sigma }^z_j\) are the \(j^{th}\) spin/Pauli matrices in x and z-directions, respectively.

Fig. 1
figure 1

Elementary quantum circuits used in the QAOA-based H-matrix search, categorized into four types: (a) 1-body interaction term, (b) 2-body interaction term, (c) 3-body interaction term, and (d) 4-body interaction term. In this figure, \(q_j, q_k, q_m,\) and \(q_n\) are the \(j^{th}, k^{th}, m^{th},\) and \(n^{th}\) qubits, respectively, \(\gamma\) denotes rotation angle, and \(c_j, c_{jk}, c_{jkm},\) and \(c_{jkmn}\) are constants.

The term expressed by the product of n Pauli matrices \(\hat{\sigma }_0^z\hat{\sigma }_1^z \cdots \hat{\sigma }_{n-1}^z\) in Eq. (6) is called an n-body interaction term. In the Hadamard Searching Problem (H-SEARCH), there are only up to 4-body interaction in the Hamiltonian, so that generally \(\hat{H}_C\) can be expressed by

$$\begin{aligned} \hat{H}_C=\sum _{j} c_j \hat{\sigma }^z_j + \sum _{j,k}c_{jk}\hat{\sigma }^z_j\hat{\sigma }^z_k + \sum _{j,k,m} c_{jkm} \hat{\sigma }^z_j\hat{\sigma }^z_k\hat{\sigma }^z_m + \sum _{j,k,m,n} c_{jkmn} \hat{\sigma }^z_j\hat{\sigma }^z_k\hat{\sigma }^z_m\hat{\sigma }^z_n \end{aligned}$$
(7)

Here, the \(1^{st}\), \(2^{nd}\), \(3^{rd}\), and \(4^{th}\) terms corresponds to the 1-body, 2-body, 3-body, and 4-body interaction terms, subsequently. The quantum circuits representing each of these elementary terms are displayed in Fig. 1.

Performance metrics

The performance of quantum computing algorithms can be evaluated in various ways, including the time-to-solution (TTS), quantum resource usage, scaling behavior, solution quality, and success probability. To measure the TTS, it is necessary to know all hardware specifications, as well as the exact start and end times of the execution. Unfortunately, this level of detail is not always accessible to ordinary users. Quantum resource usage can be assessed by comparing the number of qubits required by different algorithms. The method proposed here requires resources – specifically, the number of physical qubits – that scale linearly as O(M), representing an improvement over previous methods whose resource requirements grew quadratically24 or cubically23, when it is implemented on a D-Wave system. Scaling analysis is crucial to determine whether the proposed quantum algorithm offers an advantage over its classical counterparts as problem size increases. The solution quality measures how close the obtained solution is to the ideal one. In the context of signal processing, this is analogous to the Signal-to-Noise Ratio (SNR). However, we do not use this metric in our evaluation, because our objective is to find matrices that are (perfectly) orthogonal, in line with the goal of finding H-matrices. Instead, we adopt success probability as our main metric – that is, how much more effectively the quantum algorithm identifies correct solutions compared to a purely random search.

The outputs of the algorithms discussed in this paper are L-length bit strings, resulting in \(2^L\) possible combinations. An output string is considered correct or valid if the value of the associated non-negative error or energy function–such as the Williamson or Turyn cost function–is equal to zero. Otherwise, it is labeled as incorrect or wrong. To evaluate the algorithm’s performance in producing correct solutions, we compare it to an algorithm that randomly generates all possible L-length bit strings. Accordingly, we introduce the xRAR (x-algorithm to Random-Algorithm Ratio) as a performance metric for a given x-algorithm.

Fig. 2
figure 2

Performance metric and workflow diagram of the QAOA: (a) Performance measure in term of xRAR. The probability \(P_x\) of correct or valid answers of the x-algorithm that generates L-length bit strings is compared to \(P_R\), which is the correct probability of a random algorithm that generates L-length (uniform) randomly distributed bit strings, (b) The QAOA processing steps in the experiments. The parameters \(\{\beta , \gamma \}\) are angle parameters which is updated during the execution of hybrid quantum-classical algorithm. The threshold value \(\epsilon\) is a stopping parameter. When the difference \(|\Delta |\) between current energy \(E_{i+1}\) and previous one \(E_{i}\) is considered insignificant, the execution of the algorithm is terminated.

The random algorithm generates \(2^L\) possible bit strings and we can evaluate whether each of the bit string is a correct/valid solution or not. If there are \(S_R\) correct solutions among the \(2^L\) random bit strings and we assume a uniform probability distribution among them, the probability of finding a valid solution, \(P_R\), is given by \(P_R = \frac{S_R}{2^L}\). Similarly, the solutions generated by a quantum circuit that represents the x-algorithm are also probabilistic, and we can calculate the probability \(P_x\) of correct solution of the x-algorithm. Therefore, the value of xRAR; which conceptually is illustrated by Fig. 2 (a), is given by

$$\begin{aligned} xRAR = \frac{P_x}{P_R} \end{aligned}$$
(8)

The value of xRAR in Eq. (8) is a positive real number. An xRAR value where \(0< \textit{xRAR} < 1\) indicates that the x-algorithm performs worse than the random algorithm R, \(\textit{xRAR} = 1\) signifies comparable performance to the random algorithm, and \(\textit{xRAR} > 1\) suggests that the x-algorithm outperforms the random algorithm R.

The execution steps of the QAOA used in the experiments are shown in Fig. 2 (b). When running the quantum algorithm, either on a simulator or a real quantum computer, we repeat the process N times, referred to as the number of shots. This produces N solutions or bit strings, each with a corresponding value of energy or error. A specific part of the algorithm computes the average or expectation value, allowing us to determine whether any of the bit strings achieve the minimum energy, as indicated by the Extraction of Measured Qubits blocks in the figure. We can count the number of valid solutions at each iteration step and also after reaching the lowest average energy for a given experimental setup. The number of correct solutions is then used to evaluate the algorithm’s performance.

In addition to xRAR, we also evaluate the performance of the algorithm using an error metric. This error metric is defined as the accumulated or total objective values of all generated solutions, with the objectives measured by either the Williamson or Turyn cost functions (Eq. (11) and Eq. (14) in the Methods section), respectively. The error is normalized by the maximum value of each cost function and then compared to the value obtained by a random algorithm through exhaustive search. For a particular order of H-SEARCH that generates \(N_Q\)-length bit string solutions with a maximum objective error of \(E_{max}\) and a total error for all possible \(2^{N_Q}\) bit strings equal to \(E_{tot}\), the normalization factor is \(2^{N_Q}E_{max}\). The average error is given by \(E_{tot}/2^{N_Q}\), and the normalized average error is \(E_{tot}/(2^{N_Q} E_{max})\). This normalized average error can be used to compare the performance of algorithms with varying numbers of shots (samples).

Methods

Finding H-matrices as a binary optimization problem

A direct method to find an M-order a H-matrix, i.e. a binary orthogonal matrix of size \(M\times M\), can be done by checking the orthogonality condition of all possible binary matrices \(B=[b_{m,n}]\), where \(b_{m,n} \in \{-1, +1\}\). The orthogonality test can be formulated as a cost function \(C_D(B)\), which is the sum of the squared off-diagonal elements of an indicator matrix \(D=[d_{m,n}]=B^TB\), which can be expressed by,

$$\begin{aligned} C_D(B) = C(b_{m,n}) = \sum _{m=0}^{M-1} \sum _{n=0}^{M-1} \left( d_{m,n}- MI_{m,n} \right) ^2 \end{aligned}$$
(9)

where I is an \(M\times M\) identity matrix. When \(C_D(B)=0\), then the matrix B is orthogonal and therefore it is a H-matrix; otherwise it is not. This is not an efficient method because the number of binary matrices to check is \(2^{M\times M}\).

A more efficient way of finding the H-matrix is by employing the Williamson/Baumert-Hall12 or the Turyn methods16,21,25. We also have developed optimization based methods that employs quantum computers to find the H-matrix, which are the QA (Quantum Annealing) direct method by representing each entries as a binary variable23, the QA Williamson/Baumert-Hall method, and the QA Turyn method24. Whereas the number of variables in the QA direct method grows with the order M by \(O(M\times M)\), the QA Williamson/Baumert-Hall and the QA Turyn methods only grows by O(M), which is more efficient in term of the number of the variables. However, when it is implemented on the present day quantum annealer, such as the D-Wave, not only each variable should be represented by a qubit, but additional ancillary qubits are also required for representing 3-body and 4-body terms. Accordingly, the required number of qubits grows with the order of the matrix by \(O(M\times M)\). Since the qubit is one of the most valuable resources in quantum computing, a more efficient method that can reduce the number of qubits is highly desired.

In the Williamson based method24, we seek for a binary \(\{-1, +1\}\) vector

$$\begin{aligned} \vec {s}=\left[ s_0, s_1, \cdots , s_n, \, \cdots , s_{N-1} \right] ^T \end{aligned}$$
(10)

where \(s_n \in \{-1,+1\}\), that minimize a Williamson cost function \(C_W(\vec {s})\) that is given by,

$$\begin{aligned} C_W(\vec {s}) = \sum _{i=0}^{K-1} \sum _{j=0}^{K-1} \left( v_{i,j}(\vec {s})- 4k\delta _{i,j}\right) ^2 \end{aligned}$$
(11)

In this equation, \(v_{i,j}(\vec {s})\) is the elements of matrix V that is constructed from four sub-matrices ABC,  and D of dimension \(K \times K\); that is,

$$\begin{aligned} V = A^TA + B^TB + C^TC +D^TD \end{aligned}$$
(12)

where \(V=V(\vec {s}), A=A(\vec {s}), B=B(\vec {s}), C=C(\vec {s}), D=D(\vec {s})\) are sub-matrices whose elements include some particular elements of the vector \(\vec {s}\). When \(C_W(\vec {s})=0\), then the matrix H of size \(4K \times 4K\) given by the following block matrix

$$\begin{aligned} H = \begin{pmatrix} A & B & C & D \\ -B & A & -D & C \\ -C & D & A & -B \\ -D & -C & B & A \end{pmatrix} \end{aligned}$$
(13)

is Hadamard12. A larger Baumert-Hall matrix can also be constructed from the same \(\{A, B, C, D\}\) submatrices12. We will call the binary representation of vector \(\vec {s}\) given in Eq. (10) that minimize Eq. (11) as a Williamson/Baumert-Hall string or a WBH-string.

In the Turyn based method, we also seek for a vector \(\vec {s}=\left[ s_0, s_1, \cdots , s_{N-1}\right] ^T\) like in Eq. (10) that minimize a Turyn cost function \(C_T(\vec {s})\) given by

$$\begin{aligned} C_T(\vec {s}) = \sum _{r>1} \left( N_{X(\vec {s})}(r) + N_{Y(\vec {s})}(r) + 2N_{Z(\vec {s})}(r) + 2N_{W(\vec {s})}(r) \right) ^2 \end{aligned}$$
(14)

where \(N_{X(\vec {s})}(r), N_{Y(\vec {s})}(r), N_{Z(\vec {s})}(r), N_{W(\vec {s})}(r)\) are non-periodic auto-correlation functions of sequences \(X(\vec {s}), Y(\vec {s}), Z(\vec {s}), W(\vec {s})\), respectively, which are calculated at lagged r. Note that for a sequence \(X=\left[ x_0, x_1, \cdots , x_{N-1}\right] ^T\), the non-periodic auto-correlation function is given by21,25,

$$\begin{aligned} N_X(r) = {\left\{ \begin{array}{ll} \sum _{n=0}^{N-1-r} x_n x_{n+r} & , 0 \le r \le N-1\\ 0 & , r \ge N\\ \end{array}\right. } \end{aligned}$$
(15)

Similarly as in the previous case, we will call the binary representation of vector \(\vec {s}\) that makes \(C_T(\vec {s})=0\) as a Turyn string or T-string. In this paper, since the number of variables can be very large, the computation of the cost functions \(C_W(\vec {s})\) and \(C_T(\vec {s})\) and its corresponding Hamiltonian expression are performed by symbolic computing. These functions are then transformed into the problem Hamiltonian \(\hat{H}_C\) in Eq. (3).

Construction of elementary circuits

Consider a general problem Hamiltonian given by Eq. (7). By using Eq. (4), the unitary for a single layer problem’s Hamiltonian can be expressed by

$$\begin{aligned} \hat{U}(\gamma )=\prod _{j}e^{-i\gamma c_j\hat{\sigma }^z_j} \prod _{j,k}e^{-i\gamma c_{jk}\hat{\sigma }^z_j\hat{\sigma }^z_k} \prod _{j,k,m}e^{-i\gamma c_{jkm}\hat{\sigma }^z_j\hat{\sigma }^z_k\hat{\sigma }^z_m} \prod _{j,k,m,n}e^{-i\gamma c_{jkmn}\hat{\sigma }^z_j\hat{\sigma }^z_k\hat{\sigma }^z_m\hat{\sigma }^z_n} \end{aligned}$$
(16)

We can represent the exponentiation of \(\hat{\sigma }^z\) as a rotation in z-direction, \(\hat{R}_Z(\cdots )\), as follows

$$\begin{aligned} \hat{U}(\gamma ) = e^{-i\gamma \hat{\sigma }^z} = e^{-i\gamma \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} } = \begin{pmatrix} e^{-i\gamma } & 0 \\ 0 & e^{i\gamma } \end{pmatrix} = \hat{R}_Z(2\gamma ) \end{aligned}$$

Substituting \(\gamma \leftarrow c_j \gamma\) yields

$$\begin{aligned} \hat{U}(c_j\gamma )= e^{-c_j\gamma \hat{\sigma }^z} = \hat{R}_Z(2c_j\gamma ) \end{aligned}$$

The first term of the product in Eq. (16), considering there are N qubits to be rotated, can be expanded into the following

$$\begin{aligned} \prod _{j} e^{-ic_j\gamma \hat{\sigma }_j^z} = \begin{pmatrix} e^{-ic_0\gamma } & 0 \\ 0 & e^{ic_0\gamma } \end{pmatrix} \otimes \begin{pmatrix} e^{-ic_1\gamma } & 0 \\ 0 & e^{ic_1\gamma } \end{pmatrix} \otimes \cdots \otimes \begin{pmatrix} e^{-ic_{N-1}\gamma } & 0 \\ 0 & e^{ic_{N-1}\gamma } \end{pmatrix} \end{aligned}$$

where \(\otimes\) denotes tensor product. By denoting the Z-rotation on qubit n as \(\hat{R}_{Z_n(\cdots )}\), we can write

$$\begin{aligned} \prod _{j} e^{-ic_j\hat{\sigma }_j^z} = \hat{R}_{Z_0}(2c_0\gamma ) \otimes \hat{R}_{Z_1}(2c_1\gamma ) \otimes \cdots \otimes \hat{R}_{Z_{N-1}}(2c_{N-1}\gamma ) \end{aligned}$$

The products in Eq. (16) can be interpreted as a series of cascaded operations. Then, a 1-body interaction term in the Hamiltonian that is expressed by \(\hat{H}_{C_1}= \sum _j c_j \hat{\sigma }^z_j\) can be implemented as a quantum circuit given by Fig. 1.(a), which is a Z-rotation of angle \(2c_j\gamma\).

Higher order terms, which are 2-,3-, and 4- body interacting terms, can also be treated similarly, but with a different elementary circuits in the cascaded block.

The quantum circuits implementation of the k-body interactions displayed in Fig. 1 (b), (c), and (d) are adopted from Nielsen-Chuang26, Seeley27, and Setia28. A 2-body interaction term in the Hamiltonian \(\hat{H}_{C_2}= \sum _{j<k} c_{jk}\hat{\sigma }^z_j\hat{\sigma }^z_k\) has a corresponding circuits given by Fig. 1(b), which is a combination of CNOT and Z-rotation gate. The 3-body interaction term in the Hamiltonian which is expressed by \(\hat{H}_{C_3}= \sum _{j<k<m} c_{jkm} \hat{\sigma }^z_j\hat{\sigma }^z_k\hat{\sigma }^z_m\) has a corresponding circuits given by Fig. 1(c), which is a combination of CNOT and Z-rotation gate acting on 3 qubits. Finally, a 4-body interaction term in the Hamiltonian \(\hat{H}_{C_4}= \sum _{j<k<m<n} c_{jkmn} \hat{\sigma }^z_j\hat{\sigma }^z_k\hat{\sigma }^z_m\hat{\sigma }^z_n\) has a corresponding circuits given by Fig. 1(d), which is a combination of CNOT and Z-rotation gate acting on 4 qubits.

Results and discussions

Evaluation of elementary quantum circuits

The prototypical problem that we use to evaluate the individual elementary circuits is finding matrix’s element(s) of a 2-order H-matrix. For cases involving the 1,2,3, and 4 unknown elements, the representative problem in finding unknown elements \(s_1, s_2, s_3, s_4\) are given by the following \(2\times 2\) matrices

$$\begin{aligned} \begin{pmatrix} 1 & 1 \\ 1 & s_1 \end{pmatrix}, \begin{pmatrix} 1 & s_1 \\ 1 & s_2 \end{pmatrix}, \begin{pmatrix} 1 & s_1 \\ s_2 & s_3 \end{pmatrix}, \begin{pmatrix} s_1 & s_3 \\ s_2 & s_4 \end{pmatrix} \end{aligned}$$

Then, for each cases, we formulate their corresponding energy functions E(s) and Hamiltonians \(\hat{H}(\hat{\sigma })\), whose formulation has been described in our previous papers23,24. As an example, for 4 unknown elements of the latest matrix, the Hamiltonian is

$$\begin{aligned} \hat{H}\left( \hat{\sigma }\right) =2\left( 1+\hat{\sigma }^z_{1}\hat{\sigma }^z_{2} \hat{\sigma }^z_{3}\hat{\sigma }^z_{4} \right) \end{aligned}$$

This Hamiltonian includes a 4-body interaction term, with its corresponding quantum circuit shown in Fig. 1(d). After incorporating the mixing Hamiltonian, the quantum circuit for this QAOA problem is shown in Fig. 3(a). The solution distribution obtained from execution on a quantum simulator is displayed in Fig. 3(b).

Fig. 3
figure 3

Evaluation of the elementary circuit for the QAOA-based H-matrix search: (a) A single-layer QAOA quantum circuit implemented for a problem involving a 4-body interaction term, (b) The distribution of solutions after running the circuit on a quantum computer simulator. The resulting histogram demonstrates that all solutions are found with an almost uniform distribution.

This case represents a simple, non-cascaded QAOA problem that can be effectively solved using a single-layer quantum circuit. The solutions with dominant probabilities, as shown in the histogram–0001, 0010, 0100, 0111, 1000, 1011, 1101, 1110–are all correct. The total probability of these solutions results in a performance value of \(xRAR = 2.0\), which is the maximum possible value of xRAR for this problem.

However, when elementary circuits are combined into a more complex circuit, the problem becomes increasingly difficult to solve. For a general problem involving mixed k-body cascaded circuits, we consider two cases for the Hamiltonians: uniform coefficients and non-uniform coefficients.

First, we consider a Hamiltonian with uniform coefficients for all terms, expressed as follows:

$$\begin{aligned} \hat{H}\left( \hat{\sigma }\right) = \hat{\sigma }^z_{0} + \hat{\sigma }^z_{1} + \hat{\sigma }^z_{2} + \hat{\sigma }^z_{0}\hat{\sigma }^z_{1} + \hat{\sigma }^z_{1}\hat{\sigma }^z_{2} + \hat{\sigma }^z_{1}\hat{\sigma }^z_{2} \hat{\sigma }^z_{3} + \hat{\sigma }^z_{0}\hat{\sigma }^z_{1}\hat{\sigma }^z_{2} \hat{\sigma }^z_{3}-1 \end{aligned}$$
(17)

There are 4 correct solutions for this problem, which are \(\{0101, 0110, 1000, 1001\}\), so that \(P_R=\frac{1}{4}\). We run the code for this problem with various random initial values for the angle parameters and a sufficient number of iterations. Simulation results are shown in Fig. 4. The results show that a single-layer QAOA circuit is insufficient to produce good outcomes, as indicated by the histogram in (a), where non-correct solutions still have high probabilities. Accordingly, we increase the number of layers subsequently into \(p=4, 8, 16,\) and 32. The histogram in (b) shows the solution distribution for \(p=32\), where some of the correct solutions exhibit dominant probability values, while the incorrect ones are suppressed. The curve in (c) shows that the error stabilizes and converges after approximately 500 iterations of updating the angle parameters. The table in (d) shows that increasing the number of QAOA layers p consistently enhances performance, as reflected by a reduction in objective errors and an increase in xRAR values.

Fig. 4
figure 4

Experiment results for a mixed k-body Hamiltonian with uniform coefficients: (a) distribution of solution for \(p=1\) showing both of correct and incorrect solutions with comparable probability values, (b) distribution of solution for \(p=32\), indicating that increasing the layer number improves the solution; non-correct solutions are suppressed, (c) Curve showing decreases in error by iteration and achieve convergence after more than 500 iterations, (d) performance table showing that increasing the number of layers will improve the performance, i.e., reducing objective error and increasing the xRAR.

For the second case, we consider a more general Hamiltonian where the circuits of mixed k body interactions is weighted with non-uniform coefficients as follows

$$\begin{aligned} \hat{H}\left( \hat{\sigma }\right) = \hat{\sigma }^z_{0} +2\hat{\sigma }^z_{1} + 3\hat{\sigma }^z_{2} + 5\hat{\sigma }^z_{0}\hat{\sigma }^z_{1} + 7\hat{\sigma }^z_{1}\hat{\sigma }^z_{2} + 11\hat{\sigma }^z_{1}\hat{\sigma }^z_{2}\hat{\sigma }^z_{3} + 13\hat{\sigma }^z_{0}\hat{\sigma }^z_{1}\hat{\sigma }^z_{2}\hat{\sigma }^z_{3} \end{aligned}$$
(18)

This problem only have one correct solution, which is 1100, so that the probability of random algorithm is \(P_R= 1/16\). Some of the simulation results are displayed in Fig. 5. In parts (a) and (b), we observe a re-concentration of the distribution toward the solution as the number of layers p increases. In (c), we observe that the error decreases and approaches convergence after approximately 800 iterations. The performance table in (d) indicates that increasing p generally enhances performance by increasing xRAR and decreasing objective errors; however, at certain values of p, the performance degrades.

Fig. 5
figure 5

Experimental results for a problem Hamiltonian with uniform k-body interactions but non-uniform coefficients: (a) The distribution of solutions for a single layer shows that while the correct solution has a high probability, other incorrect solutions also exhibit significantly high probabilities. (b) After increasing the layer number to 32, the distribution of solutions improves, with correct solutions now holding the highest probability. (c) The error curve indicates a decreasing error value as the iteration number increases. (d) The performance table by the number of layers demonstrates that increasing the layer count generally enhances xRAR performance and reduces errors; however, there are instances where performance may decrease.

In general, we can say that the performance of each individual circuit met our expectations, with the solution distributions confirming the circuit’s validity. Explanations for other cases, ranging from the simplest to the mixed cascade, are provided in the Supplementary Information. The circuit construction for the elementary 1-body case and its simulation are shown in Fig. S.1, the 2-body case in Fig. S.2, and the 3-body case in Fig. S.3 demonstrate that such elementary problem-2 instances yield very good simulation results, with all solutions successfully found using only a single layer (\(p=1\)). On the other hand, the circuit construction and simulation for the mixed 2-body problem with uniform coefficients, presented in Fig. S.4, and the 2-body problem with non-uniform coefficients in Fig. S.5, are relatively more difficult to solve. The number of layers must be increased to find the correct solutions with more dominant probability distributions, as shown in Fig. S.6.

Finding Hadamard matrices

The lowest order case for the Williamson method is 12, which corresponds to 36-order Baumert-Hall H-matrix, requires \(N_Q=8\) qubits to implement. The energy function and the Hamiltonian of this problems was obtained similarly to our previous paper24.

Fig. 6
figure 6

Quantum Circuits of QAOA-Based Hadamard Search: (a) A 1-layer quantum circuit for 12-order QAOA-Williamson/Baumert-Hall method, and (b) A 1-layer quantum circuit of 44-order QAOA-Turyn method.

An exhaustive search to all possible \(2^{8}\) bit strings that yields minimum energy; and therefore correct bit strings, found 64 WBH-sequences as correct solutions. This result yields the probability value to find the solution of 8-length uniformly distributed bit strings \(P_R=\frac{1}{4}\); therefore, the maximum xRAR performance is \(\frac{1}{P_R} = 4\). It was also found that the maximum value of the cost function is 18 and the total error of 1, 024. The Hamiltonian of this problem is given by

$$\begin{aligned} \begin{aligned} \hat{H}_C\left( \hat{\sigma }\right) =2 \hat{\sigma }^z_0\hat{\sigma }^z_1 + 2\hat{\sigma }^z_2\hat{\sigma }^z_13 + 2\hat{\sigma }^z_4\hat{\sigma }^z_5 + 2\hat{\sigma }^z_6\hat{\sigma }^z_7 + \hat{\sigma }^z_0\hat{\sigma }^z_1\hat{\sigma }^z_2\hat{\sigma }^z_3 + \hat{\sigma }^z_0\hat{\sigma }^z_1\hat{\sigma }^z_4\hat{\sigma }^z_5 + \\ \hat{\sigma }^z_0\hat{\sigma }^z_1\hat{\sigma }^z_6\hat{\sigma }^z_7 + \hat{\sigma }^z_2\hat{\sigma }^z_3\hat{\sigma }^z_4\hat{\sigma }^z_5 + \hat{\sigma }^z_2\hat{\sigma }^z_3\hat{\sigma }^z_6\hat{\sigma }^z_7 + \hat{\sigma }^z_4\hat{\sigma }^z_5\hat{\sigma }^z_6\hat{\sigma }^z_7 + 4 \end{aligned} \end{aligned}$$
(19)

Note that in the minimization, the constant term can be dropped without affecting the result. We will perform some experiments for this case with both of the simulator and the real quantum computer.

Fig. 7
figure 7

Lowest order PEL (Potential Energy Landscape) of (a) 12-order QAOA-Based Williamson Method and (b) 44-order QAOA-Based Turyn Method. The Williamson’s PEL method is relatively more regular than the Turyn’s, indicating that finding the minimum value in the Turyn-based method is more difficult than the Williamson’s.

First, we run the QAOA-HSEARCH algorithm in a quantum computer simulator (IBM-Qiskit) with various number of layers, random initialization of \(\{\beta _0, \gamma _0\}\) parameters, and apply the COBYLA (Constrained Optimization BY Linear Approximation) optimization method which is available in the Python library29. Figure 6 (a) depicts a one-layer quantum circuit associated with the QAOA algorithm, with the Problem Hamiltonian given in Eq. (19).

Fig. 8
figure 8

Performance of 12-order Williamson/36-order Baumert Hall Based QAOA Methods. (a) presents the simulation results on a quantum simulator: solid blue line with blue circles is the xRAR curve, solid blue line is the upper bound of xRAR value; which is equal to 4, the red-dotted line with circle is the normalized objective error, and the dotted red line is the lower bound of error, which is equal to zero. (b) Shows the hardware performance: the dotted blue line with \(\times\) symbols at the bottom part is the curve of the objective error, the dotted blue line at the upper part is the error threshold for the random algorithm, solid red line with circle is the mean value of xRAR, the solid red line is the xRAR of random algorithm, and the red dashed-dot with \(\times\) symbols are maximum value of xRAR at corresponding layer number.

The energy distribution as a function of \(\gamma\) and \(\beta\) parameters displayed as PEL (Potential Energy Landscape) in Fig. 7 (a) reveals a periodic landscape with minima located in the first (right upper) and third (left lower part) quadrants. Additionally, the PEL for the 44th-order Turyn Hadamard search is shown in Fig. 7(b) and will be discussed later. Fig. 8 (a) shows the performance of the algorithm with the number of layers p are increased stepped wisely, i.e, \(p= 1, 4, 8, 16, 32\). Based on the location of the minima indicated in the PEL, the initialization of the parameters was selected within the interval \((-0.5, 0.5)\). We repeat the experiment 30 times and plot the mean value of xRAR and the (optimization objective) error values in the figure. We observed that the value of xRAR consistently increased asymptotically to its maximum theoretical value of \(xRAR=4\) at \(p=32\). At the same time, we observed that increasing the number of layer reduces the error. The resulting 12-order of the Williamson’s and its corresponding 36 order of Baumert-Hall’s are displayed in Fig. 9 (a) and Fig. 9 (b), respectively.

We also implemented the algorithm of finding 12-order Williamson matrix in a quantum computer hardware. An IBM quantum computer, in this case is the IBM-Brisbane machine powered by a 127 qubits Eagle r.3 of version 1.1.6 quantum processor, was employed. The processor’s qubits mean coherence time are \(T_1 \approx 227 \mu s, T_2 \approx 130 \mu s\) with median ECR error \(\approx 7 \times 10^{-3}\) and median SX error \(\approx 2 \times 10^{-4}\). We repeat the run 10 times and the number of shots in the hardware is set to 1, 024. The results are displayed in Fig. 8(b), which is the quantum hardware (QPU) performance for the QAOA 12-order Williamson method with number of layers 1, 2, 3 and 4. This figure shows that, although the mean error of QAOA implemented on hardware are consistently lower than the mean error of random algorithm, the mean xRAR performance is sometimes slightly better than the random algorithm bound for number of layer of 1 and 3, and worse for 2 and 4. A slight increase at layer 3 indicates improved performance due to the increased number of layers; however, beyond this point, at higher layer counts, noise becomes more dominant and degrades performance. (This phenomenon is also observed in simulations with noisy qubits, as shown by Fig. S.7 in Section 4 of the S.I.). Since initialization of the angle can influence the final results, in term of xRAR, we also display the maximum xRAR for each repeated 10 times run. It can be seen in this figure that the maximum xRAR curve consistently lies above the average of the random algorithm. In this regard, if the goal is to find a single solution, then QAOA is more promising than the random algorithm.

Fig. 9
figure 9

The 12-order Williamson and 36-order Baumert-Hall Hadamard Matrices which are found by the proposed methods. Both execution of the algorithm in the quantum simulator and quantum hardware found identical H-matrices. In the figure, white boxes represent “+1” elements and the black ones represent “-1” elements of the matrices.

In the Turyn-based method, for a particular order of H-matrix that we want to construct, we have to find a corresponding TT (Turyn Type)-Sequence16,21,24,25. In term of the energy (cost) function (see Eq. (14) in the Method section), we are looking for a T-string \(\vec {s}\). For even positive integers \(N=4, 6, 8,...\), the order of related Turyn’s H-matrix will be \(M=4(3N-1)\) and the number of variables or required qubits is \(Q=4N-11\). We performed experiments for \(N=4, 6, 8\) that corresponds to the Hadamard matrices of order \(M=44, 68, 92\) which require \(Q=5, 13, 21\) qubits, respectively.

In the first experiment, our goal is to find a Turyn H-matrix of order 44, which needs 5 qubits to implement. The problem Hamiltonian is given by the following expression,

$$\begin{aligned} \begin{aligned} \hat{H}_C\left( \hat{\sigma }\right) =\hat{\sigma }^z_0\hat{\sigma }^z_1\hat{\sigma }^z_2 + \hat{\sigma }^z_0\hat{\sigma }^z_3\hat{\sigma }^z_4 + \hat{\sigma }^z_0\hat{\sigma }^z_3 - \hat{\sigma }^z_0\hat{\sigma }^z_4 + \hat{\sigma }^z_1\hat{\sigma }^z_2\hat{\sigma }^z_3\hat{\sigma }^z_4 \\ + \hat{\sigma }^z_1\hat{\sigma }^z_2\hat{\sigma }^z_3 + 2\hat{\sigma }^z_1\hat{\sigma }^z_2 + \hat{\sigma }^z_1\hat{\sigma }^z_3\hat{\sigma }^z_4 + \hat{\sigma }^z_1\hat{\sigma }^z_3 + \hat{\sigma }^z_1\hat{\sigma }^z_4 + \hat{\sigma }^z_1 + \hat{\sigma }^z_2\hat{\sigma }^z_3\hat{\sigma }^z_4 \\ + \hat{\sigma }^z_2\hat{\sigma }^z_3 + \hat{\sigma }^z_2\hat{\sigma }^z_4 + \hat{\sigma }^z_2 + \hat{\sigma }^z_4 + 5 \end{aligned} \end{aligned}$$
(20)

The corresponding quantum circuit can be automatically generated using a construction algorithm, with a single-layer version shown in Fig. 6 (b). The PEL shown in Fig. 7(b) exhibits a more irregular surface compared to the 12th-order Williamson PEL in Fig. 7(a). Therefore, we anticipate that finding a solution will be more challenging in the Turyn case than in the Williamson case.

We then execute the QAOA on both a quantum simulator and quantum hardware, successfully identifying the 44th-order Turyn H-matrix. As illustrated in the QAOA flowchart in Fig. 2, the quantum computing process is typically embedded within the optimization loop, requiring dedicated access to the quantum device. However, since we used a public access to a 5 qubits IBM quantum computer, such dedicated access was difficult to obtain. Therefore, we only implemented the optimized quantum circuits obtained in the simulation into the 5 qubits IBM Quito.

Fig. 10
figure 10

Results for 44-order Turyn method: (a) Histogram of the output when running the algorithm in IBM Quito. The inset shows qubit configuration on the quantum device. (b) A Turyn H-matrix of order-44 found by the proposed method running on IBM Quito.

The resulting 44-order H matrix is identical for both the simulator and the hardware, which is shown in Fig. 10: (a) output histogram of implemented algorithm in IBM-Quito, and (b) result of 44-order of the Turyn H-matrix. The histogram in Fig. 10 (a) indicates that the number of the valid solution, i.e. the bit string 11100 consists of about \(4 \%\) valid solution. Since a random algorithm would yield only \(3.1\%\), the experiments on the quantum processor suggest a better performance.

Finding higher order H-matrices

The PEL of the QAOA for higher-order H-matrix is more irregular compared to that of lower-order matrices. This suggests that selecting initial parameters is both challenging and crucial. In the final experiment, we present the results of finding higher order Williamson/Baumert Hall and Turyn matrices using a quantum computer simulator.

Fig. 11
figure 11

Experiment Results QAOA-Based H-matrix Search on a Simulator: (a) Williamson Matrix of Order 44 and (b) Corresponding Baumert-Hall of Order 132.

In principle, simulating an n-qubit quantum system using a classical computer, where each complex amplitude is represented using 16 bytes, requires approximately \(2^n \times 16\) bytes of memory. Consequently, a typical desktop PC can simulate up to around 30 qubits at most. Table 1 and Table 2 show the required number of qubits for finding the corresponding order of Hadamard matrices using the QAOA-BWH and QAOA-Turyn methods, subsequently.

Table 1 Search space of and performance of WBH-based QAOA method.
Table 2 Search space of and performance of Turyn-based QAOA method.

Theoretically, it is sufficient to simulate the WBH-based QAOA-algorithm for matrices up to order 52 (Williamson)/ 208-order (Baumert–Hall), which requires 28 qubits. Likewise, the Turyn-based QAOA algorithm can be simulated up to order 116, which requires 29 qubits. However, practical quantum simulations involve optimization routines, which are computationally intensive. Due to this additional complexity, we are constrained to using approximately 20–24 qubits in practice. This corresponds to WBH 44/132 and up to order 92 for the Turyn-QAOA method.

We used a single-layer QAOA and experimented with various initial parameters, selecting the best-performing configuration. Tables 1 and 2 also show the performance for a few numbers of order for both methods, where each quantum simulations is repeated for 30 times. It can be seen that the maximum values indicates better xRAR performances than the random algorithm.

The required number of qubits to implement WBH-based QAOA method of order 44/132 is 24. After setting the number of sampling to 10,000 shots, we obtained the mean xRAR on 10 different parameter initialization is equal to 1.30 with the maximum value of 3.58. One of the obtained matrix is displayed in Fig. 11, where (a) shows 44-order Williamson and (b) its corresponding 132-order Baumert-Hall matrices. Similar simulation was also done for the Turyn-based QAOA method. For order 92, we obtained the mean xRAR of 1.29 and its maximum value is 4.31, indicating a better performance than the random algorithm. The found Turyn H-matrix of order 92 is shown in Fig. 12.

Fig. 12
figure 12

92-order Hadamard Matrix found by QAOA-Turyn Method: (a) Hadamard Matrix, (b) Orthogonality Indicator Matrix.

Discussion

We have demonstrated the feasibility of implementing H-matrix search algorithms on a circuit-based quantum computer, both in a simulator and on actual quantum hardware. Within the framework of quantum optimization, we utilized the QAOA to construct the Hamiltonian described in our previous work24. This Hamiltonian was then implemented in quantum circuits and executed on circuit-based quantum computers. Due to hardware limitations and the current state of noisy qubits, the implementation on quantum hardware was limited to the lowest-order cases with a single-layer QAOA scheme. However, the quantum simulator successfully handled higher-order cases.

Experimental results suggest that the difficulty in finding Hadamard matrices using QAOA algorithms arises from the non-smoothness of the energy landscape (PEL), which becomes increasingly pronounced in higher-order cases. While the Turyn-based method is more qubit-efficient than the Williamson and Baumert-Hall (WBH) method, its more irregular energy landscape makes finding Turyn’s solution more challenging.

Experiments with the lowest-order WBH case (as shown in Fig. 8) on the quantum simulator demonstrate that increasing the number of layers consistently enhances performance. This improvement is indicated by the xRAR metric, which asymptotically approaches the performance limit as the number of layers increases. However, implementing the algorithm on a real quantum device did not replicate this performance. With a single layer, the algorithm performed slightly better than a random algorithm on average, but this advantage diminished as the number of layers increased–unless only when the best performance from multiple iterations was selected. However, the advantage of using more than one layer also disappears. These results suggest that increasing the number of layers on a NISQ device provides only limited benefits. The simulation with error as shown in Fig. S.8 in the Supplementary Information supports this view. This also confirms the findings of previous QAOA researchers, including Boulebnane et al.4.

The experiments with the quantum simulator described earlier used COBYLA to find the optimal parameters. Other algorithms can certainly be employed for the same purpose. However, preliminary results using the Nelder-Mead algorithm, as shown in Fig. S.8 in the Supplementary Information, indicate that COBYLA performs better in terms of improving xRAR performance and reducing objective error as the number of layers increases.

In late 2023, IBM successfully built a 1,121-qubit processor, known as the Condor quantum processor, although the issue of noise remains unresolved. But more recently, quantum error correction experiments have reached the threshold for the surface code30. If these trends continue, it is likely that some of the currently unknown Hadamard matrices will eventually be discovered by quantum computing. The QAOA method would require 336 qubits to find the lowest unknown 668-order H-matrix using the Williamson method, or 157 qubits using the Turyn method. In terms of qubit numbers, this is within the reach of current technology. However, noise will remain a significant obstacle to implementation. Nonetheless, it would be exciting to further explore this domain, especially with dedicated access to a quantum device capable of running QAOA at full capacity and the application of error mitigation techniques.