Introduction

Fault-tolerant quantum computing holds the potential to push the boundaries of quantum chemistry calculations1,2,3,4,5,6,7,8,9,10,11 due to efficient quantum algorithms, such as quantum phase estimation (QPE)12 that, given a sufficient initial state preparation13, allows for a nearly exact estimation of the energy of a molecule or a material in polynomial time. Specifically, QPE with qubitization is the leading approach14,15,16 which requires the lowest quantum resources for quantum chemistry problems2,8,17,18,19. In order to use this algorithm, one must map the Hamiltonian and wavefunction into the qubit representation and also embed the Hamiltonian into the block of a larger unitary matrix using, for example, a linear-combination-of-unitaries (LCU) decomposition. The computational cost of the quantum algorithm is determined by the efficiency of these procedures, which in turn depends on the representation of the quantum chemistry Hamiltonian and whether it is in first or second quantization.

The most studied approach in the context of quantum computation is second quantization. In this formalism, the anti-symmetry of the electronic wavefunction is encoded into the creation and annihilation operators. These operators can be mapped into the qubit representation using, for example, the Jordan-Wigner transformation20,21. Conveniently, the occupation number wavefunction maps directly onto the qubit basis, requiring 2D qubits for a system with 2D spin orbitals. The resulting computational cost does not explicitly depend on the number of electrons. Many LCU decompositions of the second quantized Hamiltonian exist, including the sparse representation19, single factorization19, double factorization18,22, and tensor hyper-contraction17. All of these methods, based on the factorization of the two-body term, are valid for any basis functions and can take advantage of state-of-the-art quantum chemistry basis sets.

There is, however, a tempting alternative. The first quantization formalism requires \(N{\log }_{2}2D\) qubits to represent the wavefunction, where N is the number of electrons. As a result, for a fixed N, first quantization offers an exponential improvement in the scaling of the number of system qubits with respect to the number of orbitals23. Therefore, we can increase the number of orbitals to obtain a better approximation of the continuum limit, with little increase in the number of system qubits16. Furthermore, unlike methods developed in second quantization, the first quantization approach is equally applicable to bosonic problems with fixed number of particles as it is to fermionic ones; the only difference is the initial state preparation of the wavefunction which accounts for the symmetry.

Previous work on qubitization in first quantization is limited to plane-wave (PW) basis sets24,25 (for other approaches see refs. 26,27,28,29,30,31). The electronic integrals have a simple analytical form in the PW basis set, which makes it possible to avoid the loading of classical data and instead use quantum arithmetic circuits to estimate the parameters of the Hamiltonian on the quantum computer. These methods have been developed further32,33 to implement the Goedecker-Teter-Hutter pseudopotentials34,35. However, one of the main disadvantages of these approaches from an applications point of view is their inability to treat active regions of molecules or materials - one of the main tasks in computational quantum chemistry. While pseudopotentials do separate electrons into active (valence) and frozen (core) electrons, this is not the same as a more general scheme for active space construction used in quantum chemistry. The other disadvantage is that it might not be possible to incorporate modern pseudopotentials36,37 or PAW method38, which do not have a simple analytical representation, without the classical-data loading. Therefore, developing approaches in first quantization for advanced quantum chemistry basis sets and exploring its advantages over second quantization in the context of QPE is of great interest.

In this work, we present a qubitization-based QPE implementation of the quantum chemistry Hamiltonian in first quantization, with an arbitrary basis set. We provide the first explicit LCU decomposition in first quantization that makes use of a sparse representation of the Hamiltonian. While we focus on electronic Hamiltonians, our result is general for any Hamiltonian in first quantization. We then explore two basis sets: molecular orbitals spanned on Gaussian-type orbitals and dual plane waves (DPW)39,40,41. For molecular orbitals, we compare quantum resource estimates of QPE with qubitization for active space calculations between sparse implementations in first and second quantization19. One of the main quantum primitives in sparse qubitization is advanced quantum read only memory (QROAM19,42,43) which allows for a trade-off between the number of qubits and Toffoli gates. Irrespective of the QROAM implementation, we surprisingly observe a polynomial speedup with respect to the number of basis functions, D, in the Toffoli count for a given number of electrons. For the implementation of QROAM that minimizes the number of logical qubits, the first quantization approach naturally provides exponential improvements in the qubit count, while for the implementation of QROAM that minimizes the number of Toffoli gates, we observe similar logical qubit count in both first and second quantization. The speedup in Toffoli-gate count is due to a lower subnormalization factor of the sparse LCU in first quantization. For DPWs, we also analyze the uniform electron gas and materials in the physically relevant regimes, and for DPWs, we observe a speed up of many orders of magnitude in both the logical qubit count as well as Toffoli counts over the second quantization counterpart. For some cases, we also observe resource reduction as compared to a first quantization plane wave (PW) implementation25 which avoids the data loading. To our best knowledge, this is the first work that implements first quantization in any basis sets for qubitized QPE.

Results

LCU decomposition in first quantization

In first quantization, the Hamiltonian of interacting identical particles can be written as follows

$$\hat{H}=\mathop{\sum }\limits_{i=0}^{N-1}\mathop{\sum }\limits_{p,q=0}^{D-1}\sum _{\sigma =0,1}{h}_{pq}{\left(\vert p\sigma \rangle \langle q\sigma \vert \right)}_{i}+\frac{1}{2}\mathop{\sum }\limits_{i\ne j}^{N-1}\mathop{\sum }\limits_{p,q,r,s=0}^{D-1}\sum _{\sigma ,\tau =0,1}{h}_{pqrs}{\left(\vert p\sigma \rangle \langle q\sigma \vert \right)}_{i}{\left(\vert r\tau \rangle \langle s\tau \vert \right)}_{j}.$$
(II.1)

Here, N is the number of particles, D is the number of basis functions (orbitals) and σ and τ are spin indices. The one-body and two-body matrix elements, hpq and hpqrs, are independent of particle number and spin. The subscript on the operator \({(\vert p\sigma \rangle \langle q\sigma \vert )}_{i}\) indicates that the operator acts on the ith particle:

$${(\vert p\sigma \rangle \langle q\sigma \vert )}_{i}={I}_{2D}^{\otimes i}\otimes \vert p\sigma \rangle \langle q\sigma \vert \otimes {I}_{2D}^{\otimes (N-i-1)},$$
(II.2)

where I2D is the identity operator acting on vector space \({{\mathbb{C}}}^{2D}\).

To block encode the Hamiltonian for QPE with qubitization, we can express this Hamiltonian as a linear combination of unitary matrices (LCU),

$${\hat{H}}_{{\rm{LCU}}}=\sum _{\alpha }{\omega }_{\alpha }{U}_{\alpha },$$
(II.3)

where Uα are unitary matrices with coefficients ωα. The one-norm of the LCU,

$$\lambda =\sum _{\alpha }| {\omega }_{\alpha }| ,$$
(II.4)

is the subnormalisation of the LCU block encoding.

As a choice for unitary matrices we consider Pauli strings, which are tensor products of the Pauli matrices:

$$I=\left(\begin{array}{rc}1&0\\ 0&1\end{array}\right),\quad X=\left(\begin{array}{rc}0&1\\ 1&0\end{array}\right),\quad Y=\left(\begin{array}{rc}0&-{\rm{i}}\\ {\rm{i}}&0\end{array}\right),\quad Z=\left(\begin{array}{rc}1&0\\ 0&-1\end{array}\right).$$
(II.5)

To express the Hamiltonian as an LCU with Pauli strings, we use the generic Pauli LCU decomposition presented in ref. 44, which is valid for square matrices of dimension D = 2M where M is a natural number. The Pauli LCU decomposition of the one-body term then reads as

$${\hat{H}}_{\text{LCU},1}=\mathop{\sum }\limits_{p,q=0}^{D-1}{\omega }_{pq}{(-{\rm{i}})}^{\mu (p,q)}\mathop{\sum }\limits_{j=0}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{X}_{jM+k}^{{p}_{k}}{Z}_{jM+k}^{{q}_{k}}{{\rm{i}}}^{{p}_{k}\wedge {q}_{k}},$$
(II.6)

where \(\mu (p,q)=\mathop{\sum }\nolimits_{k=0}^{M-1}{p}_{k}\wedge {q}_{k}\), is the AND operation, Xa and Za are Pauli operators acting on qubit a:

$$\begin{array}{l}{X}_{a}={I}^{\otimes a}\otimes X\otimes {I}^{\otimes (MN-a-1)},\\ {Z}_{a}={I}^{\otimes a}\otimes Z\otimes {I}^{\otimes (MN-a-1)},\end{array}\quad a=0\ldots MN-1$$
(II.7)

and pk is the value of the kth bit of the standard binary representation of p, likewise for qk. The action on the spin index is the identity, so there are \(N\log D\) system qubits. The coefficients ωpq of the Pauli strings are given by an XOR transform of the indices followed by a Hadamard transform,

$${\omega }_{pq}=\frac{1}{D}\mathop{\sum }\limits_{a=0}^{D-1}{h}_{p\oplus a,a}{\left({H}^{\otimes M}\right)}_{a,q},$$
(II.8)

where is the bitwise XOR operation and H is a Hadamard matrix, defined as:

$$H=\left(\begin{array}{ll}1&1\\ 1&-1\end{array}\right).$$
(II.9)

In quantum computation, the Hadamard matrix is defined with a refactor of \(\frac{1}{\sqrt{2}}\) unlike our definition. While the Eq. (II.6) is written using the conventional definition of Pauli strings, we find it more convenient to cancel − i with i and work with a slightly different form:

$${\hat{H}}_{\text{LCU},1}=\mathop{\sum }\limits_{p,q=0}^{D-1}{\omega }_{pq}\mathop{\sum }\limits_{i=0}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{X}_{Mi+k}^{{p}_{k}}{Z}_{Mi+k}^{{q}_{k}}.$$
(II.10)

For a real Hamiltonian, anti-Hermitian terms in Eq. (II.10) have coefficients ωpq equal to zero, and all remaining unitary matrices are also Hermitian. From now on, we will refer to this form of the decomposition as Pauli LCU. The two-body LCU decomposition is

$${\hat{H}}_{\text{LCU},2}=\mathop{\sum }\limits_{p,q,r,s=0}^{D-1}{\omega }_{pqrs}\frac{1}{2}\mathop{\sum }\limits_{i\ne j}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{X}_{Mi+k}^{{p}_{k}}{Z}_{Mi+k}^{{q}_{k}}{X}_{Mj+k}^{{r}_{k}}{Z}_{Mj+k}^{{s}_{k}},$$
(II.11)

where the two-body Pauli LCU coefficients ωpqrs, are

$${\omega }_{pqrs}=\frac{1}{{D}^{2}}\mathop{\sum }\limits_{a=0}^{D-1}\mathop{\sum }\limits_{b=0}^{D-1}{h}_{p\oplus a,a,r\oplus b,b}{\left({H}^{\otimes M}\right)}_{a,q}{\left({H}^{\otimes M}\right)}_{b,s}.$$
(II.12)

Eq. (II.11) is a generalization of ref. 44 to a 4-dimensional tensor and details of this generalization are given in “Two-body LCU derivation”. Eq. (II.8) and (II.12) are manipulated into a matrix multiplication so that the fast Walsh-Hadamard transform can be used to efficiently find the LCU coefficients44.

Coefficients that correspond to the same Pauli string can be combined to reduce the one-norm of the LCU and data loading cost. When p = q = 0 or when r = s = 0, the two-body LCU repeats Pauli strings from the one-body LCU. Moreover, all terms that give the identity may be removed; these terms shift the eigenvalues of the Hamiltonian by a constant and do not need to be input on the quantum computer. These changes give the canonicalized one-body LCU,

$${{\hat{H}}}_{{\rm{LCU}}\,,1}^{{\prime}}=\mathop{\sum}\limits_{p,q=0}^{D-1}{\omega }_{pq}^{{\prime} }\mathop{\sum }\limits_{i=0}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{X}_{Mi+k}^{{p}_{k}}{Z}_{Mi+k}^{{q}_{k}},$$
(II.13)

where

$${\omega }_{pq}^{\,{\prime} }=\left\{\begin{array}{ll}0,\quad &\,{\rm{if}}\,\,p=q=0\\ {\omega }_{pq}+\frac{1}{2}\left(N-1\right)\left({\omega }_{pq00}+{\omega }_{00pq}\right),\quad &\,{{\rm{otherwise}}.}\,\end{array}\right.$$
(II.14)

The canonicalized two-body LCU is

$${{\hat{H}}}_{{\rm{LCU}}\,,2}^{{\prime} }=\mathop{\sum }\limits_{p,q,r,s=0}^{D-1}{\omega }_{pqrs}^{{\prime} }\frac{1}{2}\mathop{\sum }\limits_{i\ne j}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{X}_{Mi+k}^{{p}_{k}}{Z}_{Mi+k}^{{q}_{k}}{X}_{Mj+k}^{{r}_{k}}{Z}_{Mj+k}^{{s}_{k}},$$
(II.15)

where

$${\omega }_{pqrs}^{{\prime} }=\left\{\begin{array}{ll}0,\qquad\,{\rm{if}}\,\,p=q=0\,{\rm{or}}\,r=s=0\\ {\omega }_{pqrs},\quad\,{\rm{otherwise}.}\,\end{array}\right.$$
(II.16)

The removed terms proportional to the identity are

$${{\hat{H}}}_{\,{\text{LCU}},\,0}^{{\prime} }=N{\omega }_{00}+\frac{1}{2}N(N-1){\omega }_{0000}.$$
(II.17)

The cost of an LCU block encoding depends on the number of unique, non-zero coefficients, L, and the one-norm, λ. The upper bound of the number of non-zero terms from Eq. (II.10) and (II.11) is L = D4 + D2. The one-norm of the canonicalized LCU is

$$\lambda =\mathop{\sum }\limits_{p,q=0}^{D-1}| {\omega }_{pq}^{{\prime} }| \cdot N+\mathop{\sum }\limits_{p,q,r,s=0}^{D-1}| {\omega }_{pqrs}^{{\prime} }| \cdot \frac{1}{2}N(N-1).$$
(II.18)

In the case of dense matrix elements hpqrs, such that \(\sum _{pqrs}| {h}_{pqrs}| =O({D}^{4})\), we expect λ = O(N2D3). This is because in Eq. (II.12), \({H}^{\otimes M}/\sqrt{D}\) is the unitary transformation, and there is an extra prefactor 1/D. We confirm this speedup numerically in “Molecular orbitals. Scaling properties for basic models” for dense matrix elements. However, for chemical applications, hpqrs is sparse and λ depends on the particular basis set and symmetries presented in the system. Our numerical simulations (“Molecular orbitals. Scaling properties for basic models” and “Scaling properties using dual plane waves”) show that for molecular orbitals we observe less than a 1/D speedup, while for dual plane waves we achieve more than 1/D compared to second quantization for a fixed N. If D = O(N), the one-norm in second quantization scales better than in first quantization in the examples we studied. However, for a fixed N, λ can be smaller than the one-norm in second quantization not only in the asymptotic regime, but also for a practical size of the basis set. We present detailed numerical analyzes for different basis sets in “Molecular orbitals. Scaling properties for basic models” and “Scaling properties using dual plane waves” and show that for dual plane waves this is indeed the case.

So far, our results are completely general for any Hamiltonian. We used no symmetries of the tensors hpq and hpqrs, or of the coefficients ωpq and ωpqrs, to derive Eq. (II.10) through Eq. (II.18). In “Matrix elements and symmetries in different basis sets”, we discuss the consequences of different symmetries and the choice of the basis functions on the LCU.

Matrix elements and symmetries in different basis sets

Arbitrary basis sets

In quantum chemistry, the two-body matrix elements of the Hamiltonian can be computed as

$${h}_{pqrs}={\iint_{V\times V}}d{\bf{r}}d{{\bf{r}}}^{{\prime} }\frac{{\phi }_{p}^{* }({\bf{r}}){\phi }_{q}({\bf{r}}){\phi }_{r}^{* }({{\bf{r}}}^{{\prime} }){\phi }_{s}({{\bf{r}}}^{{\prime} })}{| {\bf{r}}-{{\bf{r}}}^{{\prime} }| },$$
(II.19)

where ϕq(r) are molecular orbitals and V is the supercell in the case of a periodic solid, or is \({{\mathbb{R}}}^{3}\) for an isolated molecule. Often, these orbitals are chosen to be real and in this case the two-body matrix elements possess the 8-fold symmetry

$${h}_{pqrs}={h}_{qprs}={h}_{pqsr}={h}_{qpsr}={h}_{rspq}={h}_{srpq}={h}_{rsqp}={h}_{srqp}.$$
(II.20)

One-body matrix elements are symmetric for real orbitals,

$${h}_{pq}={h}_{qp}.$$
(II.21)

Data loading is often a dominant cost of qubitized QPE. To reduce this cost, symmetries can be exploited to load only the unique, non-zero LCU coefficients. Next, we discuss how the symmetries of real matrix elements, given in Eq. (II.20) and (II.21), affect the one- and two-body LCU coefficients in turn.

The one-body LCU coefficients do not retain the two-fold symmetry of the one-body matrix elements. However, the Hadamard and XOR transforms of the matrix elements convert the symmetry into zero coefficients. The number of non-zero matrix elements for the one-body term is less than or equal to \(\frac{1}{2}D(D+1)\)44. By examining Eq. (II.11), it can be seen that the two-body LCU coefficients retain a two-fold symmetry,

$${\omega }_{pqrs}={\omega }_{rspq}.$$
(II.22)

This symmetry simplifies the canonicalized one-body coefficients so that

$${\omega }_{pq}^{{\prime} }=\left\{\begin{array}{ll}0,\quad &\,{\rm{if}}\,\,p=q=0\\ {\omega }_{pq}+\left(N-1\right){\omega }_{pq00},\quad &\,{{\rm{otherwise}}.}\,\end{array}\right.$$
(II.23)

Accounting for the XOR and Hadamard transform of the LCU, the remaining two-fold symmetry and canonicalisation, the number of unique, non-zero, two-body LCU coefficients is less than or equal to D(D + 1)(D − 1)(D + 2)/8. Indeed, assuming all original tensor elements are non-zeros, for each pair (p, q) that corresponds to non-zero value of ωpqrs we have K D(D + 1)/2 pairs of (r, s). This corresponds to K2 non-zero coefficients. Taking into account the symmetry of ωpqrs under (p, q) ↔ (r, s) and canonicalization (II.16), we obtained the total number of unique, non-zero coefficients to be (K2K)/2 = D(D + 1)(D − 1)(D + 2)/8.

Basis sets diagonalizing the Coulomb potential

We next consider the Hamiltonian in basis sets which diagonalize the Coulomb potential, such as dual plane waves. Then,

$${\hat{H}}=\mathop{\sum }\limits_{p,q=0}^{D-1}{T}_{pq}\mathop{\sum }\limits_{i=0}^{N-1}{\left(\vert p\rangle \langle q\vert \right)}_{i}+\frac{1}{2}\mathop{\sum }\limits_{p,r=0}^{D-1}{V}_{pr}\mathop{\sum }\limits_{i\ne j}^{N-1}{(\vert p\rangle \langle p\vert )}_{i}{(\vert r\rangle \langle r\vert )}_{j},$$
(II.24)

where Tpq and Vpr are symmetric matrices. This Hamiltonian transforms to a Pauli LCU as:

$$\hat{H}=\mathop{\sum }\limits_{p,q=0}^{D-1}{\omega }_{pq}\mathop{\sum }\limits_{i=0}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{X}_{Mi+k}^{{p}_{k}}{Z}_{Mi+k}^{{q}_{k}}+\mathop{\sum }\limits_{p,r=0}^{D-1}{\gamma }_{pr}\cdot \frac{1}{2}\mathop{\sum }\limits_{i\ne j}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{Z}_{Mi+k}^{{p}_{k}}{Z}_{Mj+k}^{{r}_{k}},$$
(II.25)

with the LCU coefficients

$${\omega }_{pq}=\frac{1}{D}\mathop{\sum }\limits_{u=0}^{D-1}{T}_{p\oplus u,u}{\left({H}^{\otimes M}\right)}_{uq}$$
(II.26)

and

$${\gamma }_{pr}=\frac{1}{{D}^{2}}\mathop{\sum }\limits_{u,v=0}^{D-1}{\left({H}^{\otimes M}\right)}_{pu}{V}_{uv}{\left({H}^{\otimes M}\right)}_{vr}.$$
(II.27)

As before, symmetries of the original one-body matrix elements manifest in the sparsity of ωpq, while two-body coefficients γpr remain symmetric. Similarly to the previous case, there is double-counting of Pauli strings in one- and two-body terms. By removing contributions proportional to the identity, the canonicalized Hamiltonian in the Pauli basis can be written as follows

$${\hat{H}}_{1}+{\hat{H}}_{2}=\mathop{\sum }\limits_{p,q=0}^{D-1}{\omega }_{pq}^{{\prime} }\mathop{\sum }\limits_{i=0}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{X}_{Mi+k}^{{p}_{k}}{Z}_{Mi+k}^{{q}_{k}}+\mathop{\sum }\limits_{p,r=1}^{D-1}{\gamma }_{pr}^{{\prime} }\cdot \frac{1}{2}\mathop{\sum }\limits_{i\ne j}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{Z}_{Mi+k}^{{p}_{k}}{Z}_{Mj+k}^{{r}_{k}},$$
(II.28)

where the one-body and two-body matrix elements are now

$${\omega }_{pq}^{{\prime} }=\left\{\begin{array}{ll}{\omega }_{pq},\qquad\qquad\qquad\quad\,{\rm{if}}\,\,p \,> \,0\\ {\omega }_{0q}+(N-1){\gamma }_{0q},\quad\,{\rm{if}}\,\,p=0\,\,{\rm{and}}\,\,q \,> \,0\\ 0,\qquad\qquad\qquad\qquad\,{\rm{if}}\,\,q=0\,\,{\rm{and}}\,\,p=0.\end{array}\right.$$
(II.29)
$${\gamma }_{pr}^{{\prime} }=\left\{\begin{array}{ll}0,\quad &\,{\rm{if}}\,\,p\,{\rm{or}}\,r=0\\ {\gamma }_{pr}\quad &\,{{\rm{otherwise}}.}\,\end{array}\right.$$
(II.30)

Here we again disregard terms with identity Pauli-strings, since they only contribute a constant shift of the eigenvalue. An example of such basis sets are DPWs39,40,41, which exhibit additional translation symmetry. This symmetry manifests in the sparsity of Pauli LCU coefficients as will be numerically shown in “Scaling properties using dual plane waves”.

Block encoding

In qubitized QPE, the Hamiltonian is embedded into a block of the larger unitary matrix U with subnormalization factor λ:

$$\hat{U}=\left(\begin{array}{rc}\hat{H}/\lambda &* \\ * &* \end{array}\right).$$
(II.31)

This unitary is then used to construct the quantum-walk operator,

$$\hat{W}=\hat{U}\left(\begin{array}{rc}I&\\ &-I\end{array}\right),$$
(II.32)

which has eigenvalues related to the spectrum of the Hamiltonian, \({\rm{Eigenvalue}}{[\hat{W}]}_{i}={e}^{\pm \arccos \left({E}_{i}/\lambda \right)}\)15,16.

In a circuit implementation of the block encoding of \(\hat{U}\), the value of the ‘\(*\)’ blocks is irrelevant. We will use a linear combination of unitaries (LCU) approach to constructing the block encoding. A block encoding can be constructed for an LCU \(\hat{H}=\mathop{\sum }\nolimits_{l=0}^{L-1}| {a}_{l}| {\hat{U}}_{l}\) by combining the operators PREP (state preparation) and SELECT:

$$\,{\rm{PREP}}\,{\vert 0\rangle }^{\otimes {\log }_{2}L}=\mathop{\sum }\limits_{l=0}^{L-1}\sqrt{| {a}_{l}| }\vert l\rangle ,\,\,{\rm{SELECT}}\,=\mathop{\sum }\limits_{l=0}^{L-1}\vert l\rangle \langle l\vert \otimes {\hat{U}}_{l},\,\lambda =\mathop{\sum }\limits_{l=0}^{L-1}| {a}_{l}| ,\,\hat{U}={\rm{PREP}}^{\dagger }\cdot {\rm{SELECT}}\cdot {\rm{PREP}}.$$
(II.33)

Typically, the dominant cost is \(O(\sqrt{L}\lambda /{\epsilon }_{{\rm{QPE}}})\), which assumes a QROAM-based implementation of PREP19.

First, we will show the circuit constructions for the sparse qubitization in first quantization for arbitrary basis sets, based on the Hamiltonian (II.13) and (II.15), and then, we discuss how the circuits cost reduces for the Hamiltonian with a diagonal Coulomb interaction (II.28).

Arbitrary basis sets

Let us write both the one-body (II.13) and the two-body (II.15) Hamiltonian as a single sum of ij. For the one-body term, we use s = r = 0:

$${\hat{H}}^{{\prime} }={\hat{H}}_{\,\text{LCU,1}\,}^{{\prime} }+{\hat{H}}_{\,\text{LCU,2}\,}^{{\prime} }=\mathop{\sum }\limits_{pqrs=0}^{D-1}{a}_{pqrs}\mathop{\sum }\limits_{i\ne j}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{X}_{iM+k}^{{p}_{k}}{Z}_{iM+k}^{{q}_{k}}\mathop{\prod }\limits_{{k}^{{\prime} }=0}^{M-1}{X}_{jM+{k}^{{\prime} }}^{{r}_{{k}^{{\prime} }}}{Z}_{jM+{k}^{{\prime} }}^{{s}_{{k}^{{\prime} }}}$$
(II.34)

with coefficients

$${a}_{pqrs}=\left\{\begin{array}{ll}{\omega }_{pq}^{{\prime} }/(N-1)\quad &\,{\rm{for}}\,\,r=s=0\\ {\omega }_{pqrs}^{{\prime} }/2\quad &\,{\rm{for}}\,\,(p,q)=(r,s)\\ {\omega }_{pqrs}^{{\prime} }\quad &\,{\rm{for}}\,\,(p,q)\, < \,(r,s)\\ 0\quad &\,{\rm{otherwise}}\,.\end{array}\right.$$
(II.35)

The inequality (p, q) < (r, s) should be understood as inequality of composite indices and selects one branch of the symmetry (p, q) ↔ (r, s). The factor of 1/(N − 1) in the one-body coefficients cancels the extra sum over j.

A priori, there are L = D4 coefficients apqrs to load. However, the coefficients are very sparse with many zeros or values close to zero that can be truncated, resulting in an effective L < D4, which can be harnessed by using ideas from the sparse qubitization scheme19. Therefore, we will load L coefficients al, along with index value p(l), q(l), r(l), s(l) for use in SELECT.

The SELECT operator. We begin by showing a circuit construction for a SELECT operator selecting \(\mathop{\prod }\nolimits_{k=0}^{M-1}{Z}_{iM+k}^{{q}_{k}}\). We denote the circuit as \({{\rm{SELECT}}}_{i,q}[\mathop{\prod }\nolimits_{k=0}^{M-1}{Z}_{iM+k}^{{q}_{k}}]\) where the indices are the control registers for the SELECT operation. Our approach is to iterate over i with a unary iteration gadget43. For each value of i, we apply \(M={\log }_{2}D\) CCZ gates, each of which targets the iM + k’th system qubit (k = 0, …, M − 1) and is controlled on the unary iterator qubit specifying the current value of i as well as the k’th qubit of the q register. This is illustrated in the following, where we only show the unary iterator qubit and hide all other ancillary qubits and inner workings of the unary iteration. Between the first two cnots, the unary iterator qubit is \(\left\vert 1\right\rangle\) if i = 0. Between the second two cnots, the unary iterator qubit is \(\left\vert 1\right\rangle\) if i = 1, etc.

(II.36)

The above circuit can be adjusted to implement \({\rm{SELECT}}_{i,p,q}[\mathop{\prod }\nolimits_{k=0}^{M-1}{X}_{iM+k}^{{p}_{k}}{Z}_{iM+k}^{{q}_{k}}]\). Rather than implementing it as a product of two circuits (II.36) for q and p, we combine them to save the cost of a second unary iteration: A CCX gate is added to the left of each CCZ gate in (II.36), and is controlled on the relevant qubit of the \(\vert p\rangle\) register instead of the \(\vert q\rangle\) register. This has cost

$${\rm{SELECT}}_{i,p,q}\left[\mathop{\prod }\limits_{k=0}^{M-1}{X}_{iM+k}^{{p}_{k}}{Z}_{iM+k}^{{q}_{k}}\right]:\,(N-1)+2NM\,\,{\rm{Toffolis}}\,,\,\lceil {\log }_{2}N\rceil \,\,{\rm{ancilla}}\, {\rm{qubits}},$$
(II.37)

when using measurement based uncomputation for the unary iteration. The (N − 1) Toffolis and ancilla qubits are internal to the unary iteration.

For the LCU (II.34), we use the circuit twice to implement the product

$${\text{SELECT}}_{i,p,q}\left[\mathop{\prod }\limits_{k=0}^{M-1}{X}_{iM+k}^{{p}_{k}}{Z}_{iM+k}^{{q}_{k}}\right]\cdot {\text{SELECT}}_{j,r,s}\left[\mathop{\prod }\limits_{k=0}^{M-1}{X}_{jM+k}^{{r}_{k}}{Z}_{jM+k}^{{s}_{k}}\right].$$
(II.38)

The same circuit can be used for both one-body and two-body LCU terms, by setting the control registers r = s = 0 in the one-body terms, as is taken care of in (II.35).

The work of25 uses a different approach to SELECT: Rather than implementing gates for each value of i in turn (II.36), the part of the system register belonging to electron i is swapped towards the top, then the gates are implemented on that register once, and it is swapped back. In our case this is not advantageous as we are not carrying out arithmetic on the system register and the CCX and CCZ gates are cheaper than the swaps and extra unary iteration for the unswaps would be.

The PREPARE operator. As mentioned, we use a sparse scheme to reduce the data loading cost in the face of many zero coefficients or small coefficients that are truncated to zero. For this, the non-zero coefficients ap,q,r,s from the LCU (II.34) are indexed by l {0, …L − 1} in arbitrary order. We use the sparse scheme to construct the PREP operator with QROAM and coherent alias sampling, see refs. 17,19,45. The construction of the PREP circuit is largely unchanged as compared to sparse qubitization in second quantization and the main modification is an additional step for preparing the equal superposition over electron numbers. For completeness, we provide a detailed description below:

  1. 1.

    Prepare an equal superposition state \(\frac{1}{\sqrt{L}}\mathop{\sum }\nolimits_{l=0}^{L-1}\left\vert l\right\rangle\) of dimension L via amplitude amplification (see “Amplitude amplification for equal superpositions”).

  2. 2.

    Data lookup: Use a QROAM indexed on \(\vert l\rangle\) to load indices and sign \(\vert p,q,r,s,\theta \rangle\), alternative indices and sign, and keep probabilities for the coefficients apqrs. This is the dominant cost of PREP in both time and space.

  3. 3.

    Coherent alias sampling: Perform swaps between indices and alternative indices based on an inequality test between keep probability and an equal superposition state. The sign qubit does not have to be swapped but the phase can be applied directly (omitted during uncomputation).

  4. 4.

    Prepare an equal superposition state \(\frac{1}{\sqrt{N(N-1)}}\mathop{\sum }\nolimits_{i\ne j}^{N-1}\vert i\rangle \vert j\rangle\). This can be done by preparing an equal superposition state on \(2\lceil {\log }_{2}N\rceil\) qubits and doing amplitude amplification, see “Amplitude amplification for equal superpositions”.

The SELECT operator from (II.38) should be controlled on the successful outcome of the uniform superposition preparations in step 1 and 4, requiring an extra Toffoli gate. Uncomputation of PREP can make extensive use of measurement based uncomputation, such that the only Toffoli cost is that of uncomputing the QROAM and the equal superposition states. The walk operator required for QPE is the product of the block encoding and a reflection. The reflection must be on the flag qubits, which are \(\lceil {\log }_{2}L\rceil +2\) for the equal superposition state \(\frac{1}{\sqrt{L}}\mathop{\sum }\nolimits_{l=0}^{L-1}\vert l\rangle\) and its success qubit and rotation qubit for amplitude amplification, similarly \(2\lceil {\log }_{2}N\rceil +2\) for \(\frac{1}{\sqrt{N(N-1)}}\mathop{\sum }\nolimits_{i\ne j}^{N-1}\left\vert i\right\rangle \left\vert j\right\rangle\). The costs are summarized in Table 1.

Table 1 Cost for one walk operator and the total QPE circuit

Basis sets diagonalizing the coulomb potential

The Hamiltonian (II.28) for block-encoding can be rewritten as

$${{\hat{H}}}^{\,{\prime} }=\mathop{\sum }\limits_{pqr=0}^{D-1}{a}_{pqr}\mathop{\sum }\limits_{i\ne j}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}\left({X}_{iM+k}^{{p}_{k}}{Z}_{iM+k}^{{q}_{k}}\right)\mathop{\prod }\limits_{{k}^{{\prime} }=0}^{M-1}{Z}_{jM+{k}^{{\prime} }}^{{r}_{{k}^{{\prime} }}},$$
(II.39)

where the combined matrix elements are

$${a}_{pqr}=\left\{\begin{array}{ll}{\omega }_{pq}^{{\prime} }/(N-1),\quad &\,{\rm{if}}\,\,r=0\\ {\gamma }_{qr},\quad &\,{\rm{if}}\,\,p=0\,\,{\rm{and}}\,\,0 \,<\, q\, <\, r\\ {\gamma }_{qr}/2,\quad &\,{\rm{if}}\,\,p=0\,\,{\rm{and}}\,\,q=r\,\ne\, 0\\ 0,\quad &\,{{\rm{otherwise}}.}\,\end{array}\right.$$
(II.40)

Compared to the costs for a general Hamiltonian shown in Table 1 resource requirements differ in several ways: The number of unique coefficients L is much smaller, less than or equal to D2 + D(D + 1)/2, and we never need to implement an XZXZ-type Pauli string, so the cost of the second SELECT operator does not have the controlled X-gates, which lowers the Toffoli cost to 2N + 3NM, as opposed to 2N + 4NM Toffolis. In addition we only need to specify 3 indices in the output of the QROAM register in the PREP step, allowing for m = + 2(3M + 1) instead of m = + 2(4M + 1). These Toffoli counts are also indicated in Table 1.

Scaling properties of Pauli LCU and quantum resource estimates

In this section, we first discuss the properties of the LCU decomposition and establish asymptotic scalings which we compare to existing methods. The properties that affect the overall scaling of QPE are the one-norm and the number of unique, non-zero coefficients. A good LCU minimizes both of these factors. We will calculate the one-norm and number of unique, non-zero coefficients for several systems, truncating either the matrix elements of the Hamiltonian or its LCU coefficients.

With these results, we then provide the quantum resource estimates of QPE with qubitization for a range of systems. We focus on single-shot QPE resource estimation and do not consider subdominant costs such as the antisymmetrization of the initial state16. To obtain the number of physical qubits, we will assume that the algorithm is run on superconducting qubits placed on a square grid. We consider the code cycle duration of 10−6 s and the physical error rate of 0.1% – conventional figures for superconducting qubits46,47. We employ the surface code following refs. 48,49,50,51,52,53,54; for details we refer the reader to Appendix D of ref. 55.

Implementation of qubitization-based QPE uses QROAM which allows for a trade-off between the number of logical qubits and the number of Toffoli gates. Therefore, we will consider two versions of the algorithm: one minimizes the number of logical qubits by not using additional copies of the input registers for data loading, which corresponds to setting κ1 = 1 in Table 1 (min-Qu). The other version of the algorithm chooses κ1 such that it minimizes the number of Toffoli gates needed instead (min-T).

The systems that we will consider are a dense, real Hamiltonian, the square-planar H4 molecule, a hydrogen chain, the \({[{{\rm{Fe}}}_{2}{{\rm{S}}}_{2}]}^{2-}\) complex, the uniform electron gas and the crystalline solids nickel oxide and diamond. We apply the DPW basis set to the periodic systems and for the others we use molecular orbitals. These systems range from test models to possible applications.

We compare our work primarily against the sparse, second quantization LCU of ref. 19 in “Molecular orbitals. Scaling properties for basic models” and “Quantum resource estimates for iron-sulfur complex”. We aim to establish what advantage first quantization brings over second quantization when matrix elements of the Hamiltonian are treated similarly without applying any additional factorization. We believe that this method is comparable to our method as a baseline from which other methods in the same quantization can be developed. Additionally, we will compare our results with the second quantization DPW method of ref. 41 and the PW approach of refs. 24,25 in “Resource estimates for materials”.

Molecular orbitals. Scaling properties for basic models

In order to benchmark this work against Berry’s approach19, our first point of comparison is generating random matrix elements to form a dense, real Hamiltonian. The one-body matrix elements have two-fold symmetry and the two-body matrix elements have eight-fold symmetry. This is a worst case scenario of scaling for a generic Hamiltonian. For this reason, we do not truncate these matrix elements nor the LCU coefficients, except for a 10−10 Ha cut-off for ‘zero’ coefficients. To compare the asymptotic scaling of these methods, only the dominant two-body term is considered. For a dense real Hamiltonian, we keep the number of electrons constant at N = 4, so that it does not affect the scaling in D. The one-norm scales as \({\mathcal{O}}({D}^{2.93})\) in first quantization and as \({\mathcal{O}}({D}^{4})\) in second quantization as shown in Fig. 1a. Figure 1d shows that both methods have \({\mathcal{O}}({D}^{4})\) unique, non-zero LCU coefficients as expected.

Fig. 1: Scaling of the one-norm and number of unique, non-zero coefficients (NNZ) for test systems.
figure 1

a For a dense, real Hamiltonian with a fixed number of electrons, the one-norm of first quantization scales as \({\mathcal{O}}({D}^{3})\) compared to \({\mathcal{O}}({D}^{4})\) for the equivalent second quantization method, where D is the number of orbitals. b Similarly, for H4 the one-norm scales better in first than in second quantization. c Hydrogen chains with varying number of atoms, NA, with the number of orbitals per atom, NA/D, being constant. The number of electrons, N, is proportional to the number of atoms and orbitals, N = O(NA) = O(D). Therefore, the \({\mathcal{O}}({N}^{2})\) dependence of the one-norm in first quantization leads to worse scaling. d For a dense, real Hamiltonian, the number of unique, non-zero terms is the same for first and second quantization. e For the H4 molecule, the number of unique non-zero terms is greater for first quantization. f The number of unique, non-zero terms has better scaling in second quantization than first for the Hydrogen chain.

Square planar H4 is a common molecule used to compare different algorithms. In this work, we aim to use the same bond lengths and truncation procedures as Lee et al.17, so that we can compare our scaling results. The H4 molecule has N = 4 electrons and a bond length of 2 Bohr radii. For the second quantization representation, we truncate the matrix elements so that there is an error of less than 5.0 × 10−5 Ha per hydrogen atom, using CCSD(T). We use the aug-cc-pVQZ56,57 basis with D = 4 to 128 orbitals, using Boys localization. For the first quantization representation, we truncate the LCU coefficients to achieve this error instead, because the Pauli LCU decomposition is reversible and it is the number of non-zero LCU coefficients that determine the computational cost of the algorithm. As before, we count all LCU coefficients below 10−10 as zero and we only consider the two-body term since we are interested in the asymptotic scaling of the methods. For the line of best fit, we use the final three data points. For H4, the one-norm has lower asymptotic scaling in first quantization. As shown in Fig. 1b, our method scales as \({\mathcal{O}}({D}^{2.26})\), while the Berry representation scales as \({\mathcal{O}}({D}^{3.09})\). The Berry representation has fewer unique, non-zero LCU coefficients, but a slightly worse asymptotic scaling of \({\mathcal{O}}({D}^{4.25})\) compared to \({\mathcal{O}}({D}^{4.06})\) is obtained, see Fig. 1e. However, we expect that the asymptotic scaling is actually \({\mathcal{O}}({D}^{4})\), which is obscured because the truncation improved some data points more than others.

The hydrogen chain is another common system used to compare methods. It was also considered in Lee et al.17 and we will similarly use a bond length of 1.4 Bohr radii and a truncation procedure that aims for a CCSD(T) error of 5.0 × 10−5 Ha per hydrogen atom employing STO-3G basis set. In such basis set, we have one orbital per atom, so N = D = NA, where NA is the number of atoms in the chain. As for H4, we truncate the matrix elements of the second quantization representation and the LCU coefficients of the first quantization representation. Unlike the previous two examples, the number of electrons increases with the number of orbitals for the hydrogen chain model. For the previous systems, the number of system qubits is smaller in first quantization. For the hydrogen chain, the dependence of the number of orbitals on the number of electrons turns the system qubit number advantage into a disadvantage. In this case, first quantization requires \(D\log D\) system qubits compared to 2D in second quantization. Because the number of electrons grows with the system size, the \({\mathcal{O}}({N}^{2})\) scaling of the first quantization one-norm, shown in Eq. (II.18), becomes an \({\mathcal{O}}({D}^{2})\) scaling. Therefore, the scaling of the one-norm in first quantization for this problem is about D2 worse than second quantization methods, as shown in Fig. 1c. There is no such dependence on the number of electrons in the number of non-zero LCU coefficients. However, the scaling of the Berry representation is superior at \({\mathcal{O}}({D}^{3.28})\), compared to \({\mathcal{O}}({D}^{3.97})\) for the first quantization representation, see Fig. 1f.

Quantum resource estimates for iron-sulfur complex

Iron-sulphur clusters are known for their structural versatility and are ubiquitous in biochemical systems58,59. They easily participate in redox reactions in which electrons are exchanged between reactants and as a result form a crucial part of electron transfer mechanisms that take place in several important enzymes. These clusters consist of a FenSm (n, m are some integers) core cluster to which a number of ligands are attached and in an enzyme several such clusters may form an electron transfer pathway to ensure the passage of electrons within the protein60,61. There are several other functions iron-sulfur clusters serve in biological systems, and they also represent a challenge to electronic structure theory calculations58,59. The characterization of magnetic interactions in these clusters is especially challenging. In a typical scenario, each iron in the cluster can be thought of as a site where several unpaired electrons with parallel spins are localized. The electromagnetic properties of the entire cluster then depend on the interactions between several of these sites. Antiferromagnetic coupling is often favored between two sites, in which case all electrons in one site are aligned in an antiparallel fashion compared to the electrons on the other site. In chemistry, this situation perhaps is one of the best candidates for finding strong correlation effects62,63. Consequently, such systems require demanding classical computational techniques for even a qualitatively accurate calculation, and even the simplest of these clusters, such as Fe2S2, often feature as test systems in classical computational studies64,65. Such clusters are also used as a benchmark molecule in the context of quantum computation13,66.

In this work, we compare the quantum resource estimates of our new approach with the second quantization sparse qubitization algorithm of Berry et al.19, considering the \({[{{\rm{Fe}}}_{2}{{\rm{S}}}_{2}]}^{2-}\) complex as a test system. While the active space we consider consists of only 14 electrons, this is a more realistic example than those we used in “Molecular orbitals. Scaling properties for basic models”. In order to define the active space Hamiltonian, we have followed methodology similar to refs. 13,64. Namely, we used the cc-pVTZ-DK basis set56,67,68,69 and carried out high-spin restricted open-shell Kohn-Sham calculations with the exact two-component (x2c) scalar-relativistic correction70,71,72 and the BP86 functional73,74. We achieve convergence with the second-order co-iterative augmented Hessian (CIAH) method75. Next, we localized the virtual and occupied spaces separately using Pipek-Mezey (PM) localization76,77. We choose 12 occupied orbitals for the active space, consisting of five 3d orbitals per iron atom and one 3pz orbital per bridging sulfur atom. The active space is then completed by selecting D − 12 orbitals from the virtual space based on their orbital energy such that D = 16, 32, 64, 128. The first 122 virtual space orbitals are all centered on iron or a bridging sulfur.

In order to carry out resource estimations, one has to distribute the error budget among all approximations used in the quantum algorithm17:

$${\epsilon }_{{\rm{tot}}}={\epsilon }_{{\rm{QPE}}}+{\epsilon }_{{\rm{trunc}}}+{\epsilon }_{{\rm{PREP}}},$$
(II.41)

where ϵQPE is the error due to finite precision in QPE, ϵtrunc is the error due to truncation of the Hamiltonian and ϵPREP is the error in the state preparation (PREP circuit). We choose these parameters as follows17:

$${\epsilon }_{{\rm{QPE}}}=\frac{10}{16}{\epsilon }_{{\rm{tot}}},\quad {\epsilon }_{{\rm{trunc}}}=\frac{3}{16}{\epsilon }_{{\rm{tot}}},\quad {\epsilon }_{{\rm{PREP}}}=\frac{3}{16}{\epsilon }_{{\rm{tot}}}.$$
(II.42)

Our target is to estimate the ground state energy with precision of ϵtot = 0.1 mHa, since the energy difference between different spin states in \({[{{\rm{Fe}}}_{2}{{\rm{S}}}_{2}]}^{2-}\) is on the order of only few mHa (the singlet-triplet gap is ≈2.0 mHa)64. This is higher precision than that typically used in resource estimation for quantum chemistry (1.6 mHa). We truncate the first quantization LCU coefficients and second quantization Hamiltonian matrix elements to meet the target error, ϵtrunc, as estimated with an unrestricted MP2 (UMP2) calculation. We choose UMP2 method instead of CCSD(T) because UMP2 requires fewer computational resources and does not have problems with convergence, unlike CCSD(T) for large active spaces (128 orbitals).

Figure 2 shows the resource estimates for the two approaches, which both use QROAM that minimizes the Toffoli count (min-T). The number of Toffoli gates in first quantization scales asymptotically better by a factor of D0.4. This asymptotic speedup is due to better scaling of the Pauli LCU one-norm in first quantization. The number of logical qubits is similar in both methods because of the small truncation value, ϵtrunc. For D ≤ 1024, sparse qubitization in second quantization shows smaller Toffoli count. The reason is that the one-norm of the first-quantization LCU explicitly depends on the number of electrons squared and this gives a large prefactor. Only at more than 103 orbitals would we expect first quantization to show an advantage.

Fig. 2: Resource estimates of sparse qubitization in first and second quantization for \({[{{\rm{Fe}}}_{2}{{\rm{S}}}_{2}]}^{2-}\) with 14 electrons, varying the number of orbitals.
figure 2

Resource estimates were obtained for the target accuracy ϵtot = 0.1 mHa for ground state energy estimation. a Second quantization requires fewer Toffoli gates for all data points, but first quantization scales better in D. b Only the last three points were used for the fit of logical qubits. First and second quantization require a similar number of logical qubits.

Table 2 shows the resource estimates when one chooses to minimize the number of logical qubits. As one can see, the number of qubits becomes smaller in first quantization compared to second quantization for more than 16 orbitals. For example, for 128 orbitals the number of qubits is 293 while in second quantization this number is 492. With larger system size there will be even larger improvement. However, the Toffoli count is again only asymptotically better in first quantization, scaling with order \({\mathcal{O}}({D}^{6.04})\) against \({\mathcal{O}}({D}^{6.23})\).

Table 2 Resource estimates with QROAM that minimizes the number of qubits instead of the number of Toffolis

Scaling properties using dual plane waves

In order to take advantage of the more efficient algorithm for systems with diagonal Coloumb interactions outlined in “Basis sets diagonalizing the coulomb potential” we consider the DPW. This basis set is related to PW via a Fourier transform, so the basis functions are localized in real space around evenly spaced grid points (see refs. 39,40 and Appendix C of ref. 41 for details). This transformation diagonalizes the Coloumb interaction, bringing the Hamiltonian into the form of Eq. (II.24) and greatly reducing the number of coefficients that need to be loaded. The trade-off is that one typically needs far more DPWs to achieve the same basis set accuracy compared to Gaussian orbitals, but since the size of our system register only scales logarithmically with the basis size we expect DPW to be favorable for our algorithm.

The Pauli-LCU one-norm of the Hamiltonian (II.24) is bound by

$${\mathcal{O}}\left(N{D}^{1/2}\right)\le \lambda \le {\mathcal{O}}\left(\frac{N(N+1){D}^{2}}{2}\right)$$
(II.43)

We note that this bound might be too loose for DPW and in order to establish a more precise dependence on the number of basis functions, we have carried out numerical calculations. For this, we have considered diamond with a cubic cell made of 8 atoms. The one-norm of the Pauli LCU consists of one- and two-body contributions and it is bounded by the sum of the one-norms of the kinetic energy, nuclear-electron and electron-electron interactions

$$\lambda ={\lambda }_{1}+{\lambda }_{2}\le {\lambda }_{T}+{\lambda }_{U}+{\lambda }_{V}.$$
(II.44)

The total norm, as well as each individual contribution, is shown in Fig. 3a, b. The one-body term provides smaller contribution to the total norm than the two-body term. The nuclear-electron contribution is around an order of magnitude smaller than electron-electron contribution and demonstrates subdominant scaling in basis functions. Based on these numerical results, we can conclude that the total norm scales as

$$\lambda ={\mathcal{O}}\left(N{\left(\frac{D}{V}\right)}^{2/3}{D}^{0.1}\right)+{\mathcal{O}}\left({N}^{2}{\left(\frac{D}{V}\right)}^{1/3}{D}^{0.28}\right),$$
(II.45)

where the first contribution is due to the kinetic energy and the second term is due to the electron-electron interaction. We note that the calculations we have done might be in the pre-asymptotic regime and as a result this scaling might change for the larger basis sets. However, this is the regime which is of interest when such methodology will be used with pseudo-potentials. In comparison, the one-norm in second quantization with DPW41 scales as \({\mathcal{O}}({D}^{2})\) with respect to the number of DPWs and thus, we achieved a considerable speedup by using the same basis but employing first quantization. For a very small number of basis functions, the norm in second quantization is smaller than our norm but for the physically interesting regime we achieve orders of magnitude improvement as shown in Fig. 3c.

Fig. 3: Scaling of the one-norm and number of unique, non-zero coefficients (NNZ) for diamond.
figure 3

Diamond is represented by an 8-atom unit cell a First quantization using DPWs. The two-body term contributes more to the one-norm. b First quantization using DPWs. Electron-electron interactions contribute more to the one-norm than the electron-nuclear or kinetic terms. c In first quantization, the scaling of the one-norm with respect to the number of basis functions, D, is more favorable than second quantization. The norm of PW approach of ref. 25 has the most favorable scaling but in the pre-asymptotic regime the first quantization DPWs norm of this work is lower. d The number of unique, non-zero coefficients scales as \({\mathcal{O}}({D}^{1.84})\) in the number of basis functions, D, for first quantization using DPWs and as \({\mathcal{O}}(D)\) for second quantization using DPWs.

Apart from improvement in the one-norm, we also observe polynomial speedup in the space-time cost of the walk operator. Such cost consists of the SELECT and PREP circuits. In second quantization, the dominant cost will be SELECT in which the Toffoli-gate count scales linearly with the size of the basis set, D,43 while in our approach such cost scales logarithmically in D. The most dominant cost is then due to data loading in PREP, where both Toffoli and qubit numbers scale as \({\mathcal{O}}(\sqrt{\Gamma })\), where Γ is the amount of information needed to specify the Hamiltonian19. From Fig. 3d, we can see that \(\Gamma ={\mathcal{O}}({D}^{1.85})\) and as a result the Toffoli complexity scales sublinearly. We also note that we did not use any translation symmetries here and we expect that if one encorporates those, one can achieve \({\mathcal{O}}({D}^{1/2})\) scaling of PREP. When one uses QROM instead of QROAM to minimize the number of qubits, the space cost scales logarithmically in D, while the time cost of the block-encoding scales as \({\mathcal{O}}({D}^{1.85})\). Even in this case, we achieve smaller total resources, as will be shown in the next section.

Resource estimates for materials

In this section, we report quantum resource estimates for the uniform electron gas (UEG), diamond and nickel oxide (NiO). We compare our approach with the second quantization DPW algorithm43 and the first quantization PW algorithm described in ref. 25. The latter bypasses the data loading and uses quantum arithmetic to approximate the Hamiltonian matrix elements on the quantum computer. The DPW can be obtained by applying the unitary transformation to PW and as a result, the same size DPW and PW would provide the same accuracy and can therefore be compared directly. However, the algorithm from ref. 25 can only work with basis sets of size \({({2}^{n}-1)}^{3}\) for integer n, whereas our algorithm requires 23n basis states, and as a result, we cannot compare the identical basis sizes and instead use the closest matches. Since we do not introduce the error due to truncation of the Hamiltonian matrix elements, we distribute the total error, ϵtot, as ϵQPE = 15.8ϵtot/16 and ϵPREP = 0.2ϵtot/16. The rationale for this is to reduce the Toffoli count as much as possible which is mainly affected by ϵQPE.

We provide resource estimates for the UEG with N = 14, 54, 114 electrons in the classically challenging strongly correlated regime. In order to converge the total energy of 14 electrons with 1 mHa accuracy, one would need on the order of 2000 plane waves, and the strongly correlated regime appears when the Winger-Seitz radius, rs, is larger than 3.5 Bohr78. As shown in ref. 78, even for around 100 basis functions there is a significant deviation (a few tens of mHa) of approximate methods from i-FCIQMC79. The energy estimates in the regime with more than 1000 PWs and 14 electrons are not currently available with FCIQMC method due to high computational cost, although we do not state that this is impossible to reach with modern large-scale HPC. A larger system with N = 54, and rs ≥ 5 is also challenging regime for classical algorithms, where different flavors of quantum Monte-Carlo algorithms give energies differing by more than 1 mHa per electron. For N = 114, only few data points are available in the regime rs ≤ 2, and D ≤ 210978,80. Therefore, we consider rs = 5 and the number of basis functions, D, to be 4096. For the 14-electron system, we also present results for D = 512. Resource estimates for these systems are shown in Table 3. Comparison between the four algorithms show that our method substantially outperforms the previous DPW algorithm by Babbush et al.43, requiring only a fraction of logical qubits and reducing the Toffoli count by four orders of magnitude. The min-Qu algorithm also achieves the lowest logical qubit counts across all tested systems, by requiring a much smaller system register compared to the second quantization algorithm, whose system register scales linearly with the number of basis states. The number of logical qubits in the PW algorithm also scales logarithmically in D similarly to the min-Q algorithm, but the qubit overhead of the PW algorithm is larger, and it requires more logical qubits overall. This results into higher physical qubit requirements, despite having overall lower quantum volume for large basis sets.

Table 3 Resource estimates for uniform electron gas systems with 14, 54, and 114 electrons and Wigner-Seitz radius of 5

Next, we consider two examples of realistic materials: diamond and nickel oxide (NiO) in cubic cells with 8 and 64 atoms. While diamond is a relatively simple material to describe using classical electronic structure methods, NiO is the quintessential system for studying the strong correlations due to presence of localized d-electrons and it is often used as a prototype system to test the accuracy of different electronic structure methodologies81,82,83,84,85. All-electron calculations of such a material would require enormous PW basis sets. For example, the much simpler Si would require on the order of 109 PW per Å3 to converge the mean-field eigenvalues with the accuracy of 0.01 eV using the regularized Coulomb potential86. Instead of analyzing such a large basis set, we will focus on the practical size of basis sets which are suitable87,88 for use with pseudopotentials36,37 or the projector augmented-wave method38. For example, a simple conversion from grid-spacing, h, to plane-wave cutoff \({E}_{{\rm{cut}}}=\frac{1}{2}{(\pi /h)}^{2}\)89, for 64-atom diamond and NiO cells would provide Ecut ≈ 750 eV and Ecut ≈ 550 eV kinetic energy cutoff, respectively. Resource estimates with the parameters described are shown in the Table 4. As one can see, the min-Qu algorithm provides the lowest number of logical qubits for small cell calculations and as a result the lowest number of physical qubits, apart from the case of diamond and nickel oxide with 64 atoms cell, where the PW algorithm has lower physical qubit count. However, the number of physical qubits in both approaches differ by less than 10%. The min-T algorithm provides the lowest Toffoli count in all cases apart from diamond represented with a 64-atom cell, where the first quantization PW approach is better. The interesting result is that for a large system with 1152 electrons and ≈ 32000 basis functions, the min-T algorithm which loads the data from QROAM requires only a factor of 1.8 more logical qubits as compared to the first quantization PW approach which completely avoids the data loading. Similarly to the results obtained for the UEG model, the first quantization approach with DPWs (both min-Qu and min-T versions) is significantly better than the second quantization approach regardless of quadratic dependence of the first quantized Hamiltonian on the number of electrons as indicated in Table 4, and allows calculation of realistic systems with lower physical qubit counts than any prior method.

Table 4 Resource estimates for diamond and nickel oxide with different number of atoms in the supercell (8 and 64 atoms)

In order to understand why our approach provides a lower Toffoli count for some systems compared to the PW algorithm of ref. 25, we report the Toffoli cost for construction of the quantum walk operator and the one-norm. The one-norm of our approach is lower up to a few tens of thousands of basis functions (Fig. 3d). In Table 5, we show the cost of block-encoding and the one-norm for each system. For diamond and nickel-oxide the norm of our approach is lower than that of ref. 25, and for NiO with 8 atoms in the unit cell, the cost of block encoding is also slightly lower. For NiO with 64 atoms, the block encoding cost is slightly larger by a factor of 1.2, but this cost is compensated by a lower norm which leads to overall smaller Toffoli cost. For diamond with a 64 atom cell, the norm is smaller by a factor of 1.3 but the block encoding cost is larger by a factor of 2.0, which makes the total Toffoli cost larger. For larger basis sets (more then 30,000 basis functions) and for systems studied here, we expect the PW approach of ref. 25 to be better due to lower asymptotic scaling.

Table 5 One-norm and the number of Toffoli gates per block encoding

Discussion

In this work, we have presented a Pauli LCU decomposition of the Hamiltonian in first quantization, for arbitrary basis sets. This simple form of the LCU allow us to construct an efficient block encoding which can be used with the sparse qubitization technique. In addition to the unary iteration over the electron number, our SELECT circuit involves only CCX and CCZ gates for selecting Pauli strings and the number of such gates scales only logarithmically in the size of the basis set. We observe that our approach achieves asymptotic polynomial speedup in the number of Toffoli gates compared to second quantization with the same basis set. When QROAM is implemented in such a way as to minimize the number of qubits then we see that already for an active space of 14 electrons and 32 orbitals of the \({[{{\rm{Fe}}}_{2}{{\rm{S}}}_{2}]}^{2-}\) molecule, the first quantization algorithm requires fewer qubits compared to its second quantization counterpart. We observe these asymptotic improvements because the one-norm of the Pauli LCU decomposition scales better, but the large prefactor due to explicit dependence on the number of electrons squared can only be overcome for large basis sets.

In recent years, several efficient LCU decompositions of the Hamiltonian in second quantization and corresponding block-encodings have been developed17,18,19,45,90. Exactly the same methods, such as factorization of the Hamiltonian’s matrix elements, can of course be applied in first quantization. It is of great importance to understand if the first quantization approach can bring advantage beyond asymptotic improvements over second quantization when such factorizations are applied.

We have also applied our method to the DPW basis set, which is more suitable for periodic systems. In that case, the SELECT and PREP circuits can be slightly optimized compared to the generic quantum chemistry case. The resource estimates for realistic materials show that in some cases we outperform in some metrics (either qubit or Toffoli counts) the previous first quantization approach of refs. 24,25, which used regular PW. Our approach is more efficient in Toffoli and qubit count than the second quantization approach with DPW43 on all considered examples. However, we recognize that there are regimes when second quantization proves advantageous, for example, when one considers systems with the same number of orbitals as the number of electrons (half-filling).

Our approach with DPW employs classical data loading, yet it still provides quantum resources comparable to the approach of refs. 24,25 in the physically interesting regimes. The reasons for this are the sub-linear scaling of the one-norm of the Pauli LCU in the basis set size and the diagonal form of the Coulomb interaction which significantly reduces the amount of information needed to specify the Hamiltonian. For example, for NiO we achieve a lower T-gate count and the number of logical qubits is only a factor of 1.8 larger than that obtained with pure PW approach, for a PW cutoff that is suitable for use with pseudopotentials. However, since we used data loading, our approach can be used with norm-conserving pseudopotentials, and projector augmented-wave method (Conventional PAW approach38 introduces the non-orthogonal transformations which do not allow for a simple Hamiltonian expression using plane wave or dual plane waves in first quantization Eq. (II.1) and as a result PAW introduces Hamiltonian with non-orthogonal eigenstates. However, the unitary version of PAW (UPAW)55 overcomes these issues.), because the only change would be in the calculation of matrix elements on a classical computer. As is shown in several density functional theory calculations (https://molmod.ugent.be/deltacodesdft)87,88,91, PAW approach demonstrates smaller approximation error than norm-conserving pseudopotentials and often requires fewer number of PW in order to reach the convergence. While there has been generalization of refs. 24,25 on norm-conserving Goedecker-Teter-Hutter pseudopotentials34,35, it is not clear if such an approach can be easily generalized for PAW method or even other pseudopotentials without the data loading.

We also speculate that the algorithms presented here can be further improved. For example, in the DPW basis set, there are only 3D unique Hamiltonian matrix elements and all other matrix elements could be restored by using the translational symmetry. We have not considered this here. One possible way to take this into account is to load unique original matrix elements from QROAM and then calculate the Pauli LCU coefficients (Eq. (II.26) and (II.27)) on the quantum computer (quantum Pauli decomposition). This would reduce the data loading which in turn would make Toffoli cost of the PREPARE to scale as \({\mathcal{O}}(D^{0.5})\) .

The Pauli LCU decomposition of the Hamiltonian (II.13), (II.15) is generic and it can be applied equally to either bosonic or fermionic or even mixed systems. This in turn opens up for possibilities to apply the approach developed here to other application areas such as studying vibrational properties of molecules and materials92,93,94,95. We also hope that the explicit form of the Pauli LCU decomposition can be used to reduce resources in other quantum algorithms such as Trotterization96,97 or, more generally, Hamiltonian simulation23.

Methods

Two-body LCU derivation

In this Section, we derive Eq. (II.11), starting from the second term on the right-hand side of Eq. (II.1). First, we express the tensor product of operators in terms of X and Z Pauli operators in the same way as ref. 44:

$${\left(\vert p\rangle \langle q\vert \right)}_{i}{\left(\vert r\rangle \langle s\vert \right)}_{j}=\left(\mathop{\sum }\limits_{u=0}^{D-1}\frac{1}{D}\mathop{\prod }\limits_{k=0}^{M-1}{(-1)}^{{q}_{k}\wedge {u}_{k}}{X}_{iM+k}^{{p}_{k}\oplus {q}_{k}}{Z}_{iM+k}^{{u}_{k}}\right)\left(\mathop{\sum }\limits_{v=0}^{D-1}\frac{1}{D}\mathop{\prod }\limits_{l=0}^{M-1}{(-1)}^{{s}_{l}\wedge {v}_{l}}{X}_{jM+l}^{{r}_{l}\oplus {s}_{l}}{Z}_{jM+l}^{{v}_{l}}\right).$$
(VIII.1)

We have neglected the spin indices, because these cancel to the identity the same as in the one-body term. Next, we take out the factors of − 1, which are the matrix elements of Hadamard matrices (see definition Eq. (II.9)),

$${\left(\vert p\rangle \langle q\vert \right)}_{i}{\left(\vert r\rangle \langle s\vert \right)}_{j}=\mathop{\sum }\limits_{u=0}^{D-1}\mathop{\sum }\limits_{v=0}^{D-1}\frac{1}{{D}^{2}}{\left({H}^{\otimes M}\right)}_{qu}{\left({H}^{\otimes M}\right)}_{sv}\left(\mathop{\prod }\limits_{k=0}^{M-1}{X}_{iM+k}^{{p}_{k}\oplus {q}_{k}}{Z}_{iM+k}^{{u}_{k}}\right)\left(\mathop{\prod }\limits_{l=0}^{M-1}{X}_{jM+l}^{{r}_{l}\oplus {s}_{l}}{Z}_{jM+l}^{{v}_{l}}\right).$$
(VIII.2)

Substituting Eq. (VIII.2) into the two-body term of Eq. (II.1), we find

$$\begin{array}{lll}{\hat{H}}_{{\rm{two}}\text{-}{\rm{body}}}&=&\frac{1}{2}\mathop{\sum }\limits_{p,q,r,s=0}^{D-1}\mathop{\sum }\limits_{u,v=0}^{D-1}\frac{1}{{D}^{2}}{h}_{pqrs}{\left({H}^{\otimes M}\right)}_{qu}{\left({H}^{\otimes M}\right)}_{sv}\\&& \mathop{\sum }\limits_{i\ne j}^{N-1}\left(\mathop{\prod }\limits_{k=0}^{M-1}{X}_{iM+k}^{{p}_{k}\oplus {q}_{k}}{Z}_{iM+k}^{{u}_{k}}\right)\left(\mathop{\prod }\limits_{l=0}^{M-1}{X}_{jM+l}^{{r}_{l}\oplus {s}_{l}}{Z}_{jM+l}^{{v}_{l}}\right).\end{array}$$
(VIII.3)

Next, we change sum index from p to g = p q and from r to f = r s, giving

$${\hat{H}}_{{\rm{two}}\text{-}{\rm{body}}}=\mathop{\sum }\limits_{u,v,g,f=0}^{D-1}{\omega }_{gufv}\frac{1}{2}\mathop{\sum }\limits_{i\ne j}^{N-1}\left(\mathop{\prod }\limits_{k=0}^{M-1}{X}_{iM+k}^{{g}_{k}}{Z}_{iM+k}^{{u}_{k}}\right)\left(\mathop{\prod }\limits_{l=0}^{M-1}{X}_{jM+l}^{{f}_{l}}{Z}_{jM+l}^{{v}_{l}}\right),$$
(VIII.4)

where

$$\displaystyle{\omega }_{gufv}=\frac{1}{{D}^{2}}\mathop{\sum }\limits_{q,s=0}^{D-1}{h}_{g\oplus q,q,f\oplus s,s}{\left({H}^{\otimes M}\right)}_{qu}{\left({H}^{\otimes M}\right)}_{sv}.$$
(VIII.5)

Since ij, the two Pauli strings always commute, and as a result

$${\hat{H}}_{{\rm{two}}\text{-}{\rm{body}}}=\mathop{\sum }\limits_{u,v,g,f=0}^{D-1}{\omega }_{gufv}\frac{1}{2}\mathop{\sum }\limits_{i\ne j}^{N-1}\mathop{\prod }\limits_{k=0}^{M-1}{X}_{Mi+k}^{{g}_{k}}{Z}_{Mi+k}^{{u}_{k}}{X}_{Mj+k}^{{f}_{k}}{Z}_{Mj+k}^{{v}_{k}}.$$
(VIII.6)

This is the result of Eq. (II.11) with different labels for the indices: g, u, f, v are p, q, r, s, respectively. To turn the two-body LCU coefficients calculation into a matrix multiplication, we form a D2 × D2 Hadamard matrix:

$$\left(\mathop{\prod }\limits_{k=0}^{M-1}{(-1)}^{{q}_{k}\wedge {u}_{k}}\right)\left(\mathop{\prod }\limits_{l=0}^{M-1}{(-1)}^{{s}_{l}\wedge {v}_{l}}\right)={\left({H}^{\otimes M}\right)}_{qu}{\left({H}^{\otimes M}\right)}_{sv}={\left({H}^{\otimes 2M}\right)}_{Dq+s,Du+v}.$$
(VIII.7)

If we rearrange the matrix elements with

$$\displaystyle{\tilde{h}}_{Dg+f,Dq+s}={h}_{g\oplus q,q,f\oplus s,s},$$
(VIII.8)

the LCU coefficients can be found with matrix multiplication,

$$\displaystyle{\omega }_{gufv}=\frac{1}{{D}^{2}}\mathop{\sum }\limits_{q,s=0}^{D-1}{\tilde{h}}_{Dg+f,Dq+s}{\left({H}^{\otimes 2M}\right)}_{Dq+s,Du+v}$$
(VIII.9)

Using the fast Hadamard transform, calculating these matrix elements scales as \({\mathcal{O}}({D}^{4}\log D)\), rather than \({\mathcal{O}}({D}^{6})\) for basic matrix multiplication.

Amplitude amplification for equal superpositions

The equal superposition state

$$\frac{1}{\sqrt{L}}\mathop{\sum }\limits_{l=0}^{L-1}\left\vert l\right\rangle$$
(VIII.10)

needed in PREP can be prepared trivially with Hadamards if L is a power of 2. Otherwise, it is prepared using amplitude amplification; the procedure for this state is described in ref. 17.

In this appendix, we will adapt that procedure and cost the preparation of the equal superposition state

$$\frac{1}{\sqrt{N(N-1)}}\mathop{\sum }\limits_{i\ne j}^{N-1}\vert i\rangle \vert j\rangle$$
(VIII.11)

also required in PREP. We use a single iteration of Grover search98 along the lines of Fig. 3 from ref. 17.

In order to achieve a success probability close to one, we augment (VIII.11) by an ancilla rotated by a classical tunable angle β, such that the desired state is

$$\vert \phi \rangle =\frac{1}{\sqrt{N(N-1)}}\mathop{\sum }\limits_{i\ne j}^{N-1}\vert i\rangle \vert j\rangle \otimes {R}_{Y}^{\dagger }(\beta )\vert 0\rangle$$
(VIII.12)

The starting point for Grover search of this state (including the rotated ancilla) is an equal superposition over all computational basis states

$$\vert \psi \rangle =\frac{1}{\sqrt{{2}^{\lceil {\log }_{2}N\rceil }{2}^{\lceil {\log }_{2}N\rceil }}}\mathop{\sum }\limits_{i,j=0}^{{2}^{\lceil {\log }_{2}N\rceil }-1}\vert i\rangle \vert j\rangle \otimes \vert +\rangle.$$
(VIII.13)

The circuit we use to perform the Grover iteration is the following:

(VIII.14)

It closely follows the textbook circuit98. Initially, the Hadamards prepare the state \(\vert \psi \rangle\). This is followed by a Grover iteration, consisting of reflections about the target state, \((I-2\vert \phi \rangle \langle \phi \vert )\), and about the initial state, \((I-2\vert \psi \rangle \langle \psi \vert )\). Finally, the desired state is flagged in the success qubit.

The overlap of desired state and initial state is

$$\cos \alpha := \langle \phi | \psi \rangle =\sqrt{\frac{N(N-1)}{{2}^{\lceil {\log }_{2}N\rceil }{2}^{\lceil {\log }_{2}N\rceil }}}\sin (\beta /2+\pi /2).$$
(VIII.15)

The output of a single iteration Grover circuit is a state98

$$\cos (3\alpha )\vert \phi \rangle +\sin (3\alpha )\vert {\phi }^{\perp }\rangle ,$$
(VIII.16)

where \(\vert {\phi }^{\perp }\rangle\) is some state orthogonal to \(\vert \phi \rangle\); the success probability has been amplified from \(\cos \alpha\) to \(\cos (3\alpha )\). Thus, one can ensure a high success probability by choosing β such that \(\cos (3\alpha )=1\Rightarrow \cos \alpha =-1/2\). This is always possible because

$$\sqrt{\frac{N(N-1)}{{2}^{\lceil {\log }_{2}N\rceil }{2}^{\lceil {\log }_{2}N\rceil }}} > \frac{1}{2}\,,$$
(VIII.17)

so it can always be scaled down by \(\sin (\beta /2+\pi /2)\) to reach \(\cos \alpha =-1/2\).

The optimal value of β must be approximated to accuracy ϵ when compiling the rotations RY(β). In principle, they could be compiled directly into a Clifford+T gate set99 using \(4{\log }_{2}(1/\epsilon )+O(\log \log (1/\epsilon ))\,T\)-gates. However,17 choose the approach of ref. 100 Appendix A, adding a binary representation of β into a phase-gradient ancilla register, which has a similar T count. Yet, the latter reference mentions that for a classically given β, like here, compilation into Clifford+T is slightly faster. For consistency with literature, we follow the approach of ref. 17 using the phase gradient register.

The cost of this circuit is as follows: The Hadamard gates are Clifford gates and therefore require no Toffolis. The two inequality tests (i < N) and (j < N) are implemented as described in appendix H of19, where we can save one Toffoli because N is constant, for a Toffoli cost of \(2(\lceil {\log }_{2}N\rceil -1)\). The final equality test requires \(\lceil {\log }_{2}N\rceil\) Toffolis. If ηN is the largest power of 2 that divides N, the cost of each i < N inequality test is reduced by ηN Toffolis (by ignoring the lowest \({\log }_{2}{\eta }_{N}\) bits/qubits of N and \(\left\vert i\right\rangle\)). Each test also requires \(\lceil {\log }_{2}N\rceil -1\) ancilla not depicted in (VIII.14), and can be uncomputed without additional Toffoli cost using the out-of-place adders from ref. 101. (Essentially, the first (i < N) is only the computation step of the addition. The ancillas are kept alive during the CCCZ gate, and the second (i < N) is the uncomputation step of the addition). The total cost of all the (in)equality tests in the circuit (VIII.14) is therefore \(6\lceil {\log }_{2}N\rceil -4{\eta }_{N}-4\) Toffolis and \(3\lceil {\log }_{2}N\rceil -3\) ancilla.

The rotation by β using bN bits of precision (typically bN = 8) costs bN − 3 Toffolis, since the angle is known classically, plus a one-time cost for preparing a catalytic state that can be reused for rotations throughout the circuit and is thus negligible. It also requires bN ancilla. The reflection around \(\vert \phi \rangle\) requires 2 Toffolis, the reflection around \(\vert \psi \rangle\) needs \(2\lceil {\log }_{2}N\rceil -1\) Toffolis, and the flagging of the success qubit requires another 2 Toffolis. The total Toffoli cost of the equal superposition circuit is therefore

$$8\lceil {\log }_{2}N\rceil -4{\eta }_{N}+2{b}_{N}-7\,$$
(VIII.18)

Toffolis, and requires the use of \(3\lceil {\log }_{2}N\rceil +{b}_{N}-3\) ancillas in parallel. We can reuse the ancilla from the QROAM procedure for most of these ancilla, with two exceptions: We need to keep the phase-gradient register of size bN, so we do not have to prepare it every time, and the success qubit and rotation qubit flag the block encoding and therefore cannot be reused either. The number of additional ancilla we need for this procedure is therefore

$${b}_{N}+2.$$
(VIII.19)