Introduction

The quantum Fourier transform (QFT) is considered one of the most versatile components in quantum algorithms. It plays a key role in fundamental quantum algorithms such as quantum phase estimation1, algorithms for hidden subgroup problems including Shor’s factoring algorithm2, the Harrow-Hassidim-Lloyd (HHL) algorithm for solving systems of linear equations3, and quantum amplitude estimation4, to name a few. The applications of QFT span various fields, including basic arithmetic operations5,6,7,8, cryptography9,10,11,12,13,14, signal processing15,16, quantum simulation17,18,19,20, quantum machine learning21,22,23,24,25, quantum finance26,27,28, and computational fluid dynamics29,30,31. Improving the efficiency of QFT implementations is therefore indispensable for enhancing the performance of various quantum algorithms and expanding their applicability.

Most quantum algorithms are designed based on the quantum circuit model, which describes a computation as a sequence of discrete quantum gates applied to qubits. Therefore, the efficient execution of these algorithms directly depends on the efficiency of their underlying circuits. This necessity has driven significant research into optimizing fundamental building blocks, particularly for resource-intensive tasks like quantum arithmetic32,33,34,35,36,37,38.

Large-scale quantum algorithms, such as Shor’s factoring algorithm2 and the HHL algorithm 3, which utilize QFT, should be implemented in a fault-tolerant manner due to the fragility of quantum information. In fault-tolerant quantum computations, circuits are synthesized using universal and fault-tolerant gates. The Clifford + T gate library is generally chosen for this purpose in various promising error correction codes39,40. Within these fault-tolerance approaches, Clifford gates are easier to implement, often transversally. In contrast, T gates require more resource-intensive methods, such as magic state distillation41,42, thus dominating the implementation cost43,44. Consequently, the T-gate cost—typically quantified by the number of T gates (T-count) and the depth of T gates (T-depth)—has become a primary bottleneck for the efficient execution of large-scale quantum algorithms.

In fault-tolerant quantum computing, QFT is approximately implemented with an acceptable error, known as approximate quantum Fourier transform (AQFT). Typically, AQFT is implemented by removing all controlled rotation gates with angles below a certain threshold value. It has been demonstrated that, for effectively implementing various quantum algorithms utilizing QFT, applying AQFT instead of a full QFT often yields satisfactory results without significant performance penalties45,46,47,48. For instance, using AQFT with a threshold value of \(\pi /2^{8}\) is sufficient for factoring RSA-2048 with a \(95\%\) success rate48.

The basic implementation method for an \(n\)-qubit AQFT involves approximating the \(n\)-qubit QFT by omitting small-angle controlled rotations and decomposing the remaining controlled rotation gates into the Clifford + T gate library. This approach reduces the T-count for QFT implementation from \(O({n}^{2}\text{log}n)\) to \(O(n{\text{log}}^{2}n)\), while omitting the dependence on the approximation error for brevity. For the semiclassical version of the AQFT49, which is followed by measurement, thereby limiting its application in the midst of computation, it has been demonstrated that AQFT can be implemented with a T-count of \(O(n\text{log}n)\)50. In the case of fully coherent AQFT, Nam et al.51 achieved a T-count of \(O(n\text{log}n)\) using a method originally reported in Ref.52 that involves implementing the phase gradient transformation (PGT) with a quantum adder. They utilized Toffoli gates (specifically, relative phase Toffoli gates, measurements, and classically controlled gates) to construct PGT circuits and employed quantum adders from Ref.37. The T-count and T-depth of their \(n\)-qubit AQFT circuit with an approximation error of \(O(\varepsilon )\) are reported as \(8n{\text{log}}_{2}(n/\varepsilon )-O({\text{log}}^{2}(n/\varepsilon ))\) and \(n{\text{log}}_{2}(n/\varepsilon )+5n-O({\text{log}}^{2}(n/\varepsilon )),\) respectively. While this work achieved an asymptotic scaling of \(O(n{\text{log}}_{2}(n/\varepsilon ))\), the large constant factors on the leading terms of both the T-count and T-depth indicate a clear potential for further practical optimizations.

In this paper, we present two new fully coherent n-qubit AQFT circuits, AQFT Circuit 1 and AQFT Circuit 2, which achieve further optimization of T-count and T-depth while maintaining an approximation error of \(O(\varepsilon )\). AQFT Circuit 1 is designed to reduce the T-count. In this design, we construct the rotation gate layers that perform inverse PGTs without using Toffoli gates, which are non-Clifford gates and accounted for approximately half of the T-count in the AQFT circuit design in Ref.51. Next, we replace the rotation gate layers with quantum adders from Ref.37. On the other hand, AQFT Circuit 2 is designed to reduce the T-depth. In this design, we construct the rotation gate layers without Toffoli gates, pair and parallelize two rotation gate layers into one, and replace them with quantum adders from Ref.37.

As a result, AQFT Circuit 1 achieves a T-count of \(4n{\text{log}}_{2}(n/\varepsilon )-O({\text{log}}^{2}(n/\varepsilon ))\) with a T-depth of \(n{\text{log}}_{2}(n/\varepsilon )+n-O({\text{log}}^{2}(n/\varepsilon ))\), and AQFT Circuit 2 achieves a T-depth of \(\frac{1}{2}n{\text{log}}_{2}(n/\varepsilon )+\frac{3}{2}n-O({\text{log}}^{2}(n/\varepsilon ))\) with a T-count of \(4n{\text{log}}_{2}(n/\varepsilon )+n-O({\text{log}}^{2}(n/\varepsilon )).\) One might argue that using logarithmic-depth quantum adders could reduce the T-depth in AQFT implementation at the expense of increasing T-count. However, we demonstrate that the linear-depth quantum adders described in Ref.37 offer advantages over the state-of-the-art logarithmic-depth quantum adders in Ref.38, even in terms of T-depth optimization. This conclusion is supported by a comparison of both approaches when applied to AQFT circuits for systems satisfying \(3<n/\varepsilon <{10}^{13}\).

Overview of the AQFT circuits and their advantages

In this paper, we introduce the construction of two novel n-qubit AQFT circuits based on the Clifford + T gate library. AQFT Circuit 1 is optimized to minimize the T-count, whereas AQFT Circuit 2 focuses on reducing the T-depth. These AQFT circuits approximate the QFT with an error of \(O(\varepsilon )\). Figure 1a,b present the schematic diagrams for AQFT Circuit 1 and AQFT Circuit 2, respectively.

Fig. 1
figure 1

Schematic diagram of AQFT circuits presented in this paper. In these circuits, we construct inverse PGTs, and implement them using quantum adders. The boxes labeled RUS represent “repeat until success” circuits from Ref.53. Each RUS circuit is used to prepare a special quantum state \(|{\psi }_{b+1}\rangle \equiv \frac{1}{\sqrt{{2}^{b+1}}} {\sum }_{k=0}^{{2}^{b+1}-1}{e}^{2\pi ik/{2}^{b+1}}|k\rangle ,\) where \(b=\lceil{\text{log}}_{2}(n/\varepsilon )\rceil\). The special states are required to perform inverse PGTs using quantum adders. (a) AQFT Circuit 1: designed to reduce T-count. All circuits labeled \({C}_{i}\) consist of Clifford gates. (b) AQFT Circuit 2: designed to reduce T-depth. Each circuit labeled \({C}_{i}{\prime}\) consists of Clifford gates with a maximum of two T gates. Note that the quantum adders are paired and parallelized.

Similar to the AQFT circuit described in Ref.51, the construction of our AQFT circuits involves creating \({Z}^{\theta }\) gate (See Eq. (1)) layers, each of which performs an inverse PGT, and implementing them using quantum adders.

$${Z}^{\theta }=\left(\begin{array}{cc}1& 0\\ 0& {e}^{i\pi \theta }\end{array}\right).$$
(1)

In order to facilitate this process, a special quantum state \(|{\psi }_{b+1}\rangle \equiv \frac{1}{\sqrt{{2}^{b+1}}} {\sum }_{k=0}^{{2}^{b+1}-1}{e}^{2\pi ik/{2}^{b+1}}|k\rangle\) needs to be prepared, where \(b\) is chosen as \(\lceil{\text{log}}_{2}(n/\varepsilon )\rceil\). We employ “repeat until success” circuits described in Ref.53 to prepare the special quantum state \(\left|{\psi }_{b+1}\right.\rangle\). A detailed explanation of the inverse PGT and its execution through a quantum adder is provided in Supplementary Material.

Before presenting detailed circuit designs and results, we provide a rough overview of the advantages of our AQFT circuits. AQFT Circuit 1 requires a count of \(\sim n\) \((b+1)\)-qubit quantum adders to perform ~\(n\) inverse PGTs. Constructing these quantum adders involves a T-count of \(\sim 4nb\). A key advantage of AQFT Circuit 1 is that the construction of inverse PGTs does not require additional non-Clifford gates, such as Toffoli gates (Note that each “\({C}_{i}\)” box in Fig. 1a consists solely of Clifford gates.). This omission of non-Clifford gates results in a reduction of the T-count from \(\sim 8nb\) to \(\sim 4nb\). AQFT Circuit 2 reduces T-depth from \(\sim nb\) to \(\sim \frac{1}{2}nb\) compared to AQFT Circuit 1 by pairing and parallelizing the inverse PGTs, which requires only \(O(n)\) additional T gates. These additional T gates are included within the \({C{\prime}}_{i}\) boxes illustrated in Fig. 1b.

In the subsequent sections, for each AQFT circuit, we present the circuit design process, resource estimation focused on T-count and T-depth, and approximation error analysis.

AQFT circuit 1: optimized for T-count

Circuit design

Throughout this paper, we employ a compact notation for sequential CNOT gates that share a single control qubit but act on different target qubits. This “fan-out” notation is illustrated in Supplementary Material. This convention is used to improve the readability of complex circuits.

AQFT Circuit 1 is constructed through the following four steps:

(Step 1: QFT subcircuit decomposition) We begin by decomposing the subcircuits of the standard QFT circuit described in Ref.54. The initial structure of these subcircuits is illustrated on the left-hand side of Fig. 2. The right-hand side of Fig. 2 shows the decomposed structure of the QFT subcircuits. A detailed explanation of this decomposition process is provided in Supplementary Material. In the decomposed subcircuit, multiple \({Z}^{\theta }\) gate layers are present. However, in the subsequent step, we will consolidate these layers. Specifically, the \({Z}^{\theta }\) gates within the red boxes from all the decomposed QFT subcircuits in Fig. 2 will be combined into a single \({Z}^{\theta }\) gate layer and placed at the very front of the QFT circuit. Similarly, the \({Z}^{\theta }\) gates in the purple boxes from all the decomposed QFT subcircuits in Fig. 2 will be merged into a \({Z}^{\theta }\) gate layer at the very end of the QFT circuit.

Fig. 2
figure 2

Subcircuit decomposition for AQFT Circuit 1. The left-hand side illustrates a subcircuit of the standard QFT circuit described in Ref.54. This subcircuit is decomposed into the structure shown on the right-hand side. The \({Z}^{\theta }\) gates in the red boxes from all the decomposed QFT subcircuits will be gathered into a single \({Z}^{\theta }\) gate layer at the very front of the QFT circuit. This reordering is possible because the circuit in the green box has a diagonal matrix representation. The \({Z}^{\theta }\) gate in the purple box will be gathered into a single \({Z}^{\theta }\) gate layer at the very end of the QFT circuit. Note that the \({Z}^{\theta }\) gate layer in the green box is constructed without using Toffoli gates. This results in T-count reduction.

(Step 2: Constructing QFT circuit) We combine all the decomposed QFT subcircuits to construct a full QFT circuit. Next, all the \({\text{Z}}^{\uptheta }\) gates in the red boxes from the decomposed subcircuits in Fig. 2 are gathered at the very front of the QFT circuit. This reordering is possible because both the circuits in the green and red boxes of Fig. 2 have diagonal matrix representations and therefore commute with each other. Similarly, we gather the \({\text{Z}}^{\uptheta }\) gates in the purple boxes from the decomposed subcircuits in Fig. 2 at the very end of the QFT circuit. After this reorganization, the \({\text{Z}}^{\uptheta }\) gates at the very front and end of the QFT circuit have the form of \({\text{Z}}^{{2}^{\text{k}-1}-1/{2}^{\text{k}}}\), where \(\text{k}\in \{\text{1,2}, \dots ,\text{n}\}\). We divide each \({\text{Z}}^{{2}^{\text{k}-1}-1/{2}^{\text{k}}}\) gate into \({\text{Z}}^{1/2}\) and \({\text{Z}}^{-1/{2}^{\text{k}}}\) gates. Note that the \({\text{Z}}^{1/2}\) gate is a Clifford gate.

(Step 3: Approximation) We remove all the \({\text{Z}}^{-1/{2}^{\text{k}}}\) gates with \(\text{k}>\text{b}\) from the QFT circuit (See Fig. 3).

Fig. 3
figure 3

AQFT Circuit 1: \(n\)-qubit AQFT with an error of \(O(\varepsilon )\) optimized for T-count, where \(b=\lceil{\text{log}}_{2}(n/\varepsilon )\rceil\). This circuit is decomposed into single-qubit \(H\) and \({Z}^{\theta }\) gates, and standard two-qubit CNOT gates. To improve readability, the \({Z}^{\theta }\) gates are aligned vertically to represent a gate layer; this does not imply a multi-qubit control structure. Each \({Z}^{\theta }\) gate layer in the blue boxes performs an inverse PGT with the help of a \({Z}^{-1}\) gate. The effect of inserting \({Z}^{-1}\) gates can be canceled by inserting \(Z\) gates. Note that this circuit contains \((n+1)\) inverse PGTs and these are constructed without using additional non-Clifford gates. We implement each inverse PGT using a quantum adder from Ref.37. Since the circuit is constructed entirely from unitary gates, it is inherently reversible.

(Step 4: Inverse PGT implementation using quantum adder) In Fig. 3, each \({\text{Z}}^{\uptheta }\) gate layer in blue boxes performs an inverse PGT with the help of a \({\text{Z}}^{-1}\) gate. The effect of inserting the \({\text{Z}}^{-1}\) gates can be canceled by inserting \(\text{Z}\) gates. We implement each inverse PGT using a quantum adder from Ref.37.

T-count and T-depth estimation

In this section, we estimate the T-count and T-depth of AQFT Circuit 1. The T-count refers to the total number of T gates. The depth of a circuit refers to the number of sequential layers of gates that cannot be executed in parallel. Based on this, the T-depth is the number of sequential layers of T gates that cannot be parallelized, effectively ignoring the depth contributed by other gate types such as Clifford gates. The T-count and T-depth estimates presented in this paper are derived from the analytical formulas developed in the following sections, not from numerical simulations on a quantum simulator.

Among the inverse PGTs in Fig. 3, the 3-qubit inverse PGT does not need to be replaced by a 3-qubit quantum adder, because it only requires a single T gate. Considering this, the T gates required to construct AQFT Circuit 1 are found in four parts:

  1. (1)

    \((n-b+3)\) \((b+1)\)-qubit quantum adders,

  2. (2)

    One for each of the \(k\)-qubit quantum adders, where \(k\in \{4, 5, \dots , b\}\),

  3. (3)

    One T gate in the 3-qubit inverse PGT,

  4. (4)

    T gates for preparing the special state \(|{\psi }_{b+1}\rangle\).

The \(t\)-qubit quantum adder in Ref.37 requires a T-count of \((4t-4)\) and a T-depth of \(t\). With the quantum adder, the T-count and T-depth in part (1) are \(4nb-4{b}^{2}+12b\) and \(nb+n-{b}^{2}+2b+3\), respectively. The T-count and T-depth in part (2) are \({\sum }_{k=4}^{b}\left(4k-4\right)=2{b}^{2}-2b-12\) and \({\sum }_{k=4}^{b}k={b}^{2}/2+b/2-6\), respectively. For part (4), \((b-2)\) \({Z}^{\theta }\) gates have to be synthesized with an error of \(\varepsilon /b\) for each gate (See Supplementary Material). The value \(\varepsilon /b\) is chosen so that AQFT Circuit 1 has an approximation error of \(O(\varepsilon )\). In the gate synthesis, we use \(RUS\) circuits from Ref.53, and the expected T-count to synthesize a \({Z}^{\theta }\) gate with a Fourier angle with an error of \(\varepsilon {\prime}\) is \(1.08{\text{log}}_{2}(1/\varepsilon {\prime})+17.5.\) Therefore, the total T-count for AQFT Circuit 1 is

$$4nb-2{b}^{2}+b\left\{1.08 {\text{log}}_{2}\left(\frac{b}{\varepsilon }\right)+27.5\right\}-2.16{\text{log}}_{2}\left(\frac{b}{\varepsilon }\right)-45=4nb-O({b}^{2})$$
(2)

and the total T-depth for AQFT Circuit 1 is

$$nb+n-{\frac{1}{2}b}^{2}+\frac{5}{2}b+1.08{\text{log}}_{2}\left(\frac{b}{\varepsilon }\right)+15.5=nb+n-O({b}^{2})$$
(3)

Approximation error analysis

The approximation error, defined as the spectral norm54, can be found in the Supplementary Material. We demonstrate that AQFT Circuit 1 has an error of \(O(\varepsilon )\) when implemented in replacement of a QFT circuit. The approximation error in AQFT Circuit 1 arises from two sources:

  1. (1)

    Removing all the \({Z}^{-1/{2}^{k}}\) gates with \(k>b\),

  2. (2)

    Approximating \({Z}^{\theta }\) gates to prepare the special state \(|{\psi }_{b+1}\rangle\) using \(RUS\) circuits.

We call the error from part (1) \({\varepsilon }_{1}^{(1)}\) and the error from part (2) \({\varepsilon }_{1}^{(2)}\).

The error for removing a \({Z}^{\theta }\) gate is

$$\underset{|\psi \rangle }{\text{max}}\Vert \left(I-{Z}^{\theta }\right)|\psi \rangle \Vert$$
(4)

and this can be rewritten as

$$\sqrt{2\left(1-\text{cos}\theta \pi \right)}=2\left|\text{sin}(\pi \theta /2)\right|<\left|\pi \theta \right|$$
(5)

since

$$\left( {I - Z^{\theta } } \right)^{\dag } \left( {I - Z^{\theta } } \right) = \left( {\begin{array}{*{20}c} 0 & 0 \\ 0 & {2\left( {1 - \cos \theta \pi } \right)} \\ \end{array} } \right)$$
(6)

Therefore, the error \({\varepsilon }_{1}^{(1)}\) is bounded as follows:

$${\varepsilon }_{1}^{(1)}\le \left(n-b\right)\bullet {\sum }_{j=b+1}^{n}\frac{\pi }{{2}^{j}}<n{2}^{-b}\pi <\varepsilon \pi$$
(7)

Note that \(b\) is chosen as \(\lceil{\text{log}}_{2}(n/\varepsilon )\rceil\), thus we have \(n{2}^{-b}\le \varepsilon\).

When synthesizing \({Z}^{\theta }\) gates to prepare the special state \(|{\psi }_{b+1}\rangle\), each \({Z}^{\theta }\) gate is approximated with an error of \(\varepsilon /b\), resulting in \({\varepsilon }_{1}^{(2)}\le \left(b-2\right)\bullet \frac{\varepsilon }{b}<\varepsilon\). Consequently, the total error implementing AQFT Circuit 1 instead of the QFT circuit is \(O(\varepsilon )\) because

$${\varepsilon }_{1}^{(1)}+{\varepsilon }_{1}^{(2)}<(1+\pi )\varepsilon$$
(8)

AQFT circuit 2: optimized for T-depth

Circuit design

The construction of AQFT Circuit 2 involves the following four steps:

(Step 1: QFT subcircuit decomposition) We begin by pairing the subcircuits on the left-hand side of Fig. 2, as shown on the left-hand side of Fig. 4. These paired subcircuits are then decomposed into the structure illustrated on the right-hand side of Fig. 4. For the decomposition of a \(\text{k}\)-qubit subcircuit, \((\text{k}-1)\) ancilla qubits, initially in the state \({\left|0\right.\rangle }^{\otimes (\text{k}-1)}\), are required. A detailed demonstration of this decomposition is provided in Supplementary Material.

Fig. 4
figure 4

Subcircuit decomposition for AQFT Circuit 2. The \({Z}^{\theta }\) gates in the red boxes from all the QFT subcircuits will be gathered into a single \({Z}^{\theta }\) gate layer at the very front of the QFT circuit. This reordering is possible because the circuits in the red and green boxes have diagonal matrix representations. The \({Z}^{\theta }\) gates in the purple box will be gathered into a single \({Z}^{\theta }\) gate layer at the very end of the QFT circuit. Note that two \({Z}^{\theta }\) gate layers are paired and parallelized in the green box. This results in T-depth reduction.

These ancilla qubits serve as a temporary workspace to enable the parallel execution of gate layers that would otherwise be sequential. The state of the primary register is copied to the ancilla register, allowing two distinct operations to be applied simultaneously before the ancilla is returned to its initial state, thereby reducing the circuit’s overall T-depth.

In the decomposed structure, note that two \({\text{Z}}^{\uptheta }\) gate layers are paired and parallelized within the green box in Fig. 4. Similar to AQFT Circuit 1, in the next step, we will combine the \({\text{Z}}^{\uptheta }\) gates in the red boxes from all the decomposed QFT subcircuits in Fig. 4 into a single \({\text{Z}}^{\uptheta }\) gate layer at the very front of the QFT circuit. Additionally, the \({\text{Z}}^{\uptheta }\) gates in the purple boxes from all the decomposed QFT subcircuits in Fig. 4 will be gathered into a single \({\text{Z}}^{\uptheta }\) gate layer at the very end of the QFT circuit.

(Step 2: Constructing QFT circuit) We combine all the decomposed QFT subcircuits to construct a full QFT circuit. We modify the circuit using a method similar to that of AQFT Circuit 1. All the \({Z}^{\theta }\) gates in the red boxes from the decomposed subcircuits in Fig. 4 are gathered at the very front of the QFT circuit, and the \({Z}^{\theta }\) gates in the purple boxes from the decomposed subcircuits in Fig. 4 are gathered at the very end of the QFT circuit. Then, the \({Z}^{\theta }\) gates at the very front and end of the QFT circuit take the form of \({Z}^{{2}^{k-1}-1/{2}^{k}}\), where \(k\in \{\text{1,2}, \dots ,n\}\). We divide each \({Z}^{{2}^{k-1}-1/{2}^{k}}\) gate into \({Z}^{1/2}\) and \({Z}^{-1/{2}^{k}}\) gates.

(Step 3: Approximation) We remove all the \({\text{Z}}^{-1/{2}^{\text{k}}}\) gates with \(\text{k}>\text{b}\) from the QFT circuit (See Fig. 5). In the approximation process, we no longer need \((\text{n}-1)\) ancilla qubits; instead, we only need \(\text{b}\) ancilla qubits.

Fig. 5
figure 5

AQFT Circuit 2: \(n\)-qubit AQFT with an error of \(O(\varepsilon )\) optimized for T-depth, where \(b=\lceil{\text{log}}_{2}(n/\varepsilon )\rceil\). This circuit employs \(b\) ancilla qubits each of which is labeled as \({a}_{i}\) and initially in the state \(|0\rangle\). Each \({Z}^{\theta }\) gate layer in the blue boxes performs an inverse PGT with the help of a \({Z}^{-1}\) gate. The effect of inserting \({Z}^{-1}\) gates can be canceled by inserting \(Z\) gates. Note that this circuit contains \((n+1)\) inverse PGTs, and each yellow box contains only two T gates. We implement each inverse PGT using a quantum adder from Ref.37.

(Step 4: Inverse PGT implementation using quantum adder) In Fig. 5, each \({Z}^{\theta }\) gate layer in blue boxes performs an inverse PGT with the help of a \({Z}^{-1}\) gate. The effect of inserting the \({Z}^{-1}\) gates can be canceled by inserting \(Z\) gates. We implement each inverse PGT using a quantum adder from Ref.37.

In the construction of AQFT Circuit 2, we utilize the linear-depth quantum adder described in Ref.37. While one might argue that employing a logarithmic-depth quantum adder could further reduce the T-depth in the AQFT circuit, our findings indicate that the quantum adder from Ref.37 is more efficient in terms of T-depth compared to the currently known logarithmic-depth quantum adder when utilized for AQFT implementation within the range \({3<n/\varepsilon <10}^{13}\). In Sect. “Disadvantages of using logarithmic-depth quantum adders”, we provide a comparative analysis of using the quantum adder from Ref.37 and the state-of-the-art logarithmic-depth quantum adder from Ref.38, which has the lowest T-depth among the logarithmic-depth adders at the time of writing.

T-count and T-depth estimation

Among the inverse PGTs in Fig. 5, the 3-qubit inverse PGT does not need to be replaced by a 3-qubit quantum adder, because it only requires a single T gate. Considering this, the T gates required to construct AQFT Circuit 2 are found in five parts:

  1. (1)

    \((n-b+3)\) \((b+1)\)-qubit quantum adders

  2. (2)

    One for each of the \(k\)-qubit quantum adders, where \(k\in \{4, 5, \dots ,b\}\)

  3. (3)

    One T gate in the 3-qubit inverse PGT,

  4. (4)

    \((n-1)\) T gates in yellow boxes in Fig. 5,

  5. (5)

    T gates for preparing two of the special states \(|{\psi }_{b+1}\rangle\)

In the following resource estimation, for the sake of convenience in calculation, we assume that \(n\) is even and \(b\) is odd. However, even without this assumption, the difference in resource calculation remains within \(O(1)\). The \(t\)-qubit quantum adder in Ref.37 requires a T-count of \((4t-4)\) and a T-depth of \(t\). Using this quantum adder, the T-counts in part (1) through (4) are \(4nb-4{b}^{2}+12b,\) \({\sum }_{k=4}^{b}\left(4k-4\right)=2{b}^{2}-2b-12,\) \(1,\) and \(n-1,\) respectively. The T-depths required in part (1) through (4) are \(\left\{2+(n-b+1)/2\right\}\bullet \left(b+1\right),\) \({\sum }_{k=2}^{(b-1)/2}\left(2k+1\right)={b}^{2}/4+b/2-15/4, 1,\) and \(n-1,\) respectively. For T gates in part (5), \(2(b-2)\) \({Z}^{\theta }\) gates have to be synthesized with an error of \(\varepsilon /2b\) for each gate (See Supplementary Material). The value \(\varepsilon /2b\) is chosen to ensure that AQFT Circuit 2 has an approximation error of \(O(\varepsilon )\). As with AQFT Circuit 1, we use the \(RUS\) circuits from Ref.53. Consequently, the total T-count for AQFT Circuit 2 is

$$4nb+n-2{b}^{2}+b\left\{2.16 {\text{log}}_{2}\left(\frac{2b}{\varepsilon }\right)+45\right\}-4.32{\text{log}}_{2}\left(\frac{2b}{\varepsilon }\right)-80=4nb+n-O({b}^{2})$$
(9)

and the total T-depth for AQFT Circuit 2 is

$$\frac{1}{2}nb+\frac{3}{2}n-\frac{1}{4}{b}^{2}+\frac{5}{2}b+1.08{\text{log}}_{2}\left(\frac{2b}{\varepsilon }\right)+14.25=\frac{1}{2}nb+\frac{3}{2}n-O\left({b}^{2}\right)$$
(10)

Approximation error analysis

We prove that AQFT Circuit 2 has an error of \(O(\varepsilon )\) when implemented in place of a QFT circuit. The sources of the approximation error in AQFT Circuit 2 are identified in two parts:

  1. (1)

    Removing all the \({Z}^{-1/{2}^{k}}\) gates with \(k>b\),

  2. (2)

    Approximating \({Z}^{\theta }\) gates to prepare two of the special states \(|{\psi }_{b+1}\rangle\) using \(RUS\) circuits.

We call the error from part (1) \({\varepsilon }_{2}^{(1)}\) and the error from part (2) \({\varepsilon }_{2}^{(2)}\).

The error \({\varepsilon }_{2}^{(1)}\) is bounded in the same way as \({\varepsilon }_{1}^{(1)}\). When synthesizing \({Z}^{\theta }\) gates to prepare two of the special states \(|{\psi }_{b+1}\rangle\) for AQFT Circuit 2, each \({Z}^{\theta }\) gate is approximated with an error of \(\varepsilon /2b\), resulting in \({\varepsilon }_{2}^{(2)}\le 2\left(b-2\right)\bullet \frac{\varepsilon }{2b}<\varepsilon\). Consequently, the total error implementing AQFT Circuit 2 instead of the QFT circuit is \(O(\varepsilon )\).

Disadvantages of using logarithmic-depth quantum adders

If logarithmic-depth quantum adders were applied in the configuration of the circuit shown in Fig. 5, the T-depth of the AQFT circuit would be reduced from \(O(n\text{log}(n/\varepsilon ))\) to \(O\left(n\text{log}\left(\text{log}\left(n/\varepsilon \right)\right)\right)\) at the expense of increasing the T-count. In this section, we compare the results of using the state-of-the-art logarithmic-depth quantum adder (In-FT-QCLA1 from Ref.38) with those of AQFT Circuit 2. Our analysis demonstrates that AQFT Circuit 2 is advantageous not only in terms of T-count but also T-depth. We refer to the AQFT circuit that uses the logarithmic-depth quantum adder in the configuration in Fig. 5 as AQFT Circuit 2’.

First, we present the T-count and T-depth of AQFT Circuit 2’. As for AQFT Circuit 2, the T gates required to construct AQFT Circuit 2’ are categorized in five parts:

  1. (1)

    \((n-b+3)\) \((b+1)\)-qubit quantum adders,

  2. (2)

    One for each of the \(k\)-qubit quantum adders, where \(k\in \{4, 5, \dots ,b\}\),

  3. (3)

    One T gate in the 3-qubit inverse PGT,

  4. (4)

    \((n-1)\) T gates in yellow boxes in Fig. 5,

  5. (5)

    T gates for preparing two of the special states \(|{\psi }_{b+1}\rangle\).

The t-qubit quantum adder (In-FT-QCLA1 from Ref.38) requires a T-count of \(20t - 8w\left( t \right) - 8w\left( {t - 1} \right) - 4\left\lfloor {\log_{2} \left( t \right)} \right\rfloor - 4\left\lfloor {\log_{2} \left( {t - 1} \right)} \right\rfloor - 8\), where \(w\left( t \right) = t - \mathop {\sum \left\lfloor {t/2^{i} } \right\rfloor }\limits_{i = 1}^{\infty }\) represents the number of ones in the binary expansion of \(t\), and a T-depth of \(2\left\lfloor {\log_{2} \left( t \right)} \right\rfloor + 2\left\lfloor {\log_{2} \left( {t - 1} \right)} \right\rfloor + 2\left\lfloor {\log_{2} \left( {t/3} \right)} \right\rfloor + 2\left\lfloor {\log_{2} \left( {\left( {t - 1} \right)/3} \right)} \right\rfloor + 28\). For readability, we use the following notations:

$$\begin{gathered} TC_{adder} \left( t \right) = 20t - 8w\left( t \right) - 8w\left( {t - 1} \right) - 4\left\lfloor {\log_{2} \left( t \right)} \right\rfloor - 4\left\lfloor {\log_{2} \left( {t - 1} \right)} \right\rfloor - 8, \hfill \\ TD_{adder} \left( t \right) = 2\left\lfloor {\log_{2} \left( t \right)} \right\rfloor + 2\left\lfloor {\log_{2} \left( {t - 1} \right)} \right\rfloor + 2\left\lfloor {\log_{2} \left( {t/3} \right)} \right\rfloor \hfill \\ + 2\left\lfloor {\log_{2} \left( {\left( {t - 1} \right)/3} \right)} \right\rfloor + 28 \hfill \\ \end{gathered}$$
(11)

In the following resource estimation, for the sake of convenience in calculation, we assume that \(n\) is even and \(b\) is odd. Even without this assumption, however, the difference in resource calculation is only \(O(1)\). Then, the T-count and T-depth in part (1) are \((n-b+3)\bullet T{C}_{adder}(b)\) and \(\{2+\frac{n-b+1}{2}\}\bullet T{D}_{adder}\left(b\right)\), respectively. The T-count and T-depth in part (2) are \({\sum }_{i=4}^{b-1}T{C}_{adder}(i)\) and \({\sum }_{i=2}^{(b-1)/2}T{D}_{adder}(2i)\), respectively. For T gates in part (5), \(2(b-2)\) \({Z}^{\theta }\) gates have to be synthesized with an error of \(\varepsilon /2b\) for each gate. The value \(\varepsilon /2b\) is chosen to ensure that AQFT Circuit 2’ has an approximation error of \(O(\varepsilon )\). Therefore, the total T-count for AQFT Circuit 2’ is

$$\left(n-b+3\right)\bullet T{C}_{adder}\left(b\right)+{\sum }_{i=4}^{b-1}T{C}_{adder}\left(i\right)+n+b\left(2.16{\text{log}}_{2}\left(\frac{2b}{\varepsilon }\right)+35\right)-4.32{\text{log}}_{2}\left(\frac{2b}{\varepsilon }\right)-70=20nb+O(n)$$
(12)

and the total T-depth for AQFT Circuit 2’ is

$$\left\{\frac{n-b+5}{2}\right\}\bullet T{D}_{adder}\left(b\right)+{\sum }_{i=2}^{\frac{b-1}{2}}T{D}_{adder}\left(2i\right)+n+1.08{\text{log}}_{2}\left(\frac{2b}{\varepsilon }\right)+17.5=4n{\text{log}}_{2}(b)+O(n)$$
(13)

Comparing AQFT Circuit 2’ with AQFT Circuit 2, the T-depth appears to be reduced at the expense of an increased T-count. However, within the range of \(3<n/\varepsilon <{10}^{13}\), the inequality \({4\text{nlog}}_{2}b>\frac{1}{2}nb\) holds, indicating that the T-depth is not reduced. This can be observed in Fig. 6. In summary, using the linear-depth adder from Ref.37 for AQFT implementation offers advantages over the state-of-the-art logarithmic-depth quantum adder from Ref.38 in terms of both T-count and T-depth.

Fig. 6
figure 6

Comparison of T-depth for \(n\)-qubit AQFT implementations. In the figure, the blue solid lines represent the results from AQFT Circuit 2’ using the logarithmic-depth quantum adder from Ref.38, and the red dash-dot lines represent the results from AQFT Circuit 2 using the linear-depth quantum adder from Ref.37. The results here are derived from the analytical formulas, not from a numerical simulation on a quantum simulator. (a) \(\varepsilon =1/10.\) (b) \(\varepsilon =1/100.\) (c) \(\varepsilon =1/1000.\) (d) \(\varepsilon =1/10000\).

Results and discussion

In this paper, we introduced two novel \(n\)-qubit AQFT circuits with an approximation error of \(O(\varepsilon )\): AQFT Circuit 1 for T-count optimization and AQFT Circuit 2 for T-depth optimization. AQFT Circuit 1 is designed to minimize T-count, achieving a T-count of \(4n{\text{log}}_{2}\left(n/\varepsilon \right)-O({\text{log}}^{2}\left(n/\varepsilon \right))\), which is roughly half of the best-known result in Ref.51. This reduction is accomplished by constructing inverse PGT circuits without using Toffoli gates and by replacing them with quantum adders from Ref.37. On the other hand, AQFT Circuit 2 is designed to minimize T-depth, reducing it from \(n{\text{log}}_{2}\left(n/\varepsilon \right)+O(n)\) to \(\frac{1}{2}n{\text{log}}_{2}\left(n/\varepsilon \right)+O\left(n\right).\) This is achieved by pairing and parallelizing inverse PGTs using \(O(n)\) additional T gates. These results are summarized in Table 1.

As shown in Fig. 7, AQFT Circuit 1 is optimized for T-count, while AQFT Circuit 2 is optimized for T-depth. In terms of T-count, the difference between AQFT Circuit 1 and AQFT Circuit 2 is only \(O\left(n\right),\) which does not affect the leading order term \(4n{\text{log}}_{2}(n/\varepsilon ).\) Consequently, the difference between them in Fig. 7a is subtle. However, AQFT Circuit 2 has the disadvantage of requiring \(O(\text{log}(n/\varepsilon ))\) more qubits than AQFT Circuit 1 (See Fig. 1). This additional resource requirement might be a limitation in scenarios with stringent hardware constraints.

Fig. 7
figure 7

Comparison of T-count and T-depth for \(n\)-qubit AQFT implementations with a fixed \(\varepsilon =1/100\). In the figure, the blue solid lines represent the results from Ref.51, the green dash-dot lines represent the results of AQFT Circuit 1, and the red dashed lines represent the results of AQFT Circuit 2. The results here are derived from the analytical formulas, not from a numerical simulation on a quantum simulator. (a) T-count comparison. The difference in T-count between AQFT Circuit 1 and AQFT Circuit 2 is only \(O\left(n\right),\) which does not affect the leading order term \(4\mathit{n }{\text{log}}_{2}(n/\varepsilon ).\) Consequently, the difference between them is subtle. (b) T-depth comparison.

Table 1 Comparison of T-count and T-depth for \(n\)-qubit AQFT implementations with an approximation error of \(O(\varepsilon )\), where \(b=\lceil{\text{log}}_{2}(n/\varepsilon )\rceil\). For the sake of convenience in calculation, we assume that \(n\) is even and \(b\) is odd. Even without this assumption, the difference in resource calculation is only \(O(1)\).

Conclusion

In conclusion, this work successfully reduces the T gate cost of AQFTs by introducing two novel circuit designs. Our first circuit halves the T-count of the previous state-of-the-art by eliminating the need for additional non-Clifford gates in the PGT construction. Our second circuit similarly halves the T-depth through parallelization at the cost of additional ancilla qubits.

Despite these advancements, certain limitations remain. Both AQFT circuits still exhibit a leading-order complexity of \(O(n{\text{log}}_{2}\left(n/\varepsilon \right)\)) for T-count and T-depth. Although employing logarithmic-depth quantum adders could theoretically further improve T-depth complexity, our analysis reveals that existing logarithmic-depth quantum adders do not yet offer practical advantages due to their inherently higher T-count and T-depth. Thus, future research should prioritize the development of novel logarithmic-depth quantum adders capable of reducing both T-count and T-depth simultaneously.

Nevertheless, our results achieve meaningful reductions in T-count and T-depth for AQFT implementations. Considering that T gates constitute the primary bottleneck in the implementation of fault-tolerant quantum algorithms, and that QFT is ubiquitous in quantum computing, our proposed circuits represent a valuable contribution toward improving the practical feasibility of large-scale quantum algorithms, including Shor’s factoring and the HHL algorithm.