Hardware-tailored diagonalization circuits

Miller, Daniel; Fischer, Laurin E.; Levi, Kyano; Kuehnke, Eric J.; Sokolov, Igor O.; Barkoutsos, Panagiotis Kl.; Eisert, Jens; Tavernelli, Ivano

doi:10.1038/s41534-024-00901-1

Download PDF

Article
Open access
Published: 21 November 2024

Hardware-tailored diagonalization circuits

npj Quantum Information volume 10, Article number: 122 (2024) Cite this article

3092 Accesses
8 Citations
14 Altmetric
Metrics details

Subjects

Abstract

A central building block of many quantum algorithms is the diagonalization of Pauli operators. Although it is always possible to construct a quantum circuit that simultaneously diagonalizes a given set of commuting Pauli operators, only resource-efficient circuits can be executed reliably on near-term quantum computers. Generic diagonalization circuits, in contrast, often lead to an unaffordable SWAP gate overhead on quantum devices with limited hardware connectivity. A common alternative is to exclude two-qubit gates altogether. However, this comes at the severe cost of restricting the class of diagonalizable sets of Pauli operators to tensor product bases (TPBs). In this article, we introduce a theoretical framework for constructing hardware-tailored (HT) diagonalization circuits. Our framework establishes a systematic and highly flexible procedure for tailoring diagonalization circuits with ultra-low gate counts. We highlight promising use cases of our framework and – as a proof-of-principle application – we devise an efficient algorithm for grouping the Pauli operators of a given Hamiltonian into jointly-HT-diagonalizable sets. For several classes of Hamiltonians, we observe that our approach requires fewer measurements than conventional TPB approaches. Finally, we experimentally demonstrate that HT circuits can improve the efficiency of estimating expectation values with cloud-based quantum computers.

Efficient simulation of potential energy operators on quantum hardware: a study on sodium iodide (NaI)

Article Open access 11 May 2024

Scalable and robust quantum computing on qubit arrays with fixed coupling

Article Open access 05 January 2023

Realization of quantum signal processing on a noisy quantum computer

Article Open access 23 September 2023

Introduction

Since first-generation quantum computers were made publicly available eight years ago by IBM, the technological frontier is expanding at an ever increasing pace^{1,2,3,4,5,6,7,8,9,10,11}. Nevertheless, decoherence and hardware errors still limit the applicability of these early-stage quantum devices, and practical quantum advantage yet remains to be demonstrated. To go beyond what is possible now, it is crucial to operate both classical and quantum computers in an orchestrated manner that exploits their respective strengths.

For example, in the variational quantum eigensolver (VQE) algorithm^12,13,14, a classical computer optimizes the parameters of a trial quantum state vector $\left\vert \psi \right\rangle$ to accurately approximate the minimal eigenvalue of an observable O, e.g., the Hamiltonian of a molecule. The tasks performed by the quantum processor are preparing $\left\vert \psi \right\rangle$ and gathering measurement data from which the expectation value $\langle O\rangle =\left\langle \psi \right\vert O\left\vert \psi \right\rangle$ can be estimated. In practice, the observable O cannot be measured directly as this would require a quantum circuit that diagonalizes it, i.e., a circuit that rotates the unknown eigenbasis of O to the computational basis. A common approach to circumvent this problem is to express O as a linear combination of n-qubit Pauli operators P_i ∈ {I, X, Y, Z}^⊗n with real coefficients ${c}_{i}\in {\mathbb{R}}$, as in

$$O=\mathop{\sum }\limits_{i=1}^{M}{c}_{i}{P}_{i}.$$

(1)

Since any Pauli operator P_i can be diagonalized using single-qubit Clifford gates, it is straightforward to measure its expectation value 〈P_i〉. Once all 〈P_i〉 have been obtained, 〈O〉 is calculated from Eq. (1). Although the number M of Pauli operators for molecular Hamiltonians has a nominal scaling of up to ${\mathcal{O}}({n}^{4})$, measuring all Pauli expectation values individually would require a large number of quantum circuit executions (“shots”)¹⁴. In addition to the mere evaluation of 〈O〉, usually also its gradient is estimated from the measured data^15,16. To keep resource requirements at an affordable level, one can make use of simultaneous measurements of commuting Pauli operators, a technique that is commonly applied. For example, with a diagonalization circuit that only contains single-qubit gates, one can measure a tensor product basis (TPB), i.e., a set of Pauli operators that are qubitwise commuting (QWC)¹. For typical problems it is possible to group an average number of three Pauli operators into a common TPB¹⁷. To further reduce the number of required diagonalization circuits from ${\mathcal{O}}({n}^{4})$ to ${\mathcal{O}}({n}^{3})$, grouping the Pauli operators into general commuting (GC) sets has been suggested^18,19,20. Under ideal circumstances, such GC groupings would substantially decrease the number of shots required to estimate 〈O〉 to a desired accuracy²¹. Unfortunately, the corresponding diagonalization circuits consist of up to $\frac{n(n-1)}{2}$ two-qubit gates and, if the connectivity of the device is limited, a large number of additional SWAP gates. As a first step to interpolate between these extremes, diagonalization circuits featuring a single layer of two-qubit gates have been introduced^22,23; however, it is an open challenge to fully exploit the trade-off between QWC and GC. As of today, the best method to experimentally estimate an expectation value 〈O〉 is unclear. Besides Pauli grouping^{1,17,18,19,20,21,22,23,24,25}, there is active research in addressing this problem with classical shadows^{26,27,28,29,30,31}, unitary partitioning^32,33, low-rank factorization³⁴, adaptive estimators^35,36,37, and decision diagrams³⁸.

In this article, we introduce a theoretical framework for constructing diagonalization circuits whose two-qubit gates are tailored to meet the connectivity restrictions imposed by most current quantum computing architectures, e.g., super- and semiconducting qubits^{1,2,3,4,5,6,7,8,9}. Our flexible approach can be applied to any hardware connectivity. We demonstrate the viability of our techniques for a large class of paradigmatic Hamiltonians in the context of the Pauli grouping problem.

Results

Framework for HT diagonalization circuits

The purpose of our theoretical framework is the construction of hardware-tailored (HT) Clifford circuits that diagonalize a given set of commuting n-qubit Pauli operators P₁, …, P_m. After possibly replacing some of the P_j by −P_j, we can assume that the group they generate, which is denoted by 〈P₁, …, P_m〉, does not contain − I^⊗n. From now on, we will always assume that this is the case since it allows us to extend 〈P₁, …, P_m〉 to the stabilizer group ${\mathcal{S}}$ of some stabilizer state vector $\left\vert {\psi }_{{\mathcal{S}}}\right\rangle$, where $\left\vert {\psi }_{{\mathcal{S}}}\right\rangle$ is defined as the common +1-eigenvector of all operators $S\in {\mathcal{S}}$, see the Supplementary Material (SM) Sec. III. Then, uncomputing the state vector $\left\vert {\psi }_{{\mathcal{S}}}\right\rangle$, i.e., applying some Clifford circuit ${U}_{{\mathcal{S}}}^{\dagger }$ for which $\left\vert {\psi }_{{\mathcal{S}}}\right\rangle ={U}_{{\mathcal{S}}}{\left\vert 0\right\rangle }^{\otimes n}$, will simultaneously diagonalize P₁, …, P_m³⁹.

An important class of stabilizer states is that of graph states⁴⁰. A graph with n vertices is defined in terms of its adjacency matrix $\Gamma =({\gamma }_{i,j})\in {{\mathbb{F}}}_{2}^{n\times n}$, where ${{\mathbb{F}}}_{2}$ is the binary field; a pair (i, j) of vertices is connected via an edge if and only if γ_i,j = 1. In this article, we do not distinguish between a graph and its adjacency matrix, and we follow the convention γ_i,j = γ_j,i and γ_i,i = 0 for all i, j ∈ {1, …, n}. Every graph Γ defines a graph state vector $\left\vert \Gamma \right\rangle ={U}_{\Gamma }{\left\vert 0\right\rangle }^{\otimes n}$ whose preparation circuit

$${U}_{\Gamma }=\left(\prod _{i < j}{{\rm{CZ}}}_{i,j}^{{\gamma }_{i,j}}\right){H}^{\otimes n}$$

(2)

consists of a layer of Hadamard gates $H=\frac{1}{\sqrt{2}}(X+Z)$, followed by a two-qubit gate CZ = diag(1, 1, 1, −1) for every pair of connected vertices. The stabilizer group of $\left\vert \Gamma \right\rangle$ is ${{\mathcal{S}}}_{\Gamma }=\{{X}^{{\bf{k}}}{Z}^{\Gamma {\bf{k}}}{(-1)}^{{\sum }_{i < j}{k}_{i}{\gamma }_{i,j}{k}_{j}}\,| \,{\bf{k}}\in {{\mathbb{F}}}_{2}^{n}\}$, where we define ${X}^{{\bf{k}}}={X}^{{k}_{1}}\otimes \ldots \otimes {X}^{{k}_{n}}$ and similarly Z^Γk for the matrix-vector product Γk. Hence, ${U}_{\Gamma }^{\dagger }$ would diagonalize our operators P₁, …, P_m if only they were of the form ± X^kZ^Γk.

Every stabilizer state is local-Clifford (LC) equivalent to a graph state⁴¹. Thus, there exist single-qubit Clifford gates U₁, …, U_n and a graph Γ such that $({U}_{1}\otimes \ldots \otimes {U}_{n})\left\vert {\psi }_{{\mathcal{S}}}\right\rangle =\left\vert \Gamma \right\rangle$. We conclude that every set of commuting Pauli operators can be simultaneously diagonalized by some layer of single-qubit Clifford gates followed by some circuit of the form ${U}_{\Gamma }^{\dagger }$. We refer to this procedure as a graph-based diagonalization circuit, see Fig. 1.

**Fig. 1: Graph-based diagonalization circuit.**

The connectivity graph of a quantum computer is the graph Γ_con whose vertices and edges, respectively, are given by qubits and pairs of qubits for which a CZ gate can be physically implemented. For quantum devices with a limited connectivity, general graph-based diagonalization circuits require up to ${\mathcal{O}}({n}^{2})$ SWAP gates⁴². This overhead renders unconstrained graph-based diagonalization circuits infeasible for certain near-term applications, see SM Sec. II. However, if Γ ⊂ Γ_con is a subgraph of the connectivity graph, SWAP gates are avoided completely. We refer to graph-based diagonalization circuits that are designed to meet this condition as hardware-tailored (HT). By construction, HT circuits have a very low CZ depth that is upper bounded by the maximum degree of Γ_con, e.g., by 3 and 4 for heavy-hex and square-lattice connectivity, respectively⁴³.

We now derive a technical condition for the existence of HT diagonalization circuits. Every n-qubit Clifford gate U defines a symplectic matrix

$$A=\left[\begin{array}{cc}{A}^{xx}&{A}^{xz}\\ {A}^{zx}&{A}^{zz}\end{array}\right]\in \,{\text{GL}}\,\left({{\mathbb{F}}}_{2}^{2n}\right)$$

(3)

with the property that

$$U{X}^{{\bf{r}}}{Z}^{{\bf{s}}}{U}^{\dagger }={\text{i}}^{\alpha ({\bf{r}},{\bf{s}})}\,{X}^{{A}^{xx}{\bf{r}}+{A}^{xz}{\bf{s}}}\,{Z}^{{A}^{zx}{\bf{r}}+{A}^{zz}{\bf{s}}}$$

(4)

holds for all vectors ${\bf{r}},{\bf{s}}\in {{\mathbb{F}}}_{2}^{n}$, see Table 1 for the single-qubit case³⁹. Hereby, a matrix A is called symplectic if ${A}^{{\rm{T}}}\left[\begin{array}{cc}0&{\mathbb{1}}\\ {\mathbb{1}}&0\end{array}\right]A=\left[\begin{array}{cc}0&{\mathbb{1}}\\ {\mathbb{1}}&0\end{array}\right]$, with $\,{\text{GL}}\,\left({{\mathbb{F}}}_{2}^{2n}\right)$ denoting the general linear group of ${{\mathbb{F}}}_{2}^{2n}$. For the time being, we can neglect the phases that are given by $\alpha ({\bf{r}},{\bf{s}})\in {\mathbb{Z}}/4{\mathbb{Z}}=\{0,1,2,3\}$ (see Eq. (46) in SM Sec. XIII for how to recover them) and focus on the irreducible representation

$${{\mathcal{C}}}_{n}\longrightarrow \,{\text{GL}}\,\left({{\mathbb{F}}}_{2}^{2{\text{n}}}\right), \ \ U\longmapsto A$$

(5)

of the n-qubit Clifford group ${{\mathcal{C}}}_{n}$. If U = U₁ ⊗ … ⊗ U_n is a single-qubit Clifford layer, the blocks in Eq. (3) are diagonal, i.e., ${A}^{xx}=\,\text{diag}\,({a}_{1}^{xx},\ldots ,{a}_{n}^{xx})$, and similarly for A^xz, A^zx, and A^zz. Hereby, ${a}_{1}^{xx}$ is given by the xx-entry of the binary representation of U₁, etc. To construct a single-qubit Clifford layer that rotates P₁, …, P_m into the set { ± X^kZ^Γk} for a graph Γ, we write ${P}_{j}={\text{i}}^{{q}_{j}}{X}^{{{\bf{r}}}_{j}}{Z}^{{{\bf{s}}}_{j}}$. By Eq. (4), the application of U, which is represented by A, transforms the operator P_j into ${P}_{j}^{{\prime} }={X}^{{{\bf{k}}}_{j}}{Z}^{{A}^{zx}{{\bf{r}}}_{j}+{A}^{zz}{{\bf{s}}}_{j}}$ (up to a global phase), where we have introduced the notation k_j = A^xxr_j + A^xzs_j. Thus, ${P}_{j}^{{\prime} }\in \{\pm {X}^{{\bf{k}}}{Z}^{\Gamma {\bf{k}}}\}$ is equivalent to Γk_j = A^zxr_j + A^zzs_j. These equivalent conditions can be phrased for all j ∈ {1, …, m} simultaneously as a binary matrix equation

$$\Gamma {A}^{xx}R+\Gamma {A}^{xz}S={A}^{zx}R+{A}^{zz}S,$$

(6)

where $R=\left({{\bf{r}}}_{1}\,\cdots \,{{\bf{r}}}_{m}\right)$ and $S=\left({{\bf{s}}}_{1}\,\cdots \,{{\bf{s}}}_{m}\right)$ are the two matrices that store the exponent vectors of P₁, …, P_m as their columns.

Table 1 Binary representation of the single-qubit Clifford group ${{\mathcal{C}}}_{1}$

Full size table

Equation (6) is our first main result. First, by devising algorithms to solve it, we can tackle the challenge of constructing HT diagonalization circuits. This finally allows us to explore the trade-off between single-qubit Clifford layers and unrestricted Clifford circuits. On a related note, merely checking whether a guess (A, Γ) solves Eq. (6) is of course much more efficient than solving Eq. (6) from scratch. Therefore, our formulation of Eq. (6) supports the design of HT diagonalization circuits via educated guesses or dedicated case-based research for, in principle, arbitrarily large system sizes. We illustrate this idea in Table 2 for molecular Hamiltonians with up to n = 120 qubits.

Table 2 Examples of HT diagonalization circuits and their performance in the context of quantum chemistry

Full size table

For a quantum chip whose connectivity graph Γ_con has n vertices and e edges, there are 6ⁿ and 2^e potential choices for A and Γ, respectively. Hence, a brute-force solver for Eq. (6) would loop through all 6ⁿ2^e choices, which quickly becomes infeasible due to the exponential size of the search space. Restricting to a polynomially-large random subset gives rise to an efficient, probabilistic, restricted brute-force solver for Eq. (6), however, we empirically find that this approach has a vanishingly low success probability. After closely investigating the mathematics behind Eq. (6), we can overcome this problem and formulate an efficient probabilistic solver that performs well in practice. Having available a huge search space will then turn into a powerful feature: the expressivity of Eq. (6) enables us to construct ultra-short diagonalization circuits.

Mathematical results

We have reduced the task of constructing HT diagonalization circuits to the problem of solving Eq. (6) for a subgraph Γ ⊂ Γ_con and a symplectic, invertible matrix A whose blocks in Eq. (3) are diagonal. In our case, it is sufficient that A is invertible because every invertible matrix ${A}_{i}\in {{\mathbb{F}}}_{2}^{2\times 2}$, which represents U_i in U = U₁ ⊗ … ⊗ U_n, is necessarily also symplectic, see Table 1. Because of $\det (A)=\mathop{\prod }\nolimits_{i=1}^{n}\det ({A}_{i})$ and ${{\mathbb{F}}}_{2}=\{0,1\}$, the invertibility of A is equivalent to $\det ({A}_{1})=\ldots =\det ({A}_{n})=1$. As we will show soon in Eq. (13), for a fixed graph Γ, it is possible to rewrite these determinants as quadratic forms

$$\det ({A}_{i})={{\boldsymbol{\lambda }}}^{{\rm{T}}}{Q}_{i}{\boldsymbol{\lambda }}.$$

(7)

The equation λ^TQ_iλ = 1 defines what is known in algebraic geometry as a quadric hypersurface ${{\mathcal{L}}}_{i}\subset {{\mathbb{F}}}_{2}^{d}$, i.e., a generalization of a conic section, see Fig. 2. As a consequence, $\det ({A}_{1})=\ldots =\det ({A}_{n})=1$ defines the intersection ${\mathcal{L}}=\mathop{\bigcap }\nolimits_{i = 1}^{n}{{\mathcal{L}}}_{i}$. Note that the points in ${\mathcal{L}}$ are in one-to-one correspondence with single-qubit Clifford layers that transform P₁, …, P_m into stabilizers of $\left\vert \Gamma \right\rangle$. In particular, ${\mathcal{L}}$ is non-empty if and only if it is possible to diagonalize P₁, …, P_m with a circuit based on Γ as in Fig. 1.

**Fig. 2: Illustration of the solution space ${\mathcal{L}}$ (yellow dots) of Eq. (6) for a fixed graph Γ in a hypothetical scenario with n = 3 qubits.**

In our setting, every quadric hypersurface ${{\mathcal{L}}}_{i}$ can be written as the union of at most four affine spaces. Therefore, their intersection ${\mathcal{L}}=\mathop{\bigcap }\nolimits_{i = 1}^{n}{{\mathcal{L}}}_{i}$ is the union of at most 4ⁿ intersections of affine subspaces, each of which is itself an affine subspace. Using Gaussian elimination over ${{\mathbb{F}}}_{2}$, we can efficiently probe whether such an intersection is non-empty and, once a point $\lambda \in {\mathcal{L}}$ is found, we have successfully constructed a HT circuit.

What is gained? At first glance, not much: we still have 2^e choices for Γ ⊂ Γ_con and up to 4ⁿ potential intersections in which a solution might lie. Hence, also in this reformulation, a brute force approach will be inefficient in the worst case. It turns out, however, that the efficient but probabilistic version, where only a polynomial number of subgraphs Γ ⊂ Γ_con and intersections is probed, has a sufficiently high empirical success probability to facilitate the construction of HT circuits for various practical problems as we will demonstrate later in Fig. 5.

For pedagogical reasons, we have already revealed the geometry behind Eq. (6). Let us now delve into the corresponding algebra and show that the picture in Fig. 2 is correct. To solve Eq. (6) in practice, we have to bring $\det (A)=1$ into a form a classical computer can deal with. First, we exploit that the blocks of A in Eq. (3) are diagonal. Thus, we can replace A by the vector

$${\bf{a}}=\left[\begin{array}{c}{{\bf{a}}}^{xx}\\ {{\bf{a}}}^{xz}\\ {{\bf{a}}}^{zx}\\ {{\bf{a}}}^{zz}\end{array}\right]\in {{\mathbb{F}}}_{2}^{4n}$$

(8)

where the vector ${{\bf{a}}}^{xx}=({a}_{1}^{xx},\ldots ,{a}_{n}^{xx})\in {{\mathbb{F}}}_{2}^{n}$ defines the xx-block via A^xx = diag(a^xx), and likewise for a^xz, a^zx, and a^zz. In this notation, Eq. (6) reads Ma = 0 for the (mn × 4n)-matrix

$$M=\left[\begin{array}{cccc}\Gamma \,\text{diag}\,({{\bf{r}}}_{1})&\Gamma \,\text{diag}\,({{\bf{s}}}_{1})&\,\text{diag}\,({{\bf{r}}}_{1})&\,\text{diag}\,({{\bf{s}}}_{1})\\ \vdots &\vdots &\vdots &\vdots \\ \Gamma \,\text{diag}\,({{\bf{r}}}_{m})&\Gamma \,\text{diag}\,({{\bf{s}}}_{m})&\,\text{diag}\,({{\bf{r}}}_{m})&\,\text{diag}\,({{\bf{s}}}_{m})\end{array}\right].$$

(9)

Via Gaussian elimination, we can efficiently compute a basis ${{\bf{v}}}_{1},\ldots ,{{\bf{v}}}_{d}\in {{\mathbb{F}}}_{2}^{4n}$ of the null space of M. Thus, every vector ${\bf{a}}\in {{\mathbb{F}}}_{2}^{4n}$ with Ma = 0 is of the form

$${\bf{a}}=\mathop{\sum }\limits_{j=1}^{d}{\lambda }_{j}{{\bf{v}}}_{j}$$

(10)

for some vector ${\boldsymbol{\lambda }}\in {{\mathbb{F}}}_{2}^{d}$. In order to correspond to a physical solution, the vector a also needs to fulfill

$$\det ({A}_{i})={a}_{i}^{xx}{a}_{i}^{zz}+{a}_{i}^{zx}{a}_{i}^{xz}=1$$

(11)

for every i ∈ {1, …, n} as this will allow us to invert the irreducible representation ${{\mathcal{C}}}_{n}\to \,{\text{GL}}\,({{\mathbb{F}}}_{2}^{2n}),U\mapsto A$. Based on Ansatz (10), we find

$${a}_{i}^{xx}{a}_{i}^{zz}=\mathop{\sum }\limits_{j,{j}^{{\prime} }=1}^{d}{\lambda }_{j}{v}_{i,j}^{xx}{v}_{i,{j}^{{\prime} }}^{zz}{\lambda }_{{j}^{{\prime} }}={{\boldsymbol{\lambda }}}^{{\rm{T}}}\left({{\bf{x}}}_{i}{{\bf{z}}}_{i}^{{\rm{T}}}\right){\boldsymbol{\lambda }}$$

(12)

and similarly ${a}_{i}^{xz}{a}_{i}^{zx}={{\boldsymbol{\lambda }}}^{{\rm{T}}}({{\bf{w}}}_{i}{{\bf{y}}}_{i}^{{\rm{T}}}){\boldsymbol{\lambda }}$, where we have introduced ${{\bf{x}}}_{i}=({v}_{i,1}^{xx},\ldots ,{v}_{i,d}^{xx})$, ${{\bf{z}}}_{i}=({v}_{i,1}^{zz},\ldots ,{v}_{i,d}^{zz})$, ${{\bf{w}}}_{i}=({v}_{i,1}^{xz},\ldots ,{v}_{i,d}^{xz})$, and ${{\bf{y}}}_{i}=({v}_{i,1}^{zx},\ldots ,{v}_{i,d}^{zx})$. Further inserting these expressions into Eq. (11), we obtain $\det ({A}_{i})={{\boldsymbol{\lambda }}}^{{\rm{T}}}({{\bf{x}}}_{i}{{\bf{z}}}_{i}^{{\rm{T}}}+{{\bf{w}}}_{i}{{\bf{y}}}_{i}^{{\rm{T}}}){\boldsymbol{\lambda }}$ and, by defining the matrix

$${Q}_{i}={{\bf{x}}}_{i}{{\bf{z}}}_{i}^{{\rm{T}}}+{{\bf{w}}}_{i}{{\bf{y}}}_{i}^{{\rm{T}}}\in {{\mathbb{F}}}_{2}^{d\times d},$$

(13)

we finally arrive at Eq. (7). To proceed, we point out that $\det ({A}_{i})={{\boldsymbol{\lambda }}}^{{\rm{T}}}{Q}_{i}{\boldsymbol{\lambda }}=1$ is equivalent to

$$\left[\begin{array}{c}{{\boldsymbol{\lambda }}}^{{\rm{T}}}{{\bf{x}}}_{i}\\ {{\boldsymbol{\lambda }}}^{{\rm{T}}}{{\bf{z}}}_{i}\\ {{\boldsymbol{\lambda }}}^{{\rm{T}}}{{\bf{w}}}_{i}\\ {{\boldsymbol{\lambda }}}^{{\rm{T}}}{{\bf{y}}}_{i}\end{array}\right]\in \left\{\left[\begin{array}{c}0\\ 0\\ 1\\ 1\end{array}\right],\left[\begin{array}{c}0\\ 1\\ 1\\ 1\end{array}\right],\left[\begin{array}{c}1\\ 1\\ 0\\ 0\end{array}\right],\left[\begin{array}{c}1\\ 1\\ 0\\ 1\end{array}\right],\left[\begin{array}{c}1\\ 0\\ 1\\ 1\end{array}\right],\left[\begin{array}{c}1\\ 1\\ 1\\ 0\end{array}\right]\right\}.$$

(14)

For example, if λ^Tx_i = 0, the value of λ^Tz_i is irrelevant and we need λ^Tw_i = λ^Ty_i = 1 for $\det ({A}_{i})=1$. These are the first two cases in Eq. (14). Defining affine hyperplanes ${{\mathcal{X}}}_{i}^{(c)}=\{{\boldsymbol{\lambda }}\in {{\mathbb{F}}}_{2}^{d}\,| \,{{\boldsymbol{\lambda }}}^{{\rm{T}}}{{\bf{x}}}_{i}=c\}$ for $c\in {{\mathbb{F}}}_{2}$, and similarly ${{\mathcal{Z}}}_{i}^{(c)}$, ${{\mathcal{W}}}_{i}^{(c)}$, and ${{\mathcal{Y}}}_{i}^{(c)}$, we can rephrase these two cases as λ being contained in the affine subspace

$${{\mathcal{A}}}_{i}={{\mathcal{X}}}_{i}^{(0)}\cap {{\mathcal{W}}}_{i}^{(1)}\cap {{\mathcal{Y}}}_{i}^{(1)}.$$

(15)

For the middle two cases in Eq. (14), the roles of (x, z) and (w, y) are interchanged: these cases are equivalent to λ being contained in the affine subspace

$${{\mathcal{B}}}_{i}={{\mathcal{W}}}_{i}^{(0)}\cap {{\mathcal{X}}}_{i}^{(1)}\cap {{\mathcal{Z}}}_{i}^{(1)}.$$

(16)

Similarly, the remaining two cases in Eq. (14) lead to

$${{\mathcal{C}}}_{i}={{\mathcal{X}}}_{i}^{(1)}\cap {{\mathcal{Z}}}_{i}^{(0)}\cap {{\mathcal{W}}}_{i}^{(1)}\cap {{\mathcal{Y}}}_{i}^{(1)} \,\text{and}\,$$

(17)

$${{\mathcal{D}}}_{i}={{\mathcal{X}}}_{i}^{(1)}\cap {{\mathcal{Z}}}_{i}^{(1)}\cap {{\mathcal{W}}}_{i}^{(1)}\cap {{\mathcal{Y}}}_{i}^{(0)},$$

(18)

respectively. Finally, by introducing

$${{\mathcal{L}}}_{i}={{\mathcal{A}}}_{i}\cup {{\mathcal{B}}}_{i}\cup {{\mathcal{C}}}_{i}\cup {{\mathcal{D}}}_{i},$$

(19)

we obtain that $\det ({A}_{i})=1$ is equivalent to ${\boldsymbol{\lambda }}\in {{\mathcal{L}}}_{i}$. We defer further technical details about how to best implement the efficient probabilistic solver based on Gaussian elimination as well as additional mathematical insights for simplifying the problem to the methods section.

Reformulation as an optimization problem

It is possible to restate Eq. (6) as the feasibility problem of an integer quadratically constrained program (IQP), a special case of a mixed integer quadratically constrained program (MIQCP) which can be tackled with powerful numerical solvers such as Gurobi⁴⁴. The problem instance is encoded into the binary matrices $R,S\in {{\mathbb{F}}}_{2}^{n\times m}$, which host the exponent vectors of the Pauli operators P₁, …, P_m to be diagonalized, and into the connectivity graph Γ_con with e edges. A straightforward albeit computationally expensive procedure is the following, which we refer to as naive numerical solver:

Introduce 4n free binary (Boolean) variables ${\bf{a}}=({a}_{1}^{xx},\ldots ,{a}_{n}^{zz})$ as vectorizations of matrices A^xx, …, A^zz as stated in Eq. (8).
Introduce e binary (Boolean) variables γ_i,j, one for each edge in the connectivity graph Γ_con. The final values of γ_i,j determine whether or not the corresponding hardware-supported CZ gates are used.
Introduce a matrix $N=({\nu }_{j,k})\in {{\mathbb{Z}}}^{n\times m}$ of n × m integer slack variables and define nm quadratic constraints, one for each entry in the matrix equation ΓA^xxR + ΓA^xzS = A^zxR + A^zzS + 2N. Note that the term 2N exploits Eq. (6) being defined over ${{\mathbb{F}}}_{2}^{n\times m}$, i.e., equality only needs to hold modulo 2.
Define quadratic constraints, ${a}_{i}^{xx}{a}_{i}^{zz}+{a}_{i}^{zx}{a}_{i}^{xz}=1$ for each i ∈ {1, …, n}, to ensure invertibility as a determinant constraint. Here, there is no need to introduce additional integer slack variables that account for the fact that $\det ({A}_{i})=1$ is supposed to hold modulo 2 because 1 is the only odd value that the expression ${a}_{i}^{xx}{a}_{i}^{zz}+{a}_{i}^{zx}{a}_{i}^{xz}$ can possibly take.
Specify a target function to be minimized, e.g., the number of two-qubit gates in the final circuit, cost(a, Γ) = ∑_i<jγ_i,j.

In general, IQP is computationally hard in worst case complexity: it is in NP, while the decision version is NP-complete⁴⁵. We can reduce the complexity of the naive IQP above by leveraging our mathematical insights from the previous subsection. Treating only one fixed subgraph Γ ⊂ Γ_con at a time, we can compute Q₁, …, Q_n via Eq. (13). Here, however, we regard the entries of Q_i as real numbers rather than equivalence classes of integers modulo 2. As a direct alternative to intersecting affine subspaces, we can attempt to find a solution ${\boldsymbol{\lambda }}\in {\mathcal{L}}$ by executing the following IQP which we call informed numerical solver:

Introduce d binary variables λ = (λ₁, …, λ_d) as well as n integer slack variables μ₁, …, μ_n.
Define n quadratic constraints, λ^TQ_iλ = 1 + 2μ_i.
If necessary, specify a trivial target function.

Note that, by considering only one subgraph Γ at a time, the quadratic constraints in Eq. (6) become linear constraints over Boolean variables. This allows us to efficiently identify the ambient space in Fig. 2, however, the constraints ensuring invertibility of A still define quadratic constraints, which can be cast into the form of an IQP. The structured problems encountered here feature only n quadratic constraints and can be solved in practice for larger system sizes. Empirically, we will demonstrate later in Fig. 5 that highly-optimized, commercially available MIQCP solvers constitute a viable alternative to our provably-efficient algebraic solver.

Solver overview

For an overview of all solvers, see Table 3. In principle, each solver (except for naive num.) can be executed in an exhaustive mode, where all s(n) = 2^e subgraphs of Γ_con instead of only s(n) = poly(n) are probed. In the exhaustive mode, every solver has an exponential runtime which limits their range of applicability. For small system sizes (e.g., n ≲ 12 in Fig. 5), we recommend using the exhaustive, informed numerical solver as it performs best in practice. For larger system sizes, we recommend either the restricted, informed numerical solver, which can be very fast, or the restricted algebraic solver, whose runtime can be easily controlled, see methods section “Runtime Analysis”. Hereby, the hyperparameter s(n) and, if applicable, also c(n), should be selected appropriately, see Fig. 5, methods, and SM Sec. VII for further details and examples.

Table 3 Overview of our solvers for Equation (6)

Full size table

Experimental demonstration

Next, we demonstrate that our theoretical framework can enhance the efficiency of today’s state-of-the-art quantum computers in practice. A central task is to estimate the expectation value 〈O〉 = Tr[ρO] for an observable O of interest and a state ρ prepared on the quantum computer. For a fixed shot budget (total number of available circuit executions), the goal is to estimate 〈O〉 as accurately as possible. As a concrete example, we consider an eight-qubit molecular Hamiltonian $O=\mathop{\sum }\nolimits_{i = 1}^{M}{c}_{i}{P}_{i}$ with M = 184 Pauli operators that represents a four-atomic linear hydrogen chain, see SM Sec. XIII for details. To enable a fair comparison of different estimation procedures, we restrict ourselves to the paradigm of non-overlapping Pauli groupings with reliably-executable readout circuits²⁴. To the best of our knowledge, the previous state-of-the-art method within this paradigm is the collection of single-qubit readout circuits that arise from applying the Sorted Insertion algorithm²¹ that groups {P₁, …, P₁₈₄} into, in this case, ${N}_{{\rm{TPB}}}^{{\rm{circs}}}=35$ QWC subsets. As an alternative, we propose to use ${N}_{{\rm{HT}}}^{{\rm{circs}}}=10$ HT readout circuits which allow us to measure the same M = 184 Pauli operators and which contain no more than four linearly-connected CZ gates. This reduction in the number of readout circuits is possible because being jointly-HT diagonalizable is a less stringent requirement than QWC. What matters in the end, however, is the error $\epsilon =| {E}_{\exp }-{E}_{{\rm{ideal}}}|$ by which the experimentally measured energy ${E}_{\exp }$ differs from the ideal one. Hereby, one should minimize ϵ by optimally distributing the available shots among the ${N}_{{\rm{TPB}}}^{{\rm{circs}}}=35$ or ${N}_{{\rm{HT}}}^{{\rm{circs}}}=10$ readout circuits²¹, see SM Sec. IV. To enable a pronounced comparison between the two readout methods, we select a product target state vector $\left\vert \Psi \right\rangle =\left\vert {\psi }_{1}\right\rangle \otimes \ldots \otimes \left\vert {\psi }_{8}\right\rangle$ that can be prepared with a high fidelity. This state has a large energy ${E}_{{\rm{ideal}}}=\left\langle \Psi \right\vert O\left\vert \Psi \right\rangle =-0.49264\,\,\text{Ha}\,$ compared to the ground state energy, $\mathop{\min }\limits_{\Phi }\left\langle \Phi \right\vert O\left\vert \Phi \right\rangle =-2.26752\,\,\text{Ha}\,$, where all reported energies include Coulomb repulsion between the nuclei. We perform experiments for both readout methods and present the results in Fig. 3. The error ϵ is plotted in Fig. 3a as a function of the shot budget N^shots for both TPB (blue circles) and HT (green diamonds). We compare the experimental results (dark) to classical, noise-free simulations (bright). In Fig. 3b we display the shot reduction ratio ${N}_{{\rm{TPB}}}^{{\rm{shots}}}/{N}_{{\rm{HT}}}^{{\rm{shots}}}$ for achieving a target error ϵ that our HT circuits offer over conventional TPBs. For very low budgets of a few hundred shots in total, we see that the experimental data perfectly agree with the simulation. This is because sampling errors dominate here. While the simulated errors generally decrease as $1/\sqrt{{N}^{{\rm{shots}}}}$, the experimental errors eventually saturate at a finite bias b stemming mainly from noise in the diagonalization circuits and the readout-error-mitigated⁴⁶Z-measurements. Unsurprisingly, b_HT = 0.0299(2) Ha is larger than b_TPB = 0.0284(2)Ha. For shot budgets below 10⁴, however, we observe that ϵ is smaller for HT than for TPB. Hence, our HT readout circuits outperform TPBs in the low-shot regime, which is of practical importance, see SM Sec. I.

**Fig. 3: Experimental shot reduction of HT readout circuits over TPBs.**

Discussion

In the remainder of this article, we discuss the reusability potential of HT Pauli groupings, investigate the performance of our solvers, and highlight further potential use cases for our theoretical framework.

Every grouping of a fixed set {P₁, …, P_M} into jointly-diagonalizable subsets has vast reusability potential: for a fixed observable $O=\mathop{\sum }\nolimits_{i = 1}^{M}{c}_{i}{P}_{i}$, one can estimate Tr[ρO] for countless experimental states ρ, e.g., different eigenstates of O as well as the myriad of VQE trial states that are encountered until such eigenstates are found. Similarly, in simulations of chemical reactions, Tr[ρ(t)O] is mapped out for a time series of quantum states ρ(t).

Moreover, for any linear combination ${O}^{{\prime} }=\mathop{\sum }\nolimits_{i = 1}^{M}{c}_{i}^{{\prime} }{P}_{i}$ of the same Pauli operators, it is possible to estimate $\langle {O}^{{\prime} }\rangle$ with the same readout circuits as for 〈O〉. Therefore, a single Pauli grouping can suffice to map out low-energy Born-Oppenheimer surfaces via, e.g., VQE. At the same time, we are guaranteed that the quality of the reused grouping does not deteriorate abruptly as the estimated shot reduction $\hat{R}$ depends continuously on the nuclear coordinates²¹.

We illustrate this fact in Fig. 4 for the example of an eight-qubit Hamiltonian $O(d)=\mathop{\sum }\nolimits_{i = 1}^{184}{c}_{i}(d){P}_{i}$ that represents a four-atomic linear hydrogen chain for which the interatomic spacing d is varied. For different Pauli groupings of O(d), we plot $\hat{R}$, which should be regarded as a state-independent estimate of the shot reduction ratio ${N}_{{\rm{GPM}}}^{{\rm{shots}}}/{N}_{{\rm{IPM}}}^{{\rm{shots}}}$ (cf. Fig. 3b), where ${N}_{{\rm{GPM}}}^{{\rm{shots}}}$ and ${N}_{{\rm{IPM}}}^{{\rm{shots}}}$, respectively, denote the number of shots required to measure 〈O(d)〉 to a fixed precision ϵ if grouped Pauli measurements (GPM) and individual Pauli measurements (IPM) are performed. For example, O(d = 1.0 Å) coincides with the Hamiltonian from Fig. 3, and the estimated shot reduction ratio of HT over TPB, ${\hat{R}}_{{\rm{HT}}}/{\hat{R}}_{{\rm{TPB}}}\approx 1.76$, is close to the state-dependent value, R_HT/R_TPB ≈ 1.87. As explained above, we can see in Fig. 4 that $\hat{R}$ depends continuously on d if a fixed Pauli grouping is reused (dark dashed lines). Also shown is the piecewise continuous dependence of ${\hat{R}}_{{\rm{GC}}}$ and ${\hat{R}}_{{\rm{TPB}}}$ if a new Pauli grouping is recomputed for every value of d (bright solid lines), where the jumps unveil the values of d at which the output of the Sorted Insertion algorithm changes. Since Sorted Insertion is a greedy algorithm, recomputing the Pauli grouping can even diminish the value of $\hat{R}$. Finally, note that Pauli groupings can be reused for numerous other applications in the context of parameter-shift rules^15,16,47,48.

It is straightforward to adapt Sorted Insertion to HT Pauli groupings, see SM Sec. V. Instead of checking if a set of Pauli operators (qubitwise) commutes, one has to construct a HT readout circuit using one of the solvers from Table 3. In Fig. 5, we investigate the performance of our various solvers by applying a modified Sorted Insertion algorithm to three classes of paradigmatic Hamiltonians. In all cases, we observe in Fig. 5a–c that the estimated shot reduction ratio, ${\hat{R}}_{{\rm{HT}}}/{\hat{R}}_{{\rm{TPB}}}$, takes values in between 1.3 and 3.5. Therefore, our HT readout circuits consistently outperform conventional TPBs.

**Fig. 5: Performance of a greedy Pauli grouping algorithm with various solvers from Table 3.**

While Sorted Insertion has a runtime of ${\mathcal{O}}({M}^{2}n)$, the runtime of our adaptation is given by ${\mathcal{O}}({M}^{2}f(n))$, where M still denotes the number of Pauli operators in the observable O, and f(n) is the complexity of the selected solver, e.g., $f(n)={\mathcal{O}}({2}^{e}{6}^{n}{n}^{3})$ for the brute-force algebraic solver and $f(n)={\mathcal{O}}(\,{\text{poly}}\,(n))$ for the restricted algebraic solver, see methods section “Runtime Analysis”. For molecular hydrogen chain Hamiltonians for example, two polynomial fits (green dashed lines) in Fig. 5d reveal total runtimes t⁽¹⁾ ∝ n^9.8 and t⁽²⁾ ∝ n^9.4. Since the number of Pauli operators is here given by M ≈ 0.17 × n^3.68, we can conclude that the restricted algebraic solver has an empirical runtime f(n) lying somewhere in between ${\mathcal{O}}({n}^{2.0})$ and ${\mathcal{O}}({n}^{2.4})$, which is in agreement with the expected runtime of $f(n)={\mathcal{O}}({n}^{3})$, see methods.

For the example of random Hamiltonians, we compare in Fig. 5b the algebraic solver with the informed numerical solver from Table 3, both in combination with an exhaustive (exh.) search over all 2^e subgraphs Γ ⊂ Γ_con. We find that both solvers produce HT Pauli groupings of the same quality, which we believe to be near optimal for the assumed linear connectivity constraint. By construction, both exhaustive solvers are inefficient, however, we observe in Fig. 5e that the numerical solver is much faster in practice. This is unsurprising as this option leverages a highly-optimized MIQCP solver.

Since the exhaustive numerical solver is our best option for constructing near-optimal HT Pauli groupings, we can now turn to the question about how much the quality (as measured by ${\hat{R}}_{{\rm{HT}}}$) of the HT Pauli grouping decreases when we replace the brute-force solver by our provably-efficient restricted algebraic solver. For the example of the Hubbard model we observe in Fig. 5c that both the restricted numerical solver and the restricted algebraic solver produce HT Pauli groupings with the same qualitative behavior of $\hat{R}$ as the (near-optimal) exhaustive numerical solver. For all solvers, $\hat{R}$ first increases with the number of qubits n, before it drops again. We attribute this saturation effect to the linear hardware connectivity constraint because ${\hat{R}}_{{\rm{GC}}}/{\hat{R}}_{{\rm{TPB}}}$ continues to grow as shown in Fig. 5i (also see Fig. 1 in SM Sec. II). Whether a similar growth is retained with 2-dimensional HT readout circuits deserves further investigation. For n = 14, the exhaustive numerical solver constructs a grouping with ${\hat{R}}_{{\rm{HT}}}/{\hat{R}}_{{\rm{TPB}}}\approx 3.34$ after 5 h. By carefully balancing the hyperparameters of the restricted algebraic solver (see “Methods”), we find an even better grouping with ${\hat{R}}_{{\rm{HT}}}/{\hat{R}}_{{\rm{TPB}}}\approx 3.42$ in only 11 min (red star).

The discussion above shows that the classical bottleneck of our HT Pauli grouper can be overcome. To corroborate this claim, we propose and test a rudimentary HT Pauli grouper (Algorithm 2 in SM Sec. VI) for which the enabled runtime savings outweigh the classical preprocessing cost for a 52-qubit example. This demonstrates that the advantage of HT readout circuits can be scaled up to meaningful system sizes. Finally, let us point out that our approach also works for more sophisticated molecular basis sets, see SM Sec. XIV.

In this article, we have introduced a theoretical framework for the construction of diagonalization circuits that can be tailored to any given hardware connectivity. Our starting point was the observation of the fact that every set of commuting Pauli operators can be cast into the stabilizer group of a stabilizer state which is local-Clifford equivalent to a graph state. The Pauli operators can thus be diagonalized with a quantum circuit that completely avoids SWAP gates whenever this graph state matches the connectivity of the quantum computer. We derived an algebraic criterion for the existence of such hardware-tailored (HT) diagonalization circuits and introduced solvers for their construction. An important empirical observation is that in many cases it is not necessary to apply unconstrained diagonalization circuits because also HT circuits can be constructed. For super- and semiconducting chips, for which the connectivity graph has a bounded degree, HT circuits have a constant depth. In comparison to state preparation circuits such as UCCSD, which have a gate count of ${\mathcal{O}}({n}^{4})$ and a gate depth of ${\mathcal{O}}({n}^{3})$, the circuit complexity of HT readout circuits is therefore negligible¹⁴. The construction of HT circuits can be computationally demanding but is worthwhile given their reusability potential. Finally, we have demonstrated the advantage of our approach over previous ones both in theory and in experiment.

There are multiple ways to further improve the efficiency of our solvers for the construction of HT diagonalization circuits, see methods section “Ideas for Improving our Solvers”. Furthermore, it would be worthwhile to combine the paradigm of HT readout circuits with complementary state-of-the-art methods for lowering shot requirements such as iterative measurement allocation and iterative coefficient splitting²⁵.

Lowering shot requirements, e.g., in variational algorithms as discussed above, is just one of many potential applications for HT diagonalization circuits. Other, equally important use cases arise in the contexts of classical shadows²⁸, Hamiltonian time simulation^49,50,51, and quantum error correction⁵². For instance, an encoding circuit for a stabilizer quantum error-correcting code is the same as a time-reversed diagonalization circuit for its stabilizer group. For an outline how Hamiltonian exponentiation could potentially benefit from our HT diagonalization circuits, see SM Sec. IX.

Methods

Choices of Hamiltonians

All molecular hydrogen chain Hamiltonians were computed in the STO-3G minimal basis using Qiskit nature⁵³, either in combination with pyquante⁵⁴ or PySCF⁵⁵. More sophisticated basis sets are discussed in SM Sec. XIV. For Table 2, Figs. 3, and 4, we use PyQuante and the Bravyi-Kitaev fermion-to-qubit mapper to obtain n-qubit Hamiltonians representing a hydrogen chain with $\frac{n}{2}$ nuclei at an equidistant spacing of d, where we fix d = 1.0 Å in Table 2 and Fig. 3. The exact numbers of Pauli operators M(n) of the n-qubit Hamiltonians in Table 2 are given by M(20) = 7150, M(40) = 116594, M(60) = 594954, M(80) = 1886542, M(100) = 4611050, and M(120) = 9559318. For the hydrogen chain Hamiltonians in the left column of Fig. 5, on the other hand, we use PySCF and the parity-encoding fermion-to-qubit mapper, resulting in Hamiltonians with only n = 2n_at − 2 qubits, where n_at is the number of hydrogen atoms; also here the interatomic spacing is chosen as d = 1.0 Å.

In Fig. 5b, we average over twenty random Hamiltonians of the form $O=\mathop{\sum }\nolimits_{i = 1}^{M}{c}_{i}{P}_{i}$ for each choice of qubit number n and number of Pauli operators M. For every O, the Pauli operators P_i ∈ {I, X, Y, Z}^⊗n\{I^⊗n} and coefficients c_i ∈ [−1, 1] are drawn uniformly at random. Error bars show one standard deviation. For an in-depth investigation of the results, see SM Sec. VIII.

Figure 5 c features the Hubbard model of L fermionic modes ${\hat{c}}_{{k}_{j},\sigma }^{\dagger }$ with momentum k_j = 2πj/L and spin σ on a 1-dimensional lattice with L sites and periodic boundary conditions. Its Hamiltonian is given by

$$\begin{array}{rcl}O&=&\frac{U}{L}\mathop{\sum }\limits_{i,j,l=1}^{L}{\hat{c}}_{{k}_{i}-{k}_{l},\uparrow }^{\dagger }{\hat{c}}_{{k}_{j}+{k}_{l},\downarrow }^{\dagger }{\hat{c}}_{{k}_{j},\downarrow }{\hat{c}}_{{k}_{i},\uparrow }+\mathop{\sum }\limits_{j=1}^{L}\sum _{\sigma \in \{\uparrow ,\downarrow \}}{\epsilon }_{{k}_{j}}{\hat{c}}_{{k}_{j},\sigma }^{\dagger }{\hat{c}}_{{k}_{j},\sigma }\end{array}$$

(20)

where U ≥ 0 is the Coulomb energy, ${\epsilon }_{{k}_{j}}=-2t\cos ({k}_{j})$ is the dispersion relation for non-interacting (U = 0) fermions, and t ≥ 0 is the hopping strength⁵⁶. We use the block-spin Jordan-Wigner encoding to convert this Hamiltonian into a linear combination of Pauli operators. In Fig. 5, we assume U = 1 and t = 1 to obtain concrete values for $\hat{R}$.

Choices of hyperparameters

Here we report the choices of hyperparameters of our solvers that were used to create Fig. 5. In all cases, we tailor the readout circuits to a linear connectivity by applying Algorithm 1 from SM Sec. V. The total runtime and the quality of the resulting Pauli grouping depends on the choice of the solver from Table 3 and its hyperparameters. Unless specified otherwise, all computations were carried out on an 18 core Intel Xeon CPU E5-2697 v4 @2.30GHz device (Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.).

The exhaustive informed numerical solver (dark blue triangles in Fig. 5d,f and dark markers with dotted lines in Fig. 5e) searches over all 2ⁿ⁻¹ circuit templates Γ ⊂ Γ_con and constructs readout circuits (one for each subgraph Γ) by solving the informed IQP problem (if possible). Then, the best circuit is chosen to assign a collection of Pauli operators to a joint measurement, see SM Sec. V for further details. The exhaustive algebraic solver (bright markers with solid lines in Fig. 5e) similarly searches over all 2ⁿ⁻¹ circuit templates but applies the algebraic brute-force solver instead of the informed numerical solver. Both exhaustive solvers have an exponential runtime because they take all subgraphs Γ ⊂ Γ_con into account.

The restricted numerical solver has one hyperparameter: the number s(n) of subgraphs Γ ⊂ Γ_con that are probed. In our current implementation, the specific choice of subgraphs is taken uniformly at random. For the Hubbard model (Fig. 5f), we work with $s(n)=\min \{1.5{n}^{2}-0.5n,{2}^{n-1}\}$ random subgraphs.

The restricted algebraic solver has two hyperparameters: the number s(n) of random subgraphs and the cutoff c(n) defined before Eq. (27). For the Hubbard model (Fig. 5f), we work with $s(n)=\min \{{n}^{2},{2}^{n-1}\}$ and $c(n)=\lfloor {\log }_{2}(n)\rfloor$, except for the fine-tuned parameter choice (red star) where the number of subgraphs is increased from s(n) = n² = 196 to s = 1000. For hydrogen chains (Fig. 5d), on the other hand, we use two different choices of constant hyperparameters:

(1)
s(n) = 2000 and c(n) = 5.
(2)
s(n) = 1000 and c(n) = 3.

Besides HT Pauli groupings, we also compute GC Pauli groupings in Figs. 4 and 5 and groupings of O into TPBs in Figs. 3–5. These groupings were obtained by applying the Sorted Insertion algorithm²¹ with the insertion conditions “commute” and “qubitwise commute”, respectively.

For additional numerical investigations about how the choice of hyperparameters influences the performance of the HT Pauli grouping algorithm, see SM Secs. VII and XIV.

Technical details on the algebraic solver

Here we continue our technical discussion about how to algebraically construct HT diagonalization circuits. For every qubit i ∈ {1, …, n}, there are three cases how the hypersurface ${{\mathcal{L}}}_{i}=\{{\boldsymbol{\lambda }}\in {{\mathbb{F}}}_{2}^{d}\,| \,{{\boldsymbol{\lambda }}}^{{\rm{T}}}{Q}_{i}{\boldsymbol{\lambda }}=1\}$ could look like: since the image of the ${{\mathbb{F}}}_{2}$-linear map defined by the matrix Q_i is contained in the span of x_i and w_i, the rank of Q_i can only take one of the three values: 0, 1, and 2. If ${\text{rank}}_{{{\mathbb{F}}}_{2}}({Q}_{i})=0$, i.e., Q_i = 0, then ${{\mathcal{L}}}_{i}=\varnothing$ is empty, which implies ${\mathcal{L}}=\mathop{\bigcap }\nolimits_{i = 1}^{n}{{\mathcal{L}}}_{i}=\varnothing$ and proves the non-existence of a HT diagonalization circuit for the fixed choice of P₁, …, P_m and Γ. On the other hand, if ${\text{rank}}_{{{\mathbb{F}}}_{2}}({Q}_{i})=1$, e.g., ${{\bf{x}}}_{i}{{\bf{z}}}_{i}^{{\rm{T}}}=0$ but ${{\bf{w}}}_{i}{{\bf{y}}}_{i}^{{\rm{T}}}\ne 0$, then we need ${{\boldsymbol{\lambda }}}^{{\rm{T}}}{{\bf{w}}}_{i}{{\bf{y}}}_{i}^{{\rm{T}}}{\boldsymbol{\lambda }}=1$, which is equivalent to λ^Tw_i = λ^Ty_i = 1. Thus, this case is degenerate and the hypersurface collapses to an affine subspace

$${{\mathcal{L}}}_{i}={{\mathcal{W}}}_{i}^{(1)}\cap {{\mathcal{Y}}}_{i}^{(1)}.$$

(21)

For a detailed explanation about how to compute intersections of affine subspaces numerically, see SM Sec. X. Finally, in the most general case of ${\text{rank}}_{{{\mathbb{F}}}_{2}}({Q}_{i})=2$, the hypersurface ${{\mathcal{L}}}_{i}$ defies a description simpler than the union of four affine subspaces as in Eq. (19). As we show next, the occurrence of rank-1 hypersurfaces can greatly simplify the problem.

Denote the number of rank-2 hypersurfaces by k. In our search for a point ${\boldsymbol{\lambda }}\in {\mathcal{L}}$, we first compute the intersection of all rank-1 hypersurfaces ${{\mathcal{L}}}_{i}$. After potentially relabeling some of the qubits, we can assume ${Q}_{i}={{\bf{x}}}_{i}{{\bf{z}}}_{i}^{{\rm{T}}}$ and ${Q}_{j}={{\bf{w}}}_{j}{{\bf{y}}}_{j}^{{\rm{T}}}$ for all i ∈ {1, …, l} and j ∈ {l + 1, …, n − k}. Then, the intersection of all rank-1 quadric hypersurfaces is given by

$$\begin{array}{ll}\mathop{\bigcap }\limits_{i=1}^{n-k}{{\mathcal{L}}}_{i}\,=\,\left\{{\boldsymbol{\lambda }}\in {{\mathbb{F}}}_{2}^{d}\,\left| \begin{array}{l}{{\boldsymbol{\lambda }}}^{{\rm{T}}}{{\bf{x}}}_{i}={{\boldsymbol{\lambda }}}^{{\rm{T}}}{{\bf{z}}}_{i}={{\boldsymbol{\lambda }}}^{{\rm{T}}}{{\bf{y}}}_{j}={{\boldsymbol{\lambda }}}^{{\rm{T}}}{{\bf{w}}}_{j}=1\\ \,{\text{for}}\, {\text{all}}\,1\le i\le l\, < \,j\le n-k\end{array}\right.\right\}\\\qquad\quad\,=\,\left\{{\boldsymbol{\lambda }}\in {{\mathbb{F}}}_{2}^{d}\left| \,C{\boldsymbol{\lambda }}={\bf{1}}\right.\right\},\end{array}$$

(22)

where 1 = [1, …, 1]^T and for the (2(n − k) × d)-matrix $C={[{{\bf{x}}}_{1},{{\bf{z}}}_{1},\ldots ,{{\bf{x}}}_{l},{{\bf{z}}}_{l},{{\bf{y}}}_{l+1},{{\bf{w}}}_{l+1},\ldots ,{{\bf{y}}}_{n-k},{{\bf{w}}}_{n-k}]}^{{\rm{T}}}$ whose rows are given by the row vectors ${{\bf{x}}}_{1}^{{\rm{T}}}$ etc. Using Gaussian elimination over ${{\mathbb{F}}}_{2}$, we can compute the reduced row-echelon form (RREF) of the extended matrix $[C,{\bf{1}}]\in {{\mathbb{F}}}_{2}^{2(n-k)\times (l+1)}$. If the last column of the RREF is a pivot column, we are in the solutionless case ${\mathcal{L}}=\varnothing$. Otherwise, this column is an offset vector for ${{\mathcal{L}}}_{1}\cap \ldots \cap {{\mathcal{L}}}_{n-k}$, while every non-pivot column of the RREF of C yields a basis vector in the standard way of linear algebra, see SM Sec. X for technical details.

Finally, we turn to the computationally most demanding part that deals with the rank-2 hypersurfaces ${{\mathcal{L}}}_{n-k+1},\ldots ,{{\mathcal{L}}}_{n}$. Similar to the matrix C in Eq. (22), we introduce a (4k × d)-matrix ${C}^{{\prime} }={[{{\bf{x}}}_{n-k+1},{{\bf{z}}}_{n-k+1},{{\bf{w}}}_{n-k+1},{{\bf{y}}}_{n-k+1},\ldots ,{{\bf{x}}}_{n},{{\bf{z}}}_{n},{{\bf{w}}}_{n},{{\bf{y}}}_{n}]}^{{\rm{T}}}$, i.e., for every j ∈ {1, …, k}, the four rows of ${C}^{{\prime} }$ from row 4j − 3 to row 4j are given by ${{\bf{x}}}_{n-k+j}^{{\rm{T}}}$, ${{\bf{z}}}_{n-k+j}^{{\rm{T}}}$, ${{\bf{w}}}_{n-k+j}^{{\rm{T}}}$, and ${{\bf{y}}}_{n-k+j}^{{\rm{T}}}$. Then, a vector ${\boldsymbol{\lambda }}\in {{\mathbb{F}}}_{2}^{d}$ is contained in ${{\mathcal{L}}}_{n-k+1}\cap \ldots \cap {{\mathcal{L}}}_{n}$ if and only if ${C}^{{\prime} }{\boldsymbol{\lambda }}={{\bf{b}}}_{{i}_{1}}\oplus \ldots \oplus {{\bf{b}}}_{{i}_{k}}$ for some i₁, …, i_k ∈ {1, …, 6}, where the vectors b₁, …, b₆ in the direct sum

$${{\bf{b}}}_{{i}_{1}}\oplus \ldots \oplus {{\bf{b}}}_{{i}_{k}}=\left[\begin{array}{c}{{\bf{b}}}_{{i}_{1}}\\ \vdots \\ {{\bf{b}}}_{{i}_{k}}\\ \end{array}\right]$$

(23)

are the six vectors from Eq. (14), i.e., b₁ = [0, 0, 1, 1]^T, etc. If it is our goal to unambiguously ascertain whether or not ${\mathcal{L}}$ is empty, we have to check an exponential number of cases. This is only feasible for small qubit numbers n or for graphs with small components, as we explain in SM Sec. XII. To save computing time, we treat all combinations of the vectors b_i at once by introducing (4 × 6^j)-matrices

$$B_j = [\mathop{\underbrace{{\mathbf{b}}_1,\ldots, {\mathbf{b}}_1}}\limits_{6^{j-1} {\text{times}}}, \ldots, \mathop{\underbrace{{\mathbf{b}}_6,\ldots, {\mathbf{b}}_6}}\limits_{6^{j-1} {\text{times}}}]$$

(24)

for j ∈ {1, …, k} and using them as blocks for enlarging ${C}^{{\prime} }$ to a matrix $B\in {{\mathbb{F}}}_{2}^{4k\times (d+{6}^{k})}$. Hereby, the four rows from row (4j − 3) to row 4j of B are given by

$$\left[{\begin{array}{l} {\mathbf{x}}_{n-k+j}^{\mathrm{T}}, \\ {\mathbf{z}}_{n-k+j}^{\mathrm{T}}, \\ {\mathbf{w}}_{n-k+j}^{\mathrm{T}}, \\ {\mathbf{y}}_{n-k+j}^{\mathrm{T}}, \end{array}} \,\,\displaystyle\underbrace{\displaystyle{B_{j}},\ldots, {\displaystyle{B}_{j}}}_{6^{k-j}\,{\rm{times}}}\,\,\right]\in {\mathbb{F}}_{2}^{4\times (d+6^{k})}.$$

(25)

In other words, B is the matrix that arises from ${C}^{{\prime} }$ by appending all vectors of the form ${{\bf{b}}}_{{i}_{1}}\oplus \ldots \oplus {{\bf{b}}}_{{i}_{k}}$ as additional columns. Next, we use Gaussian elimination to bring B to a row-echelon form; here, time can be saved as it is not necessary to compute the RREF of B. This reveals the non-pivot columns of B. Every non-pivot column of the form ${{\bf{b}}}_{{i}_{1}}\oplus \ldots \oplus {{\bf{b}}}_{{i}_{k}}$ indicates the existence of at least one vector ${\boldsymbol{\lambda }}\in {{\mathcal{L}}}_{n-k-1}\cap \ldots \cap {{\mathcal{L}}}_{n}$. However, we are looking for a λ that also lies in the affine space ${{\mathcal{L}}}_{1}\cap \ldots \cap {{\mathcal{L}}}_{n-k}$. To accomplish this, we start by computing a basis and an offset vector of the entire affine space

$$\left\{{\boldsymbol{\lambda }}\in {{\mathbb{F}}}_{2}^{d}\,| \,{C}^{{\prime} }{\boldsymbol{\lambda }}={{\bf{b}}}_{{i}_{1}}\oplus \ldots \oplus {{\bf{b}}}_{{i}_{k}}\right\}.$$

(26)

Then, we use the procedure explained in SM Sec. X to compute the intersection of the two affine subspaces in Eqs. (22) and (26). Since this results in a subset of ${\mathcal{L}}$, we can finish if we find a non-pivot column ${{\bf{b}}}_{{i}_{1}}\oplus \ldots \oplus {{\bf{b}}}_{{i}_{k}}$ in the right part of B for which this intersection is not empty. Otherwise, if this approach fails for all non-pivot columns, we can finally infer ${\mathcal{L}}=\varnothing$. In any case, we obtain a conclusive answer whether or not a layer of single-qubit Clifford gates exists such that the corresponding graph-based circuit (see main text, Fig. 1) diagonalizes the given set of commuting Pauli operators.

Restricting the algebraic solver

In the exhaustive approach of the previous subsection to algebraically solve Eq. (6) from the main text, we iterate through a number of affine subspaces that grow exponentially in the number k ≤ n of qubits i for which ${{\mathcal{L}}}_{i}$ is a rank-2 quadric hypersurface. For large problem sizes, this is infeasible and we can instead restrict the search to a smaller number c(n) ≤ k of rank-2 quadric hypersurfaces. For example, we can work with a constant cutoff

$$c(n)=\,\text{const.}\,$$

(27)

or with a logarithmically-growing cutoff

$$c(n)=\,\text{const.}\,\times \lfloor \log (n)\rfloor .$$

(28)

This restriction turns our algebraic solver (for attempting to find a solution ${\boldsymbol{\lambda }}\in {\mathcal{L}}$) into an efficient but probabilistic algorithm because only a polynomial number of intersections of affine subspaces will be probed, see methods section “Runtime Analysis” below. If no solution is found this way, we treat this case as if ${\mathcal{L}}$ was empty, i.e., we skip the current subgraph in our Pauli grouping algorithm from SM Sec. V.

We can incorporate the cutoff c = c(n) by replacing the matrix $B\in {{\mathbb{F}}}_{2}^{4k\times (d+{6}^{k})}$ in Eq. (25) by a smaller matrix ${B}^{{\prime} }\in {{\mathbb{F}}}_{2}^{(4c+3(k-c))\times (d+{6}^{c})}$. The first 4c rows of ${B}^{{\prime} }$ are again given by Eq. (25), but with 6^c−j instead of 6^k−j blocks of the form B_j. For the remaining part, we set the three rows from row (4c + 3j − 2) to row (4c + 3j) to

$$\left[\begin{array}{cccc}{{\bf{x}}}_{n-k+c+j}^{{\rm{T}}}&0&\cdots \,&0\\ {{\bf{w}}}_{n-k+c+j}^{{\rm{T}}}&1&\cdots \,&1\\ {{\bf{y}}}_{n-k+c+j}^{{\rm{T}}}&1&\cdots \,&1\\ \end{array}\right]\in {{\mathbb{F}}}_{2}^{3\times (d+{6}^{c})}$$

(29)

for all j ∈ {1, …, k − c}, i.e., we dispose of the z-row. In this way, we have effectively combined the two cases that correspond to b₁ and b₂ from Eq. (14). The remainder of our approach stays unchanged. By restricting from B to ${B}^{{\prime} }$, we will only be able to find solutions ${\boldsymbol{\lambda }}\in {{\mathcal{L}}}_{n-k+1}\cap \ldots \cap {{\mathcal{L}}}_{n}$ which are contained in the subspace

$$\left(\bigcap\limits_{i=n-k+1}^{n-k+c} \mathop{\underbrace{{\mathcal{A}}_i \cup{\mathcal{B}}_i \cup{\mathcal{C}}_i \cup{\mathcal{D}}_i}}\limits_{={{\mathcal{L}}}_i}\right) \cap \bigcap\limits_{i=n-k+c+1}^n \mathop{\underbrace{{\mathcal{A}}_i}}\limits_{\subset {{\mathcal{L}}}_i}.$$

(30)

Let us illustrate the working principle of the cutoff c. In the hypothetical example of Fig. 2 in the main text, we have highlighted the affine subspaces

$${{\mathcal{A}}}_{i}\subset {{\mathcal{L}}}_{i}={{\mathcal{A}}}_{i}\cup {{\mathcal{B}}}_{i}\cup {{\mathcal{C}}}_{i}\cup {{\mathcal{D}}}_{i}$$

(31)

with an increased line width to distinguish them from ${{\mathcal{B}}}_{i}$, ${{\mathcal{C}}}_{i}$, and ${{\mathcal{D}}}_{i}$. In this example, the number of rank-2 hypersurfaces is k = 3. Hence, there are four possible choices for the cutoff c ∈ {0, …, k}. For c = k = 3, we recover the original approach and are able to probe all six depicted intersection spaces (yellow dots). For c = 2, the restricted search space ${{\mathcal{L}}}_{1}\cap {{\mathcal{L}}}_{2}\cap {{\mathcal{A}}}_{3}$ only contains two non-empty intersection spaces, namely ${{\mathcal{B}}}_{1}\cap {{\mathcal{A}}}_{2}\cap {{\mathcal{A}}}_{3}$ and ${{\mathcal{D}}}_{1}\cap {{\mathcal{C}}}_{2}\cap {{\mathcal{A}}}_{3}$. For c = 1, only $\varnothing \,\ne\, {{\mathcal{B}}}_{1}\cap {{\mathcal{A}}}_{2}\cap {{\mathcal{A}}}_{3}={{\mathcal{L}}}_{1}\cap {{\mathcal{A}}}_{2}\cap {{\mathcal{A}}}_{3}$ remains. Note that neither ${{\mathcal{A}}}_{1}\cap {{\mathcal{A}}}_{2}$ nor ${{\mathcal{A}}}_{1}\cap {{\mathcal{A}}}_{3}$ nor ${{\mathcal{A}}}_{2}\cap {{\mathcal{A}}}_{3}$ is empty, but ${{\mathcal{A}}}_{1}\cap {{\mathcal{A}}}_{2}\cap {{\mathcal{A}}}_{3}$ is. Therefore, we would not be able to find any solution for c = 0 in the example of Fig. 2 in the main text. For a non-hypothetical example, see SM Sec. XI.

Runtime analysis

The restricted algebraic solver consists of an outer loop over s(n) ≤ 2^e subgraphs and an inner loop over 4^c(n) subspace intersections, where s(n) and c(n) are two hyperparameters that may explicitly depend on n. For each subgraph and each intersection, the restricted algebraic solver applies Gaussian elimination to a binary matrix ${B}^{{\prime} }$ of size (4c(n) + 3(k − c(n))) × (d + 6^c(n)), where k ≤ n and d ≤ 4n. If we choose a constant cutoff c(n) = const. as in Eq. (27), then the size of ${B}^{{\prime} }$ is proportional to n × n. Hence, Gaussian elimination has a runtime of ${\mathcal{O}}({n}^{3})$, i.e., the restricted algebraic solver has a runtime of ${\mathcal{O}}(s(n)\times {n}^{3})$. This is efficient if we choose $s(n)={\mathcal{O}}(\,{\text{poly}}\,(n))$. In particular, for the hydrogen chain Hamiltonians in Fig. 5 where we work with constant values of s(n), the solver should have a runtime of $f(n)={\mathcal{O}}({n}^{3})$. This is consistent with our empirical estimate of a runtime that lies somewhere in between ${\mathcal{O}}({n}^{2.0})$ and ${\mathcal{O}}({n}^{2.4})$.

On the other hand, if we choose a logarithmically-growing cutoff $c(n)={c}_{0}\times {\log }_{6}(n)$, the inner loop of the restricted algebraic solver iterates over ${6}^{c(n)}={n}^{{c}_{0}}$ many intersections. Furthermore, we apply Gaussian elimination of a matrix ${B}^{{\prime} }$ of size proportional to $n\times {n}^{{c}_{0}}$, which has a runtime of ${\mathcal{O}}({n}^{2+{c}_{0}})$ assuming c₀≥ 1. Hence, the overall runtime of the restricted algebraic solver follows as $f(n)={\mathcal{O}}(s(n)\times {n}^{2{c}_{0}+2})$, which is efficient for $s(n)={\mathcal{O}}(\,{\text{poly}}\,(n))$.

Ideas for improving our solvers

There is much room to further improve the runtime of both the solver for Eq. (6) and the Pauli grouping algorithm into which it is embedded:

1.
Since our solvers are highly parallelizable, a more distributed software implementation would allow us to trade runtime for computational resources.
2.
Currently, our Pauli grouper calls the solver without making use of previously computed solutions. Warm starting methods could improve upon this. For example, the RREF of M from Eq. (9) could perhaps be reused.
3.
Methods for automatically tuning the hyperparameters of our restricted algebraic solver deserve further investigation.
4.
Instead of selecting the polynomially-large subset of subgraphs at random, one could implement a more sophisticated approach such as simulated annealing.
5.
The geometry of the solution space ${\mathcal{L}}$ could be further explored. If we had access to a lexicographical Gröbner basis for the ideal generated by λ^TQ₁λ + 1, …, ${{\boldsymbol{\lambda }}}^{{\rm{T}}}{Q}_{n}{\boldsymbol{\lambda }}+1\in {{\mathbb{F}}}_{2}[{\lambda }_{1},\ldots ,{\lambda }_{d}]$ (which defines the Zariski-closed set ${\mathcal{L}}\subset {{\mathbb{F}}}_{2}^{d}$), we could construct a solution ${\boldsymbol{\lambda }}\in {\mathcal{L}}$ via elimination theory⁵⁷.

Data availability

All relevant data are available from the authors upon reasonable request.

Code availability

An open-source C++ implementation of our Pauli grouping algorithm based on our informed numerical solver is available under https://github.com/mc-zen/htgrouper.

References

Kandala, A. et al. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 242–246 (2017).
Article ADS Google Scholar
Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505 (2019).
Article ADS Google Scholar
Krantz, P. et al. A quantum engineer’s guide to superconducting qubits. Appl. Phys. Rev. 6, 021318 (2019).
Article ADS Google Scholar
Blais, A., Grimsmo, A. L., Girvin, S. M. & Wallraff, A. Circuit quantum electrodynamics. Rev. Mod. Phys. 93, 025005 (2021).
Article ADS MathSciNet Google Scholar
Bravyi, S., Dial, O., Gambetta, J. M., Gil, D. & Nazario, Z. The future of quantum computing with superconducting qubits. J. Appl. Phys. 132, 160902 (2022).
Article ADS Google Scholar
Burkard, G., Ladd, T. D., Pan, A., Nichol, J. M. & Petta, J. R. Semiconductor spin qubits. Rev. Mod. Phys. 95, 025003 (2023).
Article ADS Google Scholar
Krinner, S. et al. Realizing repeated quantum error correction in a distance-three surface code. Nature 605, 669 (2022).
Article ADS Google Scholar
Kim, Y. et al. Scalable error mitigation for noisy quantum circuits produces competitive expectation values. Nat. Phys. 19, 752 (2023).
Article Google Scholar
Kim, Y. et al. Evidence for the utility of quantum computing before fault tolerance. Nature 618, 500–505 (2023).
Article ADS Google Scholar
Hangleiter, D. & Eisert, J. Computational advantage of quantum random sampling. Rev. Mod. Phys. 95, 035001 (2023).
Article ADS MathSciNet Google Scholar
Bluvstein, D. et al. Logical quantum processor based on reconfigurable atom arrays. Nature 626, 58 (2024).
Article ADS Google Scholar
Peruzzo, A. et al. A variational eigenvalue solver on a photonic quantum processor. Nature Comm. 5, 4213 (2014).
Article ADS Google Scholar
McClean, J. R., Romero, J., Babbush, R. & Aspuru-Guzik, A. The theory of variational hybrid quantum-classical algorithms. New J. Phys. 18, 023023 (2016).
Article ADS Google Scholar
McArdle, S., Endo, S., Aspuru-Guzik, A., Benjamin, S. C. & Yuan, X. Quantum computational chemistry. Rev. Mod. Phys. 92, 015003 (2020).
Article ADS MathSciNet Google Scholar
Schuld, M., Bergholm, V., Gogolin, C., Izaac, J. & Killoran, N. Evaluating analytic gradients on quantum hardware. Phys. Rev. A 99, 032331 (2019).
Article ADS Google Scholar
Sweke, R. et al. Stochastic gradient descent for hybrid quantum-classical optimization. Quantum 4, 314 (2020).
Article Google Scholar
Verteletskyi, V., Yen, T.-C. & Izmaylov, A. F. Measurement optimization in the variational quantum eigensolver using a minimum clique cover. J. Chem. Phys. 152, 124114 (2020).
Article ADS Google Scholar
Gokhale, P. et al. O(N³) measurement cost for variational quantum eigensolver on molecular Hamiltonians. IEEE Trans. Quantum Eng. 1, 1–24 (2020).
Article MathSciNet Google Scholar
Yen, T.-C., Verteletskyi, V. & Izmaylov, A. F. Measuring all compatible operators in one series of single-qubit measurements using unitary transformations. J. Chem. Theory Comput. 16, 2400 (2020).
Article Google Scholar
Jena, A., Genin, S. N. & Mosca, M. Optimization of variational-quantum-eigensolver measurement by partitioning Pauli operators using multiqubit Clifford gates on noisy intermediate-scale quantum hardware. Phys. Rev. A 106, 042443 (2022).
Article ADS MathSciNet Google Scholar
Crawford, O. et al. Efficient quantum measurement of Pauli operators in the presence of finite sampling error. Quantum 5, 385 (2021).
Article Google Scholar
Hamamura, I. & Imamichi, T. Efficient evaluation of quantum observables using entangled measurements. NPJ Quant. Inf. 6, 56 (2019).
Article ADS Google Scholar
Escudero, F., Fernández-Fernández, D., Jaumà, G., Peñas, G. F. & Pereira, L. Hardware-efficient entangled measurements for variational quantum algorithms. Phys. Rev. Appl. 20, 034044 (2023).
Article ADS Google Scholar
Wu, B., Sun, J., Huang, Q. & Yuan, X. Overlapped grouping measurement: a unified framework for measuring quantum states. Quantum 7, 896 (2023).
Article Google Scholar
Yen, T.-C., Ganeshram, A. & Izmaylov, A. F. Deterministic improvements of quantum measurements with grouping of compatible operators, non-local transformations, and covariance estimates. NPJ Quant. Inf. 9, 14 (2023).
Article ADS Google Scholar
Ohliger, M., Nesme, V. & Eisert, J. Efficient and feasible state tomography of quantum many-body systems. New J. Phys. 15, 015024 (2013).
Article ADS Google Scholar
Aaronson, S. Shadow tomography of quantum states. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, 325-338 (Association for Computing Machinery, New York, NY, USA, 2018). https://doi.org/10.1145/3188745.3188802.
Huang, H.-Y., Kueng, R. & Preskill, J. Predicting many properties of a quantum system from very few measurements. Nature Phys. 16, 1050 (2020).
Article ADS Google Scholar
Hadfield, C., Bravyi, S., Raymond, R. & Mezzacapo, A. Measurements of quantum Hamiltonians with locally-biased classical shadows. Comm. Math. Phys. 391, 951 (2022).
Article ADS MathSciNet Google Scholar
Huang, H.-Y., Kueng, R. & Preskill, J. Efficient estimation of Pauli observables by derandomization. Phys. Rev. Lett. 127, 030503 (2021).
Article ADS MathSciNet Google Scholar
Hadfield, C. Adaptive Pauli shadows for energy estimation. Preprint at: https://arxiv.org/abs/2105.12207 (2021).
Izmaylov, A. F., Yen, T.-C., Lang, R. A. & Verteletskyi, V. Unitary partitioning approach to the measurement problem in the variational quantum eigensolver method. J. Chem. Theory Comput. 16, 190 (2020).
Article Google Scholar
Zhao, A. et al. Measurement reduction in variational quantum algorithms. Phys. Rev. A 101, 062322 (2020).
Article ADS MathSciNet Google Scholar
Huggins, W. J. et al. Efficient and noise resilient measurements for quantum chemistry on near-term quantum computers. NPJ Quant. Inf. 7, 23 (2021).
Article ADS Google Scholar
García-Pérez, G. et al. Learning to measure: adaptive informationally complete generalized measurements for quantum algorithms. PRX Quantum 2, 040342 (2021).
Article ADS Google Scholar
Fischer, L. E. et al. Ancilla-free implementation of generalized measurements for qubits embedded in a qudit space. Phys. Rev. Research 4, 033027 (2022).
Article ADS Google Scholar
Shlosberg, A. et al. Adaptive estimation of quantum observables. Quantum 7, 906 (2023).
Article Google Scholar
Hillmich, S., Hadfield, C., Raymond, R., Mezzacapo, A. & Wille, R. Decision diagrams for quantum measurements with shallow circuits. IEEE Trans. Quantum Eng. 2, 24–34 (2021).
Google Scholar
Dehaene, J. & De Moor, B. Clifford group, stabilizer states, and linear and quadratic operations over GF(2). Phys. Rev. A 68, 042318 (2003).
Article ADS MathSciNet Google Scholar
Hein, M., Eisert, J. & Briegel, H. J. Multiparty entanglement in graph states. Phys. Rev. A 69, 062311 (2004).
Article ADS MathSciNet Google Scholar
Van den Nest, M., Dehaene, J. & De Moor, B. Graphical description of the action of local Clifford transformations on graph states. Phys. Rev. A 69, 022316 (2004).
Article ADS Google Scholar
Maslov, D. Linear depth stabilizer and quantum Fourier transformation circuits with no auxiliary qubits in finite-neighbor quantum architectures. Phys. Rev. A 76, 052310 (2007).
Article ADS Google Scholar
Chamberland, C., Zhu, G., Yoder, T. J., Hertzberg, J. B. & Cross, A. W. Topological and subsystem codes on low-degree graphs with flag qubits. Phys. Rev. X 10, 011022 (2020).
Google Scholar
Gurobi Optimization, LLC. Gurobi optimizer reference manual. https://www.gurobi.com (2023).
Pia, A. D., Dey, S. S. & Molinaro, M. Mixed-integer quadratic programming is in NP. Math. Program. 162, 225 (2017).
Article MathSciNet Google Scholar
Bravyi, S., Sheldon, S., Kandala, A., Mckay, D. C. & Gambetta, J. M. Mitigating measurement errors in multiqubit experiments. Phys. Rev. A 103, 042605 (2021).
Article ADS Google Scholar
Meyer, J. J., Borregaard, J. & Eisert, J. A variational toolbox for quantum multi-parameter estimation. NPJ Quant. Inf. 89, 89 (2021).
Article ADS Google Scholar
Hubregtsen, T., Wilde, F., Qasim, S. & Eisert, J. Single-component gradient rules for variational quantum algorithms. Quant. Sc. Tech. 7, 035008 (2022).
Article Google Scholar
van den Berg, E. & Temme, K. Circuit optimization of Hamiltonian simulation by simultaneous diagonalization of Pauli clusters. Quantum 4, 322 (2020).
Article Google Scholar
Faehrmann, P. K., Steudtner, M., Kueng, R., Kieferova, M. & Eisert, J. Randomizing multi-product formulas for Hamiltonian simulation. Quantum 6, 806 (2022).
Article Google Scholar
Anand, A. & Brown, K. R. Leveraging commuting groups for an efficient variational Hamiltonian ansatz. Preprint at: https://arxiv.org/abs/2312.08502 (2023).
Lidar, D. A. & Brun, T. A. Quantum error correction (Cambridge University Press, 2013).
Qiskit: An open-source framework for quantum computing (2021).
Muller, R. Pyquante: Python quantum chemistry. http://pyquante.sourceforge.net (2022).
Sun, Q. et al. PySCF: the Python-based simulations of chemistry framework. Wiley Interd. Rev.: Comput. Mol. Sci. 8, e1340 (2018).
Google Scholar
Essler, F. H. L., Frahm, H., Göhmann, F., Klümper, A. & Korepin, V. E.The one-dimensional Hubbard model (Cambridge University Press, 2005).
Cox, D., Little, J. & O’Shea, D. Ideals, varieties, and algorithms: An introduction to computational algebraic geometry and commutative algebra (Springer Science & Business Media, 2013). https://doi.org/10.1007/978-3-319-16721-3.

Download references

Acknowledgements

This research is part of two projects that have received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreements No. 847471 and No. 955479. This research was supported by the NCCR SPIN, funded by the Swiss National Science Foundation (SNF). I.S. acknowledges the financial support from the SNF through the grant No. 200021-179312. We acknowledge the use of IBM Quantum services for this work. The views expressed are those of the authors and do not reflect the official policy or position of IBM or the IBM Quantum team. (IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. The current list of IBM trademarks is available at https://www.ibm.com/legal/copytrade.) In the final phase of this project, D.M. received financial support from the Munich Quantum Valley (K-8), the Bundesministerium für Bildung und Forschung (BMBF) via the projects HYBRID, REALISTIQ, and MUNIQC-Atoms as well as from the EU Quantum Technology Flagship via the projects MILLENION and PASQuanS2. Research was sponsored by IARPA and the Army Research Office, under the Entangled Logical Qubits program, and was accomplished under Cooperative Agreement Number W911NF-23-2-0212. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of IARPA, the Army Research Office, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. The authors are thankful to Lennart Vincent Bittel, Libor Caha, Felix Huber, Seyed Sajjad Nezhadi, Max Rossmannek, Maria Spethmann, David Sutter, Stefan Woerner, James Robin Wootton, and Nikolai Wyderka for stimulating discussions. SDG.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Igor O. Sokolov & Panagiotis Kl. Barkoutsos
Present address: PASQAL, Massy, France

Authors and Affiliations

Dahlem Center for Complex Quantum Systems, Freie Universität Berlin, Berlin, Germany
Daniel Miller, Kyano Levi, Eric J. Kuehnke & Jens Eisert
Institute for Theoretical Nanoelectronics (PGI-2), Forschungszentrum Jülich, Jülich, Germany
Daniel Miller
IBM Quantum, IBM Research Europe—Zurich, Rüschlikon, Switzerland
Daniel Miller, Laurin E. Fischer, Igor O. Sokolov, Panagiotis Kl. Barkoutsos & Ivano Tavernelli

Authors

Daniel Miller
View author publications
Search author on:PubMed Google Scholar
Laurin E. Fischer
View author publications
Search author on:PubMed Google Scholar
Kyano Levi
View author publications
Search author on:PubMed Google Scholar
Eric J. Kuehnke
View author publications
Search author on:PubMed Google Scholar
Igor O. Sokolov
View author publications
Search author on:PubMed Google Scholar
Panagiotis Kl. Barkoutsos
View author publications
Search author on:PubMed Google Scholar
Jens Eisert
View author publications
Search author on:PubMed Google Scholar
Ivano Tavernelli
View author publications
Search author on:PubMed Google Scholar

Contributions

D.M. developed the theory, implemented and tested the grouping algorithm, and wrote the manuscript. D.M. and P.B. performed the experiment. D.M. and L.F. analyzed the experimental data. D.M., L.F., I.S., P.B., and I.T. were involved in the design of the experiment and discussed the results. K.L. implemented the open-source version of the grouping algorithm. K.L., E.K., J.E., and I.S. supported D.M. in carrying out the study in SM Sec. XIV. All authors contributed to the manuscript.

Corresponding author

Correspondence to Daniel Miller.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Miller, D., Fischer, L.E., Levi, K. et al. Hardware-tailored diagonalization circuits. npj Quantum Inf 10, 122 (2024). https://doi.org/10.1038/s41534-024-00901-1

Download citation

Received: 01 May 2023
Accepted: 10 October 2024
Published: 21 November 2024
DOI: https://doi.org/10.1038/s41534-024-00901-1

This article is cited by

Practical techniques for high-precision measurements on near-term quantum hardware and applications in molecular energy estimation
- Keijo Korhonen
- Hetta Vappula
- Daniel Cavalcanti
npj Quantum Information (2025)
Guaranteed efficient energy estimation of quantum many-body Hamiltonians using ShadowGrouping
- Alexander Gresch
- Martin Kliesch
Nature Communications (2025)
Measuring correlation and entanglement between molecular orbitals on a trapped-ion quantum computer
- Gabriel Greene-Diniz
- Chris N. Self
- Matthias Rosenkranz
Scientific Reports (2025)

Subjects

Abstract

Similar content being viewed by others

Efficient simulation of potential energy operators on quantum hardware: a study on sodium iodide (NaI)

Scalable and robust quantum computing on qubit arrays with fixed coupling

Realization of quantum signal processing on a noisy quantum computer

Introduction

Results

Framework for HT diagonalization circuits

Mathematical results

Reformulation as an optimization problem

Solver overview

Experimental demonstration

Discussion

Methods

Choices of Hamiltonians

Choices of hyperparameters

Technical details on the algebraic solver

Restricting the algebraic solver

Runtime analysis

Ideas for improving our solvers

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Practical techniques for high-precision measurements on near-term quantum hardware and applications in molecular energy estimation

Guaranteed efficient energy estimation of quantum many-body Hamiltonians using ShadowGrouping

Measuring correlation and entanglement between molecular orbitals on a trapped-ion quantum computer

Search

Quick links