Improved quantum computation using operator backpropagation

Fuller, Bryce; Tran, Minh C.; Lykov, Danylo; Johnson, Caleb; Rossmannek, Max; Wei, Ken Xuan; He, Andre; Kim, Youngseok; Vu, DinhDuy; Sharma, Kunal; Alexeev, Yuri; Kandala, Abhinav; Mezzacapo, Antonio

doi:10.1038/s41534-026-01196-0

Download PDF

Article
Open access
Published: 19 February 2026

Improved quantum computation using operator backpropagation

Bryce Fuller¹,
Minh C. Tran¹,
Danylo Lykov^2,3,
Caleb Johnson¹,
Max Rossmannek⁴,
Ken Xuan Wei¹,
Andre He¹,
Youngseok Kim¹,
DinhDuy Vu⁵,
Kunal Sharma¹,
Yuri Alexeev^2,3,
Abhinav Kandala¹ &
…
Antonio Mezzacapo¹

npj Quantum Information volume 12, Article number: 51 (2026) Cite this article

4146 Accesses
2 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Decoherence of quantum hardware is currently limiting its practical applications. At the same time, classical algorithms for simulating quantum circuits have progressed substantially. Here, we demonstrate a hybrid framework that integrates classical simulations with quantum hardware to improve the computation of an observable’s expectation value by reducing the quantum circuit depth. In this framework, a quantum circuit is partitioned into two subcircuits: one that describes the backpropagated Heisenberg evolution of an observable, executed on a classical computer, while the other is a Schrödinger evolution run on quantum processors. The overall effect is to reduce the depths of the circuits executed on quantum devices and enable the recovery of expectation values at intermediate times throughout the classically backpropagated circuit, trading this with classical overhead and an increased number of circuit executions. We demonstrate the effectiveness of this method on a Hamiltonian simulation problem, achieving more accurate expectation value estimates compared to using quantum hardware alone.

Realization of quantum signal processing on a noisy quantum computer

Article Open access 23 September 2023

Constrained quantum optimization for extractive summarization on a trapped-ion quantum computer

Article Open access 13 October 2022

Hamiltonian simulation algorithms for near-term quantum hardware

Article Open access 17 August 2021

Introduction

Quantum algorithms promise significant advantages over classical methods in many applications such as Hamiltonian simulation¹ and solving systems of linear equations². These advantages can often only be realized for sufficiently large problem instances and typically require coherent implementations of deep quantum circuits. However, decoherence of current quantum hardware limits their application to short-depth quantum circuits. Error mitigation^3,4,5 has been used to enable accurate calculations on current quantum hardware at a scale beyond brute force classical computation⁶. These methods typically incur a sampling overhead that is exponential in the depth of the circuit.

In response to the cost of error mitigation, recent experimental demonstrations aim to optimize the performance and resource overhead of quantum experiments by leveraging a variety of classical techniques to reduce the depth of executed quantum circuits. For example, advancements in transpilation have led to more efficient swap routing^7,8, which can result in shallower circuits to execute. Tensor networks have been used in conjunction with classical optimizers to improve the accuracy of expectation values for time evolution problems⁹. Hybrid multi-product formulas^10,11,12 enable performing Hamiltonian time evolution using an ensemble of shallower quantum circuits. Recent algorithms for approximate quantum compiling utilize tensor networks and a classical optimizer to compress deep Trotterized time-evolution circuits into shallower approximations^{12,13,14,15,16,17}. Additionally, recent works have studied the application of feed-forward measurements to circuit compilation in the context of circuit cutting¹⁸ and measurement-based state preparation^19,20.

These rapid developments highlight an ongoing need to discover new algorithms to reduce the depth of quantum experiments.

In this manuscript, we introduce a framework to reduce the depth of quantum circuits using classical simulation algorithms based on Clifford perturbation theory (CPT)²¹. We then apply this framework using the Qiskit Addon for operator backpropagation²² and execute experiments on quantum hardware to observe the reduction in error which can be achieved for a utility scale Hamiltonian time dynamics experiment. CPT-based algorithms classically compute the expectation value of an observable by backpropagating it, i.e. evolving the observable in the Heisenberg picture through the gates of the circuit in reverse order, starting from the last gate of the circuit. Provided that the circuit consists of a small number of non-Clifford operations, the Pauli decompositions of the backpropagated observable can be computed exactly by a classical computer in reasonable runtime. For circuits with a large proportion of non-Clifford gates the cost of exact backpropagation grows intractably with circuit depth; however, by allowing for some approximation error a wider range of circuits can be backpropagated in practice. Such a strategy has been successful in approximating the outcomes of recent utility-scale experiments²³. Theoretically, algorithms based on CPT have been shown to be asymptotically efficient for several interesting classes of quantum circuits, including those that are noisy with a random input²⁴ or those that consist of random single-qubit operations²⁵.

In contrast to these asymptotic analyses, this work uses CPT to approximate the Heisenberg evolution for explicit circuit instances while tracking the accuracy of the calculation using a combination of typical-case error bounds²⁴ and the triangle inequality.

In our framework, a quantum circuit is split into two subcircuits U_Q and U_C (Fig. 1). The observable of interest is backpropagated under U_C and decomposed as a linear combination of Pauli operators. These Pauli operators are then measured on the quantum state prepared by U_Q. In general, the number of Pauli operators grows with the depth of the backpropagated subcircuit. Thus, operator backpropagation (OBP) allows one to reduce circuit depth in exchange for a classical overhead and an increase in the total number of circuits executed on quantum hardware. Because OBP calculations can become classically expensive and lend themselves to distributed implementation, our method is amenable to be run in quantum-centric supercomputing environments²⁶. In Section II, we introduce the OBP framework and demonstrate an application of OBP for improving quantum simulation. In particular, we show that OBP helps reduce the error in computing the expectation values of observables for circuits with up to 127 qubits and 4896 two qubit gates. Hence, given a fixed error tolerance, OBP enables the computation of expectation values for deeper quantum circuits than a purely-quantum approach. Lastly, in Section III we describe the OBP framework, including details of the CPT algorithm, the truncation strategies employed, and strategies for parallelizing the CPT calculation across a multi-node architecture.

Results

Operator backpropagation

Many quantum algorithms rely on measuring operator expectation values with respect to states prepared on quantum devices. Specifically, we consider problems of estimating

$${\langle O\rangle }_{U| \psi \rangle }\equiv \langle \psi | {U}^{\dagger }OU| \psi \rangle \,,$$

(1)

given a quantum state $| \psi \rangle$, a quantum circuit U, and an observable O. Without loss of generality, we assume that O is a traceless multi-qubit Pauli operator. In near-term experiments, the depths of quantum circuits for which the expectation values can be faithfully recovered are constrained by the error of the quantum devices, limiting the size of experiments that can be performed.

Parallel to the advancements in quantum hardware, various classical algorithms have been developed for numerically computing the expectation value in Eq. (1). For problems considered in recent experiments, these classical algorithms often perform better or as well as algorithms executed on state-of-the-art quantum devices^21,27.

Despite this, the classical complexity of estimating arbitrary expectation values grows exponentially with problem size and, thus, will become out of reach for general classical algorithms even on the largest supercomputers.

To distribute the expectation value problem in Eq. (1) between a quantum device and a classical simulator, we consider a decomposition of U = U_CU_Q into two subcircuits U_C and U_Q. The classical simulator first computes ${O}^{{\prime} }\equiv {U}_{{\rm{C}}}^{\dagger }O{U}_{{\rm{C}}}$—the version of O evolved backward through the circuit ${U}_{C}^{\dagger }$. One then prepares the initial state $| \psi \rangle$ on the quantum hardware, applies the circuit U_Q, and measures the expectation value of ${O}^{{\prime} }$.

It is straightforward to verify that the result $\left\langle \psi | {U}_{Q}^{\dagger }{O}^{{\prime} }{U}_{Q}| \psi \right\rangle =\left\langle \psi | {U}^{\dagger }OU| \psi \right\rangle$ is the desired expectation value in Eq. (1).

Standard quantum hardware can measure the expectation values of observables that are diagonal in a local measurement basis. To measure the expectation value of ${O}^{{\prime} }$, we require that the classical simulator decomposes it in the Pauli basis, i.e.

$${O}^{{\prime} }=\mathop{\sum }\limits_{P}{c}_{P}P\,,$$

(2)

where P ∈ {I, X, Y, Z}^⊗n are multi-qubit Pauli operators on n qubits and the real numbers ${c}_{P}=\,{\rm{Tr}}\,({O}^{{\prime} }P)/{2}^{n}$ are the coefficients of the decomposition. We call the process of approximately evolving O through ${U}_{C}^{\dagger }$ operator backpropagation (OBP). Given the backpropagated observable ${O}^{{\prime} }$, we measure the expectation value of each Pauli in the decomposition and reconstruct the expectation value of ${O}^{{\prime} }$ by

$$\left\langle \psi | {U}_{Q}^{\dagger }{O}^{{\prime} }{U}_{Q}| \psi \right\rangle =\mathop{\sum }\limits_{P}{c}_{P}\left\langle \psi | {U}_{Q}^{\dagger }P{U}_{Q}| \psi \right\rangle \,.$$

(3)

We summarize these steps in Fig. 1.

Our framework offers a trade-off between the required circuit depth and the number of circuit executions which are needed to compute ${\langle O\rangle }_{U| \psi \rangle }$. Because the circuit executed on quantum hardware U_Q will be shallower than the original circuit, the resources needed to error mitigate each individual circuit are lowered. In exchange, the number of distinct circuits which must be executed increases with the number of Pauli operators that comprise ${O}^{{\prime} }$ in Eq. (2). In general, the number of Pauli measurements and the error-mitigation overhead both grow exponentially with the depth of U_C, thus this framework allows us to trade between these exponentials to optimize the resource requirements and accuracy of quantum hardware experiments. The applicability of our framework thus depends on several factors, including the noise profile of the quantum device and details of the problems at hand, e.g. how close the circuits are to Clifford circuits.

Experimental demonstration

In this section, we demonstrate the operator backpropagation technique on a Hamiltonian simulation experiment. In particular, we consider the simulation of the XY model with nearest-neighbor couplings

$$H=\mathop{\sum }\limits_{i,j\in {\mathcal{E}}(\Lambda )}J\left({X}_{i}{X}_{j}+{Y}_{i}{Y}_{j}\right)+h\mathop{\sum }\limits_{i\in \Lambda }{Z}_{i}\,,$$

(4)

where X_i, Y_i, Z_i are the Pauli operators supported on site i and ${\mathcal{E}}(\Lambda )$ is the set of edges of a D-dimensional regular lattice Λ, e.g. a one-dimensional chain or a two-dimensional heavy-hex lattice. For all experiments performed, we consider this model where J = 1, h = 0, and we are interested in estimating the polarization $M\equiv \frac{1}{n}{\sum }_{i}{Z}_{i}$.

To simulate e^−itH, we Trotterize the time evolution and approximate it with U(τ)^t/τ, where τ is the Trotter step size, t is taken to be an integer multiple of τ, and

$$U(\tau )={e}^{-ih\tau {\sum }_{i}{Z}_{i}}\mathop{\prod }\limits_{i,j\in {\mathcal{E}}(\Lambda )}{e}^{-iJ\tau ({Y}_{i}{Y}_{j}+{X}_{i}{X}_{j})}\,.$$

(5)

Since this Trotterization preserves the U(1) symmetry of the XY model, the expectation value of M is exactly computable and the error in the measured expectation can be used as a proxy for the error of the simulation. Although M is a conserved quantity of our Trotter circuits, recovering the dynamics of individual Z_i operators is not classically efficient in general when Λ is two-dimensional. Using OBP we recover these individual expectation values as well and compare against values obtained by a matrix product state (MPS) calculation.

All experiments were error mitigated using a combinations of zero-noise extrapolation (ZNE)^3,4,28,29 via probabilistic error amplification (PEA)^{4,30,31,32,33} as well as twirled readout error extinction (TREX)³⁴. The noise learning and noise amplification procedures used to apply PEA were performed as in Ref.⁶. In the Supplemental Information, we discuss further the details of the error mitigation used.

To benchmark the framework, we first apply it to the simulation of the exactly solvable one-dimensional XY model of 75 spins. We observe that as circuit depth increases, experiments leveraging OBP obtain lower error for estimates of polarization when the total number of circuit executions are held constant. In addition, we run a larger experiment of a two-dimensional XY model on heavy-hex lattice of 127 spins. This system is not exactly solvable and other two-dimensional geometries are known to exhibit interesting phenomena such as topological phase transitions³⁵. We observe that for all Trotter depths considered, experiments which estimate the polarization using OBP obtain a reduction in error relative to experiments which do not use OBP.

For both the one and two-dimensional spin models, the spins are mapped to a subset of nearest-neighboring qubits on the coupling graph of ibm_kyiv, one of IBM Quantum’s superconducting quantum processors. We fix the length of each Trotter step to be τ = 0.05 and consider the expectation value of the polarization, i.e.

$${\langle M\rangle }_{k}=\left\langle \psi | {U}^{\dagger }{(\tau )}^{k}MU{(\tau )}^{k}| \psi \right\rangle ,$$

(6)

at different numbers of Trotter steps k, with and without using OBP to reduce the circuit depth by 5 Trotter steps. We chose this value by fixing an L₂ error budget for the classical approximation of M, fixing a maximum of 10 qubit-wise commuting Pauli groups, and then selecting the largest whole number of Trotter steps that could be backpropagated within these constraints. For the 75 (127) qubit experiments, ${O}^{{\prime} }$ contained 370 (881) Pauli terms, respectively. See the Supplemental Information for a detailed description of the classical OBP calculations.

For each OBP computation, the polarization $M\equiv \frac{1}{n}{\sum }_{i}{Z}_{i}$ is processed for five Trotter steps by backpropagating each Z operator independently using an L₂ error budget of 0.01 and 0.025 for the one and two dimensional models, respectively. In each case, the truncation error budgets were distributed unevenly across the circuit slices, withholding a majority of the error budget for the final slice. For both models, the set of Paulis supporting all backpropagated observables are collected into 8 qubit-wise commuting groups, each of which is independently evaluated on a QPU.

Additionally, the Pauli operators needed to reconstruct one through four Trotter steps are a subset of those needed to reconstruct five steps. Thus, we can re-use our measurement data to recover the dynamics of both XY models for all k ∈ [0, 25], despite the fact that we only execute circuits on hardware where k is an integer multiple of 5.

The results shown for all experiments are obtained by bootstrapping the measurements into 100 batches and then taking the mean across all postprocessed batches. The error bars are plotted with a 2σ confidence and data which leverages OBP is plotted with a shaded region which includes the statistical error of the experiment as well as the L₂ error budget (See Section III) which was allocated during backpropagation. See the Supplemental Information for more information on the error mitigation used for this experiment.

In Fig. 2a, we plot the expectation value of the polarization, i.e. Eq. (6) for a one dimensional XY model of 75 spins. The deepest circuit executed for this experiment is 25 Trotter steps, which requires 1924 two-qubit gates and a two-qubit circuit depth of 52. The initial state $| {\psi }_{0}\rangle$ is initialized to $| 00\ldots 0\rangle$, except for seven evenly spaced qubits which are initialized to $| 1\rangle$. Under the dynamics of the XY model, these “excitations” (qubits initialized to $| 1\rangle$) will spread to other sites on the lattice. We reconstruct the expectation value of all local Z operators with and without OBP and compare their mean to the known reference value of $\frac{75-2* 7}{75}\approx 0.813$. The experiments performed with and without OBP each use a total 262144 circuit executions for every k. We observe that after 15 Trotter steps, the experiments leveraging OBP achieve a statistically significant reduction in error when estimating 〈M〉_k. Because these experiments use the same number of circuit executions, the estimates obtained via OBP use an overall fewer number of total circuit layer operations - 19% (45%) fewer for the 25 (10) Trotter step data points, respectively. Although the OBP framework will, in general, introduce an overhead in the number of quantum circuit executions (shots), these results highlight that even for a fixed budget of circuit executions, one can reduce the error of a quantum simulation by reducing circuit depth via OBP. Furthermore, because the circuits shortened via OBP have fewer total gates per circuit executions, these experiments require fewer total quantum operations

**Fig. 2: OBP experiments with 75 and 127-qubit spin models.**

In Fig. 2b, we highlight another important feature of the OBP framework: the ability to recover the dynamics at intermediate times from coarse measurement data. Specifically, from the measurement after only 5, 10, 15, 20, and 25 Trotter steps, we can use OBP to reconstruct the expectation values of observables at all Trotter steps from 0 to 25. We plot the dynamics of several such observables Z_i in Fig. 2b and compare against reference values obtained from MPS simulations. The Z_i displayed are chosen with i located near the initial excitations (red circles in the inset of Fig. 2a) in order to observe strong dynamical response to the excitations. We note that our estimates of the individual Z_i dynamics are more impacted by noise than our estimates of 〈M〉_k, which is explained by a concentration about the mean arising from the averaging over all Z_i. Nevertheless, we observe qualitative agreement between the fine-grained dynamics reconstructed via OBP and those obtained via a MPS simulation. Additionally, we can compare the error mitigated expectation values for the same total number of Trotter steps, but where OBP was or was not used to reduce circuit depth by 5 steps. These data points provide a direct way to compare the impact of OBP for individual observable expectation values.

In addition to the exactly solvable XY spin chain, we also consider a two-dimensional XY model of 127 spins defined on a heavy-hex graph given by the qubit connectivity of the ibm_kyiv QPU. The largest circuit executed in this experiment is 25 Trotter steps, which requires 4896 two-qubit gates and a two-qubit depth of 102. Because this model is two-dimensional, the resulting Trotter circuits are approximately twice as deep and contain 254% as many two-qubit gates as the Trotter circuits for the one-dimensional model. These circuits are more strongly impacted by noise, and the light-cones of operator expectation values spread more rapidly across the system. In the Supplemental Information, we detail how the Trotter circuits in this experiment were synthesized.

In Fig. 2c and d, we plot 〈M〉_k for the two-dimensional XY model as well as the expectation values for a handful of local Z observables, which we compare with reference values obtained through MPS simulations. The MPS calculations are discussed in more detail in the Supplemental Information. Similar to the one-dimensional experiment, we initialize $| {\psi }_{0}\rangle$ to $| 00...0\rangle$ apart from seven excitations which are placed near the center of the lattice. We reconstruct the expectation values of all local Z operators with and without OBP and compare the mean with the known reference value of $\frac{127-2* 7}{127}\approx 0.89$. For these experiments, each group of qubit-wise commuting Paulis estimated on hardware is allocated 32768 shots, resulting in a total of 262144 (32768) shots for estimates of 〈M〉_k with (without) OBP. We observe that for all number of Trotter steps, estimates of 〈M〉_k which use OBP to reduce the circuit depth by 5 steps obtain a statistically significant decrease in error relative to experiments which do not make use of OBP.

Discussion

We have introduced the OBP framework for improving the quantum computation of expectation values for Pauli observables on pre-fault tolerant hardware and demonstrated on quantum hardware how OBP can reduce the error of a Hamiltonian time dynamics experiment. This framework reduces the depth of quantum circuits run on quantum hardware, limiting the impact of errors and the cost of error mitigation in exchange for an increase in the number of experiments and an additional classical overhead. In the context of Hamiltonian simulation, OBP reduces the error in approximating the dynamics of quantum systems, allowing longer-time simulation compared to purely quantum approaches.

An important feature of the OBP framework is the ability to reconstruct dynamics at intermediate times from coarse measurement data. For example, in our experiment, we only measure the state after 5, 10, 15, 20, and 25 Trotter steps, but the expectation value of the observable can be constructed for all Trotter steps between 0 and 25. Additionally, the same expectation value can be computed in multiple ways: for example, the expectation value after 15 Trotter steps can be obtained from the experiment which executes 15 Trotter steps on a QPU and does not backpropagate the target observable, or from the experiment which runs 10 Trotter steps on a QPU and backpropagates the target observable through 5 Trotter steps. In principle, these multiple ways to estimate the same quantity, each suffering from a different set of errors, may be used to obtain a more accurate estimate of the expectation value. Exploring such an error-mitigation scheme is an interesting future direction.

The OBP framework is convenient whenever measuring the backpropagated observable requires a manageable number of quantum circuit executions. Generally, this requirement puts a constraint on the depth of the circuit U_C through which an observable can be backpropagated. If one fixes an error budget and a maximum overhead in the number of observable measurements, then the reduction in circuit depth which can be achieved will depend strongly on the circuit which one attempts to backpropagate. The OBP framework will show the most promise for problems where the size of one’s backpropagated observable grows slowly with the depth of U_C. This can be achieved in situations where, for example, U_C consists of few non-Clifford operations or the dynamics of a quantum systems experience periodicity at stroboscopic times, at which point the backpropagated observable may grow slowly enough that one can obtain substantial depth reduction before the incurred overhead grows prohibitively large. Identifying such use cases where backpropagation yields substantial depth reduction while the full experiment remains classically hard to simulate is an interesting direction for future work.

Methods

Truncation error

Recall that the backpropagated observable ${O}^{{\prime} }$ is written as a linear combination of multi-qubit Pauli operators (cf. Eq. (2)). Some coefficients c_P in the decomposition may be small enough that they can be truncated from ${O}^{{\prime} }$ without incurring significant error. We discuss the estimation of this truncation error in this section.

Suppose that ${\mathcal{S}}$ is the set of Paulis upon which ${O}^{{\prime} }$ is supported, ${\mathcal{T}}$ is the subset of Paulis which will be truncated, and ${\mathcal{K}}$ is the subset of the remaining Paulis which will remain after truncation. Let ${O}_{{\mathcal{K}}}^{{\prime} }={\sum }_{P\in {\mathcal{K}}}{c}_{P}P$ be the truncated version of ${O}^{{\prime} }$. The difference between them is

$$\Delta \equiv {O}^{{\prime} }-{O}_{{\mathcal{K}}}^{{\prime} }=\mathop{\sum }\limits_{P\in {\mathcal{T}}}{c}_{P}P\,.$$

(7)

Using the triangle inequality, the truncation error can be bounded by the L₁ norm of the truncated coefficients,

$$| \left\langle {\psi }_{Q}| \Delta | {\psi }_{Q}\right\rangle | \le \mathop{\sum }\limits_{P\in {\mathcal{T}}}| {c}_{P}| \,,$$

(8)

where $| {\psi }_{Q}\rangle ={U}_{Q}| \psi \rangle$. This error bound is rigorous, but it can only be saturated if the truncated Pauli operators mutually commute and $| {\psi }_{Q}\rangle$ happens to be a common eigenstate of the operators. Except for these worst cases, the L₁ norm of the truncated coefficients will largely overestimate the truncation error.

Instead, we may use a different estimate that is expected to better capture the truncation error in typical cases. To motivate the estimate, we first assume that $| {\psi }_{Q}\rangle$ were drawn from a 1-design ensemble. The error in the expectation value follows a distribution with a vanishing mean and a variance given by²⁴

$${{\mathbb{E}}}_{| {\psi }_{Q}\rangle }{| \left\langle {\psi }_{Q}| \Delta | {\psi }_{Q}\right\rangle | }^{2}\le \frac{1}{{2}^{n}}\,\mathrm{Tr}\,({\Delta }^{2})=\mathop{\sum }\limits_{P\in {\mathcal{T}}}{| {c}_{P}| }^{2}\,.$$

(9)

Thus, the majority of the drawn $| {\psi }_{Q}\rangle$ would result in a truncation error smaller than the L₂ norm of the truncated coefficients. In our problem, $| {\psi }_{Q}\rangle$ is deterministic instead of being randomly drawn from an ensemble. But if U_Q is a generic and sufficiently deep circuit, we expect Eq. (9) better captures the typical values of $\langle {\psi }_{Q}| \Delta | {\psi }_{Q}\rangle$ than the worst-case bound [Eq. (8)], leading to an estimate

$$| \left\langle {\psi }_{Q}| \Delta | {\psi }_{Q}\right\rangle | \lesssim {\left(\mathop{\sum }\limits_{P\in {\mathcal{T}}}{| {c}_{P}| }^{2}\right)}^{1/2}\,.$$

(10)

We benchmark both error bounds in Fig. 3 and show that it is indeed the case for even a relatively shallow circuit.

Fig. 3: Comparison between L1 and L2 error bounds vs. exact error of OBP truncation. — **Fig. 3: Comparison between L₁ and L₂ error bounds vs. exact error of OBP truncation.**

Operator backpropagation via Clifford perturbation theory

Our framework requires the decomposition of the backpropagated observable ${O}^{{\prime} }$ in the computational basis. In this section, we discuss an implementation of such an OBP algorithm based on the Clifford perturbation theory (CPT)²¹. The key idea is that Clifford circuits can be efficiently simulated by classical computers and we can realize an algorithm using CPT whose complexity scales exponentially with the non-Cliffordness of U_C.

The OBP algorithm based on CPT works as follows. First, we split the subcircuit to be backpropagated, U_C, into slices,

$${U}_{C}=\mathop{\prod }\limits_{s=1}^{S}{{\mathcal{U}}}_{s}\equiv {{\mathcal{U}}}_{S}\ldots {{\mathcal{U}}}_{2}{{\mathcal{U}}}_{1}\,,$$

(11)

where S denotes the total number of slices and ${{\mathcal{U}}}_{s}$ represents a single slice of U_C. No constraints are made on the depth of the slices and each slice may be supported on all qubits. The purpose of splitting of U_C into slices is to provide timely stopping points within the OBP algorithm to control the growth of the intermediate observable ${O}^{{\prime} (s)}$ by truncating terms with low magnitude coefficients. The algorithm proceeds by iterating over the circuit slices in reverse, i.e. starting from slice S and ending on slice 1. For each slice, all quantum gates are analytically applied to the current operator,

$${O}^{{\prime} (s)}={{\mathcal{U}}}_{S-s+1}^{\dagger }{O}^{{\prime} (s-1)}{{\mathcal{U}}}_{S-s+1}\,,$$

(12)

where s denotes the iteration index of the OBP algorithm and ${O}^{{\prime} (0)}=O$ is the original operator whose expectation value we are interested in.

For example, if O = X₁ is a Pauli X supported on qubit 1 and ${{\mathcal{U}}}_{S}$ is a CNOT gate between qubits 1 and 2, ${O}^{{\prime} (1)}={{\mathcal{U}}}_{S}^{\dagger }{X}_{1}{{\mathcal{U}}}_{S}$ again contains only one Pauli string X₁X₂. However, if ${{\mathcal{U}}}_{S}$ were a T gate on qubit 1, ${O}^{{\prime} (1)}$ would instead contain two Pauli operators: X₁ and Y₁. In general, the number of Paulis in the backpropagated observable remains the same after applying a Clifford gate and may double after each non-Clifford one. Therefore, during the course of the OBP algorithm, the number of Pauli terms comprising ${O}^{{\prime} (s)}$ can grow exponentially.

The OBP algorithm based upon CPT is particularly convenient when U_C contains gates that are unitary rotations by small angles. Such rotations often arise in a Trotterized Hamiltonian simulation. In such cases, many of the coefficients in the decomposition of ${O}^{{\prime} (s)}$ are small, providing an opportunity to truncate them from the operator. In our implementation, truncation occurs after the backpropagation of each slice and our total error budget is divided up among the fixed number of slices.

At this point we emphasize once more that the slices are not restricted to be of depth 1. For example, when a X₁X₂ and Y₁Y₂ rotation are applied subsequently on the same pair of qubits 1 and 2, truncating terms between their individual backpropagation may not be desirable, because backpropagating both gates at once may itself result in beneficial cancellation of terms. This choice might be made, a-priori, if one knows that the operator being backpropagated, e.g. Z₁ + Z₂, is an approximate symmetry of an X₁X₂ + Y₁Y₂ rotation. This is precisely the situation encountered in the experiments presented in Section II and discussed further in the Supplemental Information, where slices are chosen to contain a single layer of XX + YY rotations.

Truncation of the operator reduces the memory required to store its Pauli decomposition, but it also leads to a challenge in tracking the total truncation error. Because truncation occurs at different points in the Heisenberg evolution, the orthogonality of truncated components cannot be assumed. Thus one may not use Eq. (8) or Eq. (10) directly. Instead, one can use the triangle inequality to further upper bound the total truncation error

$$\epsilon \le {\epsilon }_{1}+{\epsilon }_{2}+\ldots {\epsilon }_{S}\,,$$

(13)

where ϵ_s is the truncation error, evaluated using either Eq. (8) or Eq. (10), at the sth iteration.

For the implementation of OBP discussed here, a total error budget is specified up front which is divided among each slice of U_C. At each truncation step, Pauli terms are removed in order of increasing coefficient magnitude until no further terms can be truncated without exceeding the per-slice error budget. At the end of truncation, any residual error budget is added to the per-slice error budget of the next slice. The complexity of this procedure scales as ${\mathcal{O}}(| {{\mathcal{S}}}_{s}| \log | {{\mathcal{S}}}_{s}| )$ where ${{\mathcal{S}}}_{s}$ is the set of Pauli terms comprising ${O}^{{\prime} (s)}$ prior to truncation.

The implementation of OBP described above will terminate if any of the three following conditions are met: the observable O has been backpropagated through all circuit slices, ${{\mathcal{U}}}_{s}$; after expending the truncation budget, the size of ${\mathcal{K}}$ is over the user-specified limit; or the algorithm exceeded a user-specified runtime limit.

The latter two early-termination criteria are motivated by practical constraints. The total number of Paulis in the decomposition of ${O}^{{\prime} }$, i.e. $| {\mathcal{K}}|$, is a proxy for both the classical memory requirements as well as the total number of quantum circuits which need to be executed on hardware. It is also possible to terminate based on the minimal number of qubit-wise-commuting Pauli groups, which defines the true number of required circuit executions. However, computing this quantity is NP-Hard³⁶ and approximating it for large $| {\mathcal{K}}|$ can be prohibitively expensive.

Parallelization of operator backpropagation

Backpropagation of observables is a form of simulation and thus has exponential overhead in general. Employing truncation schemes can extend the reach of this method, however, it is natural to question how well these classical algorithms can be scaled for large problem instances. In this section we discuss the challenges of scaling our method to a highly parallelized architecture like one might see in an HPC cluster. We observe that one immediate obstacle lies in the potentially prohibitive amount of all-to-all communication between computational nodes which arises from a naive parallelization of the norm-based truncation strategy we describe in Section III. Finally, we describe a scheme for distributing the OBP calculation across nodes where the number of messages passed scales only linearly in the total number of nodes.

Multiple approaches can be taken to parallelize the implementation of the OBP algorithm. The most naïve approach would be to simply parallelize the backpropagation of multiple operators of interest but this assumes that one indeed has multiple operators to begin with. Furthermore, this does not resolve the fundamental limitation imposed by memory being the most likely resource constraint for how many circuit slices can be backpropagated. Therefore, a multi-node parallelization for the backpropagation of any single operator is desirable.

The OBP implementation based on CPT poses two major difficulties for such a parallelization. First, when a slice of circuit operations gets backpropagated, each operation has to be applied to each Pauli term. If the circuit operation commutes with the Pauli term, this is trivial. Otherwise, if the operation is non-Clifford, the backpropagation results in up to twice the number of Pauli terms, possibly leading to new terms not present previously. When this is done in parallel on disjoint sets of Pauli terms, the resulting sets may no longer be disjoint. If one now desires to perform low-magnitude coefficient truncation while adhering to an error budget, the sets of Pauli terms must be de-duplicated such that coefficients of duplicate Pauli terms are properly summed. Naïvely, this would involve either an all-to-all or one-to-all communication pattern, posing a severe bottleneck on the parallelization efficiency, and in the latter case, bottlenecking the entire computation by any single node’s memory capacity. Second, agreeing on a truncation threshold over a distributed set of Pauli terms also poses a potential difficulty. In the following, we are going to propose a solution to overcome both of these limitations.

The first problem can be tackled by realizing that all 4^N Pauli terms that span the space of N qubits have a natural ordering. Every Pauli term can be represented via two bitstrings of length N where each bitstring encodes the presence (1) or lack (0) of a Z or X Pauli at a given index^37,38,39. This is possible due to the relations of the identity and three Pauli matrices. Therefore, in this representation, the four possible combinations of two bits encode the following: (0, 0) → I, (0, 1) → Z, (1, 0) → X, (1, 1) → Y.

Further, we can concatenate the bistrings encoding the Z and X Paulis to form a single bitstring of length 2N. This bitstring serves as a unique identifier for every possible Pauli term and provides a natural order to the entire set. Thus, given any Pauli term, its bitstring can be used as an address to quickly determine its position in the larger set.

When distributing Pauli terms across R compute nodes, the entire address space can be partitioned it into R intervals, associated to the different nodes. This distribution scheme ensures that for any Pauli address, the corresponding node whose interval contains this Pauli can be determined in constant time. This procedure is visualized in Fig. 4 for a specific case involving 4 qubits and a selection of Pauli terms which are to be assigned to one of the four compute nodes. From left to right, the figure shows how a Pauli gets encoded in the ZX calculus which then gets interpreted as an address that can be mapped into the address range associated with a given node.

**Fig. 4: Example of distributing Pauli terms based on their binary encoding index.**

The scheme presented above addresses the challenge of distributed deduplication and requires each node to exchange at most R messages. In the Supplemental Information, we further show how the partitioning of Pauli addresses can be efficiently updated in order to balance the number of Paulis within each node’s interval, requiring ${\mathcal{O}}(R)$ total messages to be passed.

Now that a distributed storage system for the Pauli terms has been devised, backpropagation of any circuit slice can be performed in parallel on disjoint sets of Pauli terms. If we wish to truncate Pauli terms with low-magnitude coefficients, this can also be done in parallel but care must be taken not to exceed any specified error budget. To this end, every node can in parallel determine the smallest and largest coefficient magnitude and communicate these values to a master node. Upon receiving all lower and upper bounds, the master node can propose a truncation threshold and broadcast this to all nodes. Each node can then compute its own truncation error for the proposed threshold and send this information back to the main rank. By iterating on this procedure, a binary search for an agreeable truncation threshold can be performed on a distributed set of Pauli terms and coefficients. Therefore, ${\mathcal{O}}(R\,\log (| {\mathcal{S}}| ))$ broadcast messages are required to determine the truncation threshold, where ${\mathcal{S}}$ is the set of Paulis upon which the backpropagated observable is supported, prior to truncation.

Note added: During the preparation of this manuscript, we learned of complementary works by Lerch et al.⁴⁰, Rudolph et al.⁴¹, Miller et al.⁴², and Faehrmann et al.⁴³.^40,41,42, each present techniques similar to OBP where operators are propagated backward through entire circuits as a form of simulation. These manuscripts discuss valuable ideas outside the scope of this work including different approaches to operator truncation, extensions to fermionic systems, and variations where the coefficients of backpropagated operators are represented symbolically as a function gate rotation angles. Meanwhile, Faehrmann et al. proposed to backpropagate an observable through a short-time evolution of a Hamiltonian using the truncated Taylor series. In general, one can instead construct a Trotterized approximation to the dynamics and apply the OBP framework detailed in this manuscript. Comparing the efficiency of the two approaches and identifying applications where one can be more beneficial than the other is of great practical interest, but is outside the scope of the current manuscript.

Data availability

Data generated and analyzed for this manuscript are available from the corresponding author upon reasonable request. Code used to generate data in this manuscript are available from the corresponding author upon reasonable request.

Code availability

Code used to generate data in this manuscript are available from the corresponding author upon reasonable request.

References

Lloyd, S. Universal quantum simulators. Science 273, 1073–1078 (1996).
Article ADS MathSciNet Google Scholar
Harrow, A. W., Hassidim, A. & Lloyd, S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 103, 150502 (2009).
Article ADS MathSciNet Google Scholar
Temme, K., Bravyi, S. & Gambetta, J. M. Error mitigation for short-depth quantum circuits. Phys. Rev. Lett. 119, 180509 (2017).
Article ADS MathSciNet Google Scholar
Li, Y. & Benjamin, S. C. Efficient variational quantum simulator incorporating active error minimization. Phys. Rev. X 7, 021050 (2017).
Google Scholar
Cai, Z. et al. Quantum error mitigation. Rev. Mod. Phys. 95, 045005 (2023).
Article ADS MathSciNet Google Scholar
Kim, Y. et al. Evidence for the utility of quantum computing before fault tolerance. Nature 618, 500–505 (2023).
Article ADS Google Scholar
Nation, P. D. & Treinish, M. Suppressing quantum circuit errors due to system variability. PRX Quantum 4, 010327 (2023).
Article ADS Google Scholar
Kremer, D. et al. Practical and efficient quantum circuit synthesis and transpiling with reinforcement learning. arXiv:2405.13196 https://doi.org/10.48550/arXiv.2405.13196 (2024).
Filippov, S., Leahy, M., Rossi, M. A. C. & García-Pérez, G. Scalable tensor-network error mitigation for near-term quantum computing 2307.11740 (2023).
Vazquez, A. C., Egger, D. J., Ochsner, D. & Woerner, S. Well-conditioned multi-product formulas for hardware-friendly hamiltonian simulation. Quantum 7, 1067 (2023).
Article Google Scholar
Robertson, N. F. et al. Tensor network enhanced dynamic multiproduct formulas https://arxiv.org/abs/2407.17405. 2407.17405 (2024).
Robertson, N. F., Akhriev, A., Vala, J. & Zhuk, S. Approximate quantum compiling for quantum simulation: A tensor network based approach. arXiv:2301.08609 https://arxiv.org/abs/2301.08609 (2023).
Madden, L. & Simonetto, A. Best approximate quantum compiling problems. ACM Trans. Quantum Comput. 3 https://doi.org/10.1145/3505181 (2022).
Rudolph, M. S., Chen, J., Miller, J., Acharya, A. & Perdomo-Ortiz, A. Decomposition of matrix product states into shallow quantum circuits. Quantum Sci. Technol. 9, 015012 (2023).
Article ADS Google Scholar
Ben-Dov, M., Shnaiderov, D., Makmal, A. & Dalla Torre, E. G. Approximate encoding of quantum states using shallow circuits. npj Quantum Inf. 10 https://doi.org/10.1038/s41534-024-00858-1 (2024).
Lin, S.-H., Dilip, R., Green, A. G., Smith, A. & Pollmann, F. Real- and imaginary-time evolution with compressed quantum circuits. PRX Quantum 2, 010342 (2021).
Article Google Scholar
Ran, S.-J. Encoding of matrix product states into quantum circuits of one- and two-qubit gates. Phys. Rev. A 101, 032310 (2020).
Article ADS MathSciNet Google Scholar
Carrera Vazquez, A. et al. Combining quantum processors with real-time classical communication. Nature 636, 75–79 (2024).
Article ADS Google Scholar
Krishnan Vijayan, M. et al. Compilation of algorithm-specific graph states for quantum circuits. Quantum Science and Technology 9, 025005 (2024).
Article ADS Google Scholar
Kaldenbach, T. N. & Heller, M. Mapping quantum circuits to shallow-depth measurement patterns based on graph states. Quantum Science and Technology 10, 015010 (2024).
Article ADS Google Scholar
Begušić, T., Hejazi, K. & Chan, G. K.-L. Simulating quantum circuit expectation values by clifford perturbation theory. ""http://arxiv.org/abs/2306.04797v2. 2306.04797v2 (2023).
Fuller, B. et al. Qiskit addon: Operator Backpropagation. https://github.com/Qiskit/qiskit-addon-obp (2024).
Begušić, T., Gray, J. & Chan, G. K.-L. Fast and converged classical simulations of evidence for the utility of quantum computing before fault tolerance. Sci. Adv. 10 https://doi.org/10.1126/sciadv.adk4321 (2024).
Schuster, T., Yin, C., Gao, X. & Yao, N. Y. A polynomial-time classical algorithm for noisy quantum circuits. https://arxiv.org/abs/2407.12768. 2407.12768 (2024).
Angrisani, A. et al. Classically estimating observables of noiseless quantum circuits http://arxiv.org/abs/2409.01706. arXiv:2409.01706 (2024).
Alexeev, Y. et al. Quantum-centric supercomputing for materials science: A perspective on challenges and future directions. arXiv preprint arXiv:2312.09733 (2023).
Tindall, J., Fishman, M., Stoudenmire, E. M. & Sels, D. Efficient tensor network simulation of ibm’s eagle kicked ising experiment. PRX Quantum 5 https://doi.org/10.1103/PRXQuantum.5.010308 (2024).
Kim, Y. et al. Scalable error mitigation for noisy quantum circuits produces competitive expectation values. Nat. Phys. 19, 752–759 (2023).
Article Google Scholar
Kandala, A. et al. Error mitigation extends the computational reach of a noisy quantum processor. Nature 567, 491–495 (2019).
Article ADS Google Scholar
McDonough, B. et al. Automated quantum error mitigation based on probabilistic error reduction. In 2022 IEEE/ACM Third International Workshop on Quantum Computing Software (QCS), 83–93 https://ieeexplore.ieee.org/document/10025519 (2022).
Mari, A., Shammah, N. & Zeng, W. J. Extending quantum probabilistic error cancellation by noise scaling. Phys. Rev. A 104, 052607 (2021).
Article ADS MathSciNet Google Scholar
Ferracin, S. et al. Efficiently improving the performance of noisy quantum computers. Quantum 8, 1410 (2024).
Article Google Scholar
Endo, S., Benjamin, S. C. & Li, Y. Practical quantum error mitigation for near-future applications. Physical Review X 8 https://doi.org/10.1103/PhysRevX.8.031027 (2018).
van den Berg, E., Minev, Z. K. & Temme, K. Model-free readout-error mitigation for quantum expectation values. Phys. Rev. A 105, 032620 (2022).
Article ADS MathSciNet Google Scholar
Ding, H.-Q. Phase transition and thermodynamics of quantum xy model in two dimensions. Phys. Rev. B 45, 230–242 (1992).
Article ADS Google Scholar
Yen, T.-C., Verteletskyi, V. & Izmaylov, A. F. Measuring all compatible operators in one series of single-qubit measurements using unitary transformations. J. Chem. Theory Comput. 16, 2400–2409 (2020).
Article Google Scholar
Coecke, B. & Duncan, R. Interacting quantum observables. In Aceto, L.et al. (eds.) Automata, Languages and Programming, 298–310 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008). https://doi.org/10.1007/978-3-540-70583-3.
Coecke, B. & Duncan, R. Interacting quantum observables: categorical algebra and diagrammatics. New J. Phys. 13, 43016 (2011).
Article MathSciNet Google Scholar
Aaronson, S. & Gottesman, D. Improved simulation of stabilizer circuits. Phys. Rev. A 70, 052328 (2004).
Article ADS Google Scholar
Lerch, S. et al. Efficient quantum-enhanced classical simulation for patches of quantum landscapes https://arxiv.org/abs/2411.19896 (2024). 2411.19896.
Rudolph, M. S., Jones, T., Teng, Y., Angrisani, A. & Holmes, Z. Pauli propagation: A computational framework for simulating quantum systems https://arxiv.org/abs/2505.21606 (2025). 2505.21606.
Miller, A. et al. Simulation of fermionic circuits using majorana propagation https://arxiv.org/abs/2503.18939 (2025). 2503.18939.
Faehrmann, P. K., Eisert, J., Kieferova, M. & Kueng, R. Short-time simulation of quantum dynamics by Pauli measurements http://arxiv.org/abs/2412.08719 (2024). ArXiv:2412.08719 [quant-ph].

Download references

Acknowledgements

This material is based upon work supported by theU.S. Department of Energy, Office of Science, and National Quantum Information Science Research Centers.Y.A. and D.L. acknowledge support from the U.S. De-partment of Energy, Office of Science, under contract DE-AC02-06CH11357 at Argonne National Laboratory. We would also like to acknowledge informative discussions with Petar Jurcevic, Oles Shtanko, Bibek Pokharel, Andrew Eddins, Zlatko Minev, Conrad Haupt, and Daniel Egger.

Author information

Authors and Affiliations

IBM Quantum, IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Bryce Fuller, Minh C. Tran, Caleb Johnson, Ken Xuan Wei, Andre He, Youngseok Kim, Kunal Sharma, Abhinav Kandala & Antonio Mezzacapo
Argonne National Laboratory, Lemont, IL, USA
Danylo Lykov & Yuri Alexeev
NVIDIA Corp, Santa Clara, CA, USA
Danylo Lykov & Yuri Alexeev
IBM Quantum, IBM Research – Zurich, Rüschlikon, Switzerland
Max Rossmannek
Department of Physics, Harvard University, Cambridge, MA, USA
DinhDuy Vu

Authors

Bryce Fuller
View author publications
Search author on:PubMed Google Scholar
Minh C. Tran
View author publications
Search author on:PubMed Google Scholar
Danylo Lykov
View author publications
Search author on:PubMed Google Scholar
Caleb Johnson
View author publications
Search author on:PubMed Google Scholar
Max Rossmannek
View author publications
Search author on:PubMed Google Scholar
Ken Xuan Wei
View author publications
Search author on:PubMed Google Scholar
Andre He
View author publications
Search author on:PubMed Google Scholar
Youngseok Kim
View author publications
Search author on:PubMed Google Scholar
DinhDuy Vu
View author publications
Search author on:PubMed Google Scholar
Kunal Sharma
View author publications
Search author on:PubMed Google Scholar
Yuri Alexeev
View author publications
Search author on:PubMed Google Scholar
Abhinav Kandala
View author publications
Search author on:PubMed Google Scholar
Antonio Mezzacapo
View author publications
Search author on:PubMed Google Scholar

Contributions

M.T, C.J, M.R., B.F. implemented the classical CPT-based simulation tools, wrote the main manuscript text, and made all figures in the main text. B.F., Y.K., A.H., K.W., A.K. executed the quantum hardware experiments. M.T., K.S., A.M., B.F. conceived of the original project idea. All authors contributed to editing the final manuscript.

Corresponding author

Correspondence to Bryce Fuller.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Fuller, B., Tran, M.C., Lykov, D. et al. Improved quantum computation using operator backpropagation. npj Quantum Inf 12, 51 (2026). https://doi.org/10.1038/s41534-026-01196-0

Download citation

Received: 26 March 2025
Accepted: 02 February 2026
Published: 19 February 2026
Version of record: 27 March 2026
DOI: https://doi.org/10.1038/s41534-026-01196-0