Generalized Probabilistic Approximate Optimization Algorithm

Abdelrahman, Abdelrahman S.; Chowdhury, Shuvro; Morone, Flaviano; Camsari, Kerem Y.

doi:10.1038/s41467-025-67187-5

Download PDF

Article
Open access
Published: 08 December 2025

Generalized Probabilistic Approximate Optimization Algorithm

Nature Communications volume 17, Article number: 498 (2026) Cite this article

4409 Accesses
17 Altmetric
Metrics details

Subjects

Abstract

We introduce the generalized Probabilistic Approximate Optimization Algorithm (PAOA), a classical variational Monte Carlo framework that extends and formalizes the recently introduced PAOA, enabling parameterized and fast sampling on present-day Ising machines and probabilistic computers. PAOA operates by iteratively modifying the couplings of a network of binary stochastic units, guided by cost evaluations from independent samples. We establish a direct correspondence between derivative-free updates and the gradient of the full Markov flow over the exponentially large state space, showing that PAOA admits a principled variational formulation. Simulated annealing emerges as a limiting case under constrained parameterizations, and we implement this regime on an FPGA-based probabilistic computer with on-chip annealing to solve large 3D spin-glass problems. Benchmarking PAOA against QAOA on the canonical 26-spin Sherrington–Kirkpatrick model with matched parameters reveals superior performance for PAOA. We show that PAOA naturally extends simulated annealing by optimizing multiple temperature profiles, leading to improved performance over SA on heavy-tailed problems such as SK–Lévy.

Data-driven frameworks to robustly predict solubility parameter of diverse polymers

Article Open access 25 August 2025

Computing high-degree polynomial gradients in memory

Article Open access 18 September 2024

Entanglement-assisted variational algorithm for discrete optimization problems

Article Open access 18 November 2025

Introduction

Monte Carlo algorithms remain a central tool for exploring complex energy landscapes, especially in the context of combinatorial optimization and statistical physics. Classical methods such as simulated annealing (SA) have been widely applied across these domains, but their reliance on slowly equilibrating processes limits their performance on rugged energy landscapes^1,2,3,4,5,6. New approaches are needed to construct non-equilibrium strategies that retain algorithmic simplicity while improving solution quality.

Inspired by the quantum approximate optimization algorithm (QAOA)^{7,8,9,10,11,12,13,14,15,16,17}, Weitz et al.¹⁸ proposed a classical variational protocol based on the direct parameterization of low-dimensional Markov transition matrices. Their work introduced the term Probabilistic Approximate Optimization Algorithm (PAOA), raising the possibility of classical, variational analogs to QAOA within probabilistic architectures.

Building on this foundational work, we formalize and generalize PAOA. We move beyond the original proposal’s edge-local matrices to derive a global 2^N × 2^N Markov-flow formulation applicable to any k-local Ising Hamiltonian of size N. This framework unifies a wide spectrum of variational ansätze, from global and local schedules to fully-parameterized couplings, and allows them to be stacked to an arbitrary depth p. Crucially, we connect this variational theory to practice by building on the p-computing framework¹⁹, where networks of binary stochastic units (p-bits) sample from Boltzmann-like distributions through asynchronous dynamics. This model has a demonstrated record of success in optimization and inference tasks^20,21,22,23 and provides a natural substrate for our work. By implementing these dynamics on an FPGA, we demonstrate for the first time a scalable, hardware-based path for executing the PAOA.

We show that PAOA admits a broad class of parameterizations, including global, local, and edge-specific annealing schedules. Within this framework, SA emerges as a limiting case under constrained schedules. We implement this regime on an FPGA-based p-computer to demonstrate large-scale, high-throughput sampling. In contrast to standard SA, PAOA’s flexible parameterization enables the discovery of non-equilibrium heuristics that exploit structural features of the problem. One such example is demonstrated in heavy-tailed Sherrington–Kirkpatrick (SK) models, where PAOA assigns higher effective temperatures to strongly coupled nodes and lower temperatures to weakly connected ones, enabling annealing profiles with better performance over vanilla SA. This adaptive scheduling capability provides a powerful framework for the automated discovery of novel annealing heuristics.

We also benchmark PAOA against QAOA on the SK model using matched parameter counts and observe superior performance. This may not be surprising, as the practical performance advantages of QAOA over classical and probabilistic alternatives remain uncertain. A clear quantum advantage is expected to emerge in problems where interference plays an explicit role, such as in the synthetic examples constructed by Montanaro et al.¹². Our work builds on the promising results of Weitz et al.¹⁸ on the max-cut problem by expanding PAOA into a scalable, hardware-compatible, and general-purpose optimization algorithm.

The remainder of this paper is organized as follows: we first present the theoretical formulation of PAOA, including its parameterization, Markov structure, and sampling-based approximation with p-bits. In order to illustrate the correspondence between analytical gradients and derivative-free optimization, we study a simple majority gate problem. We then demonstrate the recovery of SA as a special case and implement this regime on hardware, followed by a description of the FPGA architecture used to accelerate sampling. Using matched parameter counts, we benchmark PAOA against QAOA on the SK model. Finally, we study a heavy-tailed SK variant to show how PAOA can discover multiple annealing schedules that outperform standard single-schedule SA.

Results and discussion

Probabilistic Approximate Optimization Algorithm overview

The power of the PAOA comes from a conceptual departure from traditional optimization methods, such as SA. While SA explores a single, fixed energy landscape defined by a model Hamiltonian, PAOA treats the landscape itself as a variational object to be optimized. The goal is to dynamically reshape this landscape to make the ground state of the original problem easier to find.

This creates a two-level optimization loop, as illustrated in Fig. 1. In an outer loop, a classical optimizer adjusts a set of variational parameters, θ. In an inner loop, a probabilistic computer samples states from the variational landscape currently parameterized by θ. In the most general case, θ specifies the full set of couplings and fields {J^(k), h^(k)} at each layer k, so that the landscape itself is directly reconfigured by the optimizer. Simpler ansätze, such as global or node-wise schedules introduced in the Spectrum of variational ansätze subsection, can be viewed as constrained subsets of this general J, h parameterization (for example, a global inverse temperature amounts to scaling all entries of J by a single parameter). The sampler acts on M spins in the ansatz, while the original problem has N visible spins. In general M ≥ N because hidden spins may be introduced to enrich the representation, in direct analogy to neural quantum states²⁴. Unless otherwise stated, we set M=N in the remainder of the paper, for simplicity, and leave the possibility of introducing hidden spins for better representation to future work.

During training, samples generated under θ are evaluated with a task-dependent cost function ${{{\mathcal{L}}}}({{{\boldsymbol{\theta }}}})$. In problems with a known target set of states (e.g., majority logic gate, discussed in the Representative example: majority gate subsection), ${{{\mathcal{L}}}}$ is typically chosen as a negative log-likelihood. In optimization problems such as 3D spin glasses, ${{{\mathcal{L}}}}$ is instead taken as the average energy over sampled configurations. In either case, feedback from the samples updates θ, decoupling the problem energy from the sampling dynamics and enabling PAOA to discover nonequilibrium pathways to solutions.

Spectrum of variational ansätze

The power and flexibility of PAOA lie in the choice of the variational parameters θ that define the search landscape. We refer to this parameterization as the variational ansatz. Let N denote the total number of spins in the problem. To explore the tradeoffs between expressiveness and generalization, in this paper, we study four representative ansätze, defined by the number of free parameters, Γ:

$$\varGamma \in \left\{1,2,N,\frac{N(N-1)}{2}\right\}$$

When Γ = 1, the landscape is scaled by a single, learned inverse temperature β^(k) at each layer k. This is the global annealing schedule, which is the closest analog to SA, with the crucial difference that the schedule is learned rather than predefined.
When Γ = 2, two independent schedules are assigned to distinct parts of the graph, typically divided based on node properties like weighted degree.
When Γ = N, each node is assigned a local schedule ${\beta }_{i}^{(k)}$, allowing the algorithm to anneal different parts of the problem at different rates.
When Γ = N(N − 1)/2, the entire symmetric coupling matrix ${J}_{ij}^{(k)}$ is used as the variational ansatz, giving the algorithm maximum freedom to reshape the landscape at each layer.

Recent work in QAOA has shown that increasing the number of variational parameters per layer can improve convergence and reduce the required circuit depth^8,9. A similar principle applies to PAOA: more expressive schedules allow the system to reach useful non-equilibrium distributions in fewer layers (e.g., a single layer for a fully parameterized AND gate, as we show in Supplementary Figs. S2, 3), but may also increase the risk of overfitting. The balance between expressiveness and generalization remains a key consideration when selecting an ansatz.

Implementation via probabilistic computers

The practical implementation of the sampling inner loop is achieved using a probabilistic computing (p-computing) framework¹⁹. This framework uses networks of binary stochastic units, or p-bits, that sample from Boltzmann-like distributions via asynchronous updates governed by Glauber dynamics. A single p-bit updates its state m_i according to

$${m}_{i}=\,{{\mathrm{sgn}}}\,[\tanh (\beta {I}_{i})-{{{\mathrm{rand}}}}_{{{{\rm{u}}}}}(-1,1)]$$

(1)

where rand_u(−1, 1) is a random variable uniformly distributed in the range [ − 1, 1], β is the inverse temperature from the variational ansatz, and the local input I_i is

$${I}_{i}={\sum }_{j}{J}_{ij}{m}_{j}+{h}_{i}$$

(2)

As long as connected p-bits are updated sequentially, in any random order, this update rule generates samples from a Boltzmann distribution $P(\{m\})\propto \exp [-\beta E(\{m\})]$ over time, associated with the energy

$$E(\{m\})=-{\sum }_{i < j}{J}_{ij}{m}_{i}{m}_{j}-{\sum }_{i}{h}_{i}{m}_{i}$$

(3)

Formalism and connection to Markov chains

While in practice, PAOA relies on direct MCMC sampling, the process has a rigorous foundation in the theory of Markov chains. The evolution of the probability distribution ρ over the state space can be described by a sequence of transition matrices. To formalize this, we consider a p-layer process where the distribution after k layers, ρ_k, is given by

$${{{{\boldsymbol{\rho }}}}}_{k}={W}^{(k)}({{{{\boldsymbol{\theta }}}}}^{(k)})\cdots {W}^{(1)}({{{{\boldsymbol{\theta }}}}}^{(1)}){{{{\boldsymbol{\rho }}}}}_{0}$$

(4)

where ${{{{\boldsymbol{\rho }}}}}_{0}\in {{\mathbb{R}}}^{{2}^{N}}$ is the initial distribution and each W^(k) is a 2^N × 2^N transition matrix parameterized by the variational parameters θ^(k) for that layer. For a sequential update scheme, each W^(k) can be factorized as a product of single-site update matrices, ${W}^{(k)}={w}_{N}^{(k)}{w}_{N-1}^{(k)}\cdots {w}_{1}^{(k)}$. The exact construction of these matrices from the p-bit update rule is detailed in Supplementary Section 3.

This layered application of stochastic matrices to a probability vector ρ is the direct classical analog of applying unitary operators U^(k) to a wavefunction ψ in QAOA. The crucial distinction, however, is that each W^(k) is a norm-one-preserving stochastic matrix that mixes non-negative probabilities, whereas each U^(k) is a norm-two-preserving unitary matrix that rotates complex vectors.

The absence of complex phases in the classical evolution means that PAOA proceeds without the possibility of quantum interference. Consequently, for optimization problems where a constructive interference mechanism for QAOA is unclear, its advantage over classical alternatives is not guaranteed. Indeed, as we will demonstrate in the PAOA versus QAOA: Sherrington-Kirkpatrick model subsection, our benchmarking shows that PAOA consistently reaches higher approximation ratios on the Sherrington-Kirkpatrick model when compared to QAOA using an identical number of variational parameters.

The ultimate objective of PAOA is to find the states {m} that minimize the problem’s energy function (Eq. (3)). This is achieved through the two-level optimization loop described earlier. The outer loop minimizes a cost function (different from the problem’s energy function) over the variational parameters θ, thereby reshaping the sampling landscape to make the optimal states {m} more probable and easier to find through MCMC sampling in the inner loop.

For a target set of states ${{{\mathcal{X}}}}$, this cost function can be the negative log-likelihood:

$${{{\mathcal{L}}}}({{{\boldsymbol{\theta }}}})=-{\sum }_{\{m\}\in {{{\mathcal{X}}}}}{{{\mathrm{ln}}}}\,({{{{\boldsymbol{\rho }}}}}_{p}(\{m\};{{{\boldsymbol{\theta }}}}))$$

(5)

In practice, as the exact computation of ρ_p is intractable for large N, it is replaced by an empirical distribution ${\widehat{{{{\boldsymbol{\rho }}}}}}_{p}$ estimated from N_E independent MCMC samples. Because the samples are generated from a shallow Markov process (small p), the system typically remains out of equilibrium. This non-equilibrium character distinguishes PAOA from classical annealing methods and may unlock new features such as initial condition dependence and faster convergence to the solution. However, this may also become a liability for shallow circuits, as strong dependence on the initial distribution ρ₀ or overfitting may prevent the learned heuristics from generalizing. Therefore, the same ρ₀ used during training should also be used during inference. Deep PAOA circuits do not exhibit any initial condition dependence, especially with a small initial β that randomizes spins (see the Discovering SA with PAOA subsection).

Finally, the PAOA framework is general in two key respects. First, the cost function being minimized is not restricted to the two-local form, such as the Ising energy of Eq. (3); it can be any computable function over the states {m}, including higher-order k-local Hamiltonians or likelihood functions for machine learning tasks. Second, the variational landscape used for sampling, while implemented here with an Ising-like p-computer, is also not fundamentally limited. The PAOA approach is compatible with any model from which one can efficiently draw samples via MCMC, such as Potts models.

Representative example: majority gate

To illustrate the behavior of PAOA under both exact and approximate dynamics, we solve a small optimization problem involving a four-node majority gate. This problem serves as a tractable testbed for understanding the role of the classical optimizer.

The majority gate is defined over four binary variables [A, B, C, Y], where the output Y is given by

$$Y={{{\rm{MAJ}}}}\,(A,B,C)=\left\{\begin{array}{ll}A\vee B,& {{{\rm{if}}}}\,C=1\\ A\wedge B,& {{{\rm{if}}}}\,C=0\end{array}\right.$$

(6)

The eight correct input-output combinations, shown in the truth table in Fig. 2a, define the target set of states ${{{\mathcal{X}}}}$. The goal of the optimization is to find variational parameters that cause the final distribution, ρ_p, to be concentrated on this set.

**Fig. 2: Majority gate benchmark: comparing analytical and sampling-based PAOA.**

As a starting graph, we consider a fully connected Ising graph with J_ij = + 1 and no biases. In this example, the variational parameters are node-specific schedules β_i(p), corresponding to Γ = N. To validate the role of the numerical optimizer, we compare two optimization strategies: a full Markov chain with gradient-based optimization and an MCMC-based approximation using the COBYLA algorithm²⁵.

For the gradient-based approach, we use the formulation laid out in the Formalism and connection to Markov chains subsection and optimize over two layers (p = 2) with a uniform initial distribution ρ₀= 1/16. The cost is the negative log-likelihood over the eight correct states, and the variational parameters β_i(p) are updated using gradient descent with a fixed learning rate η = 0.004. Optimization stops when either the gradient norm falls below 10⁻⁷ or a fixed iteration budget is reached.

In the MCMC-based approach, the same initial condition is used, but sampling is performed using 10⁷ independent runs. COBYLA is used to update the parameters based on the empirical distribution $\widehat{{{{\boldsymbol{\rho }}}}}$. The optimization stops based on parameter convergence or a maximum number of function evaluations. Results are shown in Fig. 2b, where the two curves converge to nearly identical minima.

Figure 2c and d show the final distributions under both methods. The agreement between the exact Markov flow and the sampled histogram confirms that derivative-free optimization closely tracks the true gradient.

Algorithm 1

PAOA: global annealing schedule

We also test the same procedure on a 5-node full-adder circuit using the fully-parameterized J_ij(p) ansatz. The resulting distributions (see Supplementary Fig. S1) once again demonstrate tight correspondence between exact and approximate optimization at each layer.

Although analytical optimization is feasible for small N, the 2^N × 2^N transition matrix becomes intractable for larger systems. The success of COBYLA in this setting confirms that PAOA can be implemented efficiently on general p-computers using standard MCMC.

Interestingly, PAOA can explore parameter regimes not accessible to traditional Ising formulations. For example, optimal β values may be negative, corresponding to negative temperatures without a clear interpretation in the statistical physics-based context. Since equilibrium Boltzmann-sampling is not necessarily the starting point, alternative and more hardware-friendly activation functions could also be explored with PAOA, such as replacing the hyperbolic tangent in Eq. (1) with the error function ${{{\rm{erf}}}}$, or with saturating linear functions.

The shallow nature of the Markov chain used in PAOA introduces a dependency on the initial distribution ρ₀. In contrast to standard Boltzmann training methods such as Contrastive Divergence²⁶, which seek to approximate the equilibrium distribution through extensive sampling, PAOA optimizes the evolution of the distribution under a fixed initialization. This shortcut avoids the computational burden of long mixing times but restricts the learned dynamics to a specific initialization at shallow depth. While this dependence can hinder generalization at shallow depth, as we demonstrate in the Discovering SA with PAOA subsection, for deep PAOA circuits, initialization dependence is not an important concern.

Discovering simulated annealing with PAOA

An intriguing question is whether PAOA, when restricted to a global annealing schedule with one parameter per time step, can recover the well-known structure of SA. In this section, we show that the answer is yes. Using a minimal parameterization with a single global inverse temperature, β^(k), for each one of the p total layers, PAOA is able to discover SA-like profiles that optimize the average energy on a large three-dimensional (3D) spin-glass instance.

We apply PAOA to a 3D cubic lattice of size L³ = 6³, which yields a sparse bipartite interaction graph. To leverage this structure for massive parallelism on our FPGA-based p-computer, we use chromatic updates^21,22. We employ the simplest Γ = 1 global annealing ansatz, where all nodes share the same scalar inverse temperature β^(k) at each layer k, forming a schedule of total length p ∈ {5, 10, 15}.

Sampling is performed on hardware using 720 Monte Carlo sweeps (MCS) per layer. We choose a uniform initial distribution, achieved by appending β = 0 to the beginning of the schedule. After sampling, configurations are transferred to the CPU, where the CPU evaluates the average energy 〈E〉 using Eq. (3), and COBYLA updates the schedule β^(k). In this training loop, the cost function is the average sampled energy, 〈E〉, estimated from N_E independent MCMC runs. After training, however, each candidate's schedule is evaluated on a fresh batch of 10⁵ runs and assigned a success probability, defined as the fraction of runs that reach the putative ground-state energy at the final layer. When referring to the best schedule or best β in Fig. 3, we mean the schedule that achieves the highest success probability out of this independent batch. Insets in Fig. 3c show these success probabilities for all 100 optimizer runs, illustrating both the variability across runs and the improvement with circuit depth.

**Fig. 3: Discovering simulated annealing with PAOA on a 3D spin-glass problem.**

The loop continues until either convergence or a maximum number of energy evaluations is reached. The algorithm is outlined in Algorithm 1, and parameters are listed in Table 1.

Table 1 Parameters used in Algorithm 1

Full size table

The results of this optimization are summarized in Fig. 3. The optimized schedules shift the energy distribution toward lower values (Fig. 3b), increasing the probability of reaching the putative ground state at E = − 360, which was independently found using a linear SA (from β = 0.1–5) schedule with 10⁶ MCS. The schedules responsible for this improvement are shown in Fig. 3c. Notably, without any constraints on monotonicity, PAOA consistently learns schedules that begin at high temperature (low β) and gradually cool (increase β), resembling classical SA despite starting from a flat β = 2 schedule.

As depth p increases, our schedules both cool further and spend more total sweeps at lower temperatures, so the energy histograms naturally put more weight near the ground state. In majorization language, the deeper runs appear to majorize the shallower ones, but we only show energy marginals; therefore, this is suggestive rather than proof. Similar cooling-with-depth trends were observed for QAOA²⁷.

The elevated β at the first step (p = 1) for the 15-layer case may be a result of the substantial degeneracy among near-optimal schedules. Many distinct β profiles achieve nearly identical success probabilities (shaded blue lines in Fig. 3c). The inset histograms show that while individual schedules vary across 100 optimizer runs, performance remains stable (even for non-monotonic schedules) with the smallest variance in success probabilities for deeper schedules (p = 15).

Constraints could be added to restrict the class of allowed schedules, e.g., enforcing linear, exponential, or monotonic cooling profiles using inequality constraints in COBYLA. We did not use such constraints to avoid biasing the solution toward SA. These results demonstrate that SA emerges as a special case of variational Monte Carlo, recoverable from the global schedule ansatz. Moreover, PAOA proves to be more general, as it points to the possibility of alternative non-monotonic schedules with comparable performance, indicating the flexibility of the algorithm.

PAOA on FPGA with on-chip annealing

To support long annealing schedules in 3D spin-glass problems, we designed a p-computer architecture that performs annealing entirely on-chip, unlike earlier implementations that require off-chip resources for annealing (e.g., ref. ²¹). The annealing schedule β^(k) is preloaded to the FPGA and indexed via a layer counter that advances after a fixed number of MCS. At each step, the current β value is used to scale the synaptic input I_i via a digital signal processing multiplier (see Supplementary Fig. S4).

The full p-layer annealing process runs uninterrupted on the FPGA, and the final spin configurations are read out only after the last layer is complete.

The FPGA operates in fixed-point arithmetic with s{4}{5} precision (where s denotes the sign bit, followed by 4 integer and 5 fractional bits) for both β^(k) and J_ij. Due to the size of the FPGA we use, we instantiate 10 independent replicas of the L³ = 6³ spin-glass graph, enabling parallel updates of 2160 p-bits per cycle. Our FPGA architecture with on-chip annealing capability achieves about approximately an 800-fold reduction in wall-clock time in the FPGA compared to an optimized CPU implementation with graph-colored Gibbs sampling (Table 2).

Table 2 Wall-clock time for 3D spin glass (L³ = 6³), comparing CPU to FPGA with 10 identical replicas and 10⁴ independent runs

Full size table

PAOA is parallelizable at multiple levels. First, the cost estimator 〈E〉 is computed from N_E independent MCMC runs, which can be executed concurrently across replicas. Second, on sparse graphs, independent sets admit parallelism via chromatic sampling. Consequently, high-throughput implementations are possible using CPUs, GPUs, FPGAs, custom accelerators or nanodevices. Our FPGA design is one point in this space and was chosen to accelerate the experiments in this work. The wall-clock numbers in Table 2 are an implementation case study rather than a universal hardware comparison.

All simulations use a uniform initialization, implemented by prepending a β = 0 layer to the schedule. This yields random initial conditions and ensures randomized spin states for each replica. Further hardware design details are provided in Supplementary Section 4.

PAOA versus QAOA: Sherrington-Kirkpatrick model

The Sherrington-Kirkpatrick (SK) model defines a mean-field spin glass with all-to-all random couplings, and serves as a canonical benchmark for optimization algorithms. The classical energy function is defined as

$$E(\{m\})=-\frac{1}{\sqrt{N}}{\sum }_{i < j}{J}_{ij}{m}_{i}{m}_{j}$$

(7)

where m_i ∈ { − 1, 1} and ${J}_{ij} \sim {{{\mathcal{N}}}}(0,1)$. The exact ground-state energy per spin in the large-N limit is known²⁸.

QAOA has been studied extensively on the SK model^7,15,29,30, with recent numerical work suggesting it can approximate the ground state in the infinite-size limit³¹. To enable a direct comparison, we evaluate PAOA using a two-schedule ansatz with exactly 2p parameters, matching the parameter count of depth-p QAOA. Although PAOA scales readily to much larger N, we stick to a fixed size of N = 26, a typically studied SK-model, to enable a direct comparison with QAOA.

Each schedule is assigned to one half of the graph, split uniformly at random. For N = 26, we randomly generate 30 training instances and optimize PAOA separately for each instance and layer depth p ∈ {1, …, 17}. The resulting schedules are then averaged and applied without further adjustment to a disjoint set of 30 test instances. QAOA results plotted in Fig. 4a use optimal parameters from prior work^7,29. For each instance, exact ground states are computed by exhaustive search.

**Fig. 4: PAOA vs QAOA on the Sherrington-Kirkpatrick model.**

Our results show that PAOA has highly competitive performance under these iso-parametric conditions. As shown in Fig. 4a (left, PAOA), the average energy per spin consistently decreases with layer depth, and schedules trained on N = 26 generalize well to much larger instances up to N = 500 (see Supplementary Fig. S5). To formalize this comparison, we report the approximation ratio in Fig. 4b, defined as

$$\,{{{\rm{Approx.\; Ratio}}}}\,=\frac{E(\{m\})}{E(\{{m}_{{{{\rm{sol}}}}}\})}$$

(8)

where E({m_sol}) is the cost of the exact ground state. PAOA achieves a consistently higher approximation ratio than standard QAOA across all depths. Notably, the performance of vanilla PAOA is comparable to that of recently proposed hybrid quantum-classical methods that use classical techniques to improve QAOA’s performance³², suggesting that a definitive quantum advantage on this problem has yet to be firmly established.

We emphasize that a clear quantum advantage over classical algorithms is expected to emerge in problems where interference plays an explicit role, such as in the synthetic examples constructed by Montanaro et al.¹². In these cases, the superposition of many computational paths, enabled by either the problem structure or the quantum algorithm, creates interference patterns that amplify the correct solution while suppressing others, an effect that classical probabilistic samplers struggle to reproduce due to the well-known “sign problem³³”. As shown in ref. ³⁴, a probabilistic formulation of generic unitary evolutions yields complex (often purely imaginary) weights, so sampling effectively becomes randomized. While post-phase corrections recover unbiased estimates, the number of samples grows exponentially.

These interference-dominated regimes are precisely where we do not expect PAOA, or any classical sampler, to compete. On the other hand, unless such interference-driven advantages are clearly identified and demonstrated more broadly across practical problem classes, we do not expect any significant quantum advantage from QAOA. Meanwhile, PAOA offers a rigorous, scalable and high-performing benchmark for variational sampling at problem sizes beyond the reach of current quantum devices.

Multi-Schedule annealing in the Lévy SK model

To a heuristic solver, the SK model presents no intrinsic structure beyond a dense random coupling graph. To explore whether PAOA can adaptively exploit structure when it might exist, we consider a variant where the coupling weights J_ij follow a heavy-tailed Lévy distribution³⁵:

$$P(\,J)=\frac{\alpha }{2}| \,J{| }^{-1-\alpha },\,|\, J| > 1$$

(9)

We use α = 0.9, which produces a distribution with diverging variance, dominated by rare, large couplings.

In this regime, a natural question arises: can PAOA discover principles that treat strongly and weakly coupled nodes differently? Prior work on Boltzmann machines and annealing schedules^36,37 suggests that high-degree or strongly coupled nodes benefit from slower annealing (more time for high temperature) to avoid freezing too early. In these prior works, the heuristics are hand-designed by the ingenuity of the human algorithm designers. Here, we test whether the principle of annealing subgraphs with different temperatures can be discovered by PAOA through black-box optimization.

We begin by introducing a coupling strength per node:

$${\lambda }_{i}={\sum }_{j}| {\,J}_{ij}|$$

(10)

We then sort nodes by λ_i and split into a heavy group (top 20%) and a light group (bottom 80%), as shown in Fig. 5a. The split here is arbitrary and we confirmed that a 50–50% partition following the same procedure yields similar results (see Supplementary Fig. S6).

**Fig. 5: Learning a variational principle using a PAOA double-schedule ansatz.**

To test if PAOA can leverage this structure, we first identify a near-optimal single-schedule using SA at shallow depth (p = 17). We then let COBYLA optimize two-schedules, corresponding to heavy and light nodes, freely. After convergence, we average the schedules across all instances. As shown in Fig. 5b, the optimized schedules exhibit a qualitative separation: heavy nodes are assigned a higher-temperature (lower β) profile, and light nodes anneal faster. We verified that the different schedules for heavy and light nodes lead to higher success probability for the double-schedule over the single-schedule. Due to the shallow depth (p = 17) however the differences are not appreciably large.

However, when extrapolating the double-schedules discovered by PAOA to much deeper layers (Fig. 5c), we observe a clear difference against single-schedule SA. Our extrapolation is relatively straightforward but we report the details in the Methods Section. The results, shown in Fig. 5d for N = 50 and 500 unseen instances, show that the two-schedule ansatz outperforms the single schedule, with a clear gap in success probabilities between the two approaches, confirming a robust benefit from structured, heterogeneous annealing. Details on determining the success probability for single and double schedules are in the Methods Section.

This behavior suggests that PAOA can discover problem-aware heuristics computationally. While such adaptive schedules have long been used manually in structured settings³⁸, we are not aware of prior variational algorithms for non-equilibrium sampling that systematically learn them from data. In this sense, PAOA serves as a tool for discovering new algorithmic strategies in disordered systems.

Outlook

We have introduced the PAOA, a variational Monte Carlo framework that generalizes SA and supports a wide range of parameterizations compatible with existing Ising hardware. By treating the energy landscape itself as a variational object and updating it through classical feedback from Monte Carlo samples, PAOA enables efficient sampling from non-equilibrium distributions using shallow Markov chains.

We demonstrated that PAOA captures SA as a limiting case and, without constraints, can rediscover SA-like behavior when optimized for energy minimization. Using a minimal global schedule ansatz, PAOA efficiently learned monotonic cooling profiles that converge to low-energy states on 3D spin-glass problems. The approach was implemented on an FPGA-based p-computer, achieving hardware-accelerated annealing at rates exceeding CPU-based implementations by several orders of magnitude.

Beyond global schedules, we explored richer ansätze with multiple temperature profiles. On the Sherrington-Kirkpatrick (SK) model, PAOA exceeded the performance of QAOA under iso-parametric conditions, while scaling to system sizes beyond those accessible to quantum devices. In a heavy-tailed variant of the SK model, PAOA automatically learned to assign slower annealing schedules to strongly coupled nodes, reproducing hand-crafted heuristics from prior work. As such, PAOA can be a tool for discovering new algorithmic strategies for sampling in disordered systems.

Methods

Numerical simulation uses double precision (64-bit) in C++ to generate results for PAOA in Figs. 2d and 4a, and Supplementary Fig. S1d. QAOA results were reproduced from ref. ³⁹.

Data transfer between FPGA and CPU

A PCIe interface was used to communicate between FPGA and CPU through a MATLAB interface for the “read/write” operations. A global “disable/enable” signal broadcast from MATLAB to the FPGA was used to freeze and resume all p-bits. Before a “read” instruction, the p-bit states were saved to the local block memory (BRAM) with a snapshot signal. Then the data were read once from the BRAM using the PCIe interface and sent to MATLAB for post-processing, that is, updating the annealing schedule (β^(k)). For the “write” instruction, the “disable” signal was sent from MATLAB to freeze the p-bits before sending the updated schedule. After the “write” instruction was given, p-bits were enabled again with the “enable” signal sent from MATLAB.

Measurement of wall-clock time and flips per nanosecond

To measure the annealing wall-clock time per replica in FPGA along with the reading and writing overhead, we use MATLAB’s built-in tic and toc functions and average the total annealing time over 100 independent experiments. Adding all flip attempts for the entire system with all replicas, 2160 p-bits, the total flips per nanosecond (flips/ns) are computed. For CPU measurements, MATLAB’s built-in tic and toc functions were used to measure the total annealing time taken to perform the MCS in Table 2 with 1000 runs each. The average time per MCS is reported for a single replica for a network size of L³ = 6³.

Schedule generation for the Lévy SK model benchmark

The performance benchmark in the Multi-Schedule annealing in the Lévy SK model subsection required comparing two annealing strategies. The schedules for these strategies were generated via a grid search based on a geometric functional form:

$${\beta }^{(k)}={\beta }_{{{{\rm{initial}}}}}\cdot {\left(\frac{{\beta }_{{{{\rm{final}}}}}}{{\beta }_{{{{\rm{initial}}}}}}\right)}^{k/(p-1)},\,k=0,\ldots,p-1$$

(11)

The parameter ranges searched were β ∈ [0.005, 0.12] for the N = 50 instances.

The two strategies were constructed as follows:

Single-Schedule (SA): We performed a grid search over β_final to find the single geometric schedule that produced the lowest average energy across all instances. This served as the baseline.
Two-Schedule PAOA: The optimized baseline schedule was assigned to the heavy nodes, as suggested by PAOA. For light nodes, we scale up the baseline schedule by a scaling factor that reflects the separation proposed by PAOA. Then, both geometric schedules are extrapolated to large number of layers to study the effect of deep PAOA.

In order to compare the two approaches, we define success probability as the ratio of the number of runs that hit the putative ground state to the total number of runs (i.e., 100 independent runs). The putative ground state was found by running a long SA with one million sweeps and ten independent runs, then choosing the lowest energy among all ten million states.

Optimizer choice

For the outer optimization loop, we used the COBYLA algorithm as it is a robust and widely used method for derivative-free optimization problems with a moderate number of variables^40,41,42. Our COBYLA implementation is based on the non-linear optimization package⁴³. While the performance of PAOA may depend on the choice of classical optimizer, a detailed comparison of different optimization techniques is beyond the scope of this work.

Data availability

All generated and processed data used in this study are openly accessible and can be found in the GitHub repository⁴⁴. Other findings of this study are available from the corresponding authors upon request.

Code availability

The MATLAB and C++ implementations of PAOA used to generate the results and plots in this study can be found in the GitHub repository mentioned in the Data availability section.

References

Wang, W., Machta, J. & Katzgraber, H. G. Comparing Monte Carlo methods for finding ground states of Ising spin glasses: population annealing, simulated annealing, and parallel tempering. Phys. Rev. E 92, 013303 (2015).
Article ADS MathSciNet Google Scholar
Shen, Z. i-S. ong et al. Free-energy machine for combinatorial optimization. Nat. Comput. Sci. 5, 322–332 (2025).
Article PubMed Google Scholar
Muñoz-Arias, M. et al. Low-depth Clifford circuits approximately solve maxcut. Phys. Rev. Res. 6, 023294 (2024).
Article Google Scholar
Ma, Q., Ma, Z., Xu, J., Zhang, H. & Gao, M. Message passing variational autoregressive network for solving intractable Ising models. Commun. Phys. 7, 236 (2024).
Article Google Scholar
Barzegar, A., Hamze, F., Amey, C. & Machta, J. Optimal schedules for annealing algorithms. Phys. Rev. E 109, 065301 (2024).
Article ADS MathSciNet CAS PubMed Google Scholar
Fan, C. et al. Searching for spin glass ground states through deep reinforcement learning. Nat. Commun. 14, 725 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Basso, J., Farhi, E., Marwaha, K., Villalonga, B. & Zhou, L. The quantum approximate optimization algorithm at high depth for MaxCut on large-girth regular graphs and the Sherrington-Kirkpatrick model. In Proc. 17th Conference on the Theory of Quantum Computation, Communication and Cryptography (eds. Le Gall, François, Morimae, Tomoyuki) Vol. 232 Leibniz International Proceedings in Informatics (LIPIcs), pp. 7:1–7:21 ISBN 978-3-95977-237-2 (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022).
Herrman, R., Lotshaw, P. C., Ostrowski, J., Humble, T. S. & Siopsis, G. Multi-angle quantum approximate optimization algorithm. Sci. Rep. 12, 6781 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Vijendran, V., Das, A., Koh, D. axE. nshan, Assad, S. M. & Lam, P. ingK. oy An expressive ansatz for low-depth quantum approximate optimisation. Quantum Sci. Technol. 9, 025010 (2024).
Article ADS Google Scholar
Zhou, L., Wang, S-T, Choi, S., Pichler, H., and Lukin, M. D. Quantum approximate optimization algorithm: performance, mechanism, and implementation on near-term devices. Phys. Rev. X 10, 021067 (2020a).
Müller, T., Singh, A., Wilhelm, F. K. & Bode, T. Limitations of quantum approximate optimization in solving generic higher-order constraint-satisfaction problems. Phys. Rev. Res. 7, 023165 (2025).
Article Google Scholar
Montanaro, A. & Zhou, L. Quantum speedups in solving near-symmetric optimization problems by low-depth QAOA. Preprint at https://doi.org/10.48550/arXiv.2411.04979 (2024).
Zhou, L., Wang, S. heng-T. ao, Choi, S., Pichler, H. & Lukin, M. D. Quantum approximate optimization algorithm: performance, mechanism, and implementation on near-term devices. Phys. Rev. X 10, 021067 (2020b).
CAS Google Scholar
Boulebnane, S., Sud, J., Shaydulin, R. & Pistoia, M. Quantum approximate optimization algorithm in finite size and large depth and equivalence to quantum annealing. Preprint at https://doi.org/10.48550/arXiv.2503.09563 (2025a).
Gülbahar, B. Maximum-likelihood detection with QAOA for massive MIMO and Sherrington-Kirkpatrick model with local field at infinite size. IEEE Trans. Wirel. Commun. 23, 11567–11579 (2024).
Article Google Scholar
Finžgar, J. ernejR. udi, Schuetz, M. artinJ. A., Brubaker, J. K., Nishimori, H. & Katzgraber, H. G. Designing quantum annealing schedules using Bayesian optimization. Phys. Rev. Res. 6, 023063 (2024).
Article Google Scholar
Cheng, L., Chen, Y. u-Q. in, Zhang, S. hi-X. in & Zhang, S. Quantum approximate optimization via learning-based adaptive optimization. Commun. Phys. 7, 83 (2024).
Article ADS Google Scholar
Weitz, G., Pira, L. irandë, Ferrie, C. & Combes, J. Subuniversal variational circuits for combinatorial optimization problems. Phys. Rev. A 112, 032418 (2025).
Article ADS MathSciNet CAS Google Scholar
Çamsarı, K. Y., Sutton, B. M. & Datta, S. p-bits for probabilistic spin logic. Appl. Phys. Rev. 6, 011305 (2019).
Article ADS Google Scholar
Borders, W. A. et al. Integer factorization using stochastic magnetic tunnel junctions. Nature 573, 390–393 (2019).
Article ADS CAS PubMed Google Scholar
Aadit, N. A. et al. Massively parallel probabilistic computing with sparse Ising machines. Nat. Electron. 5, 460–468 (2022).
Article Google Scholar
Nikhar, S., Kannan, S., Aadit, N. avidA. njum, Chowdhury, S. & Camsari, K. Y. All-to-all reconfigurability with sparse and higher-order ising machines. Nat. Commun. 15, 8977 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Chowdhury, S. & Camsari, K. Y. Machine learning quantum systems with magnetic p-bits. In 2023 IEEE International Magnetic Conference - Short Papers (INTERMAG Short Papers), 1–2 (IEEE, Sendai, Japan, 2023). https://doi.org/10.1109/INTERMAGShortPapers58606.2023.10228205.
Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).
Article ADS MathSciNet CAS PubMed Google Scholar
Powell, M. J. D. A direct search optimization method that models the objective and constraint functions by linear interpolation. In (S. Gomez and J.-P. Hennart, eds) Advances in Optimization and Numerical Analysis, 51–67 (Springer, 1994).
Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002).
Article PubMed Google Scholar
Lotshaw, P. C. et al. Approximate Boltzmann distributions in quantum approximate optimization. Phys. Rev. A 108, 042411 (2023).
Article ADS MathSciNet CAS Google Scholar
Parisi, G. Infinite number of order parameters for spin-glasses. Phys. Rev. Lett. 43, 1754 (1979).
Article ADS CAS Google Scholar
Farhi, E., Goldstone, J., Gutmann, S. & Zhou, L. The quantum approximate optimization algorithm and the Sherrington-Kirkpatrick model at infinite size. Quantum 6, 735 (2022).
Article Google Scholar
Crisanti, A. & Rizzo, T. Analysis of the∞-replica symmetry breaking solution of the Sherrington-Kirkpatrick model. Phys. Rev. E 65, 046137 (2002).
Article ADS MathSciNet CAS Google Scholar
Boulebnane, S. et al. Evidence that the quantum approximate optimization algorithm optimizes the Sherrington-Kirkpatrick model efficiently in the average case. Preprint at https://doi.org/10.48550/arXiv.2505.07929 (2025).
Dupont, M. & Sundar, B. Extending relax-and-round combinatorial optimization solvers with quantum correlations. Phys. Rev. A 109, 012429 (2024).
Article ADS MathSciNet CAS Google Scholar
Troyer, M. & Wiese, U. we-J. ens Computational complexity and fundamental limitations to fermionic quantum Monte Carlo simulations. Phys. Rev. Lett. 94, 170201 (2005).
Article ADS PubMed Google Scholar
Chowdhury, S., Çamsari, K. Y. & Datta, S. Emulating quantum circuits with generalized Ising machines. IEEE Access 11, 116944–116955 (2023).
Article Google Scholar
Boettcher, S. Ground states of the Sherrington–Kirkpatrick spin glass with Levy bonds. Philos. Mag. 92, 34–49 (2012).
Article ADS CAS Google Scholar
Aarts, E. and Korst, J. Boltzmann machines. In Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing, 150–151 (John Wiley & Sons, 1989).
Adame, J. I. & McMahon, P. L. Inhomogeneous driving in quantum annealers can result in orders-of-magnitude improvements in performance. Quantum Sci. Technol. 5, 035011 (2020).
Article ADS Google Scholar
Mohseni, M. et al. Nonequilibrium Monte Carlo for unfreezing variables in hard combinatorial optimization. Preprint at https://doi.org/10.48550/arXiv.2111.13628 (2021).
Lykov, D., Shaydulin, R., Sun, Y., Alexeev, Y. & Pistoia, M. Fast simulation of high-depth QAOA circuits. In Proc. SC' 23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, 1443–1451 (Association for Computing Machinery, 2023).
Pellow-Jarman, A., Sinayskiy, I., Pillay, A. & Petruccione, F. A comparison of various classical optimizers for a variational quantum linear solver. Quantum Inf. Process. 20, 202 (2021).
Arrasmith, A., Cerezo, M., Czarnik, P., Cincio, L. & Coles, P. J. Effect of barren plateaus on gradient-free optimization. Quantum 5, 558 (2021).
Article Google Scholar
Schiffer, B. F., Tura, J. & Cirac, J. I. Adiabatic spectroscopy and a variational quantum adiabatic algorithm. PRX Quantum 3, 020347 (2022).
Article ADS Google Scholar
Johnson, S. G. The NLopt nonlinear-optimization package (2007).
Abdelrahman, A. S., Chowdhury, S., Morone, F., Camsari, K. Y. Generalized probabilistic approximate optimization algorithm. https://doi.org/10.5281/zenodo.17545952. Zenodo repository (2025).

Download references

Acknowledgements

A.S.A., S.C., and K.Y.C. acknowledge support from the National Science Foundation (NSF) under award number 2311295, and the Office of Naval Research (ONR), Multidisciplinary University Research Initiative (MURI) under Grant No. N000142312708. F.M. was partially supported by the Office of Naval Research (ONR) under Award No. N00014-23-1-2771. We are grateful to Navid Anjum Aadit for discussions related to the hardware implementation of online annealing, and Ruslan Shaydulin and Zichang He for input on QAOA benchmarking. Use was made of Computational facilities purchased with funds from the National Science Foundation (CNS-1725797) and administered by the Center for Scientific Computing (CSC). The CSC is supported by the California NanoSystems Institute and the Materials Research Science and Engineering Center (MRSEC; NSF DMR 2308708) at UC Santa Barbara.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA, USA
Abdelrahman S. Abdelrahman, Shuvro Chowdhury & Kerem Y. Camsari
Department of Physics, Center for Quantum Phenomena, New York University, New York, NY, USA
Flaviano Morone

Authors

Abdelrahman S. Abdelrahman
View author publications
Search author on:PubMed Google Scholar
Shuvro Chowdhury
View author publications
Search author on:PubMed Google Scholar
Flaviano Morone
View author publications
Search author on:PubMed Google Scholar
Kerem Y. Camsari
View author publications
Search author on:PubMed Google Scholar

Contributions

A.S.A. and K.Y.C. conceived the study. A.S.A. led the full implementation of PAOA, including all simulations and experiments, building on the initial PAOA implementation by S.C. that used black-box optimizers. A.S.A. designed and built the FPGA implementation of PAOA with on-chip annealing capability. S.C. performed the QAOA benchmarking. F.M. contributed key theoretical insights and assisted in interpreting the spin-glass benchmarks. All authors contributed to discussing the results and writing the manuscript.

Corresponding authors

Correspondence to Abdelrahman S. Abdelrahman or Kerem Y. Camsari.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Abdelrahman, A.S., Chowdhury, S., Morone, F. et al. Generalized Probabilistic Approximate Optimization Algorithm. Nat Commun 17, 498 (2026). https://doi.org/10.1038/s41467-025-67187-5

Download citation

Received: 28 July 2025
Accepted: 24 November 2025
Published: 08 December 2025
Version of record: 14 January 2026
DOI: https://doi.org/10.1038/s41467-025-67187-5