Efficient optimization accelerator framework for multi-state spin Ising problems

Garg, Chirag; Salahuddin, Sayeef

doi:10.1038/s41467-025-64625-2

Download PDF

Article
Open access
Published: 30 October 2025

Efficient optimization accelerator framework for multi-state spin Ising problems

Nature Communications volume 16, Article number: 9601 (2025) Cite this article

3773 Accesses
2 Citations
Metrics details

Subjects

Abstract

Ising Machines are emerging hardware architectures that efficiently solve NP-hard combinatorial optimization problems. Generally, combinatorial problems are transformed into quadratic unconstrained binary optimization (QUBO) form, but this transformation often complicates the solution landscape, degrading performance, especially for multi-state problems. To address this challenge, we model spin interactions as generalized boolean logic function to significantly reduce the exploration space. We demonstrate the effectiveness of our approach on graph coloring problem using probabilistic Ising solvers, achieving similar accuracy compared to state-of-the-art heuristics and machine learning algorithms. It also shows significant improvement over state-of-the-art QUBO-based Ising solvers, including probabilistic Ising and simulated bifurcation machines. We also design 1024-neuron all-to-all connected probabilistic Ising accelerator on FPGA with the proposed approach that shows $\sim$10000$\times$ performance acceleration compared to GPU-based Tabucol heuristics and reducing physical neurons by 1.5$-$4$\times$ over baseline Ising frameworks. Thus, this work establishes superior efficiency, scalability and solution quality for multi-state optimization problems.

Introduction

New computing paradigms are getting significant attention due to exponentially growing computing needs¹. A wide variety of problems falls into the class of non-deterministic polynomial-time hard (NP-hard) and are difficult to solve optimally using conventional computing solutions^2,3. The solution space grows exponentially with problem size, therefore making brute force searching impractical for large problem instances. However, heuristics⁴ and annealing² based approaches have been conventionally used to tackle these computationally hard problems in domains such as logistics^5,6, biology⁷, integrated circuits design⁸, etc. In this context, Ising machines-based accelerators^{9,10,11,12,13,14,15,16} are currently being leveraged to efficiently find the solution to hard optimization problems. Various technologies and hardware architectures have been explored to build these Ising accelerators including quantum annealing with superconducting qubits¹⁷, classical annealing in memristor or RRAM^18,19, coherent Ising machines employing optical oscillators^13,20, coupled oscillators ^14,16,21, neuromorphic hardware²², and stochastic circuits (probabilistic bits)^9,10,12,23. Quantum annealers show promising results²⁴ but require low temperatures, making them expensive in cost and power. Therefore, classical alternatives have gained attention due to their room-temperature operation and realization using the current semiconductor process flow. This work particularly focuses on stochastic/probabilistic Ising machines to efficiently solve combinatorial optimization problems.

Probabilistic Ising architectures follow Boltzmann machine binary neural network principles^25,26. These architectures are physically constructed using binary stochastic neurons interacting with each other and aim to minimize the Ising Hamiltonian (Fig. 1a and Supplementary Note 1). The neuron update follows sigmoidal activation (Fig. 1b) such that neurons stochastically move towards the minimum energy states (Fig. 1c). To harness the efficient solution exploration of these hardware architectures, the cost/energy functions of many complex optimization problems are converted into Ising Hamiltonian form. Based on this conversion, these optimization problems can be broadly divided into three categories (Fig. 1d)²⁷. The problem class, comprised of binary solution state space with no imposed constraints, has been widely mapped and efficiently solved on Ising machines^16,20,26. On the other hand, problems with integer/multi-state valued solutions do not naturally convert to binary solution state space, thus leading to additional constraints in the converted Ising Hamiltonian^27,28,29. This direct conversion method frequently causes the Ising machines to explore infeasible solution space, resulting in inefficiencies^18,30,31. One such example is solving the graph coloring problem with Ising machines.

**Fig. 1: Probabilistic Ising machine architecture and optimization problem classification.**

Graph coloring is an NP-hard optimization problem that seeks to assign different colors to the connected nodes of a graph network. It is an example of an integer optimization problem where each integral value represents the color of a node. Previously, annealing and heuristic methods (Tabu-search³²) were utilized to tackle this problem. These approaches achieve good solution accuracy but suffer from long runtimes for large and densely connected problem instances. Recently, learning-based approaches^33,34 especially graph neural networks are applied to solve the problem accurately and efficiently at a scale. Existing Ising hardware or its modified versions are also proposed to solve this problem. However, they significantly lag behind heuristics and learning-based approaches in solution quality. To an extent, earlier works^18,30,31 overcome this bottleneck by adopting post-processing techniques using additional hardware and software, which adversely affects area and computation time.

In this work, we present an end-to-end probabilistic Ising implementation that combines advances in multi-state problem mapping, spin interaction design, and an efficient hardware architecture to demonstrate significant improvement in area, solution accuracy, and time-to-solution. First, we propose vectorized mapping that represents the node colors as binary vectors rather than using the customary one-hot form. This circumvents the additional mapping constraint in the Ising Hamiltonian that arises due to one-hot encoding²⁷. Thus, it completely discards the exploration of infeasible/invalid solution space and improves the solution quality. Second, the interactions among the binary vector states are modeled using truth tables employed in digital logic. Third, we implement an efficient FPGA accelerator that uses higher-order multiplexers to represent these truth tables. Altogether, this multi-state Ising implementation shows approximately 100,000$\times$ speed improvement and 5$\times$ power improvement compared to its GPU-based implementation for solving problems up to 256 nodes and 16 colors graph coloring problems. This current methodology is then integrated with the parallel tempering approach to improve the solution quality and provide competitive or even better coloring results than Tabucol heuristics and machine learning algorithms.

Results

Ising mapping framework

A wide range of combinatorial optimization problems belongs to multistate/integer-valued solution space, including graph coloring, knapsack problems³⁵, etc. Graph coloring problem, investigated in this work, has applications, such as layout decomposition³⁶, register allocation³⁷, logic minimization³⁸, scheduling³³, and many more. The problem instance that aims to color $N$ nodes with $q$ colors, requires ${Nq}$ physical neurons/nodes in the Ising framework. Because this framework relies on one-hot encoding to represent node colors, additional constraints are imposed in problem Hamiltonian (${H}_{A}$ in Eq. 1) to enforce the legal combination of one-hot bits²⁷ (represented by Eq. 2). Therefore, the Ising Hamiltonian (H) for graph coloring problem becomes ${H}_{A}\,+\,{H}_{B}$.

$${H}_{A}=A{\sum }_{i,j\in E}{\sum }_{k=1}^{q}{s}_{{ik}}{s}_{{jk}}$$

(1)

$${H}_{B}=B{\sum }_{i}{\left(1-{\sum }_{k=1}^{q}{s}_{{ik}}\right)}^{2}$$

(2)

In the equations, ${s}_{{ik}}$ represents ${k}^{{th}}$ one-hot bit for node $i,{E}$ denotes the set of all edges in the problem graph, $A$ is the connectivity weighting factor, and $B$ is the one-hot constraint factor. This soft constraint Hamiltonian is expected to be zero for the ground state energy solution. However, it often takes the non-zero value, thereby making Ising samplers inefficient in solving such problems. (see Fig. 2a). The one-hot encoding requires $q$ bits to represent $q$ possible colors of a node. As a result, Ising machines search over ${2}^{q}$ states with only $q$ valid states in the solution space. It makes the optimization hard and deteriorates the solution quality. Hence, the Ising frameworks following one-hot encoding often fail to solve this class of problems. Despite these challenges, this approach is often adopted for most multistate optimization problems because it leads to quadratic interactions supported in most Ising machines^10,16,20,26. In probabilistic computing¹², binary multiplexers offer simplistic and efficient realization for state-weight multiplication.

**Fig. 2: Comparison between Ising and Vectorized mapping approaches.**

Vectorized mapping framework

In this work, we propose a vectorized mapping approach (Fig. 2b) to tailor the graph coloring problem using a binary encoding technique. It represents the $q$ color state using the $\lceil \log 2q\rceil$ ($={n}$) bit vector, which means color for node ${S}_{i}$ is represented as {${s}_{i0},{s}_{i1},\,.\,.\,.,\,{s}_{i(n-1)}$}. Therefore, this mapping requires $N\lceil \log 2q\rceil$ physical neurons/nodes for a graph coloring problem with $N$ nodes and $q$ colors. In this way, it eliminates all invalid state space and removes the extra constraints in Ising Hamiltonian (Eq. 2). Therefore, the proposed approach leads to a solvable energy landscape without additional complexities. We also adopt parallel tempering³⁹ to enhance the solution exploration and improve the solution quality.

$$H={\sum }_{\left({S}_{i}{S}_{j}\right)\in E}{W}_{{S}_{i}{S}_{j}}F\left({s}_{i0},{s}_{i1},\ldots,{s}_{i\left(n-1\right)},{s}_{j0},{s}_{j1},\ldots,{s}_{j\left(n-1\right)}\right)$$

(3)

The proposed vectorized mapping is a generic approach that can be applied to any multistate optimization problem being solved on Ising accelerators (see Fig. 2b). This approach represents the states in binary representation and only requires modeling the function operator $F$ in the Hamiltonian $H$ (Eq. 3) for a specific problem. In Eq. 3, ${W}_{{S}_{i}{S}_{j}}$ represents the edge ($E$) weight connecting the nodes ${S}_{i}$ and ${S}_{j}$, and function operator $F$ is modeled in a truth table-based format. Algorithm 1 describes the formulation of the truth table for the graph coloring problem and Fig. 2 shows the formulation of the truth table of 4 nodes, 3-color graph coloring problem. In the cases when nodes (${S}_{i}$ and ${S}_{j}$) take the same value or color value greater than $q$ (invalid colors), $F$ becomes one. $F$ is zero for the rest of the cases. The resulting Hamiltonian is a higher-order polynomial due to the $F$ operator function. Ideally, graph coloring criteria are met when the Hamiltonian energy reaches its global minimum of zero, which corresponds to the condition where $F$ is equal to zero. Further, the truth table-based algorithm facilitates its representation as multiplexers, particularly in the case of Ising hardware, which digitally models spin interactions. In this work, we leverage the probabilistic Ising machines framework to develop the accelerator following the proposed vectorized mapping. The hardware architecture for this scheme is derived from the standard node update rule for probabilistic Ising machines. For ${k}^{{th}}$ bit of node ${S}_{i}$, the update rule is as follows:

$$P\left({s}_{{ik}}=1\right)={{{\rm{\sigma }}}}\left(-\frac{1}{T}\frac{{dH}}{d{s}_{{ik}}}\right)$$

(4)

where ${s}_{{ik}}$ takes binary value $0$ and $1$, $T$ is temperature coefficient and $\sigma$ is sigmoid function $(=\,1/(1\,+\,{e}^{-x}))$ for input value$\,x$). The binary nature of ${s}_{{ik}}$ converts the differentiation into $\Delta H$ equal to ${H}_{{s}_{{ik}}=1}\,-{H}_{{s}_{{ik}}=0}$. This $\Delta H$ is implemented in hardware using higher-order multiplexers followed by a subtraction unit (see Fig. 2), called as vecmul unit. The rest of the hardware implementation remains the same as shown in Fig. 1.

Accuracy analysis and benchmarks

Next, we investigate the proposed vectorized mapping approach for the graph coloring problem. For benchmarking, we have used it on publicly available COLOR dataset⁴⁰. First, we implement probabilistic Ising frameworks following one-hot encoding (Ising) and proposed binary encoding (vectorized) approaches on GPU for comparison. Figure 3 shows the results for the ${queen}13\_13$ problem, which is deemed a ${hard}$ problem³³. For both schemes, the exploration for solution happens in a similar way (Fig. 3a). Figure 3b illustrates that the vectorized mapping gives superior coloring results as compared to Ising framework. Even with an optimal set of connectivity weight factor and one-hot constraint weight factor parameters (see Supplementary Fig. S1), the effectiveness of the Ising framework is limited by the one-hot encoding constraint in Eq. 2. This comparison for other problem instances in the dataset is shown in the Supplementary Figs. S3 and S4. Further, we employ the parallel tempering method to enhance the exploration while sampling the energy landscape. This method involves the simultaneous simulation of M replica systems with vectorized mapping running at different temperature schedules. At high temperatures, the system tends to explore the broad region of the solution space, while low temperatures allow precise sampling around a particular region (Fig. 2b). High-temperature replicas ensure that the system doesn’t get stuck around the local minimum solution. The states of higher-temperature replica systems are exchanged with low-temperature systems via a swap rule (see Methods Section 1). This state exchange phenomenon enables the exploration of certain parts of solution space that could not be possible via a single temperature schedule. Figure 3c depicts the energy exploration for the ${queen}13\_13$ problem using the parallel tempering approach. It clearly shows that the system continuously escapes from the minimum local energy solution and therefore, achieves the best possible solution accuracy.

**Fig. 3: Solution exploration in Ising and Vectorized mapping framework.**

We evaluate the accuracy of vectorized mapping solutions against learning-based methods (graph neural network³⁴ and its GraphSage architecture PI-SAGE³³), heuristic approaches (Tabucol^32,41), and state-of-the-art Ising solvers (probabilistic Ising machines⁴², simulated bifurcation^43,44) for graph coloring across standard benchmark datasets. GNN-based solvers frame the coloring problem as a multi-class node classification problem and employ unsupervised training by formulating a loss function. In contrast, the Tabucol technique searches the ground state energy solution by moving small steps and maintains a tabu list to avoid cycling around local minima. We run Tabucol heuristics, probabilistic Ising machines, and vectorized mapping framework with single-flip Gibbs sampling on NVIDIA A100 Tensor Core GPU and solve the same problems 200 times with 1000 iteration (update) steps each. Additionally, simulated bifurcation machines algorithm has been run on the same GPU solving the same problems 200 times with 10000 iteration steps each. We report best possible results achieved by the described methods and compares them in Tables 1 and 2. It includes easy, medium, and hard problem instances labeled in work⁴⁵. The Ising approach performs worse by only being able to solve small and easy problem instances accurately. Among GNN-based solvers, the GraphSage architecture (PI-SAGE) offers better solution accuracy, but it suffers from longer training times³³. By contrast, proposed vectorized mapping gives competitive coloring results compared to Tabucol heuristics and PI-SAGE GNN while having slightly lower accuracy for hard-to-solve problem instances. Employing the described parallel tempering with vectorized mapping reduces the error up to 50% on these ${hard}$ problems and therefore performs better than other methods.

Table 1 Solution accuracy in terms of wrongly colored edges (lower value is better) comparison of heuristic (Tabucol), learning-based approaches (GNN and PI-SAGE), Ising methods, and Vectorized framework

Full size table

Table 2 Solution accuracy for citation graphs in terms of wrongly colored edges (lower value is better) comparison of heuristic (Tabucol), learning-based approaches (GNN and PI-SAGE), Ising methods, and Vectorized framework

Full size table

Evaluation of success probability and time-to-solution

We employ the success probability metric extensively used in other works^16,46 that captures the solution quality of statistical algorithms. We define error as number of incorrectly colored edges divided by total edges in the problem graph. To calculate success probability (${p}_{s}$), we run each coloring problem for 200 times (or parallel runs) using each algorithm and calculate the success in getting the error less than 2% for those runs. The results in Fig. 4b confirm that the vectorized mapping achieves competitive success probability against the Tabucol heuristic approach and produces a better quality solution compared to state-of-the-art Ising solvers, including probabilistic Ising machine and Simulated Bifurcation machines. Additionally, time-to-solution (TTS) in Eq. 5 is defined as the time needed to obtain a solution within a specified accuracy across multiple runs, with a probability of 99%. ${T}_{{comp}}$ represents the average time to complete a single run. The algorithms that achieve success probability greater than 99%, TTS is defined as the average time to reach the solution across parallel runs. Using this methodology, Fig. 4c reports the TTS for different statistical algorithms used to solve the graph coloring problem instances. Overall, the success probability and TTS metric capture a critical aspect of solver performance. A higher success probability reflects a solver’s ability to tackle problem instances more efficiently, requiring fewer attempts. This efficiency is represented by incorporating a factor in TTS formulation that accounts for the influence of success probability, providing a robust evaluation of solver effectiveness in terms of both accuracy and computational efficiency.

**Fig. 4: Area, solution quality, performance and energy efficiency benchmarks.**

$${TTS}={T}_{{comp}} * \frac{log \left(1-0.99\right)}{log \left(1-{p}_{s}\right)}$$

(5)

FPGA accelerator implementation and results

We implement the proposed vectorized architecture (Fig. 2b) onto VCU118 FPGA to establish the performance acceleration and energy efficiency benchmarks. The FPGA accelerator uses the memory-mapped IO interface used by software applications to program the problem weights and take the solution out (see “Methods” Section 2 for more details). The accelerator (Fig. 1) fetches the weight data from the memory and does neuron states-weight multiplication. This process includes computing the F-operator, defined in Algorithm 1 which is implemented in hardware using truth tables or higher-order multiplexers, as illustrated in Fig. 2b. The structure of this operator is largely determined by the graph’s coloring constraints. The accumulated product, represented with 8-bit precision, is passed through a sigmoid activation function that is implemented using a look-up table (LUT) containing 2⁸ (256) entries. The LUT generates a 16-bit output, which is then compared with a 16-bit random number generated using a Linear Feedback Shift Register (LFSR) to get the updated node value. The architecture runs at 90 MHz clock frequency where only one node gets updated in each cycle. Further, it supports graph coloring problem sizes up to 256 nodes all-to-all connected with a maximum chromatic number of 16, equivalent to 1024 vectorized nodes in total. The framework achieves the same accuracy for dataset problems as the GPU-based vectorized mapping (see Table 1).

The proposed architecture not only gives better accuracy than the baseline Ising-based architecture following one-hot encoding but also reduces the area implementation. The area implementation is quantified in terms of number of physical nodes required to map any problem instance and solve it. The increase in physical nodes leads to an increase in the number of interactions and hence accumulator size grows, leading to a penalty on the implementation area. Figure 4a shows that the vectorized mapping requires 1.5–4 times fewer nodes for the graph coloring problem instances with chromatic numbers up to 16.

The FPGA architecture also uses single flip-Gibbs sampling to update the node, therefore, $N\lceil \log 2(Q)\rceil$ nodes need to be updated for vectorized mapping system for one complete iteration or time step. Thus, the time complexity for the proposed framework will scale with $O(N\lceil \log 2(Q)\rceil )$. The time update for a single node update affects the prefactor of the mentioned time complexity. Specifically, in probabilistic Ising/vectorized, this prefactor is contribution of multiplexer (states- weight multiplication), accumulator, sigmoid activation calculation, and comparator delays. Owing to binary nature of nodes, the accumulator dominates the node update timing⁹. As a result, introducing higher-order multiplexers has minimal impact on the overall time complexity scaling.

The FPGA implementation of the proposed vectorized mapping approach shows $\sim$10,000$\times$ speedup compared to Tabucol heuristics and Ising solvers (Probabilistic Ising and Simulated Bifurcation) implementation on NVIDIA A100 Tensor Core GPU (see Fig. 4c). Vectorized mapping on GPU could not take advantage of efficient multiplication of binary states with weights and, provides only comparable time-to-solution to heuristics. This same trend is expected to hold for any accelerators that are built to support these binary calculations. Similarly, Ising mapping-based solvers, including Probabilistic Ising and Simulated Bifurcation, can benefit from FPGA acceleration^42,47, potentially narrowing the performance gap with our FPGA implementation of the vectorized mapping. Nevertheless, for problems with more than 100 nodes, these solvers may continue to yield sub-optimal solutions. The FPGA accelerator also offers $\sim$5$\times$ power improvements over GPU-based vectorized mapping implementation. Therefore, accelerating the time-to-solution by > 4 orders of magnitude on FPGA at a low power budget significantly improves energy efficiency Fig. 4d.

One question that may arise is: is the acceleration achieved for the proposed method over Heuristics solely due to the fact that the vectorized method was implemented on FPGA? In this regard, we note here that the Heuristics take advantage of sequential strategies that do not scale well on parallel architecture. This has been studied extensively in the literature. For example, in Supplementary Fig. S10a we show a comparison between CPU and GPU implementations of Tabucol. It is clearly seen that the GPU implementation provides virtually no acceleration. By contrast GPU implementation of the proposed vectorized method shows large acceleration, as shown in Supplementary Fig. S10b. This underscores a strength of the proposed method that makes it amenable for scaling on specialized hardware such as the FPGA.

Discussion

Current methods employ the one-hot state encoding framework to map and solve multistate spin Ising problems onto Ising machines. This work presents an alternative approach of vectorized mapping that maps the state using binary encoding. It not only reduces state exploration from ${2}^{{qN}}$ to${\,2}^{\lceil \log 2q\rceil N}$ for a ${q}$ color problem instance but also removes the constraints resulting from the constraint-heavy Ising mapping. As a result, the proposed approach converges close to the optimal solution achieved by heuristics and learning based approaches. Existing works^12,16,20 on Ising Machines and its derivative frameworks generally aims to achieve accuracy within a certain error range compared to heuristics. However, we show that our method combined with parallel tempering achieves the solution accuracy even better than the heuristics for some of the problems, while retaining the ability to significant speed up on a custom accelerator. We also present a generalized truth table-based method that can be leveraged to map other multi-state problems. Using probabilistic Ising machine architectures, these truth tables to model the neuron interactions can be directly mapped onto higher-order multiplexers. We have implemented this architecture on FPGA to benchmark time-to-solution, energy efficiency, and area efficiency. Overall, the hardware implementation consumes 5 W power and achieves approximately $\sim$10,000$\times$ speedup compared to Tabucol heuristics and $\sim$100,000$\times$ compared to vectorized mapping on GPU. The presented hardware and software framework provides a new way to substantially expand the capabilities of the Ising machines to accurately handle a wide range of multistate optimization problems in a performance and energy-efficient manner.

Methods

Parallel tempering

The parallel tempering algorithm utilizes multiple Markov chains (replica chains) running at different temperature schedules representing different probability distributions. It allows a broader exploration of the energy landscape and facilitates better mixing to avoid local energy minima solutions. In this work, we implement 100 parallel chains running at a constant temperature geometrically spaced from temp0 (0.01) and tempn (40). After every 15 sampling steps, adjacent pairs of chains are swapped alternatively with odd-leading and even-leading chain indexes. The update rule is given as:

$$r=exp \left[\left(\frac{1}{{T}_{1}}-\frac{1}{{T}_{2}}\right)\left(H\left({s}_{T1}\right)-H\left({s}_{T2}\right)\right)\right]$$

(6)

$${P}_{{swap}}\left({s}_{T1}-{s}_{T2}\right)=\min \left(1,r\right)$$

(7)

FPGA setup PCIe

In this work, we implement the vectorized mapping on the Xilinx Virtex UltraScale+ VCU118 FPGA evaluation kit. The FPGA is interfaced with the CPU using Peripheral Component Interconnect Express (PCIe) interface. In particular, we use an open-source Xillybus IP core for the interface with data transfer capabilities of 50MB/s. The data is transferred via a memory-mapped interface implemented using block memory and a designed memory controller.

The digital implementation of 256 nodes and 16 color accelerator supports a network of 1024 probabilistic Ising nodes. These nodes consist of an LFSR-based pseudorandom number generator with a fixed seed, a lookup table-based sigmoidal activation, higher order multiplexer-based matrix multiplication, and a threshold that generates the output state of the neuron. These states are stored in the local memory of FPGA and then transferred to CPU via the PCIe interface.

Algorithm 1

Vectorized mapping for graph coloring problem (F -operator):

1: $q$ $\leftarrow$ number of colors

2: $H$ $\leftarrow$ Hamiltonian Energy

3: ${W}_{{ij}}$ $\leftarrow$ connectivity weight between node i and j

4: $n$ $\leftarrow$ $\lceil \log 2(q)\rceil$

5: ${S}_{i}$ $\leftarrow$ color vector for node $i\,${${s}_{i0},\,{s}_{i1},\,.\,.\,.,\,{s}_{i(n-1)}$}

6: Generate$\,F$ operator in truth table format for any two graph nodes $({S}_{i}\,{S}_{j})$:

7: select variables $\leftarrow$ {${s}_{i0},\,{s}_{i1},\,.\,.\,.,{s}_{i(n-1)},{s}_{j0},\,{s}_{j1},\,.\,.\,.,\,{s}_{j(n-1)}$}

8: if ${S}_{i}\,$= ${S}_{j}$ then

9: $F$ $\leftarrow$ 1 [two nodes colored with same color]

10: $H$ $\leftarrow$ $H\,+{\,W}_{{ij}}$

11: else if ${S}_{i}$, ${S}_{j}$ $\notin \,[0,\,q\,-\,1]$ then

12: $F$ $\leftarrow$ 1 [assigned color out of range]

13: $H$ $\leftarrow$ $H\,+{\,W}_{{ij}}$

14: else

15: $F$ $\leftarrow$ 0 [two nodes colored with different color]

16: $H$ $\leftarrow$ $H\,+0$

17: end if

Algorithm 2

Probabilistic Ising Machine Spin Update

1: $H$ $\leftarrow$Hamiltonian

2: $T$ $\leftarrow$ temperature

3: $\beta$ $\leftarrow$ $1/T$

4: ${N}_{S}$ $\leftarrow$ Number of iteration (update) steps

5: $k$ $\leftarrow$ Total Spins

6: ${sigmoid}(x)$ = $1/(1\,+\,{e}^{-x})$

7: Initialize the spins$\,s$ $\left\{{s}_{0},{s}_{1},\,.\,.\,.,{s}_{k}\right\}\in [0,\,1]$ randomly

8: for $t$ = $0$ to $t$ = ${N}_{S}$ do

9: for $i$ = 0 to$\,i$ = k do

10: Calculate $\Delta {H}_{i}\,=\,{H}_{{s}_{i}=1}\,-\,{H}_{{s}_{i}=0}$

11: $r$ $\leftarrow$ random number

12: if ${sigmoid}(-\Delta {H}_{i}/T)\, > \,r$ then

13: ${s}_{i}\,\leftarrow$ 1

14: else

15: ${s}_{i}\,\leftarrow$ 0

16: end if

17: end for

18: end for

Data availability

All processed data generated in this study are provided in the main text. The data supporting the plots in this paper can be found in the GitHub repository⁴⁸. Other findings of this study are available from the corresponding authors upon request.

Code availability

The code used to generate the data in this manuscript can be found in the GitHub⁴⁹

References

Sevilla, J. et al. Compute trends across three eras of machine learning. In Proc. 2022 International Joint Conference on Neural Networks (IJCNN), 1–8 (EPOCH AI, 2022).
Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Science 220, 671–680 (1983).
Article ADS MathSciNet CAS PubMed Google Scholar
Hochba, D. S. Approximation algorithms for np-hard problems. SIGACT N. 28, 40–52 (1997).
Article Google Scholar
Colorni, A. et al. Heuristics from nature for hard combinatorial optimization problems. Int. Trans. Operat. Res. 3, 1–21 (1996).
Article Google Scholar
Weinberg, S. J., Sanches, F., Ide, T., Kamiya, K. & Correll, R. Supply chain logistics with quantum and classical annealing algorithms. Sci. Rep. 13, 4770 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Tao, Q. & Han, J. Solving traveling salesman problems via a parallel fully connected ising machine. In Proc. 59th ACM/IEEE Design Automation Conference, DAC ’22, 1123–1128 (Association for Computing Machinery, 2022).
Li, R. Y., Di Felice, R., Rohs, R. & Lidar, D. A. Quantum annealing versus classical machine learning applied to a simplified computational biology problem. npj Quantum Inf. 4, 14 (2018).
Article ADS PubMed PubMed Central Google Scholar
Gerlach, T. et al. Fpga-placement via quantum annealing. In Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’24, 43 (Association for Computing Machinery, 2024).
Patel, S., Canoza, P. & Salahuddin, S. Logically synthesized and hardware-accelerated re- stricted Boltzmann machines for combinatorial optimization and integer factorization. Nat. Electron. 5, 92–101 (2022).
Article Google Scholar
Borders, W. A. et al. Integer factorization using stochastic magnetic tunnel junctions. Nature 573, 390–393 (2019).
Article ADS CAS PubMed Google Scholar
Aadit, N. A. et al. Massively parallel probabilistic computing with sparse ising machines. Nat. Electron. 5, 460–468 (2022).
Article Google Scholar
Patel, S. et al. PASS: an asynchronous probabilistic processor for next generation intelli- gence. ArXiv https://doi.org/10.48550/arXiv.2409.10325 (2024).
McMahon, P. L. et al. A fully programmable 100-spin coherent ising machine with all-to-all connections. Science 354, 614–617 (2016).
Article ADS MathSciNet CAS PubMed Google Scholar
Moy, W. et al. A 1,968-node coupled ring oscillator circuit for combinatorial optimization problem solving. Nat. Electron. 5, 310–317 (2022).
Article Google Scholar
Chowdhury, S., Camsari, K. Y. & Datta, S. Accelerated quantum Monte Carlo with probabilis- tic computers. Commun. Phys. 6, 1–7 (2023).
Article Google Scholar
Lo, H., Moy, W., Yu, H., Sapatnekar, S. & Kim, C. H. An Ising solver chip based on coupled ring oscillators with a 48-node all-to-all connected array architecture. Nat. Electron. 6, 771–778 (2023).
Article Google Scholar
McGeoch, C., Farre, P. & Boothby, K. The D-Wave Advantage2 Prototype. Tech. Rep., D-Wave Systems Inc. (2023).
Yue, W. et al. A scalable universal ising machine based on interaction- centric storage and compute-in-memory. Nat. Electron. 7, 904–913 (2024).
Chiang, H.-W., Nien, C.-F., Cheng, H.-Y. & Huang, K.-P. Reaim: A reram-based adaptive Ising machine for solving combinatorial optimization problems. In 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), 58–72 (IEEE, 2024).
Inagaki, T. et al. A coherent ising machine for 2000-node optimization problems. Science 354, 603–606 (2016).
Article ADS CAS PubMed Google Scholar
Wang, T. & Roychowdhury, J. Oim: Oscillator-based ising machines for solving combinatorial optimisation problems. In McQuillan, I. & Seki, S. (eds.) Unconventional Computation and Natural Computation, 232–256 (Springer International Publishing, 2019).
Chen, Z. et al. ON-OFF neuromorphic ISING machines using Fowler-Nordheim annealers. Nat Commun 16, 3086 https://doi.org/10.1038/s41467-025-58231-5 (2025).
Niazi, S. et al. Training deep boltzmann networks with sparse ising machines. Nat. Electron. 7, 610–619 (2024).
Article Google Scholar
McGeoch, C. C., Chern, K., Farré, P. & King, A. K. A comment on comparing optimization on D-Wave and IBM quantum processors. ArXiv https://doi.org/10.48550/arXiv.2406.19351 (2024).
Camsari, K. Y., Faria, R., Sutton, B. M. & Datta, S. Stochastic p-bits for invertible logic. Phys. Rev. X 7, 031014 (2017).
Google Scholar
Patel, S., Chen, L., Canoza, P. & Salahuddin, S. Ising model optimization problems on a FPGA accelerated restricted Boltzmann machine. https://doi.org/10.48550/arXiv.2008.04436 (2020).
Lucas, A. Ising formulations of many np problems. Front. Phys. 2, 74887 (2014).
Silva, C. et al. Mapping graph coloring to quantum annealing. Quantum Mach. Intell. 2, 16 (2020).
Article Google Scholar
Kawakami, S. et al. A constrained graph coloring solver based on ising machines. In Proc. 2023 IEEE International Conference on Consumer Electronics (ICCE), 1–6 (IEEE, 2023).
Inaba, K. et al. Potts model solver based on hybrid physical and digital architecture. Commun. Phys. 5, 137 (2022).
Article Google Scholar
Whitehead, W. et al. CMOS-compatible ising and Potts annealing using single-photon avalanche diodes. Nat. Electron. 6, 1009–1019 (2023).
Article Google Scholar
Hertz, A. & de Werra, D. Using tabu search techniques for graph coloring. Computing 39, 345–351 (1987).
Article MathSciNet Google Scholar
Schuetz, M. J. A., Brubaker, J. K., Zhu, Z. & Katzgraber, H. G. Graph coloring with physics- inspired graph neural networks. Phys. Rev. Res. 4, 043131 (2022).
Article CAS Google Scholar
Li, W. et al. Rethinking graph neural networks for the graph coloring problem. ArXiv https://doi.org/10.48550/arXiv.2208.06975 (2022).
Koopmans, T. C. & Beckmann, M. Assignment problems and the location of economic activities. Econometrica 25, 53–76 (1957).
Article MathSciNet Google Scholar
Kahng, A. B., Park, C.-H., Xu, X. & Yao, H. Layout decomposition approaches for double patterning lithography. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29, 939–952 (2010).
Article ADS Google Scholar
Smith, M. D., Ramsey, N. & Holloway, G. A generalized algorithm for graph-coloring register allocation. In Proc. ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, PLDI ’04, 277–288 (Association for Computing Machinery, 2004).
Ciesielski, M., Yang, S. & Perkowski, M. Multiple-valued boolean minimization based on graph coloring. In Proc. 1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors, 262–265 (IEEE, 1989).
Earl, D. J. & Deem, M. W. Parallel tempering: theory, applications, and new perspectives. Phys. Chem. Chem. Phys. 7, 3910–3916 (2005).
Article CAS PubMed Google Scholar
Trick, M. COLOR Dataset https://mat.tepper.cmu.edu/COLOR02/ (2002).
Lewis, R. M. R. Guide to Graph Colouring: Algorithms and Applications 2nd edn (Springer, 2021).
Nikhar, S., Kannan, S., Aadit, N. A., Chowdhury, S. & Camsari, K. Y. All-to-all reconfigura- bility with sparse and higher-order ising machines. Nat. Commun. 15, 8977 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Goto, H., Tatsumura, K. & Dixon, A. R. Combinatorial optimization by simulating adiabatic bifurcations in nonlinear hamiltonian systems. Sci. Adv. 5, eaav2372 https://www.science.org/doi/pdf/10.1126/sciadv.aav2372 (2019).
Ageron, R., Bouquet, T. & Pugliese, L. Simulated bifurcation (sb) algorithm for Python. https://github.com/bqth29/simulated-bifurcation-algorithm (2023).
Gualandi, S. & Malucelli, F. Exact solution of graph coloring problems via constraint pro- gramming and column generation. Informs J. Comput. 24, 81–100 (2012).
Article MathSciNet Google Scholar
Hamerly, R. et al. Experimental investigation of performance differences between coherent Ising machines and a quantum annealer. Sci. Adv. 5, eaau0823 (2019).
Article ADS PubMed PubMed Central Google Scholar
Tatsumura, K., Dixon, A. R. & Goto, H. FPGA-based simulated bifurcation machine. In Proc. 2019 29th International Conference on Field Programmable Logic and Applications(FPL), 59–66 (IEEE, 2019).
Garg, C. & Salahuddin, S. Optimization_integer_ising_data https://doi.org/10.5281/zenodo.16945085 (2025).
Garg, C. & Salahuddin, S. Optimization_integer_ising_code https://doi.org/10.5281/zenodo.17051903 (2025).

Download references

Acknowledgements

This work is supported by the Office of Naval Research (ONR), Multidisciplinary University Research Initiative (MURI) grant N000142312708.

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
Chirag Garg & Sayeef Salahuddin

Authors

Chirag Garg
View author publications
Search author on:PubMed Google Scholar
Sayeef Salahuddin
View author publications
Search author on:PubMed Google Scholar

Contributions

C.G. formulated the vectorized scheme, performed the CPU/GPU benchmarks, and designed the vectorized accelerator on FPGA. C.G, and S.S. co-wrote the manuscript; S.S. supervised the research. All authors contributed to discussions and commented on the manuscript.

Corresponding authors

Correspondence to Chirag Garg or Sayeef Salahuddin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Artem Litvinenko, Victor González and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Transparent Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Garg, C., Salahuddin, S. Efficient optimization accelerator framework for multi-state spin Ising problems. Nat Commun 16, 9601 (2025). https://doi.org/10.1038/s41467-025-64625-2

Download citation

Received: 25 December 2024
Accepted: 22 September 2025
Published: 30 October 2025
Version of record: 30 October 2025
DOI: https://doi.org/10.1038/s41467-025-64625-2