Abstract
Molecular docking is a critical computational strategy in drug discovery, but the diversity of biomolecular structures and flexible binding conformations create an enormous search space that challenges conventional computing. Quantum computing holds promise but remains constrained by scalability, hardware limitations, and precision issues. Here, we report a probabilistic computer (p-computer) prototype that solves complex molecular docking. The system is built upon artificial probabilistic bits (p-bits), fabricated in 180 nm CMOS with BEOL HfO₂ RRAM and compatible with compute-in-memory (CIM) schemes. A key innovation is the integration of Gaussian Random Number Generator-based p-bits with CIM, where the sigmoidal response arises from the Gaussian cumulative distribution function with coupling and bias coefficients directly encoded in the RRAM crossbar. This co-design alleviates the memory-to-compute bottleneck of prior CMOS-only and CMOS + X (emerging nanodevices) p-computers. Using this architecture, we experimentally solved a 42-node docking problem of lipoprotein with the LolA–LolCDE complex—a key target in developing antibiotics against Gram-negative bacteria, with results consistent with the Protein-Ligand Interaction Profiler tool. This work represents an early hardware application of p-computing in computational biology and demonstrates its potential to overcome the success rate and efficiency limitations of current technologies for complex bioinformatics problems.
Similar content being viewed by others
Introduction
In recent years, probabilistic computing (p-computing), as an emerging unconventional computational paradigm, has garnered significant attention due to its unique advantages in solving combinatorial optimization problems (COPs), such as integer factorization1,2,3,4,5, the traveling salesman problem6,7, and Boolean satisfiability8,9,10. At the core of p-computing is the probabilistic bit (p-bit)4,11,12,13,14,15,16,17, a hardware primitive that exhibits a tunable sigmoidal stochastic input–output characteristic and serves as the building block for probabilistic computers (p-computers) built upon energy-based models18,19,20,21 such as the Boltzmann machine and the Ising model. A key advantage of p-bits lies in their intrinsic stochasticity: unlike traditional heuristics22,23 or deterministic solvers24,25 that are prone to getting trapped in local minima, p-bits naturally support probabilistic transitions, helping the system to escape suboptimal states. More strikingly, the stochasticity of p-bits is tunable, allowing p-computing to naturally integrate with a variety of energy minimization algorithms such as simulated annealing (SA)3,26, parallel tempering27,28, and simulated quantum annealing29,30, thereby facilitating efficient exploration of complex solution spaces. Through such hardware-algorithm co-optimization, p-computers can probabilistically favor low-energy configurations while maintaining sufficient stochasticity to escape local minima, thus accelerating convergence toward the global optimal solution and enhancing the success rate of solving COPs.
In terms of hardware implementation of p-computers, existing CMOS-only solutions predominantly rely on pseudo-random number generators (PRNGs) to provide stochasticity, as shown in Fig. 1a. These digital PRNG-based p-bits typically require look-up tables (LUTs) to map uniformly distributed random numbers into sigmoidal probabilities31,32, which introduces considerable area and energy overheads and limits large-scale integration. To achieve true randomness, high energy efficiency, hybrid CMOS + X (emerging nanodevices) architecture incorporating spintronic devices such as magnetic tunnel junctions (MTJs) has been proposed, as illustrated in Fig. 1b. Two representative implementations are depicted: in the first (left), MTJ devices such as voltage-controlled magnetic anisotropy MTJs (VCMA-MTJs)33 or thermally driven MTJs20,34 generate true random bitstreams with an equal probability of 0 and 1, while deterministic CMOS circuits handle the subsequent processes of sampling, sigmoid mapping, state computation and update operations. In the second (right), devices such as spin-transfer-torque MTJs35 or superparamagnetic MTJs6 can directly produce stochastic bitstreams with tunable 0/1 probability distributions for sigmoidal stochastic behavior without the need for LUTs. Despite these advances, both implementations still rely on separate memory modules to store the coupling matrix J and bias vector h, alongside separate peripheral CMOS circuitry for computation and control. Consequently, as in pure CMOS implementations, the stochasticity generation modules, computing modules, and memory modules remain functionally separated. This disjointed architecture inherently reintroduces the von Neumann memory bottleneck, resulting in frequent data shuttling and overall system inefficiencies, which ultimately constrain the scalability of p-computers. While recent studies have provided valuable insights into device variability36 and annealing algorithm comparisions30 in hybrid p-computer architectures, in the specific context of compute-in-memory (CIM)-based p-computing platforms, the impact of device non-idealities and variability, the coupling of stochastic dynamics with in-memory computing modules, and their applicability to domain-specific tasks such as molecular docking remain insufficiently explored.
a Pure CMOS p-computers31,32 that typically require separate memory modules for storing the interconnection matrix J and bias matrix h, a digital multiply-accumulate (MAC) module, and a probabilistic unit, e.g., a pseudo-random number generator (PRNG) + a lookup table (LUT)-based sigmoidal transfer function. b Hybrid CMOS + X (emerging nanodevice) p-computers employing spintronic elements, e.g., perpendicular magnetic anisotropy (PMA)- magnetic tunnel junctions (MTJs) or in-plane magnetic anisotropy (IMA)-MTJs. Two representative implementations are illustrated. Left: MTJs such as voltage-controlled magnetic anisotropy (VCMA)-MTJs33 or thermally driven MTJs20,34 are used to generate unbiased true random bitstreams with 50% 0/50% 1. Right: MTJs such as spin-transfer-torque (STT)-MTJs35 and superparamagnetic MTJs6 directly produce tunable bitstreams with controllable 0/1 probability distributions, enabling sigmoidal stochastic behavior without the need for LUTs. For visual simplicity, only the key modules and signal paths are depicted. c RRAM-based CIM p-computer architecture: the CIM platform integrates Gaussian random number generator (GRNG)-based probabilistic bits (p-bits) within a RRAM crossbar array. The input signal (sin) is accumulated from interconnected p-bits states \(\{{x}_{i}^{t}\}\) and compared with a discrete Gaussian-threshold u to generate a binary output \({x}_{i}^{{{\rm{new}}}}\). d Device-level abstraction: the proposed tunable artificial p-bit comprises a digitally implemented GRNG and a comparator. Its output follows a sigmoidal-like probability curve determined by the update rule. The probabilistic behavior is tunable by adjusting the GRNG’s standard deviation σ, producing different S-shaped transfer characteristics. e On-chip dynamic slope annealing (DSA) algorithm: σ of the GRNG is dynamically decreased during iterations, modulating the slope of the sigmoid-like curves and enabling annealing toward low-energy configurations. This integrated annealing mechanism avoids off-chip stochasticity scheduling. f Schematic representation of solving the protein-ligand docking problem through a MWCP framework. A more detailed and complete schematic diagram of solving molecular docking under the p-computing framework is provided in Supplementary Fig. 1. The example of SARS-CoV-2 Mpro (COVID-19) docking with covalent pyrazoline-based inhibitors PM-2-020B (Protein Data Bank ID: 8SKH) is used for demonstration.
The revolutionary breakthrough of AlphaFold37 in protein structure prediction highlights the critical role of computational methods in addressing complex biological problems. Similarly, p-computing, with its unique advantages, is expected to hold promise for tackling challenging problems in computational biology. Among the core problems in computational biology, molecular docking has broad applications in drug design38,39, enzyme catalytic reaction mechanism research40, and antibody-antigen recognition41. Its goal is to predict the preferred binding conformation of a ligand molecule with a target biomolecule to form a stable complex. However, the structural complexity of biomolecules and the high flexibility of binding modes present substantial challenges to this task. Traditional approaches, such as those based on scoring functions42,43, often rely on approximate models and empirical data, which limit their ability to accurately predict binding conformations. In recent studies, quantum computing methods such as Gaussian Boson Sampling (GBS)44,45,46,47 and Quantum Approximate Optimization Algorithm (QAOA)48 have provided alternative approaches to solving molecular docking problems by reformulating them as clique problems49,50,51,52. While these methods have shown promise in solving certain docking instances, quantum computing approaches still encounter major bottlenecks in practical applications. For instance, sparse adjacency matrices, common in molecular docking problems, require carefully fine-tuned experimental configurations to approximate zero parameters, a process that remains highly challenging in experimental implementations. Furthermore, the output of GBS does not guarantee the formation of cliques and additional post-processing algorithms53,54,55 are required to optimize solution quality49,50. On the other hand, QAOA tends to getting trapped in local energy minima during its evolution52, making it difficult to converge to the ground state. These limitations substantially restrict the success probability and scalability of quantum computing methods in solving clique problems-based molecular docking.
To address these challenges faced in both molecular docking and existing p-computer implementations, we propose a CIM-based p-computer architecture that integrates innovations at the device, architecture, and algorithm levels (Fig. 1c). First, unlike prior implementations based on PRNGs or stochastic nanodevices, our design employs a tunable artificial p-bit driven by a Gaussian random number generator (GRNG), where the sigmoidal characteristic naturally arises from the Gaussian cumulative distribution function (CDF) (Fig. 1d). This Gaussian CDF–based approach eliminates the dependence on devices’ intrinsic stochastic characteristics, effectively lowering the technical barriers for constructing p-bits and providing greater flexibility and adaptability. Second, at the architectural level, our CIM design directly encodes interaction weights and biases into a resistive random-access memory (RRAM) crossbar array (Fig. 1c), enabling in-situ multiply-accumulate (MAC) operations and alleviating the memory-to-compute bottleneck of pure CMOS and CMOS + X designs. Finally, at the algorithm level, we introduce a hardware-native dynamic slope annealing (DSA) mechanism (Fig. 1e), which adjusts the standard deviation (σ) of the Gaussian distribution to globally modulate the steepness of the p-bit’s sigmoidal function. This enables annealing to be realized directly on-chip, eliminating the need for weight reloading and related multiplication operations commonly required in conventional SA schemes. In practical applications, we developed a computational framework tailored for p-computing to translate molecular docking into a maximum weighted clique problem (MWCP) and further reformulate it as a quadratic unconstrained binary optimization (QUBO) problem (Fig. 1f). Based on this, we fabricated a CIM-based p-computer prototype by integrating back-end-of-line (BEOL) HfO₂ RRAM chips with 180 nm CMOS technology and experimentally validated its performance in solving a real 42-node lipoprotein and LolCDE-LolA complex docking scenario, without the need for post-processing. To the best of our knowledge, this study represents a hardware implementation of p-computer applied to computational biology and demonstrates the largest-scale molecular docking solution based on graph theory to date.
Results
P-computing system compatible with CIM and its core artificial tunable p-bit
Our p-computing platform is built upon the architecture of a discrete Hopfield Neural Network (DHNN)7,26, as shown in Fig. 2a. DHNN is a single-layer, recurrent neural network characterized by self-feedback and interconnections. This architecture performs optimization by minimizing the following energy function:
where Jij represents the interconnection matrix between p-bits, hi denotes the local bias of the i-th p-bit, and xi is the instantaneous state of the p-bit, which takes binary values 0 or 1.
a Schematic of the compute-in-memory (CIM) hardware mapping under the discrete Hopfield Neural Network (DHNN) architecture, in which interconnection values are encoded using pairs of RRAM cells. A p-bit with tunable stochasticity is activated by different numbers of cells in the bias region. b 1T1R structure of the DHNN hardware, which needs to be used in conjunction with subsequent Gaussian random number generator (GRNG) and comparator circuits. c Photograph of the p-computing chip integrating the 1T1R RRAM array with the 180-nm CMOS technology. d Characteristics of switching curves of the standalone RRAM cell. e Experimentally measured cumulated bit line (BL) current, namely the multiply-accumulate (MAC) output by activating different numbers of word lines (WLs) at different WL voltages. f Bias current distributions controlled by the sigma value of the GRNG at standard deviation σ = 0.5, 1, and 2. g A series of sigmoidal probability curves of the artificial tunable p-bit (probability of output being 1) under different sigma values ranging from 0 to 2. h Time snapshots of the output states for three different inputs (MAC = +4, 0, and −4, respectively) observed experimentally (solid blue line) and by simulation (solid red line) at σ = 2.
However, a standard DHNN encounters challenges when solving complex COPs, as it is prone to getting trapped in local energy minima, thereby limiting the exploration of global optimal solutions. To address this issue, we introduced an in-RRAM bias with GRNG into the system, which adjusts the stochasticity of p-bits during the state update process (implementation details are provided in Supplementary Note 1). This enhanced randomness increases the likelihood of escaping local minima, thereby improving the system’s capability to explore the solution space globally. During the iterative process for solving the molecular docking problem, the p-bits’ states are updated according to the following rules:
where sin represents the net input to the i-th p-bit, and u is the dynamic threshold value provided by the GRNG, which is taken as an integer. A detailed theoretical derivation of the GRNG-based p-bit mechanism is provided in the “Methods” section. On the other hand, conventional DHNN employs a sequential state update scheme to ensure stable convergence. To fully exploit the inherent parallelism of RRAM-based architecture, a parallel evaluation scheme can be adopted. In this approach, all output states are evaluated simultaneously to identify valid update candidates, from which a single bit is randomly selected for state updating.
To implement this p-computing framework in hardware, we adopted HfO₂-based RRAM as the core device for this DHNN architecture. RRAM offers high density, low power consumption, and non-volatility, making it an ideal choice for hardware-level p-bit networks. In this system, the RRAM chip not only serves as a storage element but is also integrated closely with peripheral sensing circuit to generate tunable stochastic output signals, enabling the core functionality of p-bits. Figure 2b illustrates the 1T1R array structure of the p-computing chip integrating the RRAM array with the 180-nm CMOS technology in Fig. 2c, where “R” represents the HfO₂-based RRAM cell. This device consists of a top metal electrode, an HfO₂ resistive switching layer, and a bottom metal electrode. Its conductance state is modulated by controlling the bit line (BL) voltage VBL and word-line (WL) voltage VWL. When an external voltage is applied between the electrodes, oxygen vacancies in the HfO₂ layer are activated, leading to the formation or rupture of conductive filaments and enabling reversible switching between the high-resistance state (HRS) and low-resistance state (LRS). This bipolar switching behavior is clearly demonstrated in Fig. 2d, which shows the current-voltage characteristics of the RRAM cell. Figure 2e presents the relationship between the cumulative current on the BL and the number of WLs under different VWL. At lower VWL values, e.g., 0.6 V and 0.7 V, the BL current shows a good linear relationship with the number of WLs. However, at higher VWL, the cumulative current increases more rapidly, deviating from the linear trend due to the IR drop issue56. This observation suggests that appropriately reducing the WL voltage helps preserve the linearity of BL current, thereby improving the consistency and reliability of array’s performance (details in Supplementary Note 2). Furthermore, the 1T1R array architecture inherently eliminates the problem of sneak path currents. Since each RRAM cell is controlled by its dedicated access transistor, current flow is strictly confined to the selected cell during read and write operations. This structural isolation prevents unintended current leakage through unselected cells, which is a well-known issue in selector-less crossbar (1R) arrays. As a result, our design guarantees reliable readout and switching behavior without the interference of sneak currents, ensuring consistent device performance across the array. More detailed electrical characteristics of the core cell are provided in Supplementary Note 3.
Figure 2f highlights the key mechanism for this tunable p-bit functionality achieved through in-array random bias. This method leverages a GRNG to control the number of activated WLs in the bias region, dynamically modulating the distribution of bias current. Device-to-device and cycle-to-cycle variations are minimized through RRAM duplication and read-averaging techniques (see Supplementary Note 4). The standard deviation σ of the bias current distribution can be flexibly adjusted for implementing the DSA algorithm. When σ is small, the bias current distribution is narrow and sharply peaked. As σ increases, the distribution broadens, and randomness is significantly enhanced. This high-tunability bias current control mechanism forms the foundation for the tunable p-bit functionality. It stands in sharp contrast to conventional RRAM-based true random number generators, which generate uniformly distributed random bits for general-purpose cryptographic or stochastic computing57,58. Instead, our system employs a controllable Gaussian-distributed threshold for p-bit activation. Furthermore, as shown in Fig. 2g, after the comparison process, the probability of a p-bit outputting a value of 1 under different σ values exhibits a typical sigmoidal (S-shaped) stochastic characteristic as a function of the MAC input. The range and slope of the probability response vary with σ, enabling deliberate adjustment steepness of the sigmoid curve to optimize the state update process. Figure 2h demonstrates the real-time fluctuation behavior of a p-bit under σ = 2. When the MAC value is positive, the p-bit output state is predominantly 1, with occasional state transitions, indicating high determinism toward the high state. When the MAC value is zero, the p-bit output oscillates randomly, with nearly equal proportions of 0 and 1, resembling a random number generator with a uniform binary distribution. For negative MAC values, the p-bit output stays at 0 for most cycles, with infrequent transitions to 1.
Translation of molecular docking to maximum weighted clique problem
Molecular docking is a complex computational problem that is central to structure-based drug design. It involves identifying the binding interactions between pharmacophore points (PPs) on ligands and receptor proteins, which are critical for determining biological or pharmacological interactions. PPs, characterized by properties such as electrostatic characteristics, hydrophobicity, and hydrogen bond donor/acceptor capabilities, play a crucial role in molecular recognition and binding affinity. In this study, we focus on the LolCDE protein complex, a key component of Gram-negative bacteria, which works in conjunction with the periplasmic chaperone LolA to transport lipoproteins from the inner membrane to the outer membrane59. This process is essential for bacterial physiology and pathogenicity, as lipoproteins play crucial roles in cell wall synthesis, bacteria-host interactions, and stress signal transmission60. Disrupting this transport process provides a promising strategy for developing antibacterial drugs targeting lipoprotein transport and potentially addresses the growing threat of drug-resistant Gram-negative bacteria. A recent structural biology study utilized high-resolution three-dimensional structures to resolve the ternary complex formed by the LolCDE complex with lipoproteins and the LolA protein, which provided a structural basis for an in-depth understanding of the molecular mechanism of lipoprotein transport61.
As illustrated in Fig. 3a, we identified 6 PPs on the lipoprotein and 7 PPs on the LolA-LolCDE protein complex. For the lipoprotein, these points were classified based on their chemical properties into hydrophobic (hp1–hp5) and hydrogen bond acceptor (ha1) categories. Correspondingly, the PPs on the protein complex were categorized as hydrophobic (HP1–HP5) and hydrogen bond donor (HD1 and HD2). These PPs theoretically give rise to 42 potential ligand-receptor pharmacophore binding pair modes, where each pair involves one PP from the ligand and one from the receptor protein, as depicted in Fig. 3b. Subsequently, an undirected, fully connected graph with 42 vertices was constructed, where each potential pharmacophore binding pair corresponds to a vertex. However, not all pharmacophore binding pairs can coexist in the spatial structure. Therefore, we performed a spatial compatibility check by quantifying the spatial relationship between these PPs on their respective 3D structure based on Euclidean distances (Supplementary Fig. 2). Briefly, two binding pairs are deemed compatible only if the distance difference between the PPs satisfies specific criteria (see details in “Methods”). The adjacency matrix shown in Fig. 3c captures the coexistence of these 42 vertices after the compatibility check, where black squares indicate compatibility (spatial coexistence of two vertices) and white squares representing incompatibility. Finally, a binding interaction graph (BIG), denoted as GB = (V, E) was generated, as shown in Fig. 3d. In this graph, each edge Eij∈E indicates that the binding pair represented by vertices Vi and Vj are spatially compatible. Additionally, each vertex Vi is assigned a weight wi. It represents the knowledge-based binding potential of the corresponding binding pair derived from the PDBbind dataset62,63,64, as shown in Table 1 (the weight of each vertex is detailed in Supplementary Table 1). These weights reflect the relative strength of interactions and the contribution of each binding pair to the overall molecular binding affinity. The formation of cliques within the structure of GB implies a group of pharmacophore binding pairs that are mutually compatible and collectively define a feasible docking conformation. Among these cliques, the one with the maximum weight is referred to as the maximum weighted clique (MWC), which corresponds to the most energetically favorable and stable docking conformation. This clique maximizes the total binding potential energy while satisfying the spatial compatibility constraints.
a Identified 6 pharmacophore points (PPs) on the lipoprotein (left) with the Protein Data Bank (PDB) ID of 7ARM: C10959 (hp1), C10952 (hp2), C10932 (hp3), C10966 (hp4), C10922 (hp5), and O10938 (ha1), located on the PLM and Z41 branches. Identified 7 PPs on the LolCDE-LolA complex (right): HP1 (C3214, residue LEU36E), HP2 (C2057, LEU270C), HP3 (C5034, ILE271E), HP4 (C3313, PHE51E), HP5 (C2758, ALA369C), HD1 (O9421, SER2V), and HD2 (O2805, ALA376C). b Pharmacophore binding pairs between the ligand and the protein. Each line represents a potential pharmacophore binding pair and is sequentially named, starting from hp1-HP1 (node 1),…, hp1-HD2 (node 7) to ha1-HD2 (node 42). c Adjacency matrix after the spatial compatible check. d Binding interaction graph GB of this docking problem. e Self-complementary graph of GB, where vertices with different weights are shown in different sizes.
To date, the molecular docking problem has been transformed into a MWCP, enabling us to shift focus from the complex, high-dimensional conformational space of ligand-receptor interactions to a well-defined mathematical problem in graph theory. To solve this efficiently, we further reformulate the MWCP into a QUBO problem65, expressed as:
where the first term represents the total weight of the vertices included in the clique, while the second term accounts for the connectivity amongst the vertices. Matrix Jij(GB) represents the interconnectivity of vertex Vi and vertex Vj in the BIG. Specifically, Jij(GB) = 1, if the two vertices are connected by an edge, the Jij(GB) = 0 otherwise.
The maximization of this quadratic function returns a subset of vertices that form a clique while maximizing the total weight. To align with the principle and the architecture of our p-computer, we refined the problem formulation. Equation (3) was reformulated to represent the maximization of the sum of vertex weights in an energy minimization form and penalty terms were introduced to enforce connectivity constraints within the GB:
where the first term minimizes the energy expression to maximize the total weight of selected vertices, while the second term introduces a penalty factor P for pairs of adjacent vertices in the self-complementary graph \(\overline{{G}_{{{\rm{B}}}}}\) shown as in Fig. 3e. The relationship of interconnectivity of these two graphs is \({J}_{{ij}(\overline{{G}_{{{\rm{B}}}}})}=1-{I}_{n\times n}-{J}_{{ij}\left({G}_{{{\rm{B}}}}\right)}\), where \({I}_{n\times n}\) is an identity matrix and n is the number of vertices in GB.
Here, the parameter A serves as a balance control factor to regulate the influence of the weight term on the total energy. A larger value of A causes the weight term to dominate the energy minimization process; however, this dominance can weaken spatial constraint conditions, potentially resulting in solutions that fail to conform to clique structures. Conversely, increasing P strengthens the spatial constraints among variables in BIG, ensuring that the candidate solutions satisfy the compatibility requirements. However, an excessively large P significantly increases the hardware overhead when mapping Jij matrix to the RRAM chip. To address this issue, in the hardware implementation, we set A and P to 10 and 18, respectively, to strike a balance between these two terms (see Supplementary Note 5). This configuration effectively guides our hardware-based p-computing system toward efficient and accurate energy optimization within the solution space.
Experimental demonstration of a 42-node molecular docking
To systematically evaluate the performance of our RRAM-based p-computer in real molecular docking scenarios, framed as a MWCP, we modulated the stochasticity of the system by varying the standard deviation σ of the GRNG. Figure 4 illustrates the energy fluctuation profiles during iterations, the solution distributions, and the final predicted MWC across 100 independent experimental trials for each σ setting. Under constant stochasticity, the probabilistic state update of each p-bit follows the Gaussian-threshold rule introduced in “Methods”. The output probability is given by the Gaussian CDF, which closely approximates the sigmoid function commonly used in Boltzmann or Ising machines. Therefore, the steady-state probability distribution of the p-computer can be approximately expected from the Boltzmann law:
where T, referred to as the pseudo-temperature parameter in the context of p-computing, characterizes the system’s stochasticity. Consequently, the MWC corresponding to the ground state is expected to exhibit the highest probability. Following this principle, our p-computer identifies the most frequently occupied p-bit state configuration in each trial, represents it as a subgraph, and designates it as the trial’s output.
Energy dynamics during iterations (left), solution distributions (middle), and predicted multiply-accumulates (MWCs) (right) for varying levels of stochasticity: a σ = 2.0, b 1.5, c 1.0, d 0.5, and e 0.2, based on 100 independent trials per condition. Detailed experimental output subgraphs corresponding to these five stochasticity levels are provided in Supplementary Table 2 (σ = 2.0), Table 3 (σ = 1.5), Table 4 (σ = 1.0), Table 5 (σ = 0.5), and Table 6 (σ = 0.2).
Figure 4a shows that at the highest level of stochasticity (σ = 2.0), the system exhibited pronounced energy fluctuations, indicative of extensive and thorough exploration of the solution space. However, despite transiently reaching the theoretical ground state E = −8.702 at various points during the iterative process, which corresponds to an unidentified clique with a cumulative weight of 0.8702, the system ultimately failed to consistently occupy this optimal solution due to insufficient stability. Instead, the system predominantly produced lower-weight configurations, represented by (11, 17, 23, 41) and (5, 17, 23, 41), with cumulative weights of 0.8198 and occurrence frequencies of 32% and 29%, respectively.
Reducing σ to 1.5 (Fig. 4b) and 1.0 (Fig. 4c) mitigated the energy fluctuations, signaling a transition from broad exploration to localized oscillations around lower-energy configurations near −4 and −8. At these two stochasticity levels, the output subgraphs became more concentrated with the clique (5, 17, 23, 41), consistently emerging as the dominant solution in 36% and 37% of trials, respectively. However, despite the improved focus on low-energy configurations, like the clique with E = −8.702 ranked as the second most frequently observed output at σ = 1.0, the system still failed to reliably identify this optimal solution. As σ was further reduced to 0.5 (Fig. 4d), the system achieved a critical balance between exploration and identification. The considerably reduced stochasticity led to more confined energy fluctuations, shown as a distinct energy stratification at E = −5 and −8. Under this condition, the system occupied the ground state for the longest duration during the iterations, thus successfully identifying the theoretically optimal solution for the first time. The MWC (1, 9, 17, 25, 41) with a cumulative weight of 0.8702 emerged as the most frequent output clique, accounting for 35% of the trials. On the other hand, other candidate configurations, such as cliques (1, 26, 42) and (1, 21, 40), persisted, appearing in 15% and 12% of the outcomes, respectively. These lower-weight cliques represent local energy minima, where the system occasionally settles into during the evolvement process. Finally, at the lowest stochasticity level (σ = 0.2, Fig. 4e), the system exhibited minimal energy fluctuations. At this point, in addition to the GRNG, the contribution of device-to-device variations within the RRAM array becomes more prominent, but still in a low range. While this setting led to a more concentrated solution distribution, the system predominantly identified a non-optimal clique, (1, 26, 42), in 37% of trials, with the ground state occurring in only 26%. This outcome reveals a critical limitation of excessively low stochasticity, where restricted exploration leads the system to local optima, compromising its ability to achieve the ground state.
Figure 5a summarizes the simulation-based success probabilities of our RRAM–CIM p-computer in identifying the MWC across five stochasticity levels, comparing three scenarios: without annealing, with conventional SA, and with the DSA tailored for our CIM-based p-computer. The results show that while SA achieves moderate improvements at specific stochasticity levels (e.g., σ = 1.5), DSA exhibits enhancement across a broader range (σ = 0.5–2.0) and reaches higher success rates in certain cases. We attribute this advantage to DSA’s ability to mitigate premature convergence by maintaining higher levels of stochasticity during early iteration. This smooth annealing schedule facilitates a fast convergence to the global optimal solution. In contrast, the performance of SA is sensitive to its annealing schedule, with different decay strategies often leading to widely varying outcomes.
a Simulation-based success probability comparison of three scenarios: without any annealing schemes, including simulated annealing (SA) or dynamic slope annealing (DSA) (labeled as w/o DSA), with SA (labeled as w/ SA), and with DSA (labeled as w/ DSA). b Experiment-based success probability comparison with and without the application of DSA. For SA in simulations, a linear stochasticity decay schedule was adopted, while in the simulations and experiments using DSA, the Gaussian variance σ was linearly reduced from its initial value to zero. Each data point represents the average of 100 independent trials. c Dynamic evolution of the system’s energy during a complete 1000-iteration DSA process. Inset (left): detailed energy evolution at iterations 80 ~ 440. Inset (right): applied DSA schedule of σ decreased from 0.5 to 0. d Distribution of output solutions from 100 effective experimental trials. Detailed experimental output subgraphs are provided in Supplementary Table 7. e Graphical representation of the identified optimal maximum weighted clique (MWC), where the vertices and their corresponding exact binding pharmacophore point (PP) pairs are highlighted. f Cartoon representation of the predicted docking configuration between the ligand and the receptor protein.
Figure 5b compares the experimental success probabilities of identifying the MWC in 100 trials with the application of DSA approach against previous constant stochasticity settings. Without DSA, the success probability peaks at 35% for σ = 0.5. However, at lower stochasticity (σ = 0.2), restricted exploration leads to fluctuations among local energy minimums, while at higher stochasticity (σ = 1.5, 2.0), excessive noise inhibits the system’s ability to stabilize near optimal solutions. In contrast, the DSA strategy significantly improves success probability, achieving a peak at 72% by dynamically reducing σ from 0.5 to 0. This comparison underscores the importance of stochastic modulation in the p-computer, which enables the system to adaptively transition from global exploration to localized convergence. However, it does not always guarantee the improvement of MWC identification across all stochasticity levels, such as σ = 1.0. Factors such as device-to-device variations and inappropriate DSA schedule for a specific σ can limit the performance of this strategy. This underscores the critical importance of carefully tuning σ so that the stochasticity amplitude is of the same order as the deterministic signal range and of designing proper decay schedules to balance exploration and exploitation, as inappropriate values can either trap the system in local minima or introduce excessive randomness that hinders convergence. It should be emphasized that the primary objective of this study is not to claim algorithmic superiority of DSA over SA or other annealing schemes. Instead, our focus is to highlight the distinct hardware advantages of DSA, namely its compatibility with CIM-based p-computer, its ability to enable fully on-chip annealing at the p-bit hardware level natively, and its tighter integration with the GRNG-based p-computing framework.
A complete energy evolution of the system under the DSA application in a 1000-iteration process is shown in Fig. 5c. During the initial phase, the system exhibits significant energy fluctuations. This high level of stochasticity enables the system to effectively escape local minima and conduct a comprehensive exploration of the solution space. As DSA progresses and the stochasticity level gradually diminishes, the system transitions toward lower energy states, exhibiting increasingly stable oscillatory behavior. In the final stage of annealing, as σ approaches 0, the system’s energy stabilizes near the ground state (E = −8.702). However, due to the inherent device-to-device variations in the RRAM hardware, the system retains a small degree of randomness even when the GRNG’s σ is near 0. As shown in the left inset, this hardware-induced perturbation results in energy oscillations between global ground state and a sub-optimal state (E = −8.198), with the system exhibiting stronger preference for the ground state. Figure 5d presents the complete output subgraph distributions of the detailed 100 experimental trials conducted with DSA. Among the identified solutions, the clique (1, 9, 17, 25, 41), which corresponds to the ground state with a weight of 0.8702, emerged as the dominant outcome, achieving a success probability of 72%. In comparison, other candidate solutions, such as the cliques (1, 21, 40) with a weight of 0.4274 and (26, 29, 42) with a weight of 0.7694, occurred far less frequently, with success probabilities of 8% and 4%, respectively. Beyond the overall success probability, the effectiveness of DSA was further quantified by analyzing the first-hit time of the global optimum, defined as the iteration index at which the global optimum first appears within a trial. Under identical problem instance, the baseline system with constant stochasticity (σ = 0.5) reached the ground state after an average of 266.9 iterations, whereas DSA (σ decreasing from 0.5 to 0) achieved the same in only 63.6 iterations, corresponding to a ~ 4.2× acceleration. This quantitative evidence confirms that DSA both increases the success rate and substantially enhances convergence rate toward the global optimum. The MWC determined through this process is visualized in Fig. 5e. The resulting isolated subgraph highlights the vertices comprising the optimal solution, which directly corresponds to the matched PP binding pairs in this 42-node molecular docking problem. Specifically, vertices 1, 9, 17, 25, and 41 represent the binding PP pairs hp1-HP1, hp2-HP2, hp3-HP3, hp4-HP4, and ha1-HD1, respectively. Figure 5f provides the three-dimensional visualization of the predicted docking configuration between the lipoprotein and the LolCDE-LolA complex reconstructed from the identified MWC. This configuration includes four pairs of hydrophobic interactions and one hydrogen bond, collectively representing the most stable docking conformation with the highest binding affinity. Notably, the predicted binding mode closely aligns with the protein-ligand interactions analyzed using the Protein-Ligand Interaction Profiler (Supplementary Fig. 3)66,67, further validating the accuracy and reliability of our p-computer.
Overall, the 42-node docking experiment reveals how stochasticity fundamentally shapes the performance of our GRNG-based p-computer. High σ promotes broad search but prevents settling into the ground state, while very low σ amplifies RRAM device variations and leads to premature convergence. DSA alleviates this tension by exploiting stochasticity early and suppressing it later, thereby achieving both higher success probability and faster identification of the optimum. Together, these results demonstrate that the interplay between stochasticity, hardware non-idealities, and the evolving energy landscape necessitates a tightly integrated hardware–algorithm co-design to scale p-computing architectures toward large molecular docking and other COPs.
Discussion
This work presents a device–architecture–algorithm co-design approach to addressing molecular docking challenges by leveraging the unique capabilities of a p-computer based on tunable artificial p-bits. Through a hardware prototype integrating 180 nm CMOS and HfO₂-based RRAM technology, we experimentally demonstrated the feasibility of the p-computer in addressing the challenging MWCP reformulation of molecular docking. Notably, our system achieved a success probability of ~72% in resolving a 42-node MWCP derived from the LolA-LolCDE-lipoprotein complex, representing the largest graph-theory–formulated docking instance experimentally solved in hardware to date and surpassing the scale of state-of-the-art quantum approaches. Beyond demonstrating feasibility, our prototype also delivers strong hardware-level efficiency. At the operation level, each MAC consume a normalized energy of 8.7 pJ with an execution time of 13 ns (details in Supplementary Note 6), which corresponds to ~1620 GOPS/W and an energy–delay product of ~1.13 × 10−¹⁹ J·s. At the application level, for the 42-node docking problem, the system achieves a time-to-solution of 553 μs, an energy-to-solution of 0.365 μJ, and an energy efficiency of 2.8 × 10⁶ solutions per second per watt. The comparative analysis provided in Table 2 highlights the distinct advantages of our RRAM-based p-computer over alternative graph problem-based molecular docking solvers, including GBS systems49,50, DC-QAOA52, and a programmable photonic processor51. Unlike these quantum methods, which are typically limited by hardware scalability and the necessity for post-processing algorithms, our p-computer delivers a seamless one-step solution ensuring that candidate subgraphs inherently satisfy clique constraints, eliminating the need for post-processing (see Supplementary Tables 8 and 9). Moreover, unlike quantum photonic platforms such as GBS that require bulky laboratory-scale setups, our compact PCB implementation with 6.5 × 4 mm² RRAM chip underscores the practicality of our approach.
At the architectural level, our p-computer adopts a CIM scheme in which the interconnection and bias matrices are encoded directly in an RRAM crossbar to perform in-situ MAC operations. The p-bit is implemented as a GRNG-driven, sigmoid function following the Gaussian CDF, while hardware-native DSA through σ tuning replaces weight reloading and related multiplication operations in conventional annealing process. This device–architecture co-design alleviates the memory-to-compute bottleneck in prior CMOS-only and CMOS + X p-computer implementations. Importantly, our prototype operates at only 0.65 mW, achieving orders-of-magnitude reduction in power dissipation compared with state-of-the-art GPU-based30,68,69, TPU-based68, Pure CMOS (FPGA-based)31,32,70, and CMOS + X (sMTJ)-based20,34,70,71 p-computing platforms. Detailed comparisons and discussion are provided in the Supplementary Note 7. On the other hand, while our present prototype operates at the 42-node level, the underlying 1.1 Mb RRAM–CMOS architecture is in principle capable of scaling to 400–500 nodes through appropriate scheduling and encoding strategies. Scalability can be further enhanced by incorporating chemical-constraint preprocessing, machine-learning–assisted graph construction, and problem decomposition techniques, thereby extending the framework to increasingly complex molecular docking problems. The implications of this work also extend well beyond molecular docking, positioning p-computing as a transformative and versatile technology for addressing complex combinatorial optimization challenges across diverse scientific domains. By demonstrating architectural efficiency, algorithmic robustness, and practical scalability, this study establishes a p-computing hardware implementation for computational biology applications. It provides a solid foundation and a template for future advancements in drug discovery and other related fields.
Methods
Theoretical model of GRNG-based artificial p-bit compatible with CIM
In our architecture, sin in Eq. (2) represents the weighted input to node i, obtained through the in-memory accumulation of neighboring states via the RRAM-CIM array, while u is a stochastic threshold drawn from a Gaussian distribution. The probability density function (PDF) of u is given by:
At each update step, the comparator evaluates whether sin exceeds this random threshold u, thereby determining the updated state \({x}_{i}^{{{\rm{new}}}}=1\) follows the CDF of the Gaussian distribution:
This Gaussian CDF produces the sigmoidal transfer characteristic required for a p-bit, with its slope controlled by the variance parameter σ. For comparison, the standard Gibbs sampler in an Ising or Boltzmann machine adopts the logistic conditional distribution:
where β is an algorithmic inverse temperature parameter (1/T) that modulates the stochasticity of the system. It is well established that the Gaussian CDF can approximate the logistic function with high accuracy by introducing a scaling factor κ ≈ 1.6. Matching the slopes at the origin gives an effective temperature mapping:
Although this mapping is analytical and approximate, numerical fittings further confirm the equivalence. For example, the fitted β for different variances σ yields β ≈ 0.85 for σ = 2, β ≈ 1.15 for σ = 1.5, β ≈ 1.65 for σ = 1.0, β ≈ 3.35 for σ = 0.5, and β ≈ 8.50 for σ = 0.2. Detailed mapping visualization results for the above parameters can be found in Supplementary Note 8. Despite these quantitative differences, the Gaussian-threshold update rule, together with its PDF and CDF, is mathematically equivalent to Gibbs sampling from a near-Boltzmann distribution (see Supplementary Note 9).
In hardware, u is digitally represented as an integer sampled from a discretized Gaussian distribution produced by the GRNG. This discretization introduces only minor approximation errors, which do not alter the essential sigmoidal transfer characteristics. Importantly, for all tested values of σ, the system was experimentally calibrated such that when sin = 0 the output probability satisfies \(P({x}_{i}^{{{\rm{new}}}}=1)\) ≈ 0.5.
Fabrication of BEOL RRAM device integrating on 180 nm CMOS technology
The RRAM device, comprising TiN/HfOx/Ti/TiN stacked layers, was integrated into a CMOS array to develop a large-scale 1 Mb RRAM p-computer macro. During the BEOL processing, a TiN metal layer, serving as the bottom electrode (BE), was deposited using a physical vapor deposition (PVD) system on the top of the tungsten plug connected to the drain node of the transistor. Subsequently, the BE was patterned and planarized through chemical mechanical polishing. A 6 nm-thick HfOx resistive switching layer was then deposited using an atomic layer deposition (ALD) system (ASM, Polygon 8200) at a temperature of 300 °C. To further enhance the device structure, a 10 nm-thick Ti buffer layer was deposited over the HfOx layer using the PVD system. Without breaking the vacuum, a 40 nm-thick TiN metal layer, functioning as the top electrode (TE), was deposited using the same PVD system. The memory stacked layers were subsequently patterned via standard lithography and passivated with low-temperature oxide. Finally, the fabricated devices were then annealing at 400 °C for 5 min in the N2 environment.
Architecture of 1T1R RRAM p-computer and details for algorithm mapping
The RRAM devices show an HRS/LRS resistance ratio exceeding 10, ensuring sufficient margin for reliable binary readout. Programming is performed with a 1.7 V, 1 μs pulse, corresponding to an energy of ~0.2 pJ. Each cell operates in binary mode, with MAC operations controlled by the development board and current sensing performed by an analog-to-digital converter (ADC) capable of measuring currents from 100 nA to 1 mA. The p-computer chip, based on 1T1R binary RRAM technology, was fabricated using a 180 nm CMOS process with BEOL HfO2 RRAM integration. This chip features a 1152 × 1024 1T1R RRAM array, paired with 512 current-mode sense amplifiers (CSA) to compare the MAC outputs of every two BLs. In our 1T1R RRAM array, parallel conductance readout is realized by activating multiple WLs simultaneously, which turns on the corresponding access transistors and connects the selected RRAM cells to their respective BLs (details in Supplementary Note 1). A small read voltage is then applied to the BLs, and the resulting currents from the activated cells are collected in parallel by the CSAs. Each CSA aggregates the current from its column, enabling simultaneous readout of multiple conductance values while preserving cell selectivity. This scheme supports efficient in-situ MAC operations across the array and avoids interference between different reading paths. To solve the MWCP reformulation of the molecular docking problem, the weights and biases are encoded in paired neighboring BLs using paired RRAM devices. For a positive value, the left RRAM device is programmed to the forming state, while the right RRAM device remains in the non-forming state. Conversely, a negative value is encoded by programming the right RRAM device to the forming state, with the left RRAM device remaining in the non-forming state. To prepare the in-array bias region, 36 paired RRAM cells are programmed to represent both positive and negative biases, with 18 cells allocated for positive bias and 18 cells for negative bias. In the presence of process variations, additional RRAM cells can be selectively activated to counteract excess current, thereby compensating for mean shifts and maintaining statistical symmetry. The GRNG is used to determine the bias value for each iteration. The two MAC values of paired neighboring BLs are compared using either a CSA or an ADC to determine the results during each read step. A microcontroller unit (MCU)-based PCB test board is utilized to manage the data flow.
MCU system board
The MCU system board is designed with the CIM chip at its core, serving as the primary control and interface platform. The MCU is responsible for providing digital signal control, including critical inputs such as write addresses and input positions. These signals are first processed through a level shifter, which adjusts them to the appropriate operating voltage before transmitting them to the CIM chip via jumpers. Similarly, output signals from the CIM chip are routed back through the level shifter before being received and processed by the MCU. To facilitate stable and precise operation, a digital-to-analog converter (DAC) is integrated on the right side of the system board. This DAC supplies various analog voltages, including operating voltage (VDD), write voltage, read voltage, calibration voltage, and PAD power supply voltage, all of which are carefully regulated to ensure reliable system performance. In our system, the RRAM array and peripheral circuits (word-line/bit-line drivers and decoder) are integrated on-chip to perform in-memory MAC operations. The MCU-based development board manages high-level control, including random number generation, algorithm sequencing, data aggregation, and programmable bias delivery. The modules communicate via standard digital and analog I/O, where digital lines configure word-line selection and timing, and analog channels return summed currents for processing. This partitioning exploits the high density and low latency of RRAM computation while preserving algorithmic flexibility and real-time control through board-level logic. A full block-level description of the MCU-based system board for RRAM-CIM chip control is presented in Supplementary Note 10.
Geometric compatibility check of binding pairs
Geometric compatibility check is an essential step in constructing the BIG for the target molecular docking problem. This rigorous filtering process can minimize the inclusion of physically implausible or sterically unfavorable configurations, thereby reducing the solution space and enhancing the accuracy and reliability of downstream docking predictions performed by the p-computer.
The geometric compatibility of pharmacophore pairs is assessed based on their Euclidean distances. Specifically, for two selected pharmacophore binding pairs, (L1, P1) and (L2, P2), where D1 represents the distance between PPs L1 and L2 on the ligand and D2 represents the distance between PPs P1 and P2 on the receptor protein, the binding pairs are consider geometrically compatible if and only if the absolute difference between D1 and D2 satisfies the condition |D1 – D2 | ≤ τ + 2ε. Here, τ denotes the flexibility constant, and ε signifies the interaction distance tolerance. For the setting of parameter values of τ and ε used in the compatibility check, please refer to ref. 49. and ref. 52. for more details. In this study, τ and ε is set to 0.1 Å and 2.8 Å, respectively.
Post-processing in graph-based molecular docking
In solving graph problems-based molecular docking, inherent losses and noise in GBS often result in output subgraphs that fail to satisfy the clique property. To address these limitations, previous GBS-related studies49,50 have implemented a post-processing algorithm to optimize these suboptimal subgraphs to enhance the identification of the MWC and increase the success probability of finding the optimal docking configuration. The workflow of this hybrid GBS + postprocessing method begins with GBS performing global exploration to generate subgraphs as initial solutions and then the post-processing algorithm takes these subgraphs as input seeds to enhance solution quality through local optimization. To ensure a fair comparison of the performance between our p-computer and the GBS + post-processing method, we adopted the same post-processing algorithm described in the GBS-related works, implemented using the Python toolkit provided by the Strawberry Fields platform55.
Specifically, in each trial, the experimental output subgraph from the p-computer, defined as the most frequently visited state configuration during the iterative process, was used as the input seed for the post-processing algorithm. The post-processing algorithm consists of two key steps: shrinking and local search. During the shrinking step, the algorithm iteratively reduces the size of the input subgraph until a clique is formed. In each iteration, vertices with the lowest degree relative to the subgraph and the smallest weight are removed based on predefined criteria. This strategy can effectively retain nodes more likely to form the MWC while excluding redundant or noisy nodes. The local search step, based on the dynamic or phased local search algorithms described in refs. 53,54, further optimizes the clique obtained from the shrinking step. This step alternates between two phases: greedy expansion and plateau search. During the greedy expansion phase, the algorithm incrementally adds nodes connected to the current clique with the highest weight until no further additions are possible. In the plateau search phase, the algorithm explores the neighborhood of the current clique to identify structures with potentially higher weights. By alternating between these two phases, the local search algorithm effectively refines the initial solution obtained from the contraction step, approaching the global optimum.
The performance of post-processing to the p-computer’s output was evaluated with and without DSA under a stochasticity level of σ = 0.5, with detailed experimental results provided in Supplementary Tables 8 and 9.
Data availability
The generated and processed data are provided in the Supplementary Information/Source Data file. Source data are provided with this paper.
Code availability
The computer codes in MATLAB used to process the molecular docking problem instance, construct pairwise distance matrices, generate the adjacency matrix, analyze the QUBO formulation, and verify MWC solutions under the probabilistic computing paradigm are available online at https://doi.org/10.5281/zenodo.17713901, with a citable release at ref. 72.
References
Borders, W. A. et al. Integer factorization using stochastic magnetic tunnel junctions. Nature 573, 390–393 (2019).
Patel, S., Canoza, P. & Salahuddin, S. Logically synthesized and hardware-accelerated restricted Boltzmann machines for combinatorial optimization and integer factorization. Nat. Electron. 5, 92–101 (2022).
Aadit, N. A. et al. Massively parallel probabilistic computing with sparse Ising machines. Nat. Electron. 5, 460–468 (2022).
Woo, K. S. et al. Probabilistic computing using Cu0.1Te0.9/HfO2/Pt diffusive memristors. Nat. Commun. 13, 5762 (2022).
Camsari, K. Y., Faria, R., Sutton, B. M. & Datta, S. Stochastic p-bits for invertible logic. Phys. Rev. X 7, 031014 (2017).
Si, J. et al. Energy-efficient superparamagnetic Ising machine and its application to traveling salesman problems. Nat. Commun. 15, 3457 (2024).
Hong, M.-C. et al. In-memory annealing unit (IMAU): energy-efficient (2000 TOPS/W) combinatorial optimizer for solving travelling salesman problem. In 2021 IEEE International Electron Devices Meeting (IEDM) 21.3.1–21.3.4 https://doi.org/10.1109/IEDM19574.2021.9720619 (IEEE, San Francisco, CA, USA, 2021).
Bybee, C. et al. Efficient optimization with higher-order Ising machines. Nat. Commun. 14, 6033 (2023).
Nikhar, S., Kannan, S., Aadit, N. A., Chowdhury, S. & Camsari, K. Y. All-to-all reconfigurability with sparse and higher-order Ising machines. Nat. Commun. 15, 8977 (2024).
Park, T. J. et al. Efficient probabilistic computing with stochastic perovskite nickelates. Nano Lett 22, 8654–8661 (2022).
Roques-Carmes, C. et al. Biasing the quantum vacuum to control macroscopic probability distributions. Science 381, 205–209 (2023).
Daniel, J. et al. Experimental demonstration of an on-chip p-bit core based on stochastic magnetic tunnel junctions and 2D MoS2 transistors. Nat. Commun. 15, 4098 (2024).
Whitehead, W., Nelson, Z., Camsari, K. Y. & Theogarajan, L. CMOS-compatible Ising and Potts annealing using single-photon avalanche diodes. Nat. Electron. 6, 1009–1019 (2023).
Luo, S., He, Y., Cai, B., Gong, X. & Liang, G. Probabilistic-bits based on ferroelectric field-effect transistors for probabilistic computing. IEEE Electron. Device Lett. 44, 1356–1359 (2023).
Camsari, K. Y. et al. From charge to spin and spin to charge: stochastic magnets for probabilistic switching. Proc. IEEE 108, 1322–1337 (2020).
Cai, B. et al. Unconventional computing based on magnetic tunnel junction. Appl. Phys. A 129, 236 (2023).
Yang, C.-Y. et al. Dual-function unipolar top-pSOT-MRAM for all-spin probabilistic computing with ultra-dense coupling and adaptive temporal coding. In 2024 IEEE International Electron Devices Meeting (IEDM) 1–4 https://doi.org/10.1109/IEDM50854.2024.10873301 (IEEE, San Francisco, CA, USA, 2024).
Chowdhury, S. et al. A full-stack view of probabilistic computing with p-bits: devices, architectures and algorithms. IEEE J. Explor. Solid-State Comput. Devices Circuits 1–1 https://doi.org/10.1109/JXCDC.2023.3256981 (2023).
Chowdhury, S., Niazi, S. & Camsari, K. Y. Mean-field assisted deep Boltzmann learning with probabilistic computers. Preprint at http://arxiv.org/abs/2401.01996 (2024).
Singh, N. S. et al. CMOS plus stochastic nanomagnets enabling heterogeneous computers for probabilistic inference and learning. Nat. Commun. 15, 2685 (2024).
He, Y., Fang, C., Luo, S. & Liang, G. Many-body effects-based invertible logic with a simple energy landscape and high accuracy. IEEE J. Explor. Solid-State Comput. Devices Circuits 1–1 https://doi.org/10.1109/JXCDC.2023.3320230 (2023).
Colorni, A. Heuristics from nature for hard combinatorial optimization problems. Int. Trans. Oper. Res. 3, 1–21 (1996).
Sanders, Y. R. et al. Compilation of fault-tolerant quantum heuristics for combinatorial optimization. PRX Quantum 1, 020312 (2020).
Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79, 2554–2558 (1982).
Amit, D. J., Gutfreund, H. & Sompolinsky, H. Spin-glass models of neural networks. Phys. Rev. A 32, 1007–1018 (1985).
Cai, F. et al. Power-efficient combinatorial optimization using intrinsic noise in memristor Hopfield neural networks. Nat. Electron. 3, 409–418 (2020).
Grimaldi, A. et al. Spintronics-compatible approach to solving maximum-satisfiability problems with probabilistic computing, invertible logic, and parallel tempering. Phys. Rev. Appl. 17, 024052 (2022).
Aadit, N. A., Mohseni, M. & Camsari, K. Y. Accelerating adaptive parallel tempering with FPGA-based p-bits. In 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits) 1–2 https://doi.org/10.23919/VLSITechnologyandCir57934.2023.10185207 (IEEE, Kyoto, Japan, 2023).
Okuyama, T., Hayashi, M. & Yamaoka, M. An Ising computer based on simulated quantum annealing by path integral Monte Carlo method. In 2017 IEEE International Conference on Rebooting Computing (ICRC) 1–6 https://doi.org/10.1109/ICRC.2017.8123652 (IEEE, Washington, DC, 2017).
Raimondo, E. et al. High-performance and reliable probabilistic Ising machine based on simulated quantum annealing. Phys. Rev. X 15, 041001 (2025).
Pervaiz, A. Z., Sutton, B. M., Ghantasala, L. A. & Camsari, K. Y. Weighted p-bits for FPGA implementation of probabilistic circuits. IEEE Trans. Neural Netw. Learn. Syst. 30, 1920–1926 (2018).
Sutton, B. et al. Autonomous probabilistic coprocessing with petaflips per second. IEEE Access 8, 157238–157252 (2020).
Duffee, C. et al. An integrated-circuit-based probabilistic computer that uses voltage-controlled magnetic tunnel junctions as its entropy source. Nat. Electron. 8, 784–793 (2025).
Singh, N. S. et al. Beyond Ising: mixed continuous optimization with Gaussian probabilistic bits using stochastic MTJs. In 2024 IEEE International Electron Devices Meeting (IEDM) 1–4 https://doi.org/10.1109/iedm50854.2024.10873478 (IEEE, San Francisco, CA, USA, 2024).
Yang, S. et al. 250 magnetic tunnel junctions-based probabilistic Ising machine. Preprint at https://doi.org/10.48550/arXiv.2506.14590 (2025).
Onizawa, N. & Hanyu, T. GPU-accelerated simulated annealing based on p-bits with real-world device-variability modeling. Sci. Rep. 15, 6118 (2025).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Kitchen, D. B., Decornez, H., Furr, J. R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3, 935–949 (2004).
Meng, X.-Y., Zhang, H.-X., Mezei, M. & Cui, M. Molecular docking: a powerful approach for structure-based drug discovery. Curr. Comput. Aided-Drug Des. 7, 146–157 (2011).
Bahaman, A. H., Wahab, R. A., Abdul Hamid, A. A., Abd Halim, K. B. & Kaya, Y. Molecular docking and molecular dynamics simulations studies on β-glucosidase and xylanase Trichoderma asperellum to predict degradation order of cellulosic components in oil palm leaves for nanocellulose preparation. J. Biomol. Struct. Dyn. 39, 2628–2641 (2021).
Weitzner, B. D. et al. Modeling and docking of antibody structures with Rosetta. Nat. Protoc. 12, 401–416 (2017).
Malone, F. D. et al. Towards the simulation of large scale protein–ligand interactions on NISQ-era quantum computers. Chem. Sci. 13, 3094–3108 (2022).
Cao, Y., Romero, J. & Aspuru-Guzik, A. Potential of quantum computing for drug discovery. IBM J. Res. Dev. 62, 6:1–6:20 (2018).
Hamilton, C. S. et al. Gaussian boson sampling. Phys. Rev. Lett. 119, 170501 (2017).
Deng, Y.-H. et al. Solving graph problems using Gaussian boson sampling. Phys. Rev. Lett. 130, 190601 (2023).
Sempere-Llagostera, S., Patel, R. B., Walmsley, I. A. & Kolthammer, W. S. Experimentally finding dense subgraphs using a time-bin encoded Gaussian boson sampling device. Phys. Rev. X 12, 031045 (2022).
Arrazola, J. M. & Bromley, T. R. Using Gaussian boson sampling to find dense subgraphs. Phys. Rev. Lett. 121, 030503 (2018).
Zhou, L., Wang, S.-T., Choi, S., Pichler, H. & Lukin, M. D. Quantum approximate optimization algorithm: performance, mechanism, and implementation on near-term devices. Phys. Rev. X 10, 021067 (2020).
Banchi, L., Fingerhuth, M., Babej, T., Ing, C. & Arrazola, J. M. Molecular docking with Gaussian boson sampling. Sci. Adv. 6, eaax1950 (2020).
Yu, S. et al. A universal programmable Gaussian boson sampler for drug discovery. Nat. Comput. Sci. 3, 839–848 (2023).
Tan, X. et al. Scalable and programmable three-dimensional photonic processor. Phys. Rev. Appl. 20, 044041 (2023).
Ding, Q.-M., Huang, Y.-M. & Yuan, X. Molecular docking via quantum approximate optimization algorithm. Phys. Rev. Appl. 21, 034036 (2024).
Pullan, W. & Hoos, H. H. Dynamic local search for the maximum clique problem. J. Artif. Intell. Res. 25, 159–185 (2006).
Pullan, W. Phased local search for the maximum clique problem. J. Comb. Optim. 12, 303–323 (2006).
Killoran, N. et al. Strawberry fields: a software platform for photonic quantum computing. Quantum 3, 129 (2019).
Crafton, B., Talley, C., Spetalnick, S., Yoon, J.-H. & Raychowdhury, A. Characterization and mitigation of IR-drop in RRAM-based compute in-memory. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 70–74 https://doi.org/10.1109/ISCAS48785.2022.9937307 (IEEE, Austin, TX, USA, 2022).
Jiang, H. et al. A novel true random number generator based on a stochastic diffusive memristor. Nat. Commun. 8, 882 (2017).
Lin, B. et al. A high-speed and high-reliability TRNG based on analog RRAM for IoT security application. In 2019 IEEE International Electron Devices Meeting (IEDM) 14.8.1–14.8.4 https://doi.org/10.1109/IEDM19573.2019.8993486 (IEEE, San Francisco, CA, USA, 2019).
Okuda, S. & Tokuda, H. Lipoprotein sorting in bacteria. Annu. Rev. Microbiol. 65, 239–259 (2011).
Konovalova, A. & Silhavy, T. J. Outer membrane lipoprotein biogenesis: Lol is not the end. Philos. Trans. R. Soc. B Biol. Sci. 370, 20150030 (2015).
Tang, X. et al. Structural basis for bacterial lipoprotein relocation by the transporter LolCDE. Nat. Struct. Mol. Biol. 28, 347–355 (2021).
Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein−ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).
Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).
Liu, Z. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2015).
Lucas, A. Ising formulations of many NP problems. Front. Phys. 2, 5 (2014).
Salentin, S., Schreiber, S., Haupt, V. J., Adasme, M. F. & Schroeder, M. PLIP: fully automated protein–ligand interaction profiler. Nucleic Acids Res. 43, W443–W447 (2015).
Adasme, M. F. et al. PLIP 2021: expanding the scope of the protein–ligand interaction profiler to DNA and RNA. Nucleic Acids Res. 49, W530–W534 (2021).
Yang, K., Chen, Y.-F., Roumpos, G., Colby, C. & Anderson, J. High performance Monte Carlo simulation of Ising model on TPU clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 1–15 https://doi.org/10.1145/3295500.3356149 (ACM, Denver, Colorado, 2019).
Fang, Y. et al. Parallel tempering simulation of the three-dimensional Edwards–Anderson model with compact asynchronous multispin coding on GPU. Comput. Phys. Commun. 185, 2467–2478 (2014).
Grimaldi, A. et al. Experimental evaluation of simulated quantum annealing with MTJ-augmented p-bits. In 2022 International Electron Devices Meeting (IEDM) 22.4.1–22.4.4 https://doi.org/10.1109/IEDM45625.2022.10019530 (IEEE, San Francisco, CA, USA, 2022).
Singh, N. S. et al. Hardware demonstration of feedforward stochastic neural networks with fast MTJ-based p-bits. In 2023 International Electron Devices Meeting (IEDM) 1–4 https://doi.org/10.1109/IEDM45741.2023.10413686 (IEEE, San Francisco, CA, USA, 2023).
He, Y. Yihan529/RRAM_probabilistic_computer_for_molecular_docking: v1.0.0 – Initial Release. Zenodo https://doi.org/10.5281/ZENODO.17713901 (2025).
Acknowledgements
L.G. and H.T. would like to thank the financial support from the National Science and Technology Council (NSTC) (NSTC 112-2112-M-A49-047-MY3 to L.G. and NSTC 113-2221-E-A49-089-MY3/113-2622-8-A49-012-SB to H.T.). L.G. also would like to acknowledge the support from the Co-creation Platform of the Industry-Academia Innovation School, NYCU, under the framework of the National Key Fields Industry-University Cooperation and Skilled Personnel Training Act, and the Advanced Semiconductor Technology Research Center from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan. In addition, we acknowledge the support from the NUS Microelectronics Trailblazer Fund.
Author information
Authors and Affiliations
Contributions
H.Y., H.M., H.T., and L.G. conceptualized the study and proposed the idea of solving molecular docking problems with probabilistic computing. H.Y. developed the molecular docking framework with the help of D.Q. and designed the simulation experiments. H.Y. performed the simulations with technical assistance from F.C. H.M., designed the experimental protocols, constructed the experimental setup, including the fabrication of the RRAM-based chip, with assistance from L.C.-S. and L.C.-M. H.M. conducted experiments based on the developed probabilistic computer hardware. H.Y. and H.M., with input from G.X., H.T., and L.G., analyzed the simulation and experimental data. H.Y. and H.M. drafted the manuscript with input from all other authors. All authors discussed the results and reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Sung Gap Im and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
He, Y., Hong, MC., Ding, Q. et al. A hardware demonstration of a universal programmable RRAM-based probabilistic computer for molecular docking. Nat Commun 17, 702 (2026). https://doi.org/10.1038/s41467-025-67309-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-67309-z







