Abstract
The peptide-protein docking problem is an important problem in structural biology that facilitates rational and efficient drug design. In this work, we explore modeling and solving this problem with the quantum-amenable quadratic unconstrained binary optimization (QUBO) formalism. Our work extends recent efforts by incorporating the objectives and constraints associated with peptide cyclization and peptide-protein docking in the two-particle model on a tetrahedral lattice. We propose a “resource efficient” QUBO encoding for this problem, and baseline its performance with a novel constraint programming (CP) approach. We implement an end-to-end framework that enables the evaluation of our methods on instances from the Protein Data Bank (PDB). Our results show that the QUBO approach, using a classical simulated annealing solver, is able to find feasible conformations for problems with up to 6 peptide residues and 34 target protein residues (PDB 3WNE, 5LSO), but has trouble scaling beyond this problem size. In contrast, the CP approach can solve the largest instance in our test set, containing 11 peptide residues and 49 target protein residues (PDB 2F58). We conclude that while QUBO can be used to successfully tackle this problem, its scaling limitations and the strong performance of the CP method suggest that it may not be the best choice.
Similar content being viewed by others
Introduction
Peptide drugs are composed of short sequences of amino acids and are an important therapeutic modality for clinical treatments. These drugs have the potential to acquire higher binding affinity and selectivity, and to exhibit wider therapeutic windows than small molecule drugs, although there are drawbacks related to pharmacokinetic properties such as membrane permeability and metabolic stability1,2. Recently, it was reported that some pharmacokinetic issues could be resolved by the cyclization of peptides and incorporation of nonproteinogenic amino acids3,4,5. Therefore, cyclic peptides are increasingly expected to be a promising therapeutic modality for targeting both extracellular molecules, and intracellular biomolecules by oral administration6,7.
Structure-based drug design (SBDD), where a drug is designed based on three-dimensional structures of drug candidates and their target, is now widely applied to small-molecule drug discovery research. Thanks to advances in structural biology and artificial intelligence (AI), three-dimensional models of various target proteins are readily available for SBDD8,9,10,11,12.
In contrast to small molecules, cyclic peptides contain more rotatable bonds and thus have more possible conformations. Furthermore, their conformations depend on the surrounding environment (e.g., water or lipid membrane) and the nature of their target5. Consequently, building practical three-dimensional models of cyclic peptide-target complexes is a challenging task, though there are several docking tools for cyclic peptides tackling this challenge13,14,15,16,17,18.
A promising research direction approximates the conformation prediction problem as a combinatorial search over a fixed lattice19,20,21. Early versions of this approach model the peptide as a series of particles (points) in space, where each peptide residue is represented by a single particle, and the particles are placed on the vertices of a two-dimensional lattice grid. Any pair of particles that are within a fixed distance of each other contribute interaction potential according to whether both members of the pair are considered hydrophobic or hydrophilic (e.g., the HP model)22. Some straightforward extensions of this formulation include: i) a two-particle representation for the peptide residues, with one particle representing the residue’s main-chain backbone and the other representing the same residue’s side chain, ii) the use of Miyazawa-Jernigan (MJ) potentials as the interaction potentials between residues, and iii) the use of three-dimensional lattices23,24. The task is then to identify the placement of these peptide particles such that the MJ potential is optimized (minimized) while all known constraints, e.g., that no two particles occupy the same lattice vertex, are satisfied. This placement task is a combinatorial search problem that has been shown to be NP-complete21,25, imposing a severe limitation on the size of problems that can be solved with a classical computer. A potential avenue to overcome these scaling issues in the near term is to use noisy intermediate scale quantum (NISQ) technology such as quantum annealing26,27,28,29, quantum variational algorithms30,31, or Gaussian Boson sampling32. Common to most of these approaches is the formulation of the problem as a quadratic unconstrained binary optimization (QUBO) problem.
Early quantum computing approaches to protein folding employed a spatial encoding for amino acid conformations on 2D grids using the HP model33, though this method proved challenging to scale34. As an alternative to the spatial encoding, turn encoding approaches were explored27,35, beginning with a six-amino acid peptide implementation, followed by the introduction of the “turn ancilla” approach26. This methodology saw further development including an implementation for a 10-residue protein on a cubic lattice34, a sparse representation approach for QAOA on Rigetti Aspen36,37, an extension to two-particle coarse-grained residues on tetrahedral lattices30, and the introduction of a relative turn encoding with a qutrit mapping31.
Protein-peptide docking applications on quantum computers remain limited, with initial progress considering chaperone molecules’ impact on peptide conformations27, though this approach wasn’t generalizable to arbitrary protein-peptide complexes. We provide an expanded summary of the relevant literature in Supplementary A, including the summary Table S1.
In this work, we extend the lattice-based conformation search from the literature to incorporate the objectives and constraints associated with peptide cyclization and peptide docking with a target protein. We explore a number of formulations, including the “turn-ancilla” (Supplementary G) and “spatial” (Supplementary F) encodings, and based on preliminary model scaling analysis (described briefly in Supplementary E, with results in Figure S1), elect to extend the “resource efficient” turn encoding approach30. To compare the QUBO with a classical optimization approach, we also derive a constraint programming (CP) model for the problem. We implement an end-to-end framework that allows us to automatically process a problem instance from the Protein Data Bank (PDB) and produce conformation solutions with our QUBO-based and CP approaches38. We conduct an empirical assessment on a set of peptide-protein docked instances from the PDB and conclude that while the QUBO-based approaches can be used to model and solve the problem, they may not be the best-suited formalism. In contrast, the CP model is able to find the optimal solutions for all considered problem instances.
Methods
Problem definition
We employ a two-particle coarse grain (CG) residue representation, describing main and side-chain particles for all residues except for a specified set of small residues (e.g., Glycine), for which we use a one-particle representation. We place these peptide particles on a three-dimensional tetrahedral lattice, for instance as depicted in Fig. 1, defined by the set of its vertices \(\mathcal {T}\).
For a given problem instance, we consider a cyclic peptide consisting of an amino acid sequence \(A = (A_1,\ldots ,A_N)\), where \(A_i\) specifies the particular amino acid type for sequence location i, \(A_i\) is adjacent to \(A_{i+1}\), and N represents the number of amino acids. We use \(A_i\) to refer to the main chain particle and use a superscript s to denote the corresponding side-chain particle \(A^s_i\) that is assumed to be adjacent to \(A_i\). Cyclization bonds of the peptide are expressed by pairs of residue numbers \(C = \{(i_1,j_1),\ldots ,(i_K,j_K)\}\) for K specified pairs that are bonded, where typically \(K=1\).
In this work, we consider a smaller, more focused area of the external protein, namely the “active site.” More details on how the active site is constructed for our experiments are provided in Supplementary H. This external protein active site is represented as \(P=\{B_1,\ldots ,B_M\}\), where \(B_i\) specifies the particular residue (i.e., amino acid type) at sequence location i and M is the number of residues.
We model the interaction energy between amino acid residues by Miyazawa-Jernigan (MJ) potentials39,40, which specify a fixed interaction potential between unique pairs of amino acid types. For simplicity of notation, we denote \(\epsilon _{ij}\) to be the MJ interaction between residues \(A_i\) and \(A_j\). For the peptide, we restrict the interaction to non-bonded particles (i.e., main- and side-chain particles) that are first nearest neighbours (1-NN) on the lattice (i.e., a distance of one lattice edge); pairs of residues that are not 1-NN in the solution do not contribute any interaction energy.
Given the coordinates of the protein residues P, the lattice vertices \(\mathcal {T_B} = \{t^B_{k}\}_k \subset \mathcal {T}\) that are “blocked” by the protein due to steric hindrance can be calculated using a specified blocking radius (e.g., 3.8Å). Similarly, using a specified interaction radius (e.g., 6.5Å), we can compute the lattice vertices \(\mathcal {T_I} = \{t^I_{l}\}_{l} \subset \mathcal {T}\) on which the protein residues effect an interaction potential for the peptide. Here, we assume that all blocking points \(\mathcal {T_B}\) are removed from \(\mathcal {T_I}\), as the steric hindrance prohibits peptide residues from being placed on those vertices. We denote the total MJ interaction energy on peptide \(A_i\) at the interaction site \(t^I_l\) caused by the sum of the interactions between \(A_i\) and the protein residues in \(P_l\) by \({\tilde{\epsilon }}_{il}\).
An optimal solution to the optimization problem is a docking conformation for peptide A on tetrahedral lattice \(\mathcal {T}\) in the presence of external protein P that minimizes the aggregate MJ interaction potential while satisfying the cyclization conditions C and ensuring peptide particles do not overlap with each other or with the set \(\mathcal {T_B}\). A feasible conformation/solution is a conformation that satisfies all problem constraints but may have suboptimal MJ interaction potential. Additional details and assumptions of the problem definition can be found in Supplementary B.
QUBO model
Various formulations of the problem without the external protein interaction (i.e., without docking) with different encoding strategies have been previously introduced and analyzed. We give a detailed overview of the different approaches in Supplementary A. Because of the beneficial scaling in the number of required variables, we start from the Hamiltonian derived in30 and introduce two additional terms, one for peptide cyclization, and another for the influence of the external target protein. We then use a locality reduction step to transform the higher-order terms into a quadratic form (Supplementary D).
Our starting point is the Hamiltonian derived in30, which solves the peptide folding problem without cyclization and protein interaction:
where \(H_\textrm{comb}\) combines the inter-peptide interaction energy with the no-overlap constraint penalty term for peptide particles that are more than three turns apart, and \(H_{back}\) is the no-overlap constraint penalty term ensuring that subsequent main- and side-chain particles do not backtrack on each other. The Hamiltonian is based on a turn encoding of the peptide main- and side-chain particles, and the location of subsequent particles are defined by the turns on the tetrahedral lattice. For the explicit form of the Hamiltonian terms, we refer to Supplementary B.
As our first extension of the model presented in30, we add a cyclization constraint penalty term \(H_\textrm{cycle}\) that enforces the cyclization bond \((i,j) \in C\), between given peptide residue pairs i and j. To do so, we define a distance function d(X, Y) that calculates the squared number of steps along the lattice between two lattice vertices X and Y, used to calculate distance between particles on the lattice. The distance is expressed in terms of the turn directions and the definition can be found in Supplementary B. We enforce cyclization by constraining the particles of the residue i and j to be 1-NN on the lattice (i.e., \(d(A_i,A_j)=1\)). For this we construct the term
where \(\lambda _\textrm{cycle}\) is a strictly positive scalar used to regulate the penalty strength and p can be set to 1 or 2 depending on the solver properties. \(H_\textrm{cycle}\) is equal to zero if the particles are one lattice step apart and adds a positive contribution otherwise as the non-overlap constraint forbids a distance zero.
For the peptide-protein interaction, we add MJ interaction energies for peptide residues residing on lattice points within the interaction range of the protein \(\mathcal {T_I}\), and a constraint penalty term to represent steric hindrance on the sites \(\mathcal {T_B}\) blocked by the protein. We leverage knowledge of the problem geometry and assume that for any point in \(\mathcal {T_B}\) all 1-NN points not in \(\mathcal {T_B}\) are in \(\mathcal {T_I}\). For this to hold it is sufficient to choose blocking and interaction radii that differ by at least one lattice step. Necessarily, the peptide starts on a vertex outside of \(\mathcal {T_B}\), and by our assumption, for any potential overlap with a point in \(\mathcal {T_B}\) there must be an overlap with a point in \(\mathcal {T_I}\).
Borrowing from the logic in \(H_\textrm{comb}\), we design a Hamiltonian that adds a protein-peptide MJ interaction energy when a peptide particle resides on a point in \(\mathcal {T_I}\) and none of its bonded 1-NN main or side-chain residues are on a blocking point in \(\mathcal {T_B}\), and is strictly positive otherwise. If any bonded 1-NN particles do land on a point in \(\mathcal {T_B}\), they will incur a positive penalty to the system energy, driving the optimizer to turn off the particle contributions which then zeroes out the MJ interaction. Additionally, we make sure that each interaction site can contribute at most one protein-peptide MJ interaction to the system, penalizing the placement of multiple peptide particles on the same interaction site.
We introduce an interaction variable \(\eta _{il}\) (\(\eta _{i^sl}\)) for any valid combination of main (or side) chain particle i (\(i^s\)) and interaction site \(t_l^{I} \in \mathcal {T_I}\), which acts as a switch in the minimization, turning on or off the energy contribution depending on if it is beneficial (negative contribution) or harmful (positive contribution) to the system. By valid combination, we mean that the site \(t_l^{I}\) has sufficient non-blocked 1-NN sites required by the bonds of the peptide particle. We then define the peptide-protein Hamiltonian by
The outer sums run over all valid pairs of peptide main chain particle i and lattice points \(t_l^{I} \in \mathcal {T_I}\), indexed by l. \(M_X(t^I_l)\) denotes the set of all lattice sites 1-NN of \(t_l^I\) that are either in \(\mathcal {T_B}\) or do not have sufficient non-blocked 1-NN for the peptide particle X to be placed on them. In the second sum on line (4), the sum over all 1-NN of a side chain \(A_{i}^s\) can be simplified as the only bonded particle is the corresponding main chain residue. The sums on line (5) are meant over all triples l, i, j such that the corresponding pairs in the summands are valid, and thus for which interaction variables exist.
The different terms in \(H_\textrm{protein}\) and a proof that the Hamiltonian satisfies the requirements discussed above can be found in Supplementary C. Therein, we also derive the sufficient bounds for the penalty parameters \(\mu _2 > -\epsilon _{il}/2\) for all i, l and
where \(|\cdot |\) applied to sets denotes the cardinality. Combining the Hamiltonian in Eq. (1) with the cyclization and protein interaction contributions, we arrive at our final Hamiltonian
CP model
For each residue \(A_i\) we associate a vector of integer variables \((x_i, y_i, z_i) \in \mathbb {Z}^3\) to describe the position of the residue on the lattice. We use s superscripts to identify side-chain elements where necessary (e.g., \(x_i^s\) is the x-coordinate of the side-chain element for residue \(A_i\)).
We also introduce integer variables that represent the pairwise dimension distance between residues: \((x^{\delta }_{ij}, y^{\delta }_{ij}, z^{\delta }_{ij}) \in \mathbb {Z}^3, \forall i \ne j\). We introduce a set of boolean variables that identify if a pair of residues (or a residue and an external protein residue) is within an interaction threshold \(\psi\) on the lattice, and thus have a contribution to the objective function: \(\alpha _{ij} \in \{0,1\}, \forall i \ne j\). Finally, we use a set of boolean variables that flag if a pair of residues have a different location along the axes: \((x^{\ne }_{i,j}, y^{\ne }_{i,j}, z^{\ne }_{i,j}) \in \{0,1\}^3, \forall i \ne j\).
In contrast to linear programming-based model-and-solve technologies, CP supports logical constraints and non-linear relationships. The core objective, constraints, and variables of our CP model with a standard energy-minimizing objective function are expressed by:
Objective (8) expresses our core model objective function, which is to minimize MJ interaction energies of adjacent particles (since these interactions have negative values, minimization is desirable). This includes components for main-main, main-side, side-side, and external protein interactions. Constraint (9) uses the \(\textsc {Table}\) constraint to ensure that the coordinates of each residue are placed on the points defined by the lattice but not affected by external protein steric hindrance, \(\mathcal {T} \setminus \mathcal {T_B}\). The \(\textsc {Table}\) constraint takes a tuple and maps it to a set of tuples (allowed value assignments). For example, if an external protein residue was located at coordinate (1, 1, 1), we could exclude this coordinate from \(\mathcal {T}\) to ensure the solver does not place a residue there.
Constraint (10) links the pairwise distance variables to the location variables. Constraints (11)–(12) link the summation of the pairwise distance variables (for both main chain and side-chain) in each dimension to be equal to \(\Omega\), the rectilinear distance of adjacent neighbors in the lattice. Similarly, Constraint (13) enforces neighboring rectilinear distance on residues included in cyclization bonds (for simplicity we omit some constraints for side-chain residues, but these are handled in similar fashion). Constraint (14) links the no-overlap variables and location variables, and Constraint (15) enforces the various non-overlap conditions for main-main, side-side, and main-side (namely that at least one coordinate value must be different between pairs). Constraints (16)–(17) link interaction variables, \(\alpha\), to pairwise distance variables and bound their domains according to the interaction threshold. Constraint (18) enforces that a peptide residue must be located at one of the lattice vertices under the influence of the external protein (via projection), \(\mathcal {I}_j\), for it to be considered interacting for the purposes of the objective function. Finally, Constraints (19) through (22) define the model variables and their domains.
Problem instance construction
To measure the effectiveness of our approach, we take entries from the RCSB Protein Databank (PDB) and attempt to predict the peptide conformation found in each. By using PDB contents, we have ground truth solutions we can measure our predictions against. We chose a set of PDB instances that are short in length and have known cyclization specifications. A list of the PDB instances we consider in this work is given in Table 1.
Instances in the PDB repository describe cyclic protein complexes using atomic-level coordinates in a standardized format. Before the PDB contents can be incorporated into the problem, they must be processed and projected onto the lattice. We provide a summary of the pipeline constructed to achieve this, but further details can be found in Supplementary H. The first step is to separate the information in the PDB file into two components: i) the peptide, and ii) the target protein (i.e., the protein that the peptide docks onto), as shown in Figure S2a. The next step is to identify the protein active site (Figure S2b), which constitutes a smaller, more focused area that the peptide is likely to use for docking.
Each atom in the peptide or protein active site is then classified as belonging to main chain or side chain, and the weighted centroid of each grouping is calculated, yielding a 2-particle representation for each amino acid in each structure (Figure S2c). The last step is to produce the tetrahedral lattice that discretizes the search space for our optimization methods (Figure S2d). Our approach for placing the tetrahedral lattice is described in Algorithm S1. The lattice is then filtered according to the current active site coordinates P and steric clash radius to remove vertices that would clash with the target site \(\mathcal {T_B}\), yielding the final lattice structure. The problem instance can then be characterized by the peptide main and side chains, filtered tetrahedral lattice coordinates \(\mathcal {T}'\) (with origin \(t_1\)), and shifted protein active site coordinates, P.
Solver implementation details
To solve the QUBO model, we first use a reduction step to transform the (typically) high order polynomial Hamiltonian into a quadratic form, which can introduce overhead in the form of ancilla variables. To keep problem size tractable, we additionally employ an energy impact decomposition strategy to decompose the QUBO into smaller sub-problems (sub-QUBOs), such that it could potentially be solved using current quantum hardware. Then we submit each sub-problem to a simulated annealing solver to sample solutions and return the best solution found. The results from each sub-problem are injected back into the global solution to generate an updated solution state, and the global solution is passed to a local Tabu search for further refinement. This combined Simulated Annealing and Tabu search process is repeated until termination criteria are met (e.g., some number of global steps is reached). These steps are implemented by leveraging the D-Wave hybrid python library, including the EnergyImpactDecomposer, the SimulatedAnnealingSampler, and the TabuProblemSampler.
As part of our experimentation, we ran hyperparameter optimization (HPO) for the QUBO model over the Hamiltonian H terms \(\lambda _\textrm{cycle}\), \(\lambda _\textrm{back}\), \(\mu _{2}\), \(\mu _{3}\), p, the QUBO decomposition size \(sub\_qubo\_size\), and the simulated annealing solver parameters \(num\_reads\), \(num\_sweeps\). This HPO process sought to minimize a custom loss function, which was the sum of the Hamiltonian energy H of the best solution found and a scaled (1000x) count of violations observed in the solution. See Supplementary D, and Supplementary H for more details on each step outlined here.
To solve the CP model we use the Google OR-Tools CP-SAT solver41 with default settings. Logical ‘implies’ (e.g., \(\rightarrow\)) is implemented using the OnlyEnforceIf method, and the logical disjunctions in Eq. (15) are implemented as BoolOR constraints.
Experiments
We conduct an experimental analysis on the peptide instances summarized in Table 1 using the instance generation pipeline and models discussed in “Methods”. We include results and analysis for instances that both exclude and include the external protein, only the latter of which can be considered peptide docking.
We assess feasible conformations produced by the approaches in terms of both their MJ interaction value and the root mean square distance (RMSD) compared against the coordinates of the real peptide (sourced from the PDB file) in Euclidean space. RMSD measures the average distance between the coarse-grained particles of the real (PDB) peptide and the peptide produced by our approach. The equation for RMSD is defined as follows:
where \(\textbf{v}\) denotes the coordinates for the real peptide and \(\textbf{w}\) denotes the coordinates of the predicted peptide, and the RMSD represents the \(\ell ^2\)-norm of the difference between both vectors.
It is important to note that while our approach optimizes for aggregate MJ energy, the practical success of a conformation is typically gauged using RMSD or other molecular similarity metrics.
Excluding the external protein
By removing the external protein from the problem (and the associated penalty term \(H_{protein}\)), we reduce the QUBO model size significantly, thereby making it easier to solve. For instance, for peptide 3WNE, the number of variables in the QUBO formulation without the external protein is 152, and the number of variables when the external protein is included is 596.
As our resource-efficient turn encoding approach is an unconstrained formulation that leverages penalty terms in the objective, it is possible for the solver to return states corresponding to peptide conformations that violate problem constraints. As the problem size (number of variables) grows, it becomes harder for the solver to find high-quality states corresponding to feasible conformations. When the external protein is excluded, the QUBO solver is able to find the optimal solution in at least one shot for each problem instance.
Table 2 summarizes the solver’s ability to find feasible conformations, and also reports on the best conformation MJ energy found per problem instance. While we did run HPO over these problems, we found that it was possible to use one general set of hyperparameters and yield feasible results.
Table 3 provides a summary of feasible conformation quality produced by the approaches in terms of MJ interaction and RMSD values; QUBO solver shots that resulted in constraint violations are not included. To provide context for these results, we also include the optimal MJ and associated RMSD attained by our CP-based approach, where the CP model is optimizing aggregate MJ interaction energy, as in Objective (8), with the first residue anchored to the origin to facilitate fair comparison with the QUBO results.
Removing the external protein removes the external (protein) MJ interaction potentials associated with the “correct” conformation; meaning there is no force driving the solvers to a conformation that is spatially aligned with the ground truth PDB result. This also means that the peptide could fold into a stable, feasible conformation in a number of directions that it otherwise would be driven away from (in the presence of the external protein). Removal of the protein also removes the protein blocking influence over the peptide, meaning a feasible state in this context could actually be infeasible if the external protein were considered.
We note an obvious gap between the QUBO and CP MJ energies using the default interaction radius of 6.5Å. When the CP model uses a reduced interaction radius (one grid step), we see similar performance between the QUBO and CP (1-NN) results, showing that the QUBO model is able to effectively solve the problems, though it is slightly worse on average. We note that 2F58 does not follow this trend, as the CP results are markedly better than the QUBO results. We also see that the QUBO runtimes are fairly consistent across peptides, with 3WNE being the fastest at 126 s (\(\sim\)2 min), and 3AVN being the slowest at 325 s (\(\sim\)5.5 min). This leads to the expectation that the solver can solve problems without the external protein within \(\sim\)5 min (not including HPO time).
Including the external protein (docking)
Here we conduct docking experiments where the external protein is included. The QUBO solver was only able to find states corresponding to feasible conformations for some problem instances, so we first report on the number and type of constraint violations observed. For each problem instance, we determine the shot that resulted in the most optimized QUBO objective function value and then assess the breakdown of constraint violations for that state. We report violations for the number of overlaps involving pairs of peptide residues (# Overlaps), the number of steric clashes between a peptide residue and the external protein (# Protein Clash) and whether the cyclization constraint is satisfied by the state.
As shown in Table 4, the QUBO approach yields states associated with feasible conformations for the smaller peptides 3WNE and 5LSO, but struggles as problem size increases. For instance, the best MJ state for 3AV9 has six overlaps of the peptide, two steric clashes with the external protein, and the cyclization was not satisfied. Because 3AV9 involves eight main chain residues and eight side chain residues (for a total of sixteen peptide residues), there are \({\tiny {16 \atopwithdelims ()2}} = 120\) possible pairs of overlapping residues. This means that the state returned by the QUBO solver satisfies 114/120 \((95.0\%)\) of the peptide overlap constraints. When considering the external protein, the 3AV9 instance involves 28 external protein residues indicating a possible \(16 \times 28 = 448\) steric clash combinations. Of these, the state violates 2, satisfying over 99.5% of these constraints. Note that this calculation provides a lower bound, and is useful for illustrative purposes; the projection of blocking influence onto the lattice often yields more blocking sites on the lattice than the number of residues present, and the peptide could potentially clash with any of those.
A summary of feasible conformation quality with the target protein included is given in Table 5. First, we note that the QUBO model yields less favorable MJ energies than the equivalent CP (1-NN) model, and is expectantly worse than the default CP model. Second, the QUBO model produced results that were closer than the CP (1-NN) model to the PDB solution per the RMSD metric for 5LSO, and comparable RMSD for 3WNE.
This may seem surprising, but we must consider that the CP results shown here are for runs of the CP model where the first peptide residue is anchored to the lattice origin. Recall that the anchor was introduced in order to provide a more direct and fair comparison between QUBO and CP models. Because of this anchoring, while working to maximize interactions with the external protein, the CP models create high-quality solutions that are rotated off of alignment with the ground truth PDB result. An example of this is shown in Fig. 2, with additional examples in Supplementary Figures S3, S4, S5, S6, S7,. Finally, we note some significant variance in solver runtimes in this case, with 5LSO being the fastest at 213 s (\(\sim\)4 min), and 2F58 being the slowest at 1595 s (\(\sim\)27 min).
Our results demonstrate that mapping protein optimization type problems to QUBOs inherently makes these harder to solve, not only because penalty terms are hard to construct and parameters hard to tune, but also because any solver must find sufficiently high-quality (e.g., low energy) states to avoid the violation of hard constraints, which is not always possible, especially for such large systems. This also indicates that protein-related problems are not well suited for QUBO-based solvers, e.g., quantum annealers42.
Discussion
In this work, we have explored solving the cyclic peptide docking problem on a fixed lattice in a general manner using a QUBO approach, and in parallel constructed a novel constraint programming (CP) formulation. To do this, we have built on previous work30 to include a cyclization constraint term and a general peptide-protein interaction term, which had not been done before. We find that wrapping the combined distance checks on interaction and blocking sites with an ancilla bit (to turn on/off), as described in “Methods”, is the most efficient approach to include the protein interactions.
The PDB instances used in this work, outlined in Table 1, were chosen due to their relatively short length and known cyclization specification. By choosing short length instances, we kept the problem size small to effectively evaluate the model’s properties and ability to yield feasible states as we added complexity through the external protein interactions. We note that this work included a relatively small sample set that adhered to these requirements, and so an important next step is to expand the set of PDB instances to, for example, 1NPO, 2CK0, 3VVS, and 4K8Y. Further, we note that the inclusion of nonproteinogenic amino acids should be straightforward with respect to the pipeline logic, because all existing pipeline steps would apply equally to these amino acids, with the exception of any potentially odd-shaped amino acids that might require bespoke particle centroid calculations. However, inclusion of any nonproteinogenic amino acids would require the use of some other inter-particle attraction measure, as MJ values do not exist for such amino acids. This would require further consideration before any peptides containing such amino acids could be solved for.
One of the primary difficulties of modeling cyclic peptide docking problem in the QUBO formalism is the construction of disjunctive expressions that turn ‘on’ or ‘off’ various components of the function. The problem requires inclusion or exclusion of forces (energies) according to distance thresholds (e.g., overlap penalties when \(d_{ij}=0\) or pairwise MJ interactions when \(d_{ij}=1\)), which exist at single points in space and are zero otherwise. These would be effectively modeled by a (set of) delta function(s). Unfortunately, there is no easy or efficient way to include these types of forces naturally in a Hamiltonian expression. Instead, we must add ancilla bits which leads to additional overhead in the problem representation and makes it harder to solve.
An important simplification to the cyclic peptide docking problem is the restriction of placement of the residues on vertices of a lattice structure. In practice, peptides are not constrained to a particular rigid underlying structure, they simply abide by steric forces preventing the molecules from clashing with one another. Therefore, the introduction of any lattice structure introduces some error between the model result and the true PDB solution. This is one driver of the large RMSD values seen in Tables 3 and 5.
When the external protein is removed, all peptides tested can be solved with the QUBO representation. As shown in Table 3, the QUBO solver was able to find the optimal MJ potential in five out of six cases, with 2F58 being the exception with a total MJ potential of \(-6.02\) compared to \(-13.61\) from the CP model (approx. 55.77% gap). We also note similar RMSD values between the QUBO and CP (1-NN) models, with QUBO performing best per RMSD on 3AVI at 9.65Å, and CP performing best on 3WNE at 7.77Å.
When the external protein is included, the QUBO approach finds feasibility for only two out of five peptides (3WNE, 5LSO). Within these, as Table 5 shows, the MJ potentials were both worse than the CP (1-NN) solutions, with a gap of approximately 33.10% for 3WNE and approximately 16.52% for 5LSO. The RMSD values for QUBO results were comparable to those of CP (1-NN) on 3WNE, and better than CP (both models) on 5LSO. This suggests that MJ-optimized solutions may not always perform well with respect to RMSD against the PDB ground truth, especially when anchored to what might be a suboptimal starting location. Further, it is worth noting that in practice, an RMSD greater than 4Å is considered to be a non-match and not practically useful.
We note that it is not particularly surprising that the QUBO approach struggles to find feasible states for some of the peptides. As the peptide size increases, the size of the QUBO model increases super-linearly (Supplementary E), thus making it harder to solve. Because the simulated annealing solver is a local, sequential search algorithm, the time it needs to converge increases with problem size, and so with a fixed time budget, the probability of finding a feasible state decreases. This is evident in Table 2, showing the decreasing feasible fraction with increasing peptide size. Some strategies to increase the probability of finding a feasible state would be to increase the memory of the underlying compute node, and to increase the runtime allowance of the solver.
Overall, the inability of the QUBO model to consistently find states corresponding to feasible conformations in the presence of an external protein is a significant limitation of the approach, which also implies that quantum optimization techniques42 that leverage QUBO representations might not be the most efficient for this class of application. In contrast, the CP approach is scalable and has the potential to deliver new insights into peptide-protein interactions.
Data availability
We source all our raw data from the RCSB Protein Databank (PDB), which can be found at https://www.rcsb.org/. Due to IP concerns, we do not provide exports of our processed PDB intermediates, but we do provide a description of the steps necessary to recreate the data in Supplementary H.
References
Wang, L., Wang, N., Zhang, W., Cheng, X., Yan, Z., Shao, G., Wang, X., Wang, R. & Fu, C. Therapeutic peptides: Current applications and future directions. Signal Transduct. Target. Ther. 7 (1). https://doi.org/10.1038/s41392-022-00904-4 (2022).
Zhang, H. & Chen, S. Cyclic peptide drugs approved in the last two decades (20012021). RSC Chem. Biol. 3, 18–31. https://doi.org/10.1039/D1CB00154J (2022).
Ohta, A. et al. Validation of a new methodology to create oral drugs beyond the rule of 5 for intracellular tough targets. J. Am. Chem. Soc. 145 (44). https://doi.org/10.1021/jacs.3c07145 (2023).
Nomura, K. et al. Broadly applicable and comprehensive synthetic method for n-alkyl-rich drug-like cyclic peptides. J. Med. Chem. 65(19), 13401–13412. https://doi.org/10.1021/acs.jmedchem.2c01296 (2022).
Corbett, K. M., Ford, L., Warren, D. B., Pouton, C. W. & Chalmers, D. K. Cyclosporin structure and permeability: From a to z and beyond. J. Med. Chem. 64(18), 13131–13151. https://doi.org/10.1021/acs.jmedchem.1c00580 (2021).
Tanada, M. et al. Development of orally bioavailable peptides targeting an intracellular protein: From a hit to a clinical KRAS inhibitor. J. Am. Chem. Soc. 145(30), 16610–16620. https://doi.org/10.1021/jacs.3c03886 (2023).
Tucker, T. J. et al. A series of novel, highly potent, and orally bioavailable next-generation tricyclic peptide pcsk9 inhibitors. J. Med. Chem. 64(22), 16770–16800. https://doi.org/10.1021/acs.jmedchem.1c01599 (2021).
Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596 (7873). https://doi.org/10.1038/s41586-021-03819-2 (2021).
Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv: 500902 (2022).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373(6557), 871–876 (2021).
Wu, R., Ding, F., Wang, R., Shen, R., Zhang, X., Luo, S., Su, C., Wu, Z., Xie, Q., Berger, B. et al. High-resolution de novo structure prediction from primary sequence. BioRxiv 2022-07 (2022).
Ahdritz, G., Bouatta, N., Floristean, C., Kadyan, S., Xia, Q., Gerecke, W., O’Donnell, T.J., Berenberg, D., Fisk, I., Zanichelli, N., et al. Openfold: Retraining alphafold2 yields new insights into its learning mechanisms and capacity for generalization. Nat. Methods 1–11 (2024).
Geng, H., Jiang, F. & Wu, Y.-D. Accurate structure prediction and conformational analysis of cyclic peptides with residue-specific force fields. J. Phys. Chem. Lett. 7(10), 1805–1810. https://doi.org/10.1021/acs.jpclett.6b00452 (2016).
Charitou, V., van Keulen, S. C. & Bonvin, A. M. J. J. Cyclization and docking protocol for cyclic peptide protein modeling using haddock2.4. J. Chem. Theory Comput. 18(6), 4027–4040. https://doi.org/10.1021/acs.jctc.2c00075 (2022).
Alogheli, H., Olanders, G., Schaal, W., BrandtOrcid, P. & Karln, A. Docking of macrocycles: Comparing rigid and flexible docking in glide. J. Chem. Inf. Model. 57(2). https://doi.org/10.1021/acs.jcim.6b00443 (2017).
Zhang, Y. & Sanner, M. F. Docking flexible cyclic peptides with autodock crankpep. J. Chem. Theory Comput. 15(10), 5161–5168. https://doi.org/10.1021/acs.jctc.9b00557 (2019).
Kosugi, T. & Ohue, M. Design of cyclic peptides targeting protein-protein interactions using alphafold. Int. J. Mol. Sci. 24(17), 13257 (2023).
Zhang, C. et al. Highfold: Accurately predicting structures of cyclic peptides and complexes with head-to-tail and disulfide bridge constraints. Brief. Bioinform. 25(3), bbae215 (2024).
Dill, K. A. Theory for the folding and stability of globular proteins. Biochemistry 24 (6) 1501–1509. https://doi.org/10.1021/bi00327a032 (1985).
Lau, K. F. & Dill, K. A. A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22(10), 3986–3997. https://doi.org/10.1021/ma00200a030 (1989).
Berger, B. & Leighton, T. Protein folding in the hydrophobic-hydrophilic (hp) model is np-complete. J. Comput. Biol. 5(1), 27–40. https://doi.org/10.1089/cmb.1998.5.27 (1998).
Camacho, C. J. & Thirumalai, D. Kinetics and thermodynamics of folding in model proteins. Proc. Natl. Acad. Sci. 90. https://doi.org/10.1073/pnas.90.13.6369 (1993).
Abkevich, V. I., Gutin, A. M. & Shakhnovich, E. I. Specific nucleus as the transition state for protein folding: Evidence from the lattice model. Biochemistry https://doi.org/10.1021/bi00199a029 (1994).
Pande, V. S., Grosberg, A. Y. & Tanaka, T. Heteropolymer freezing and design: Towards physical models of protein folding. Rev. Mod. Phys. 72, 259–314. https://doi.org/10.1103/RevModPhys.72.259 (2000).
Crescenzi, P., Goldman, D., Papadimitriou, C., Piccolboni, A. & Yannakakis, M. On the complexity of protein folding (abstract). In Proceedings of the Second Annual International Conference on Computational Molecular Biology, RECOMB ’98, Association for Computing Machinery, New York, NY, USA. 6162. https://doi.org/10.1145/279069.279089 (1998).
Babbush, R., Perdomo-Ortiz, A., O’Gorman, B., Macready, W. & Aspuru-Guzik, A. Construction of energy functions for lattice heteropolymer models: Efficient encodings for constraint satisfaction programming and quantum annealing. In Advances in Chemical Physics. 201–244. (Wiley, 2014). https://doi.org/10.1002/9781118755815.ch05.
Perdomo-Ortiz, A., Dickson, N., Drew-Brook, M., Rose, G. & Aspuru-Guzik, A. Finding low-energy conformations of lattice protein models by quantum annealing. Sci. Rep. 2(1), 1–7 (2012) https://www.nature.com/articles/srep00571.
Outeiral, C. et al. Investigating the potential for a limited quantum speedup on protein lattice problems. New J.Phys. 23(10), 103030. https://doi.org/10.1088/1367-2630/ac29ff (2021).
Irbäck, A., Knuthson, L., Mohanty, S. & Peterson, C. Folding lattice proteins with quantum annealing. Phys. Rev. Res. 4, 043013. https://doi.org/10.1103/PhysRevResearch.4.043013 (2022).
Robert, A., Barkoutsos, P. K., Woerner, S. & Tavernelli, I. Resource-efficient quantum algorithm for protein folding. npj Quantum Inf. 7 (1). https://doi.org/10.1038/s41534-021-00368-4 (2021).
Boulebnane, S., Lucas, X., Meyder, A., Adaszewski, S. & Montanaro, A. Peptide conformational sampling using the quantum approximate optimization algorithm. arxiv:2204.01821 (2022).
Banchi, L., Fingerhuth, M., Babej, T., Ing, C. & Arrazola, J. M. Molecular docking with gaussian boson sampling. Sci. Adv. 6 (23). https://doi.org/10.1126/sciadv.aax1950 (2020).
Perdomo, A., Truncik, C., Tubert-Brohman, I., Rose, G. & Aspuru-Guzik, A. Construction of model Hamiltonians for adiabatic quantum computation and its application to finding low-energy conformations of lattice protein models. Phys. Rev. A 78(1). https://doi.org/10.1103/physreva.78.012320 (2008).
Babej, T., Ing, C. & Fingerhuth, M. Coarse-grained lattice protein folding on a quantum annealer. https://doi.org/10.48550/ARXIV.1811.00713 (2018).
D-Wave Systems Inc. https://www.dwavesys.com/.
Fingerhuth, M., Babej, T. & Ing, C. A quantum alternating operator ansatz with hard and soft constraints for lattice protein folding. arxiv:1810.13411 (2018).
Rigetti Computing. https://www.rigetti.com/
Protein Data Bank. https://www.rcsb.org/
Miyazawa, S. & Jernigan, R. L. Estimation of effective interresidue contact energies from protein crystal structures: Quasi-chemical approximation. Macromolecules 18(3), 534–552. https://doi.org/10.1021/ma00145a039 (1985).
Miyazawa, S. & Jernigan, R. L. Residue residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol. 256(3), 623–644. https://doi.org/10.1006/jmbi.1996.0114 (1996) https://www.sciencedirect.com/science/article/pii/S002228369690114X.
Perron, L. & Furnon, V. OR-tools, Google [Online]. https://developers.google.com/optimization (2019).
Hauke, P., Katzgraber, H. G., Lechner, W., Nishimori, H. & Oliver, W. Perspectives of quantum annealing: Methods and implementations. Rep. Prog. Phys. 83, 054401 (2020).
Acknowledgements
The authors would like to thank Grant Salton, Gili Rosenberg, Yusuke Mastumoto, and Shoko Ustunomiya for their support in this work through their rigorous feedback and various insightful discussions. The authors would also like to thank Takuya Shiraishi and Yoshihisa Murata for engaging in productive research discussions and providing suggestions, and Yuji Koriyama and Masako Kimura for their support on contractual procedures. Author Akihiko Arakawa is currently affiliated with Digital Transformation Unit, Chugai Pharmaceutical Co., Ltd., Chuo-ku, Tokyo, 103-8324, Japan.
Author information
Authors and Affiliations
Contributions
Members of the Amazon team, including Kyle Brubaker, Kyle Booth, Fabian Furrer, and Jayeeta Ghosh developed the pipeline code, ran the experiments, and conducted the data analysis for the experiments. Helmut Katzgraber assisted in the conceptualization of the various QUBO approaches and in the experimental design. Members of the Chugai team including Akihiko Arakawa and Tsutomu Sato supported in conceptualization of the approach and experimental design, provided supporting code implementation, and provided regular feedback and subject matter expertise.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Brubaker, J.K., Booth, K.E.C., Arakawa, A. et al. Quadratic unconstrained binary optimization and constraint programming approaches for lattice-based cyclic peptide docking. Sci Rep 15, 20395 (2025). https://doi.org/10.1038/s41598-025-05565-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-05565-1




