Abstract
G protein-coupled receptors (GPCRs) are a prominent class of therapeutic targets for which structure-based drug discovery (SBDD) has traditionally been challenging to apply. However, recent artificial intelligence (AI)-powered breakthroughs have opened new avenues. Here, we discuss the impact of computational models on hit discovery and lead optimization for GPCRs. We also provide best practices for generating and validating predictive models for prospective use.
Similar content being viewed by others
Introduction
G protein-coupled receptors (GPCRs) are a prominent class of therapeutic targets, with nearly a third of the FDA-approved drugs targeting members of this protein family1,2,3. Structure-based drug discovery (SBDD) uses the three-dimensional (3D) structure of protein targets to rationally identify and optimize compounds in preclinical drug discovery4. The SBDD process consists of four key phases (Fig. 1): 1) receptor modeling, where a 3D model of the target receptor is built or selected, 2) modeling of ligand-bound receptor complex(es), where ligand pose is generated together with receptor conformations suitable for ligand binding, 3) hit identification, where a starting-point chemical matter, referred to as ‘hits’, is discovered, and 4) hit-to-lead and lead optimization, where the hit or lead compounds are optimized for potency and drug-like properties. In this review, we discuss the recent innovation in artificial intelligence- (AI) and physics-based computational methodologies that advance SBDD for GPCRs. The review is organized in four sections, each covering the key phase of SBDD.
AI-based prediction of GPCR structures
An accurate three-dimensional structure of the target protein in a relevant functional state is a central component and a critical prerequisite for structure-based drug discovery4. However, for GPCRs, high-resolution experimental structures have historically been scarce5. Until only a few years ago, challenges of experimental structure determination and accurate structure prediction largely precluded structure-based drug discovery efforts for this target class.
Since 2020, AI approaches have led to truly breakthrough advances in protein structure prediction, as demonstrated in the community-wide blind competition CASP14 (14th biannual Critical Assessment of Structure Prediction6) and as recognized by a Nobel prize in Chemistry7. Deep-learning based methods like AlphaFold2 (AF2)8 and RoseTTAFold9 consistently deliver structural predictions approaching experimental accuracy6. These AI-based structure prediction algorithms are trained on known experimental structures deposited in the Protein Data Bank (PDB)10, and thus would not have been successful for GPCRs without the preceding explosion in the number of experimental GPCR structures in the PDB. However, in contrast to conventional homology modeling, AI methods do not directly depend on structures of homologs with high sequence identity.
As of March 2025, experimentally determined structures have been solved for about a quarter of the GPCR superfamily (235 out of ~800 GPCRs)11,12, but AF2 models are available for all superfamily members, including its largest subfamily, Class A (674 receptors, Fig. 2). High prediction confidence (average TM domain pLDDT >90) is featured not only by models of receptors with medium-to-high sequence identity to known structures (>35% identity in transmembrane (TM) domain) but also by the majority of receptors with only distant homology in the PDB (Fig. 2a). For class A GPCRs, the pLDDT scores of the TM orthosteric pocket are nearly as high as that of the TM domain (Fig. 2b), albeit slightly more variable, suggesting an overall confidence in predicted AF2 models around the ligand binding site. Importantly, a large fraction of AF2 models are in good agreement with previously available and subsequently solved experimental structures, showing root mean square deviation (RMSD) of <2 Å in both the TM domain backbone and the orthosteric pocket side chains, across all ranges of homology to known structures available at the time of model training (Fig. 2a, b).
Scatterplots of the average AF2-reported pLDDT scores in the TM domain (a, c, e) or the class A TM orthosteric pocket (b, d, f) against the sequence identity of TM domains to the closest homolog for which experimental structures were available in the PDB on 2022-10-01. a, b show data for non-state-specific models in the AlphaFoldDB (AF2DB158, dated 2022-10-19), (c, d) for the active state models in GPCRdb (dated 2022-08-1621), and (e, f) for the inactive state models in GPCRdb (also dated 2022-08-1621). The color gradient indicates the geometric accuracy of the respective models (TM domain backbone RMSD for (a, c, e) and pocket residue heavy-atom RMSD for (b, d, f)), measured against experimental structures of the receptors in the PDB as of November 2024 for the AF2DB models (a, b) or against the structures with known activation states as annotated in GPCRdb (with the latest release date of January 2024, (c–f). The x-axis values for receptors with 100% TM domain sequence identity to 2022-10-01 PDB structures are jittered for clarity. Plots were created using R 4.4.1159, and ggplot2 3.5.0160.
Several studies have systematically examined the geometric accuracy of GPCR models predicted by AF2 and RoseTTAFold. By examining 29 GPCRs for which the structures were released after the publication of the AF2 database in 2021, He et al. established that AF2 achieves a TM domain Cα RMSD accuracy of ~1 Å13. However, AF2 models showed limitations in the extracellular loop (ECL)-TM domain assembly and at the transducer interface, as well as in the sidechain conformations of the orthosteric ligand binding site, so that ligands failed to dock in native-like poses13. Lee et al. compared the AF2 and RoseTTAFold predicted models to the experimental structures for 73 GPCRs, and found that AF2 tends to be slightly more accurate than RoseTTAFold, while both performed better than conventional homology modeling method for receptors with no good templates14. These findings are in agreement with the experience of labs specializing in determining experimental structures of GPCRs15, and with the general evaluation of AF2 predictions across different protein families. Furthermore, despite the reputation of AF2 models having ‘near-experimental accuracy’, Terwilliger et al. concluded that the mean error in predicted models is higher than the experimental error in the determined structures16. For example, for high-confidence residues (pLDDT > 90), AF2 models had a mean prediction error of 0.6 Å Ca RMSD, vs 0.3 Å Cα RMSD for experimental structures. The side chains in moderate-to-high confidence regions (pLDDT > 70) of AF2 models had 10% (vs 6% in experimental structures) of residues with an error over 2 Å, and 20% (vs 2% in experimental structures) with conformations substantially different (>1.5 Å RMSD) from the experimental density maps16.
Aside from geometric accuracy, a critical aspect of AI-generated structural models is their physical validity which concerns both the covalent bonds (e.g. bond lengths, angles, and shapes of aromatic rings) and the non-bonded intramolecular interactions (e.g. steric clashes). In the case of AF2, non-physical contacts and geometries are often present in initial models but are sufficiently mild to be removed by model relaxation within the prediction routine8.
One major limitation of AF2 is its inability to directly model functionally distinct conformational states of the target protein17. GPCRs undergo a large conformational change upon agonist binding and thus can adopt at least two distinct states, inactive and active; however, the models often represent only one state that is biased by the experimental structures in the training database. By analyzing the predicted conformation of TM6 and TM7, indicative of receptor activation state, He et al. concluded that AF2 tended to produce an “average” conformation for class A and an active-like conformation for class B1 GPCRs, consistent with the activation state distribution of the available structures in the PDB at the time of their analyses (55% inactive/37% active for class A, 70% active for class B1)13. That said, conformational variation and local uncertainties in AF2 ensembles are often consistent with intrinsic protein dynamics18. Accordingly, for select GPCRs with a sufficient number of conformationally diverse templates in the PDB training set, AF2 can produce a conformational ensemble that spans part of the receptor conformational spectrum19.
To enable reproducible generation of state-specific GPCR models by AF2, the Feig group developed an extension, AlphaFold-MultiState, that uses activation state-annotated template GPCR databases20,21. The generated models show an excellent agreement with both previously available and subsequently solved experimental structures of GPCRs in respective states (Fig. 2c–f). Other groups have reported the generation of functionally relevant conformational state ensembles by modifying and reducing the depth of the input multiple-sequence alignments22,23,24,25,26.
In the years since the release of AF2, the computational community has developed a number of alternative implementations designed to improve AF2 scalability, accessibility, and applicability. Spearheaded by the AlQuraishi group, OpenFold is a GPU-memory-efficient reproduction of AF2 that enables retraining on a new dataset27. MassiveFold has been developed to parallelize AF2 to significantly reduce the compute time28. Finally, to address the limitations of single-conformation predictors, Microsoft Research has recently presented BioEmu29, a scalable generative deep learning model for generating protein equilibrium ensembles. While these developments are exciting, they have not yet been applied specifically to GPCR structure prediction.
Prediction of GPCR-ligand complex geometries
Another critical prerequisite for both structure-based hit identification and lead optimization is an accurate structural model for not just the receptor, but for the complex between the receptor and the relevant ligand. A model showing near-native ligand pose within the receptor binding pocket, and forming receptor-ligand interactions similar to those observed experimentally, can be instrumental for rationalizing and predicting structure-activity relationships (SAR) in a compound series, optimizing compound potency, and discovering new ligands with different scaffolds. In benchmark studies, the accuracy of predicted ligand poses is typically assessed relative to an experimental structure of the same complex by the RMSD of ligand heavy atoms after optimally superimposing the receptor binding pocket or TM domain (Fig. 3a–c), whereas receptor-ligand interactions can be evaluated by comparing the experimentally observed and predicted interatomic distances for all ligand-receptor atom pairs. For any given set of receptor-ligand complex models, ligand RMSD from the experimental structure and fraction of correctly predicted contacts are usually only in a loose agreement with each other (Fig. 3d, e). Therefore, it is convenient to combine these two metrics and to assess the result in the context of the variation of corresponding parameters observed across pairs of experimental high-resolution structures of identical composition complexes in the PDB (Fig. 3d, e)30,31. The percentile within that distribution is a quantitative expression of the geometrical ‘correctness’ of the given model.
a–c Examples of computationally predicted models of dopamine D3 receptor complex with eticlopride (shades of orange) overlaid with the experimental co-crystal structure of the same complex (PDB 3pbl, white). The models from (a) to (c) show an increasing degree of ‘correctness’ relative to the experimental structure. Images were created with PyMOL (The PyMOL Molecular Graphics System, Version 3.0, Schrödinger, LLC). d, e Scatter plots of ligand RMSD (from the selected experimental answer, x-axis) vs fraction of captured ligand-pocket contacts (y-axis) for the models of D3 receptor complex with eticlopride (d) and GPR139 complex with JNJ-63533054 (e) in the GPCR Dock 2010 and 2021 assessments, respectively. Each point represents a model or an experimental structure. For D3 receptor, the plot (d) shows chains A and B of 2.9Å X-ray structure, PDB 3pbl108; for GPR139 (e), the alternative ligand poses in four available cryo-EM structures, PDB 7vug, 7vuh, 7vui, and 7vuj64, which range in resolution from 3.2Å to 3.8Å and feature profound differences in compound pose and interactions. The shaded background represents the distribution of the plot parameters observed across high-resolution X-ray structures in the PDB; solid black curves represent isolines of the model ‘correctness’, measured as percentile of the experimentally observed distribution. Plots were created with ICM 3.9-3b161, https://www.molsoft.com/icm_browser.html.
Prediction of receptor-ligand complex geometry conventionally involves docking of the ligand into the binding pocket of the receptor, by flexibly sampling the possible ligand conformations within the rigid receptor binding pocket, followed by scoring and ranking of the resulting poses. The success of this docking approach strongly depends on the accuracy of the binding pocket and the compatibility of its shape with the binding of the given ligand. It also depends on the type of ligands being docked, e.g. ligands with many rotatable bonds, such as peptides, are more challenging. Keeping the receptor rigid during docking helps computational efficiency but makes it impossible to capture any significant rearrangement of the receptor pocket conformation that may occur upon binding of the ligand, and to account for the so-called induced fit effect.
With the overall improvement in receptor model accuracy achieved by AF2 and RoseTTAFold, there was an expectation that the accuracy of ligand pose prediction by docking would also improve. However, in practice, the resulting impact turned out to be less straightforward. Karelina et al examined the docking accuracy of 54 ligands to unrefined, non-state-specific AF2 models of 17 class A and 1 class B GPCRs32. They found that despite the improved binding pocket accuracy, the fraction of correctly predicted ligand binding poses (ligand RMSD ≤ 2.0 Å relative to experimental structure) was not significantly higher for AF2 models compared to traditional homology models. By contrast, Lee et al. evaluated the success rate of docking to AF2 models for 38 agonist and 32 antagonist ligands across 33 GPCRs spanning classes A, B1, C, and F33. They found that by considering the relevant functional state of the receptor, and by incorporating receptor side-chain flexibility, both the binding site prediction accuracy and the ligand pose prediction accuracy (ligand RMSD < 2.5 Å relative to experimental structure) are improved for AF2 models compared to homology models. This result underlines the importance of receptor refinement and induced fit modeling in ligand recognition. Despite the observed ligand pose prediction accuracy improvements in AF2 models compared to conventional homology models, both studies showed that the accuracy is much higher when docking to experimentally determined structures32,33.
Conducted in 2008, 2010 (Fig. 3d), 2013, and 2021 (Fig. 3e), GPCR Dock is a series of community-wide assessments of blind structure prediction for GPCR-ligand complexes30,31,34,35. The latest round of GPCR Dock was conducted in 2021, soon after the release of AphaFold2, and challenged the participants with predicting the structures for five target complexes: two with small molecule ligands (apelin receptor (APJ) with Cmpd6 and GPR139 with JNJ-63533054) and three with peptides (κ-type opioid receptor (OPRK) with dynorphin, neuropeptide Y receptor type 1 (NPY1R) with NPY, and neuromedin U receptor type 2 (NMUR2) with NMU25)35. Across the board, the majority of predictions (including the most accurate ones) capitalized on AF2 models, even for those receptors that had experimental structures in the PDB at the time of the assessment35. Consistent with findings of Karelina et al. 32 and Lee et al. 33, the highest achieved prediction accuracy for small-molecule targets (which required ligand docking) was in the same range as in prior, pre-AF2 assessments (Fig. 4a); however, encouragingly, this accuracy was achieved even for GPR139, for which no closely homologous structures were available. By contrast with this modest improvement for the small-molecule targets, the prediction accuracy for the peptide targets (traditionally much more challenging for docking) improved unexpectedly and dramatically (Fig. 4b). For two out of three peptide targets (Y1 and NMU2), the most accurate predictions were generated by AlphaFold-Multimer36 which allowed the complexes to be co-folded with the use of AI-informed distograms. For the third peptide target, the κ receptor, the peptide might have been too short for AF2-Multimer to capture its co-evolution patterns with the receptor; consequently, the best predictions were less accurate (but still by far exceeded the best peptide results from GPCR Dock 2010, Fig. 4b) and were generated by peptide modeling based on homology rather than by AF2-Multimer35.
GPCR Dock 2021 has demonstrated that the co-folding approach implemented in AF2-Multimer36 largely solves the challenges of complex geometry prediction for large natural peptides, including receptor induced fit and ligand flexibility. On the other hand, the problems remained unsolved for small molecules. Thus, the computational community has since been seeking a comparable solution that would accurately predict the ligand pose and the associated pocket conformational rearrangements for receptor complexes with small-molecule compounds.
In 2023-24, several methods have been published for “end-to-end” prediction of protein complexes with small-molecule compounds and non-protein biomolecules (ions, nucleic acids), such as DiffusionProteinLigand37, NeuralPlexer38, RosettaFold-AllAtom39, and most recently, AlphaFold3 (AF3)40,41,42,43. The new methods rely on diffusion-based co-folding and are designed to address the prior limitation where the inaccuracy (or an inadequate conformation) of the receptor binding site impacted ligand docking. Considering the restricted Terms of Use of AF344, many academic and industry groups strived to implement their own versions42,45, including ChaiDiscovery Chai-146, Iambic Therapeutics NeuralPlexer 2 and 3 beta47, Ligo Biosciences AlphaFold348, Baidu HelixFold349, ByteDance Protenix50, Umol51, and Boltz-152.
The accuracy of receptor-ligand complex geometry predictions varies widely between the listed co-folding methods and is also dependent on the evaluation benchmark. For example, on the PoseBusters benchmark53, the authors of both AF3 and Chai-1 report success (pocket-aligned ligand RMSD within 2 Å of the experimental structure) for ~75% of the cases, whereas the publicly available NeuralPLexer and the proprietary NeuralPLexer 3-beta are claimed to achieve ~55% and ~97% success rate, respectively54. However, on the CASP15 dataset, the reported success rate for both Chai-1 and Boltz-1 is only ~35% and 43%, respectively52. Another variable aspect is the physical validity of the predictions. Diffusion-based methods often generate non-physical models violating bond length and bond angle limits and featuring inter- and intramolecular steric clashes53,55, likely as a result of overfitting to particular data subsets in the training set. Unlike in AF2 models, these non-physical violations are not always readily removed by relaxation53. Furthermore, these new AI-based methods often struggle to preserve the specified stereochemistry and protonation state of the input ligand54. Nonetheless, graph-convolutional neural networks have been shown to be successful in generating physically valid and stereo-aware low energy conformers for small molecule compounds in isolation56, suggesting that the difficulties with physical validity in protein-ligand complex predictions are technical rather than conceptual. Consistent with these concepts, our attempts to reproduce the structures of selected GPCR Dock complexes via diffusion co-folding had mixed success due to the non-physical intra-receptor and ligand-receptor interactions (Fig. 5a,c), and variable geometric accuracy of the predictions (Fig. 5b,d,e–j). More accurate predictions were generated for the D3 receptor, which has many homologous and analogous structures in the PDB, compared to GPR139, which has few or no homologous and analogous structures (Fig. 5i, j). This is in agreement with the reported bias of the tested methods towards known ligands and complexes57.
a, c The best ‘correctness’ models of the D3-eticlopride (a) and GPR139-JNJ-63353054 (c) complexes predicted by NeuralPLexer38 deviate quite substantially from the experimental structures and also feature a large number of steric clashes both within the receptor and between the receptor and the ligand. Clashes are determined using the Schrödinger suite Maestro and shown as orange and red dotted lines. Images were created with Maestro (Schrödinger release 2024-4, Schrödinger, LLC). (b, d, e–h) The best ‘correctness’ models of the D3 receptor-eticlopride (b, e, f) and GPR139-JNJ-63353054 (d, g, h) complexes predicted by Chai-146 (b, d), Boltz-152 (e, g), and AlphaFold340 (f, h) are free of gross steric inaccuracies and reasonably close to the experimental answer for most D3 predictions (b, e, f) but only some GPR139 predictions (g). The predicted ligand poses are shown in yellow sticks, the predicted protein models are shown in orange (D3 receptor, a, b, e, f) or green (GPR139, c–d, g–h) ribbons, whereas the experimental structures are in gray sticks and ribbons. Images were created with Maestro (Schrödinger release 2024-4, Schrödinger, LLC, New York, NY). i, j The distribution of geometric accuracy for the full ensemble of NeuralPLexer (cyan, 16 models per complex), Chai-1 (blue, 10 models per complex), Boltz-1 (violet, 15 models per complex), and AlphaFold3 (pink, 15 models per complex) models of D3 receptor-eticlopride (i) and GPR139-JNJ-63353054 (j) complexes. Each point represents a model or an experimental structure. The grey and yellow points represent experimental structures or GPCR Dock models (same as Fig. 3d-e), the shaded background represents the distribution of the plot parameters observed across high-resolution X-ray structures in the PDB; solid black curves represent isolines of the model ‘correctness’, measured as percentile of the experimentally observed distribution. Plots were created with ICM 3.9-3b161, https://www.molsoft.com/icm_browser.html.
Considering the enduring limitations of end-to-end AI-based complex structure predictors, physics-based methods remain essential for the prediction of GPCR complexes with small molecule ligands, including the modeling of induced fit. Such methods include ensemble docking and scoring58,59 and the ‘IFD-MD’ methodology which combines protein structure prediction, ligand-based pharmacophore docking, and rigid receptor docking with explicit solvent molecular dynamics (MD) simulations60,61. Studies also reported successful attempts at ligand pose prediction and optimization by combining traditional force-field based scoring with deep learning62, suggesting that synergy between AI and physics may be the optimal path forward.
A key lesson from GPCR Dock 2021 was not only in the advancements in computational models but also in the increased variability in the quality of experimental structures. The cryo-EM resolution revolution63 led to a surge of structures for highly dynamic and conformationally variable GPCR complexes that completely redefined our notion of “high resolution”. We found that in many cases, experimental structures of the same receptor-ligand complex are as distinct from each other (in terms of ligand RMSD and ligand-receptor contacts), or even more distinct, than they are from the best computationally generated models35. In some cases, the differences can be attributed to variations in intracellular effectors64,65,66, but in other cases, they are hardly rationalizable and may simply reflect the dynamics of the complex67,68,69 or the resolution limits of the experiment70. In any case, it appears that the accuracy of the best modern computational predictions for GPCR-ligand complex geometries is well within the accuracy of modern experimental structures. A caveat for structure-based drug discovery is that neither predicted models nor experimental structures may be directly usable in hit discovery or lead optimization without careful evaluation of their performance in respective applications. This highlights the importance of model optimization, selection, and validation, as outlined in the following sections.
Ligand discovery based on GPCR structural models
In the hit identification phase of the drug discovery process, the goal is to find compounds with novel scaffolds that have measurable potency and desired efficacy towards the target of interest. Computational approaches to this hit-finding process involve screening of a large library of small molecules either by ligand-based or structure-based methods. Ligand-based approaches use 2D or 3D pharmacophores representing known compounds, which often leads to limited variation in scaffolds and chemotypes of the identified new chemical matter, whereas structure-based virtual ligand screening (VLS) relies on one or more 3D models of the target receptor binding pocket and can often find novel and diverse chemotypes. Top-scoring molecules from the VLS campaign are triaged to a final small and diverse set of predicted active compounds (‘actives’) that are experimentally evaluated. The success rates of VLS campaigns (defined as the fraction of predicted actives that are confirmed experimentally) vary substantially depending on the 3D models used. For GPCRs, VLS campaigns using high-resolution experimental structures have reported hit rates as high as ~60%, and nanomolar potencies71,72,73,74,75,76,77,78, whereas the success rates for computationally predicted structural models have traditionally shown lower hit rates and weaker affinities79,80,81.
A widely accepted approach to structure/model selection for prospective VLS, and to gauging the expected success rate, is to evaluate its performance in retrospective VLS where the model is challenged with discriminating a small number of known actives (agonists or antagonists) from a much larger set of decoys, consisting of unrelated compounds with similar physicochemical properties (size, charge, flexibility) but dissimilar structures to the actives82,83,84,85,86. The Database of Useful Decoys—enhanced (DUDe)87 is often used to generate property-matched decoys sets for given sets of active compounds. The VLS performance of the model—reflective of its ability to score known actives better than the decoys—is not only considered predictive of the success rate in prospective screens but also has been reported to correlate with the ability of the model to dock known actives in a geometrically correct pose88. A caveat to this approach is a possible bias of retrospectively validated model towards similar compound scaffolds and chemotypes in prospective VLS88, which motivates occasional inclusion of new, distinct, and non-retrospectively validated models in hit discovery campaigns89.
Given the improved geometrical accuracy of GPCR-ligand complexes predicted with AF2, there is an expectation that such models would also have improved success rates in structure-based hit discovery campaigns. However, studies reported mixed results. Diaz-Rovira et al. established that unrefined AF2 models are not optimal for VLS because they often represent the apo state of the receptor with a collapsed binding site90. Zhang et al. subsequently showed that VLS performance can be considerably improved with refined compared to unrefined AF2 models91. On the other hand, Lyu et al. compared the performance of a large-scale VLS using an unrefined AF2 model and an experimental structure for the 5-HT2A receptor, and found that despite poor retrospective VLS performance and notable side chain rotamer changes in the binding site, the AF2 model was still quite effective in identifying novel ligands prospectively89. Surprisingly, the hit rates were comparable between the experimental structure and the AF2 model, even though there was no overlap in scaffold between the hit sets. In another study, Diaz-Holguin et al. compared the VLS performance of an AF2 model and a traditional homology model for the TA1 trace amine receptor, and found that the hit rate was two-fold higher with the AF2 model than the homology model92. Altogether, these studies support the utility of AI-generated GPCR structural models for hit discovery efforts for select targets, while emphasizing the role of model refinement and performance assessment.
An integral but rarely explicitly acknowledged component of the VLS process is the scoring function used to rank the compounds in the screening library93. Over the years, numerous scoring functions based on physics-informed force fields have been developed94,95,96,97,98,99. These scoring functions take into account physical parameters such as shape complementarity (van der Waals interactions), charge complementarity, hydrogen bonding, and conformational strain. Force-field-based scoring functions have performed well in VLS when using high-resolution experimental structures of the target receptor93, but their performance rapidly degrades with minimal/subtle pocket conformational inaccuracies88. In keeping with the AI revolution, a number of deep-learning-based scoring functions have been recently introduced, such as EquiScore100, AtomNet101,102, RTMScore103, RTCNN104,105, IGModel106, and RosettaVS107. Respective studies almost invariably report superior VLS performance of such functions on various benchmarks, compared to physics-based methods, due to higher tolerance to conformational variation and inaccuracies; however, the functions also frequently suffer from overtraining and inability to penalize non-physical interactions (e.g. assess compound conformational strain and steric clashes)88. In the GPCR drug discovery realm, the application of such functions is in its early days, and their prospective evaluation is still pending; however, retrospective studies hint at the possibility that they may compensate for minor geometric inaccuracies in the computational models and thus overcome some of the challenges in model-based hit discovery88,106.
To illustrate a broader relationship between the ‘correctness’ (geometric accuracy) of computationally predicted models and their VLS performance when assessed with an AI-based scoring function, RTCNN88,104,105, we evaluated experimental structures and representative models across a range of accuracy levels from GPCR Dock assessments 2010 and 2021, for the dopamine D3 receptor and GPR139, respectively (Fig. 3d, e). For the former, model correctness was measured in comparison with chain B of a 2.9Å X-ray structure, PDB 3pbl108; for the latter, in comparison with a 3.22 Å cryo-EM structure, PDB 7vuh64. The models in the 2010 assessment were generated by homology with existing structures of aminergic receptors (only β2-adrenoceptor and β1-adrenoceptor at that time), whereas the 2021 assessment GPR139 models were almost exclusively built by AF2, which is understandable considering the lack of homologous structures in the PDB at the time35.
For D3 receptor, as expected, at least one of the experimental structure chains (B) and the highest-accuracy assessment models (e.g. 8004-1 and 5084-3) showed robust discrimination of 14 active compounds from the Astra series (mostly eticlopride analogs, Supplemental Data 1) from 1505 property-matched decoys from DUDe by RTCNN (Fig. 6a, b). Unexpectedly, for GPR139, almost all tested models showed excellent discrimination (ROC AUC >75-80% Fig. 6c, d). A closer inspection revealed that known GPR139 agonists share a very tight SAR and an invariable di-peptide linkage as the key scaffold (Supplemental Data 2), recognized by the polar residues in the binding pocket of not only accurate but also geometrically incorrect computational models. Because these features were entirely absent from the DUDe-generated decoys (selected so that their Tanimoto distance (TD) from the known actives exceeded 0.5), the retrospective VLS task turned out ‘too easy’ in the case of GPR139. By replacing the DUDe-generated decoy set by a set of ChEMBL compounds at 0.1 < TD < 0.5 without known activity at GPR139, the retrospective VLS task became more challenging, leading to overall deterioration of the structure and model VLS performance but also to more pronounced differences among experimental structures, most accurate models, and less accurate models (Fig. 6e, f). These examples demonstrate that the relationship between model correctness and its predictive capacity in VLS is not straightforward and may require the use of carefully selected target-specific compound benchmarks. They also show that experimental structures (especially relatively low resolution cryo-EM structures) can widely vary in their VLS predictive capacity and that, conversely, some models may turn out quite predictive despite the apparently low geometric accuracy.
a, c, e Scatter plots of retrospective VLS ROC AUC values (x-axis) against ‘correctness’ (y-axis) for 23 selected GPCR Dock 2010 models and two chains of the X-ray structure of the dopamine D3 receptor-eticlopride complex (a), as well as for 10 selected GPCR Dock 2021 models and six cryo-EM structures of the GPR139-JNJ-63533054 complex (c, e). The ROC AUC values in (c) and (e) were obtained using the same set of active compounds but different sets of decoys/inactives: the DUDe-generated set of compounds that are highly chemically dissimilar from actives (Tanimoto distance (TD) > 0.5, c) vs a comparable size set of ChEMBL compounds that are more chemically similar (0.1 < TD < 0.5) to the actives but have no known activity at GPR139 (e). b, d, f Retrospective VLS ROC curves for selected experimental structures (brighter color), more accurate models (lighter color), and less accurate models (black) from (a, c, e). Figure was created using using R 4.4.1159, and ggplot2 3.5.0160.
Beyond structure prediction and scoring of predicted receptor-ligand complexes, AI is being increasingly used to address the growing need for screening ultra-large (billion-size) virtual and combinatorial compound databases85,109, via AI-accelerated virtual screening platforms such as MolPAL110, Active Learning Glide111, DeepDock112,113, OpenVS107, GigaScreen114, and CP framework115. At a high level, this approach involves docking and scoring of a small fraction of the database and subsequent training of a target-specific neural net (NN) to predict compound binding scores from 2D structure alone. The NN is then applied to the entire database (which is orders of magnitude faster than evaluating the same compound by docking), a small fraction of top-scoring compounds is re-evaluated by docking and scoring again, and the process is repeated until convergence is reached or until the NN starts showing signs of overtraining. The success of such AI-accelerated pipeline in retrospective and prospective ligand discovery for the dopamine D4 receptor111 and voltage-gated sodium channel NaV1.7107, respectively, suggests its promise for membrane proteins; however, we have yet to see its prospective application to GPCRs.
Finally, there are growing reports of using generative AI for ligand discovery, which circumvents the challenges of library screening and complements structure-based VLS in hit discovery applications116,117,118,119,120,121,122,123,124. For example, Powers et al. reported using an SE(3) equivariant graph neural networks to directly build compounds in the binding pockets125. Although tested only on experimental structures, the method was demonstrated to produce better-scoring ligands than conventional VLS for a wide range of targets including GPCRs, and also generated compounds with better drug-likeness and predicted PK properties125. Similarly, using the dopamine D2 receptor as their case study, Thomas et al. showed that the generated molecules not only have better predicted affinity but also occupy a different physicochemical space compared to known D2 actives126. In-depth review of the use of generative AI for ligand discovery is outside of the scope of the present review and is described elsewhere127,128.
In summary, AI-based structural models of GPCRs can be as effective as experimental structures in enabling hit discovery campaigns, especially when combined with AI-powered scoring functions. While selecting structural models for prospective VLS is challenging, certain best practices can increase the chances of success, like using an ensemble docking approach and including models that perform well in retrospective VLS on appropriate benchmarks (Fig. 7). Furthermore, as the accuracy of the recent experimental (mostly cryo-EM) GPCR structures has become more variable, their use for hit discovery may require the same level of refinement and selection as for computationally generated models.
In Step 1, models of receptor complexes with a small number of diverse ligands with relevant pharmacology (e.g. orthosteric antagonists) are generated by docking, AI-based co-folding, MD, or other methods. The models are then conformationally refined to ensure physicochemical validity and favorable scoring by the selected scoring function. In Step 2, the models are assessed in retrospective VLS initially chemotype-specific benchmarks, after which small ensembles are formed from well-performing models and tested for their ability to discriminate actives from inactive and decoy compounds in a diverse and preferably challenging benchmark. It is important to test multiple models to gauge the range of VLS performances; if unrealistically high performance is observed for all or most models (as in Fig. 6d), the compound benchmark needs to be revised. Conformational refinement steps may be repeated for obtaining a new/broader ensemble if none of the tested models shows satisfactory performance. Models with poor or mediocre performance in retrospective VLS can still be included based on orthogonal criteria or rationale, e.g. to diversify potential hits. At the end, a small ensemble is formed and used for prospective hit discovery by VLS.
Hit-to-lead and lead optimization based on GPCR structural models
In the hit-to-lead (H2L) and lead optimization (LO) phases of the drug discovery process, the goal is to improve the drug-like properties (such as on-target potency, selectivity over off-targets, and pharmacokinetic properties) of the hit or lead compound by exploring modifications around its scaffold. This can be guided by computational prediction of potency or binding affinity of hit/lead analogs, which is often done by building and assessing the structural models of their complexes with the target. The underlying models must have sufficient sensitivity and accuracy to discriminate between highly similar ligands with small differences in potency, sometimes below one log unit129. Arguably, this is a more challenging task than discriminating binders from non-binders in the hit discovery phase; for example, even the best AI-powered VLS scoring functions could not accurately rank order the binding affinity of actives in the same chemical series88,93.
Alchemical free-energy perturbation (FEP) is a well-validated physics-based computational method for structure-based prediction of binding free energies130,131. In relative binding FEP (RB-FEP) calculations, changes in binding free energy upon small modification of the ligand are calculated by simulating the corresponding alchemical transformation of one molecule into the other, both in solvent and in the binding pocket131, and by MD-based conformational sampling of the complex at different stages of the alchemical transformation path. Implementations of the free-energy perturbation method include GROMACS132,133, YANK134, and Schrödinger FEP + 135. Retrospective benchmarking of FEP+ using crystal structures of protein-ligand complexes across multiple target classes reported RB-FEP accuracy close to experimental reproducibility, with an average mean unsigned error (MUE) of 0.90–0.98 kcal/mol, root mean square error (RMSE) of 1.11–1.25 kcal/mol, and correlation coefficient (R2) of 0.37–0.84 (see Fig. 8a for an example)136,137. These error ranges correspond to only 0.66-0.72 and 0.73–0.92 log units of binding affinity, respectively, suggesting the promise of the FEP+ methodology for structure-based lead optimization138. It should be noted, however, that in prospective applications, the reported error of FEP with respect to experiment is typically higher139. Cited reasons include pose uncertainties and unaccounted-for pocket induced-fit effects with some of the new compounds, as well as force field limitation for certain types of compound chemistry139.
a Correlation plot of FEP+ predicted vs experimental binding affinities for the D3 receptor-eticlopride system. Each point on the plot represents one ligand from the eticlopride-related congeneric series. The gray bars represent the regions of high accuracy (error below 1 or 2 kcal/mol for dark and light gray, respectively). b Change in RB-FEP predictive accuracy for models with a range of ligand RMSD, from a previous study (non-GPCR models154). Each point corresponds to a pairwise comparison between a receptor-ligand model and its corresponding experimental structure. The change in RMSE is represented on the left y-axis (blue circles), and the change in R2 on the right y-axis (red diamonds). In both cases, the difference has been calculated so that the points that are below the x-axis correspond to an improved accuracy of the model relative to the experimental structure (i.e. ΔRMSE= RMSEModel − RMSEExperimental, while ΔR2 = R2Experimental − R2Model). c, d RMSE for RB-FEP predictions as a function of ligand RMSD (left) or receptor pocket RMSD (right), for the D3-eticlopride models (c) and the GPR139-JNJ-63353054 models (d). Each point is a complex structure. Experimental structures are shown as green dots and computational models as blue and yellow dots (yellow for the putative false positive models for GPR139 as described in the text). The shaded bars indicate the regions of high accuracy of RB-FEP predictions (RMSE below 1 kcal/mol or 1.5 kcal/mol for dark and light gray, respectively). e, f Histograms of -log10 activity for the eticlopride (e) and JNJ-63353054 (f) congeneric series used in RB-FEP calculations for D3 receptor and GPR139, respectively. g, h RB-FEP accuracy plotted against AB-FEP prediction for the D3 receptor-eticlopride models (g) and GPR139-JNJ-63353054 models (h). The color coding of the dots and bars is the same as in (c, d). The dashed line indicates the experimental ΔGbinding for eticlopride to D3 receptor (g) or JNJ-63353054 to GPR139 (h).
Recently, several groups have attempted to tackle the task of relative binding energy prediction using machine learning140,141,142,143,144. However, so far, these efforts have had limited success. The predictions tend to be relatively inaccurate, with MUE of log10 potency prediction exceeding 1 in at least half of the cases145, and are poorly generalizable146,147. AI-based models that directly integrate physical knowledge (e.g. PBCNet148 or IGModel106) may have a higher prediction accuracy and also allow for better interpretability. However, even for these models, the accuracy is markedly lower than for FEP + 148, possibly because they disregard the pocket conformational changes upon compound analog binding. Consequently, in the field of binding affinity prediction, physics-based approaches remain the standard.
Success of FEP calculations is tightly coupled to the accuracy of the initial protein-ligand complex structure and its ability to capture relevant interactions and their energetic contributions during simulation. GPCR-specific complexities include the dynamics of extracellular loops and the uncertainties of water positioning and displacement in the orthosteric pocket149. Nevertheless, FEP+ calculations were shown to be accurate, both retrospectively and prospectively, when applied to experimental structures of several Class A GPCRs: adenosine A2A receptor, β1-adrenoceptor, CXCR4, δ-opioid receptor, and OX2 orexin receptor150,151. Furthermore, proof-of-concept studies showed that accurate FEP+ predictions can also be obtained using homology models refined by IFD-MD or MD simulation: for the adenosine A2A receptor, the FEP+ performance on such a model was slightly degraded compared to that on experimental structure, but still predictive (MUE = 1.50 vs 0.81 kcal/mol, R2 of 0.47 vs 0.39)152, while for the mosquito neuropeptide Y-like receptor 7 (NPYLR7), FEP+ calculations showed retrospective MUE of 1.29 kcal/mol, R2 of 0.57, and allowed to prospectively improve the functional efficacy of the lead compound153. Similarly, Xu et al. demonstrated the applicability of FEP+ to IFD-MD-generated models for 14 diverse protein targets (although no GPCRs) (Fig. 8b)154. Recently, Coskun et al. extended the proof-of-concept to IFD-MD-refined AF2 models, for the somatostatin SST4 (SSTR4)61. On a set of 64 congeneric ligands with a wide range (>3 log units) of activities, their final receptor-ligand complex model showed high accuracy (pairwise RMSE of 1.00 kcal/mol, R2 of 0.54). These successes indicate that FEP+ possesses a degree of tolerance to structural inaccuracies, likely via its sampling of the target and ligand conformational space by MD, which makes it applicable to computational models built by both conventional homology and AI-based methods.
An important lesson of the study by Xu et al. is that computationally generated models with relatively large geometric deviations from experimental structures can still be predictive; for example, one model with ligand RMSD of 3.1 Å relative to the experimental structure achieved FEP + RMSE of 1.08 kcal/mol and R2 of 0.63154. Moreover, for 9 out of the 18 models, the accuracy of FEP+ predictions was either similar or better than for the corresponding experimental structures. That said, for models with ligand RMSD > 2.5Å, accuracy, as measured by RMSE or by R2, was generally degraded compared to experimental structures, whereas for models with ligand RMSD < 2.5 Å, it was either improved or reduced, without any obvious trend (Fig. 8b). The authors also emphasized the risk of using insufficient or inadequate retrospective activity datasets in model validation by RB-FEP, whereby the use of sparse, unevenly distributed, or narrow-range activity data may lead to a misleadingly high retrospective RB-FEP accuracy even for models that are too geometrically incorrect to allow prospective success154.
In addition to RB-FEP, absolute binding FEP (AB-FEP) can be used to assess the models for prospective use in ligand optimization155. In contrast to RB-FEP where only the modified region between two ligands is sampled alchemically, AB-FEP alchemically couples/decouples a single ligand in its entirety, simulating its complete apparition/disparition in the binding pocket and the solvent. AB-FEP therefore, does not require a congeneric compound series or an activity dataset. When starting from an accurate model, AB-FEP is expected to closely predict the experimental binding energy of the ligand, or possibly generate a more negative number, with the difference interpreted as protein reorganization energy between the apo and the ligand-bound state155. Models with AB-FEP prediction significantly less favorable than the experimental binding affinity of the compound are likely incorrect. Therefore, when more than one model with similarly high RB-FEP performance is available, AB-FEP can help prioritize models for prospective applications61.
To further investigate the relationship between complex geometry and FEP predictive accuracy for experimental structures, homology models, and AF2 models of GPCRs, we applied both RB-FEP and AB-FEP (FEP + 135) to 21 GPCR Dock 2010 models of eticlopride-bound D3 receptor and 11 GPCR Dock 2021 models of JNJ-63533054-bound GPR139, along with two X-ray (for D3) and six cryo-EM (for GPR139) structures of the respective complexes (Figs. 3d, e and 6a, c, e). As described previously, these two sets of models were selected to span a large range of geometric ‘correctness’ relative to the known experimental structures. For D3 receptor, relative binding free energies were calculated for a series of 11 eticlopride analogs with potencies ranging from 0.1 nM to 426 nM108,156, whereas for GPR139, the study involved 21 Janssen glycine benzamides (same series as JNJ-63533054) with EC50 values from 24 nM to 1700 nM157.
The results for the two receptors and their respective sets of models were in sharp contrast. For the D3-eticlopride set, the two X-ray structures (chains A and B of PDB 3pbl) showed good retrospective accuracy in RB-FEP (RMSE = 1.3 kcal/mol and R2 = 0.66, Fig. 8c), but the accuracy degraded rapidly and significantly in the models (RMSE ≥ 2.4 kcal/mol and R2 < 0.52), even those with relatively close positioning of the ligand and globally similar pocket conformations (Fig. 8c). The most geometrically accurate of the tested models (model 5084-3: ligand RMSD of 1.2 Å and receptor pocket RMSD of only 1.6 Å) showed RB-FEP RMSE as high as 2.8 kcal/mol and an R2 of only 0.09 (Fig. 8c). This illustrates that geometric accuracy is not directly correlated with RB-FEP performance, and that small structural details can significantly impact FEP accuracy, hence justifying the need for additional refinement to improve performance as in the SSTR4 study61.
In contrast to the D3-eticlopride models, all eleven computationally generated models of the GPR139 - JNJ-63533054 complex yielded reasonably accurate retrospective RB-FEP predictions that were similar to the six cryo-EM structures, as measured by RMSE (1.18–1.42 kcal/mol, Fig. 8d). This was in spite of a wide range of geometric correctness of the models (ligand RMSD of 2–9.5 Å and receptor pocket RMSD of 2–6.9 Å compared to the most predictive cryo-EM structure, PDB 7vuj). Notably, four models with poses completely different from the cryo-EM structures (ligand RMSD > 8 Å) still showed relatively accurate RB-FEP predictions, likely exemplifying false positive models and illustrating the potential risks of RB-FEP retrospective assessment. Consistent with the cautionary note from Xu et al.154, this outcome could have been anticipated given the narrow activity range and sparseness of the activity dataset used (EC50 between 24 nM and 440 nM, only 1.3 log units, for 18 out of the 21 ligands, Fig. 8f and Supplemental Data 2). This narrow and sparse activity distribution, and the uncertainty of the retrospective RB-FEP validation of the models, were also reflected by the poor correlation coefficients observed for all models (R2 from 0 to 0.17 for all complex structures assessed, including the cryo-EM structures). By contrast, the retrospective dataset for D3-eticlopride had better distributed activities covering a broader range (EC50 from 0.1 nM to 440 nM, or 3.6 log units, Fig. 8e and Supplemental Data 1). AB-FEP calculations showed that two of the GPR139 RB-FEP false positive models had very poor predicted ΔGbinding for JNJ-63533054 (−6.3 and −6.6 kcal/mol compared to −13.2 kcal/mol for the best cryo-EM structure, Fig. 8h), supporting the complementary role of AB-FEP in validating receptor-ligand complex models. However, for the two remaining geometrically incorrect models, AB-FEP-predicted ΔGbinding values were close to the experimentally measured potency and only slightly weaker than ΔGbinding for the cryo-EM structures. In such cases, recognizing the limitation of the available ligand activity dataset should invite caution and encourage gathering of more data for more reliable validation.
In summary, FEP can achieve accurate predictions and guide potency optimization within ligand series not only for high-resolution experimental structures but also for some computationally predicted models of GPCRs. That said, there are inherent challenges and caveats that we emphasized in this section. A notable limitation is the amount of SAR known for the series of interest, as illustrated with the GPR139 study above. However, even in cases where the available SAR is insufficient to identify a single model for prospective use, it can help prioritize several models that can further be filtered as more SAR data is accumulated. In Fig. 9 and Table 1, we have gathered best practices and lessons learned to address the above challenges, and to generate and validate predictive models for FEP, both from the literature and from our experience in real-world drug discovery projects (Table 1, Fig. 9). We share a general workflow that uses an ensemble and trial-and-error approach for model selection and validation, to account for the unpredictable effects of small structural details on FEP accuracy.
In the first step (1), an ensemble of receptor-ligand structural models is built by combining multiple receptor models (including experimental structures available as well as relevant model from homology modeling or AI-based methods like AF2) with induced-fit docking on each of these starting receptor conformations. In the second step (2), each receptor-ligand complex model is assessed for its ability to retrospectively predict the SAR around the ligand of interest with RB-FEP (grey arrows). For efficiency, this can be done in a stepwise manner where a smaller, select subset of the SAR data available is used for the rapid initial assessment of all the models, and only the models showing promising accuracy at that stage are further validated on a larger dataset. If the SAR data allows, the results can be also broken down by region of ligand modification, and provide more granular insights in terms of which region of each model is predictive or not with RB-FEP (small colored squares). Oftentimes, when several models show good RB-FEP accuracy, they include variations around the same model, which can be clarified by clustering the best models based on structural similarity. In such cases, local structural refinements as well as FEP parameter optimization typically allow convergence to an optimal model for prospective FEP usage. If the available retrospective activity data is insufficient, though, it is possible that several models with significant differences show reasonable RB-FEP predictions at that stage. In such cases, the best models need to be further tested in order to discriminate those to be used prospectively from potential false positives (which will not hold prospectively, by definition). AB-FEP, as well as the extension of the dataset used for RB-FEP validation (potentially including non-binders in the same series), are typical ways to discriminate between top models.
Conclusion
The AI revolution opened new avenues for structure-based drug discovery for GPCRs by dramatically increasing the availability of high-quality structural models. The AI-based co-folding methods also have a potential to solve the decades-long challenge in modeling the induced-fit effect and to deliver accurate structural models for receptor complexes with small molecules and peptides–a critical prerequisite for most drug discovery applications. However, these models still show a wide range of geometric accuracy and physical validity. On the other hand, advances in physics-based computational chemistry, including new induced-fit docking methodologies and highly accurate FEP implementations, have led to notable successes in application to GPCRs, and AI-generated structural models have considerably extended the domain of applicability of these methods. Importantly, the geometric accuracy of computational models is not always correlated with their performance in hit discovery and ligand optimization applications. Predictive models can be identified via retrospective assessments in respective applications using adequate benchmarks. Beyond structure prediction, scoring functions and VLS acceleration by score extrapolation exemplify current and future areas of method development at the interface between AI and physics, poised for application in GPCR drug discovery.
Data availability
No datasets were generated in this study. The sets of receptor structural models analyzed in Figs. 2, 3, 4, 5, 6, 8 originate from studies cited in respective figure legends and are published as part of those studies. The sets of chemical compounds used for the analyses in Figs. 6 and 8 originate from studies cited in the text as well as the ChEMBL database (release ChEMBL34, March 2024); the final compound sets for D3 receptor and GPR139 are provided as Supplemental Data 1 and 2.
Abbreviations
- GPCR:
-
G protein-coupled receptors
- AI:
-
artificial intelligence
- SBDD:
-
structure-based drug discovery/design
- PDB:
-
protein data bank
- TM:
-
transmembrane
- AF2:
-
AlphaFold2
- AF3:
-
AlphaFold3
- pLDDT:
-
predicted local distance difference test
- RMSD:
-
root mean square deviation
- ECL:
-
extracellular loop
- ICL:
-
intracellular loop
- SAR:
-
structure-activity relationship
- MD:
-
molecular dynamics
- IFD:
-
induced-fit docking
- VLS:
-
virtual ligand/library screening
- TD:
-
Tanimoto distance
- ROC:
-
receiver-operating characteristic curve
- AUC:
-
area under the curve
- DUDe:
-
database of useful decoys – enhanced
- NN:
-
neural net
- PK:
-
pharmacokinetics
- cryo-EM:
-
cryogenic electron microscopy
- H2L:
-
hit-to-lead optimization
- LO:
-
lead optimization
- FEP:
-
free-energy perturbation
- RB-FEP:
-
relative binding FEP
- MUE:
-
mean unsigned error
- RMSE:
-
root mean square error
- AB-FEP:
-
absolute binding FEP.
References
Oprea, T. I. et al. Unexplored therapeutic opportunities in the human genome. Nat. Rev. Drug Discov. 17, 317–332 (2018).
Hauser, A. S., Attwood, M. M., Rask-Andersen, M., Schioth, H. B. & Gloriam, D. E. Trends in GPCR drug discovery: new agents, targets and indications. Nat. Rev. Drug Discov. 16, 829–842 (2017).
Lorente, J. S. et al. GPCR drug discovery: new agents, targets and indications. Nat. Rev. Drug Discov. 16, 829-842 (2025).
Kuhn, P., Wilson, K., Patch, M. G. & Stevens, R. C. The genesis of high-throughput structure-based drug discovery using protein crystallography. Curr. Opin. Chem. Biol. 6, 704–710 (2002).
Stevens, R. C. et al. The GPCR Network: a large-scale collaboration to determine human GPCR structure and function. Nat. Rev. Drug Discov. 12, 25–34 (2013).
Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Proteins 89, 1607–1617 (2021).
NobelPrize.org. Nobel Prize Outreach. They cracked the code for proteins’ amazing structures, <https://www.nobelprize.org/prizes/chemistry/2024/press-release/> (2024).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Isberg, V. et al. GPCRdb: an information system for G protein-coupled receptors. Nucleic Acids Res. 44, D356–D364 (2016).
Herrera, L. P. T. et al. GPCRdb in 2025: adding odorant receptors, data mapper, structure similarity search and models of physiological ligand complexes. Nucleic Acids Res. 53, D425-D435 (2024).
He, X. H. et al. AlphaFold2 versus experimental structures: evaluation on G protein-coupled receptors. Acta Pharm. Sin. 44, 1–7 (2023).
Lee, C., Su, B. H. & Tseng, Y. J. Comparative studies of AlphaFold, RoseTTAFold and Modeller: a case study involving the use of G-protein-coupled receptors. Brief Bioinform. 23, bbac308 (2022).
Callaway, E. What’s next for AlphaFold and the AI protein-folding revolution. Nature 604, 234–238 (2022).
Terwilliger, T. C. et al. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat. Methods 21, 110–116 (2024).
Borkakoti, N. & Thornton, J. M. AlphaFold2 protein structure prediction: Implications for drug discovery. Curr. Opin. Struct. Biol. 78, 102526 (2023).
Guo, H. B. et al. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci. Rep. 12, 10696 (2022).
Pinheiro, I. D. M. et al. Noncanonical roles of chemokine regions in CCR9 activation revealed by structural modeling and mutational mapping. bioRxiv https://doi.org/10.1101/2024.06.04.596985 (2024).
Heo, L. & Feig, M. Multi-state modeling of G-protein coupled receptors at experimental accuracy. Proteins 90, 1873–1885 (2022).
Pandy-Szekeres, G. et al. GPCRdb in 2023: state-specific structure models using AlphaFold2 and new ligand resources. Nucleic Acids Res. 51, D395–D402 (2023).
Wayment-Steele, H. K. et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 625, 832–839 (2024).
Del Alamo, D., Sala, D., McHaourab, H. S. & Meiler, J. Sampling alternative conformational states of transporters and receptors with AlphaFold2. Elife 11, e75751 (2022).
Sala, D., Hildebrand, P. W. & Meiler, J. Biasing AlphaFold2 to predict GPCRs and kinases with user-defined functional or structural properties. Front. Mol. Biosci. 10, 1121962 (2023).
Bryant, P. & Noe, F. Structure prediction of alternative protein conformations. Nat. Commun. 15, 7328 (2024).
Rustamov, K. R. & Baev, A. Y. MSA clustering enhances AF-Multimer’s ability to predict conformational landscapes of protein-protein interactions. Bioinform. Adv. 5, vbae197 (2025).
Ahdritz, G. et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat. Methods 21, 1514–1524 (2024).
Raouraoua, N. et al. MassiveFold: unveiling AlphaFold’s hidden potential with optimized and parallelized massive sampling. Nat. Comput. Sci. 4, 824–828 (2024).
Lewis, S. et al. Scalable emulation of protein equilibrium ensembles with generative deep learning. bioRxiv https://doi.org/10.1101/2024.12.05.626885 (2024).
Kufareva, I. et al. Status of GPCR modeling and docking as reflected by community-wide GPCR Dock 2010 assessment. Structure 19, 1108–1126 (2011).
Kufareva, I., Katritch, V., Participants of, G. D., Stevens, R. C. & Abagyan, R. Advances in GPCR modeling evaluated by the GPCR Dock 2013 assessment: meeting new challenges. Structure 22, 1120–1139 (2014).
Karelina, M., Noh, J. J. & Dror, R. O. How accurately can one predict drug binding modes using AlphaFold models? Elife 12 (2023).
Lee, S. et al. Evaluating GPCR modeling and docking strategies in the era of deep learning-based protein structure prediction. Comput. Struct. Biotechnol. J. 21, 158–167 (2023).
Michino, M. et al. Community-wide assessment of GPCR structure modelling and ligand docking: GPCR Dock 2008. Nat. Rev. Drug Discov. 8, 455–463 (2009).
Chitsazi, R. et al. The 4th GPCR Dock: assessment of blind predictions for GPCR-ligand complexes in the era of AlphaFold. bioRxiv https://doi.org/10.1101/2025.04.18.647407 (2025).
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).
Nakata, S., Mori, Y. & Tanaka, S. End-to-end protein-ligand complex structure generation with diffusion-based generative models. BMC Bioinforma. 24, 233 (2023).
Qiao, Z., Nie, W., Vahdat, A., Miller, T. F. & Anandkumar, A. State-specific protein–ligand complex structure prediction with a multiscale deep generative model. Nat. Mach. Intell. 6, 195–208 (2024).
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Editorial. AlphaFold3 - why did Nature publish it without its code? Nature https://doi.org/10.1038/d41586-024-01463-0 (2024).
Callaway, E. AI protein-prediction tool AlphaFold3 is now more open. Nature https://doi.org/10.1038/d41586-024-03708-4 (2024).
Google DeepMind. AlphaFold3, <https://github.com/google-deepmind/alphafold3> (2024).
Google DeepMind. AlphaFold3 License, <https://github.com/google-deepmind/alphafold3?tab=License-1-ov-file> (2024).
Callaway, E. Who will make AlphaFold3 open source? Scientists race to crack AI model. Nature https://doi.org/10.1038/d41586-024-01555-x (2024).
Boitreaud, J. et al. Chai-1: Decoding the molecular interactions of life. bioRxiv https://doi.org/10.1101/2024.10.10.615955 (2024).
Iambic Therapeutics. Transforming computational drug discovery with NeuralPLexer2, <https://www.iambic.ai/post/np2> (2024).
Ligo Biosciences. Open source implementation of AlphaFold3, <https://github.com/Ligo-Biosciences/AlphaFold3>.
Liu, L. et al. Technical Report of HelixFold3 for Biomolecular Structure Prediction. arXiv https://doi.org/10.48550/arXiv.2408.16975 (2024).
ByteDance AML AI4Science Team. A trainable PyTorch reproduction of AlphaFold 3 https://github.com/bytedance/Protenix/blob/main/Protenix_Technical_Report.pdf (2024).
Bryant, P., Kelkar, A., Guljas, A., Clementi, C. & Noe, F. Structure prediction of protein-ligand complexes from sequence information with Umol. Nat. Commun. 15, 4536 (2024).
Wohlwend, J. et al. Boltz-1 democratizing biomolecular interaction modeling. bioRxiv https://doi.org/10.1101/2024.11.19.624167 (2024).
Buttenschoen, M., Morris, G. M. & Deane, C. M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 15, 3130–3139 (2024).
Iambic Therapeutics. Previewing NeuralPLexer3: Towards fully AI-enabled Structure-Based Drug Discovery, <https://www.iambic.ai/post/np3-preview> (2024).
Masters, M. R., Mahmoud, A. H. & Lill, M. A. Do Deep Learning Models for Co-Folding Learn the Physics of Protein-Ligand Interactions? bioRxiv https://doi.org/10.1101/2024.06.03.597219 (2024).
Raush, E., Abagyan, R. & Totrov, M. Efficient generation of conformer ensembles using internal coordinates and a generative directional graph convolution neural network. J. Chem. Theory Comput. 20, 4054–4063 (2024).
He, X. H., Li, J. R., Shen, S. Y. & Xu, H. E. AlphaFold3 versus experimental structures: assessment of the accuracy in ligand-bound G protein-coupled receptors. Acta Pharmacol. Sin. 46, 1111-1122 (2024).
Totrov, M. & Abagyan, R. Flexible ligand docking to multiple receptor conformations: a practical alternative. Curr. Opin. Struct. Biol. 18, 178–184 (2008).
Amaro, R. E. et al. Ensemble docking in drug discovery. Biophys. J. 114, 2271–2278 (2018).
Miller, E. B. et al. Reliable and accurate solution to the induced fit docking problem for protein-ligand binding. J. Chem. Theory Comput. 17, 2630–2639 (2021).
Coskun, D. et al. Using AlphaFold and experimental structures for the prediction of the structure and binding affinities of GPCR complexes via induced fit docking and free energy perturbation. J. Chem. Theory Comput. 20, 477–489 (2024).
Wang, Z. et al. A fully differentiable ligand pose optimization framework guided by deep learning and a traditional scoring function. Brief. Bioinform. 24, bbac520 (2023).
de Oliveira, T. M., van Beek, L., Shilliday, F., Debreczeni, J. E. & Phillips, C. Cryo-EM: the resolution revolution and drug discovery. SLAS Discov. 26, 17–31 (2021).
Zhou, Y. et al. Molecular insights into ligand recognition and G protein coupling of the neuromodulatory orphan receptor GPR139. Cell Res. 32, 210–213 (2022).
You, C. et al. Structural insights into the peptide selectivity and activation of human neuromedin U receptors. Nat. Commun. 13, 2045 (2022).
Zhao, W. et al. Ligand recognition and activation of neuromedin U receptor 2. Nat. Commun. 13, 7955 (2022).
Arroyo-Urea, S. et al. A bitopic agonist bound to the dopamine 3 receptor reveals a selectivity site. Nat. Commun. 15, 7759 (2024).
Wang, Y. et al. Structures of the entire human opioid receptor family. Cell 186, 413–427 e417 (2023).
Chen, Y., Chen, B., Wu, T., Zhou, F. & Xu, F. Cryo-EM structure of human kappa-opioid receptor-Gi complex bound to an endogenous agonist dynorphin A. Protein Cell 14, 464–468 (2023).
Lopez-Balastegui, M. et al. Relevance of G protein-coupled receptor (GPCR) dynamics for receptor activation, signalling bias and allosteric modulation. Br. J. Pharmacol. (2024).
Irwin, J. J. & Shoichet, B. K. Docking screens for novel ligands conferring new biology. J. Med. Chem. 59, 4103–4120 (2016).
Ballante, F., Kooistra, A. J., Kampen, S., de Graaf, C. & Carlsson, J. Structure-based virtual screening for ligands of G protein-coupled receptors: What can molecular docking do for you?. Pharm. Rev. 73, 527–565 (2021).
Carlsson, J. & Luttens, A. Structure-based virtual screening of vast chemical space as a starting point for drug discovery. Curr. Opin. Struct. Biol. 87, 102829 (2024).
Liu, F. et al. Large library docking identifies positive allosteric modulators of the calcium-sensing receptor. Science 385, eado1868 (2024).
Patel, N. et al. Structure-based discovery of potent and selective melatonin receptor agonists. Elife 9, e53779 (2020).
Zheng, Z. et al. Structure-based discovery of new antagonist and biased agonist chemotypes for the kappa opioid receptor. J. Med. Chem. 60, 3070–3081 (2017).
Lane, J. R. et al. Structure-based ligand discovery targeting orthosteric and allosteric pockets of dopamine receptors. Mol. Pharm. 84, 794–807 (2013).
Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601, 452–459 (2022).
Kaplan, A. L. et al. Bespoke library docking for 5-HT2A receptor agonists with antidepressant activity. Nature 610, 582–591 (2022).
Bender, B. J. et al. Structure-based discovery of a NPFF1R antagonist with analgesic activity. bioRxiv https://doi.org/10.1101/2023.10.25.564029 (2023).
Smith, S. T. et al. Discovery of protease-activated receptor 4 (PAR4)-tethered ligand antagonists using ultralarge virtual screening. ACS Pharm. Transl. Sci. 7, 1086–1100 (2024).
Katritch, V., Rueda, M., Lam, P. C., Yeager, M. & Abagyan, R. GPCR 3D homology models for ligand screening: lessons learned from blind predictions of adenosine A2a receptor complex. Proteins 78, 197–211 (2010).
Katritch, V., Rueda, M. & Abagyan, R. Ligand-guided receptor optimization. Methods Mol. Biol. 857, 189–205 (2012).
Bender, B. J. et al. A practical guide to large-scale docking. Nat. Protoc. 16, 4799–4832 (2021).
Sadybekov, A. V. & Katritch, V. Computational approaches streamlining drug discovery. Nature 616, 673–685 (2023).
Liu, F. et al. Small vs. large library docking for positive allosteric modulators of the calcium sensing receptor. bioRxiv https://doi.org/10.1101/2023.12.27.573448 (2024).
Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594 (2012).
Dawson, J. R. D. et al. Molecular determinants of antagonist interactions with chemokine receptors CCR2 and CCR5. bioRxiv https://doi.org/10.1101/2023.11.15.567150 (2023).
Lyu, J. et al. AlphaFold2 structures guide prospective ligand discovery. Science 384, eadn6354 (2024).
Diaz-Rovira, A. M. et al. Are deep learning structural models sufficiently accurate for virtual screening? Application of docking algorithms to AlphaFold2 predicted structures. J. Chem. Inf. Model 63, 1668–1674 (2023).
Zhang, Y. et al. Benchmarking refined and unrefined AlphaFold2 structures for hit discovery. J. Chem. Inf. Model 63, 1656–1667 (2023).
Diaz-Holguin, A. et al. AlphaFold accelerated discovery of psychotropic agonists targeting the trace amine-associated receptor 1. Sci. Adv. 10, eadn1524 (2024).
Su, M. et al. Comparative assessment of scoring functions: the CASF-2016 update. J. Chem. Inf. Model 59, 895–913 (2019).
Neves, M. A., Totrov, M. & Abagyan, R. Docking and scoring with ICM: the benchmarking results and strategies for improvement. J. Comput Aided Mol. Des. 26, 675–686 (2012).
Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).
Halgren, T. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47, 1750–1759 (2004).
Coleman, R. G., Carchia, M., Sterling, T., Irwin, J. J. & Shoichet, B. K. Ligand pose and orientational sampling in molecular docking. PLoS One 8, e75992 (2013).
Jones, G., Willett, P., Glen, R. C., Leach, A. R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267, 727–748 (1997).
Park, H., Zhou, G., Baek, M., Baker, D. & DiMaio, F. Force Field Optimization Guided by Small Molecule Crystal Lattice Data Enables Consistent Sub-Angstrom Protein-Ligand Docking. J. Chem. Theory Comput. 17, 2000–2010 (2021).
Cao, D. et al. EquiScore: a generic protein-ligand interaction scoring method integrating physical prior knowledge with data augmentation modeling. bioRxiv https://doi.org/10.1101/2023.06.18.545464 (2023).
Stafford, K. A., Anderson, B. M., Sorenson, J. & van den Bedem, H. AtomNet PoseRanker: enriching ligand pose quality for dynamic proteins in virtual high-throughput screens. J. Chem. Inf. Model 62, 1178–1189 (2022).
Atomwise, A. P. AI is a viable alternative to high throughput screening: a 318-target study. Sci. Rep. 14, 7526 (2024).
Shen, C. et al. Boosting protein-ligand binding pose prediction and virtual screening based on residue-atom distance likelihood potential and graph transformer. J. Med Chem. 65, 10691–10706 (2022).
Raush, E. & Totrov, M. RTCNN Performance (CASF 2016 pose rank benchmark). Molsoft ICM User Group Meeting https://doi.org/10.6084/m9.figshare.24309496.v1 (2023).
Totrov, M. New developments in ICM: neural networks and beyond in Molsoft ICM User Group Meeting. (San Diego, CA, 2023).
Wang, Z. et al. A new paradigm for applying deep learning to protein-ligand interaction prediction. Brief. Bioinform. 25, bbae145 (2024).
Zhou, G. et al. An artificial intelligence accelerated virtual screening platform for drug discovery. Nat. Commun. 15, 7761 (2024).
Chien, E. Y. et al. Structure of the human dopamine D3 receptor in complex with a D2/D3 selective antagonist. Science 330, 1091–1095 (2010).
Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).
Graff, D. E., Shakhnovich, E. I. & Coley, C. W. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci. 12, 7866–7881 (2021).
Yang, Y. et al. Efficient exploration of chemical space with docking and deep learning. J. Chem. Theory Comput. 17, 7106–7119 (2021).
Gentile, F. et al. Deep docking: a deep learning platform for augmentation of structure based drug discovery. ACS Cent. Sci. 6, 939–949 (2020).
Gentile, F. et al. Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc. 17, 672–697 (2022).
Raush, E. Highlights of recent ICM developments: GPU acceleration, its applications and more in Molsoft ICM User Group Meeting. (San Diego, CA, 2023).
Luttens, A. et al. Rapid traversal of vast chemical space using machine learning-guided docking screens. Nat. Comput. Sci. 5, 301–312 (2025).
Grechishnikova, D. Transformer neural network for protein-specific de novo drug generation as a machine translation problem. Sci. Rep. 11, 321 (2021).
Qian, H., Lin, C., Zhao, D., Tu, S. & Xu, L. AlphaDrug: protein target specific de novo molecular generation. PNAS Nexus 1, pgac227 (2022).
Bernatavicius, A. et al. AlphaFold meets de novo drug design: leveraging structural protein information in multitarget molecular generative models. J. Chem. Inf. Model 64, 8113–8122 (2024).
Atz, K. et al. Prospective de novo drug design with deep interactome learning. Nat. Commun. 15, 3408 (2024).
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Nat. Comput. Sci. 4, 899–909 (2024).
Peng, X. et al. Pocket2Mol: efficient molecular sampling based on 3D protein pockets. arXiv https://doi.org/10.48550/arXiv.2205.07249 (2022).
Zhang, O. et al. ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modelling. Nat. Mach. Intell. 5, 1020–1030 (2023).
Jiang, Y. et al. PocketFlow is a data-and-knowledge-driven structure-based molecular generative model. Nat. Mach. Intell. 6, 326–337 (2024).
Zhung, W., Kim, H. & Kim, W. Y. 3D molecular generative framework for interaction-guided drug design. Nat. Commun. 15, 2688 (2024).
Powers, A. S. et al. Geometric deep learning for structure-based ligand design. ACS Cent. Sci. 9, 2257–2267 (2023).
Thomas, M., Smith, R. T., O’Boyle, N. M., de Graaf, C. & Bender, A. Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J. Cheminform. 13, 39 (2021).
Grisoni, F. et al. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci. Adv. 7, eabg3338 (2021).
Ivanenkov, Y. et al. The Hitchhiker’s guide to deep learning driven generative chemistry. ACS Med. Chem. Lett. 14, 901–915 (2023).
Cournia, Z., Allen, B. & Sherman, W. Relative binding free energy calculations in drug discovery: recent advances and practical considerations. J. Chem. Inf. Model 57, 2911–2937 (2017).
Straatsma, T. & McCammon, J. Computational alchemy. Annu. Rev. Phys. Chem. 43, 407–435 (1992).
York, D. M. Modern alchemical free energy methods for drug discovery explained. ACS Phys. Chem. Au 3, 478–491 (2023).
Gapsys, V. et al. Large scale relative protein ligand binding affinities using non-equilibrium alchemy. Chem. Sci. 11, 1140–1152 (2019).
Abraham, M. J. et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1-2, 19–25 (2015).
Wang, K., Chodera, J. D., Yang, Y. & Shirts, M. R. Identifying ligand binding sites and poses using GPU-accelerated Hamiltonian replica exchange molecular dynamics. J. Comput. Aided Mol. Des. 27, 989–1007 (2013).
Wang, L. et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc. 137, 2695–2703 (2015).
Abel, R., Wang, L., Harder, E. D., Berne, B. J. & Friesner, R. A. Advancing drug discovery through enhanced free energy calculations. Acc. Chem. Res. 50, 1625–1632 (2017).
Ross, G. A. et al. The maximal and current accuracy of rigorous protein-ligand binding free energy calculations. Commun. Chem. 6, 222 (2023).
Kuhn, B. et al. Prospective evaluation of free energy calculations for the prioritization of cathepsin L inhibitors. J. Med Chem. 60, 2485–2497 (2017).
Schindler, C. E. M. et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. J. Chem. Inf. Model 60, 5457–5474 (2020).
Wang, D. D., Wu, W. & Wang, R. Structure-based, deep-learning models for protein-ligand binding affinity prediction. J. Cheminform. 16, 2 (2024).
Liu, X. et al. Binding affinity prediction: from conventional to machine learning-based approaches. arXiv https://doi.org/10.48550/arXiv.2410.00709 (2024).
Zhang, Y., Li, S., Meng, K. & Sun, S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J. Chem. Inf. Model 64, 1456–1472 (2024).
Karlov, D. S., Sosnin, S., Fedorov, M. V. & Popov, P. graphDelta: MPNN scoring function for the affinity prediction of protein-ligand complexes. ACS Omega 5, 5150–5159 (2020).
Mqawass, G. & Popov, P. graphLambda: fusion graph neural networks for binding affinity prediction. J. Chem. Inf. Model 64, 2323–2330 (2024).
Jones, D. et al. Improved protein-ligand binding affinity prediction with structure-based deep fusion inference. J. Chem. Inf. Model 61, 1583–1592 (2021).
McNutt, A. T. & Koes, D. R. Improving DeltaDeltaG predictions with a multitask convolutional siamese network. J. Chem. Inf. Model 62, 1819–1829 (2022).
Mohamed Abdul Cader, J., Newton, M. A. H., Rahman, J., Mohamed Abdul Cader, A. J. & Sattar, A. Ensembling methods for protein-ligand binding affinity prediction. Sci. Rep. 14, 24447 (2024).
Yu, J. & Zheng, M. Efficient prediction of relative ligand binding affinity in drug discovery. Nat. Comput Sci. 3, 829–830 (2023).
Mason, J. S. et al. High end GPCR design: crafted ligand design and druggability analysis using protein structure, lipophilic hotspots and explicit water networks. LID - 23. In Silico Pharmacol. (2013).
Lenselink, E. B. et al. Predicting Binding Affinities for GPCR Ligands Using Free-Energy Perturbation. ACS Omega 1, 293–304 (2016).
Deflorian, F. et al. Accurate Prediction of GPCR Ligand Binding Affinity with Free Energy Perturbation. J. Chem. Inf. Model 60, 5563–5579 (2020).
Cappel, D. et al. Relative Binding Free Energy Calculations Applied to Protein Homology Models. J. Chem. Inf. Model 56, 2388–2400 (2016).
Zeledon, E. V. et al. Next-generation neuropeptide Y receptor small-molecule agonists inhibit mosquito-biting behavior. Parasit. Vectors 17, 276 (2024).
Xu, T. et al. Induced-fit docking enables accurate free energy perturbation calculations in homology models. J. Chem. Theory Comput 18, 5710–5724 (2022).
Fajer, M., Borrelli, K., Abel, R. & Wang, L. Quantitatively accounting for protein reorganization in computer-aided drug design. J. Chem. Theory Comput. 19, 3080–3090 (2023).
Shaik, A. B. et al. Structure activity relationships for a series of eticlopride-based dopamine D(2)/D(3) receptor bitopic ligands. J. Med. Chem. 64, 15313–15333 (2021).
Dvorak, C. A. et al. Identification and SAR of glycine benzamides as potent agonists for the GPR139 receptor. ACS Med. Chem. Lett. 6, 1015–1018 (2015).
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50, D439–D444 (2022).
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2021).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York. ISBN 978-3-319-24277-4. https://cran.r-project.org/web/packages/ggplot2/citation.html (2016).
Abagyan, R. & Totrov, M. Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. J. Mol. Biol. 235, 983–1002 (1994).
Acknowledgements
The authors are grateful to Drs. Mike Gilson and Adrian Jinich (UC San Diego) for valuable discussions, to Dr. Sofia Endzhievskaya and Mr. Timothy Liu in the Kufareva lab (UC San Diego) for help with software deployment, and to Dr. Ed Miller (Schrödinger) for reading the manuscript and providing valuable feedback. This work was supported by NIH grants R21 AI149369, R21 AI156662, R01 AI161880, and R01 GM136202 (to I.K.). The Sanders Tri-Institutional Therapeutics Discovery Institute (TDI) is a 501(c)(3) organization and receives financial support from its parent institutes (Memorial Sloan Kettering Cancer Center, The Rockefeller University, and Weill Cornell Medicine) and from a generous contribution from Lewis Sanders and other philanthropic sources.
Author information
Authors and Affiliations
Contributions
M.M., J.V., and I.K. conducted the review of the literature and wrote, revised, and reviewed the main manuscript text. I.K. and J.V. performed data analyses. I.K., M.M., and J.V. prepared Figure 1, I.K. and M.M. prepared Figs. 2, 3, 5, I.K. prepared Figs. 4, 6, 7, and J.V. prepared Figs. 8, 9. J.V. and M.M. prepared Table 1.
Corresponding authors
Ethics declarations
Competing interests
I.K. is an Editorial Board Member for npj Drug Discovery. She was not part of a peer review process or decision-making of the manuscript. The other two authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Michino, M., Vendome, J. & Kufareva, I. AI meets physics in computational structure-based drug discovery for GPCRs. npj Drug Discov. 2, 16 (2025). https://doi.org/10.1038/s44386-025-00019-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s44386-025-00019-0