Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Primer
  • Published:

Computational protein design

This article has been updated

Abstract

Combining molecular modelling, machine-learned models and an increasingly detailed understanding of protein chemistry and physics, computational protein design and human expertise have been able to produce new protein structures, assemblies and functions that do not exist in nature. Currently, generative deep-learning-based methods, which exploit large databases of protein sequences and structures, are revolutionizing the field, leading to new capabilities, improved reliability and democratized access in protein design. This Primer provides an introduction to the main approaches in computational protein design, covering both physics-based and machine-learning-based tools. It aims to be accessible to biological, physical and computer scientists alike. Emphasis is placed on understanding the practical challenges arising from limitations in our fundamental understanding of protein structure and function and on recent developments and new ideas that may help transcend these.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Structure-based functional design with successive design steps.
Fig. 2: The three multivariate dimensions of protein state space.
Fig. 3: Computational protein design tools that capture the relationships between sequence, backbone structure including side chain geometry and conformation ensembles to properties and functions.
Fig. 4: Evolution of protein-design approaches over time.
Fig. 5: Diffusion-based backbone generation can be steered using conditional models and potential-based guidance.
Fig. 6: Standard workflow of protein experiments.

Similar content being viewed by others

Change history

  • 11 March 2025

    In the version of the article initially published, Sophie Barbe’s email was incorrect and has now been amended in the HTML and PDF versions of the article.

References

  1. Jiang, L. et al. De novo computational design of retro-aldol enzymes. Science 319, 1387–1391 (2008).

    Article  ADS  MATH  Google Scholar 

  2. Arnold, F. H. Innovation by evolution: bringing new chemistry to life (nobel lecture). Angew. Chem. Int. Ed. 58, 14420–14426 (2019).

    Article  Google Scholar 

  3. Winter, G. Harnessing evolution to make medicines (nobel lecture). Angew. Chem. Int. Ed. 58, 14438–14445 (2019).

    Article  Google Scholar 

  4. Woolfson, D. N. A brief history of de novo protein design: minimal, rational, and computational. J. Mol. Biol. 433, 167160 (2021).

    Article  MATH  Google Scholar 

  5. Chu, A. E., Lu, T. & Huang, P.-S. Sparks of function by de novo protein design. Nat. Biotechnol. 42, 203–215 (2024).

    Article  MATH  Google Scholar 

  6. Arnold, F. H. Design by directed evolution. Acc. Chem. Res. 31, 125–131 (1998).

    Article  MATH  Google Scholar 

  7. Arnold, F. H. Directed evolution: bringing new chemistry to life. Angew. Chem. Int. Ed. 57, 4143–4148 (2018).

    Article  MATH  Google Scholar 

  8. Wang, Y. et al. Directed evolution: methodologies and applications. Chem. Rev. 121, 12384–12444 (2021).

    Article  Google Scholar 

  9. Zeymer, C. & Hilvert, D. Directed evolution of protein catalysts. Annu. Rev. Biochem. 87, 131–157 (2018).

    Article  Google Scholar 

  10. Korendovych, I. V. & DeGrado, W. F. De novo protein design, a retrospective. Q. Rev. Biophys. 53, e3 (2020).

    Article  MATH  Google Scholar 

  11. Pan, X. & Kortemme, T. Recent advances in de novo protein design: principles, methods, and applications. J. Biol. Chem. 296, 100558 (2021).

    Article  Google Scholar 

  12. Chen, K. & Arnold, F. H. Engineering new catalytic activities in enzymes. Nat. Catal. 3, 203–213 (2020).

    Article  MATH  Google Scholar 

  13. Suleyman, M. & Bhaskar, M. The Coming Wave: Technology, Power, and the Twenty-first Century’s Greatest Dilemma (Crown, 2023).

  14. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  ADS  MATH  Google Scholar 

  15. Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).

    Article  ADS  MATH  Google Scholar 

  16. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).

    Article  MATH  Google Scholar 

  17. Baek, M. et al. Efficient and accurate prediction of protein structure using RoseTTAFold2. Preprint at bioRxiv https://doi.org/10.1101/2023.05.24.542179 (2023).

  18. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article  ADS  MathSciNet  MATH  Google Scholar 

  19. Chai, C. D. et al. Chai-1: decoding the molecular interactions of life. Preprint at bioRxiv https://doi.org/10.1101/2024.10.10.615955 (2024).

  20. Wohlwend, J. et al. Boltz-1 democratizing biomolecular interaction modeling. Preprint at bioRxiv https://doi.org/10.1101/2024.11.19.624167 (2024).

  21. Wu, R. et al. High-resolution de novo structure prediction from primary sequence. Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500999 (2022).

  22. Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

    Article  ADS  MATH  Google Scholar 

  23. Weijman, J. F. et al. Molecular architecture of the autoinhibited kinesin-1 lambda particle. Sci. Adv. 8, eabp9660 (2022).

    Article  Google Scholar 

  24. Schweke, H. et al. An atlas of protein homo-oligomerization across domains of life. Cell 187, 999–1010.e15 (2024).

    Article  Google Scholar 

  25. Shor, B. & Schneidman-Duhovny, D. CombFold: predicting structures of large protein assemblies using a combinatorial assembly algorithm and AlphaFold2. Nat. Methods 21, 477–487 (2024).

    Article  Google Scholar 

  26. Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).

    Article  MATH  Google Scholar 

  27. Albanese, K. I. et al. Rationally seeded computational protein design of α-helical barrels. Nat. Chem. Biol. 20, 991–999 (2024).

    Article  MATH  Google Scholar 

  28. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

    Article  ADS  MATH  Google Scholar 

  29. Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).

    Article  ADS  Google Scholar 

  30. Hsu, C. et al. Learning inverse folding from millions of predicted structures. In Proc. 39th International Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 8946–8970 (PMLR, 2022).

  31. Akpinaroglu, D. et al. Structure-conditioned masked language models for protein sequence design generalize beyond the native sequence space. Preprint at bioRxiv https://doi.org/10.1101/2023.12.15.571823 (2023).

  32. Gao, Z., Tan, C. & Li, S. Z. PiFold: toward effective and efficient protein inverse folding. In The Eleventh International Conference on Learning Representations, ICLR 2023 https://openreview.net/pdf?id=oMsN9TYwJ0j (OpenReview.net, 2023).

  33. Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).

    Article  ADS  MATH  Google Scholar 

  34. Ferruz, N., Schmidt, S. & Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 13, 4348 (2022).

    Article  ADS  Google Scholar 

  35. Hayes, T. et al. Simulating 500 million years of evolution with a language model. Science https://doi.org/10.1126/science.ads0018 (2024).

  36. Sumida, K. H. et al. Improving protein expression, stability, and function with ProteinMPNN. J. Am. Chem. Soc. 146, 2054–2061 (2024).

    Article  MATH  Google Scholar 

  37. Meador, K. et al. A suite of designed protein cages using machine learning and protein fragment-based protocols. Structure 32, 751–765.e11 (2024).

    Article  MATH  Google Scholar 

  38. de Haas, R. J. et al. Rapid and automated design of two-component protein nanomaterials using ProteinMPNN. Proc. Natl Acad. Sci. USA 121, e2314646121 (2024).

    Article  MATH  Google Scholar 

  39. Ma, B. et al. A top-down design approach for generating a peptide PROTAC drug targeting androgen receptor for androgenetic alopecia therapy. J. Med. Chem. 67, 10336–10349 (2024).

    Article  MATH  Google Scholar 

  40. An, L. et al. Binding and sensing diverse small molecules using shape-complementary pseudocycles. Science 385, 276–282 (2024).

    Article  MATH  Google Scholar 

  41. Winnifrith, A., Outeiral, C. & Hie, B. L. Generative artificial intelligence for de novo protein design. Curr. Opin. Struct. Biol. 86, 102794 (2024).

    Article  Google Scholar 

  42. Carlini, N. et al. Extracting training data from diffusion models. In 32nd USENIX Security Symposium (eds Calandrino, J. A. & Troncoso, C.) 5253–5270 (USENIX Association, 2023).

  43. Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).

    Article  MATH  Google Scholar 

  44. Pierce, B. G. et al. ZDOCK server: interactive docking prediction of protein–protein complexes and symmetric multimers. Bioinformatics 30, 1771–1773 (2014).

    Article  MATH  Google Scholar 

  45. Goverde, C. A., Wolf, B., Khakzad, H., Rosset, S. & Correia, B. E. De novo protein design by inversion of the AlphaFold structure prediction network. Protein Sci. 32, e4653 (2023).

    Article  Google Scholar 

  46. Anfinsen, C. B. Principles that govern the folding of protein chains. Science 181, 223–230 (1973).

    Article  ADS  MATH  Google Scholar 

  47. Vanommeslaeghe, K. et al. CHARMM general force field: a force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem. 31, 671–690 (2010).

    Article  Google Scholar 

  48. Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174 (2004).

    Article  MATH  Google Scholar 

  49. Lazaridis, T. & Karplus, M. Effective energy function for proteins in solution. Proteins 35, 133–152 (1999).

    Article  MATH  Google Scholar 

  50. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

    Article  ADS  MATH  Google Scholar 

  51. Alford, R. F. et al. The Rosetta All-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).

    Article  MATH  Google Scholar 

  52. UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021).

    Article  Google Scholar 

  53. Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011).

    Article  MATH  Google Scholar 

  54. wwPDB consortium. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 47, D520–D528 (2019).

    Article  Google Scholar 

  55. Defresne, M., Barbe, S. & Schiex, T. Scalable coupling of deep learning with logical reasoning. In Proc. Thirty-Second International Joint Conference on Artificial Intelligence (ed. Elkind, E.) 3615–3623 (International Joint Conferences on Artificial Intelligence Organization, 2023).

  56. Tsuboyama, K. et al. Mega-scale experimental analysis of protein folding stability in biology and design. Nature 620, 434–444 (2023).

    Article  ADS  MATH  Google Scholar 

  57. Lu, L. et al. De novo design of drug-binding proteins with predictable binding energy and specificity. Science 384, 106–112 (2024).

    Article  ADS  MATH  Google Scholar 

  58. Glasscock, C. J. et al. Computational design of sequence-specific DNA-binding proteins. Preprint at bioRxiv https://doi.org/10.1101/2023.09.20.558720 (2023).

  59. Vázquez Torres, S. et al. De novo design of high-affinity binders of bioactive helical peptides. Nature 626, 435–442 (2024).

    Article  ADS  MATH  Google Scholar 

  60. Yang, E. C. et al. Computational design of non-porous pH-responsive antibody nanoparticles. Nat. Struct. Mol. Biol. 31, 1404–1412 (2024).

    Article  MATH  Google Scholar 

  61. Guo, A. B., Akpinaroglu, D., Kelly, M. J. S. & Kortemme, T. Deep learning guided design of dynamic proteins. Preprint at bioRxiv https://doi.org/10.1101/2024.07.17.603962 (2024).

  62. Cross, J. A. et al. A de novo designed coiled coil-based switch regulates the microtubule motor kinesin-1. Nat. Chem. Biol. 20, 916–923 (2024).

    Article  MATH  Google Scholar 

  63. Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).

    Article  ADS  MATH  Google Scholar 

  64. Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426–431 (2020).

    Article  ADS  MATH  Google Scholar 

  65. Sesterhenn, F. et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science 368, eaay5051 (2020).

    Article  MATH  Google Scholar 

  66. Bennett, N. R. et al. Atomically accurate de novo design of single-domain antibodies. Preprint at bioRxiv https://doi.org/10.1101/2024.03.14.585103 (2024).

  67. Kajava, A. V. Tandem repeats in proteins: from sequence to structure. J. Struct. Biol. 179, 279–288 (2012).

    Article  Google Scholar 

  68. Lupas, A. N. & Gruber, M. in Fibrous Proteins: Coiled-Coils, Collagen and Elastomers, Advances in Protein Chemistry 37–38 (Elsevier, 2005).

  69. Woolfson, D. N. Understanding a protein fold: the physics, chemistry, and biology of α-helical coiled coils. J. Biol. Chem. 299, 104579 (2023).

    Article  MATH  Google Scholar 

  70. Harbury, P. B., Plecs, J. J., Tidor, B., Alber, T. & Kim, P. S. High-resolution protein design with backbone freedom. Science 282, 1462–1467 (1998).

    Article  Google Scholar 

  71. Huang, P.-S. et al. High thermodynamic stability of parametrically designed helical bundles. Science 346, 481–485 (2014).

    Article  ADS  MATH  Google Scholar 

  72. Thomson, A. R. et al. Computational design of water-soluble α-helical barrels. Science 346, 485–488 (2014).

    Article  ADS  MATH  Google Scholar 

  73. Dawson, W. M. et al. Coiled coils 9-to-5: rational de novo design of α-helical barrels with tunable oligomeric states. Chem. Sci. 12, 6923–6928 (2021).

    Article  MATH  Google Scholar 

  74. Toda, M., Zhang, F. & Athukorallage, B. Elastic surface model for beta-barrels: geometric, computational, and statistical analysis. Proteins 86, 35–42 (2018).

    Article  MATH  Google Scholar 

  75. Novotný, J., Bruccoleri, R. E. & Newell, J. Twisted hyperboloid (strophoid) as a model of β-barrels in proteins. J. Mol. Biol. 177, 567–573 (1984).

    Article  Google Scholar 

  76. Naveed, H., Xu, Y., Jackups, R. Jr. & Liang, J. Predicting three-dimensional structures of transmembrane domains of β-barrel membrane proteins. J. Am. Chem. Soc. 134, 1775–1781 (2012).

    Article  Google Scholar 

  77. Huang, P.-S. et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat. Chem. Biol. 12, 29–34 (2016).

    Article  ADS  MATH  Google Scholar 

  78. Marcos, E. et al. Principles for designing proteins with cavities formed by curved β sheets. Science 355, 201–206 (2017).

    Article  ADS  MATH  Google Scholar 

  79. Kim, D. E. et al. Parametrically guided design of beta barrels and transmembrane nanopores using deep learning. Preprint at bioRxiv https://doi.org/10.1101/2024.07.22.604663 (2024).

  80. Lasters, I., Wodak, S. J., Alard, P. & van Cutsem, E. Structural principles of parallel beta-barrels in proteins. Proc. Natl Acad. Sci. USA 85, 3338–3342 (1988).

    Article  ADS  MATH  Google Scholar 

  81. Kumar, P., Paterson, N. G., Clayden, J. & Woolfson, D. N. De novo design of discrete, stable 310-helix peptide assemblies. Nature 607, 387–392 (2022).

    Article  ADS  Google Scholar 

  82. Durairaj, J. et al. Uncovering new families and folds in the natural protein universe. Nature 622, 646–653 (2023).

    Article  ADS  MATH  Google Scholar 

  83. Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).

    Article  ADS  MATH  Google Scholar 

  84. Huang, P.-S. et al. RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS ONE 6, e24109 (2011).

    Article  ADS  Google Scholar 

  85. Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).

    Article  ADS  MATH  Google Scholar 

  86. Lin, Y.-R. et al. Control over overall shape and size in de novo designed proteins. Proc. Natl Acad. Sci. USA 112, E5478–85 (2015).

    Article  Google Scholar 

  87. Jacobs, T. M. et al. Design of structurally distinct proteins using strategies inspired by evolution. Science 352, 687–690 (2016).

    Article  ADS  MATH  Google Scholar 

  88. Pan, X. et al. Expanding the space of protein geometries by computational design of de novo fold families. Science 369, 1132–1136 (2020).

    Article  ADS  MATH  Google Scholar 

  89. Harteveld, Z. et al. A generic framework for hierarchical de novo protein design. Proc. Natl Acad. Sci. USA 119, e2206111119 (2022).

    Article  Google Scholar 

  90. Yang, C. et al. Bottom-up de novo design of functional proteins with complex structural features. Nat. Chem. Biol. 17, 492–500 (2021).

    Article  MATH  Google Scholar 

  91. Zhou, J. & Grigoryan, G. Rapid search for tertiary fragments reveals protein sequence–structure relationships. Protein Sci. 24, 508–524 (2015).

    Article  MATH  Google Scholar 

  92. Woolfson, D. N. et al. De novo protein design: how do we expand into the universe of possible protein structures? Curr. Opin. Struct. Biol. 33, 16–26 (2015).

    Article  MATH  Google Scholar 

  93. Taylor, W. R. A ’periodic table’ for protein structures. Nature 416, 657–660 (2002).

    Article  ADS  MATH  Google Scholar 

  94. Taylor, W. R., Chelliah, V., Hollup, S. M., MacDonald, J. T. & Jonassen, I. Probing the ‘dark matter’ of protein fold space. Structure 17, 1244–1252 (2009).

    Article  MATH  Google Scholar 

  95. Minami, S. et al. Exploration of novel αβ-protein folds through de novo design. Nat. Struct. Mol. Biol. 30, 1132–1140 (2023).

    Article  MATH  Google Scholar 

  96. Sakuma, K. et al. Design of complicated all-α protein structures. Nat. Struct. Mol. Biol. 31, 275–282 (2024).

    Article  MATH  Google Scholar 

  97. Lipsh-Sokolik, R. et al. Combinatorial assembly and design of enzymes. Science 379, 195–201 (2023).

    Article  ADS  MATH  Google Scholar 

  98. Kundert, K. & Kortemme, T. Computational design of structured loops for new protein functions. Biol. Chem. 400, 275–288 (2019).

    Article  MATH  Google Scholar 

  99. Du, H. et al. A general platform for targeting MHC-II antigens via a single loop. Preprint at bioRxiv https://doi.org/10.1101/2024.01.26.577489 (2024).

  100. Misson Mindrebo, L. et al. Fully synthetic platform to rapidly generate tetravalent bispecific nanobody-based immunoglobulins. Proc. Natl Acad. Sci. USA 120, e2216612120 (2023).

    Article  Google Scholar 

  101. Yu, Y. & Lutz, S. Circular permutation: a different way to engineer enzyme structure and function. Trends Biotechnol. 29, 18–25 (2011).

    Article  MATH  Google Scholar 

  102. Schellman, C. & Jaenicke, R. in The AlphaL Conformation at the Ends of Helices (ed. Jaenicke, R.) (Elsevier, 1980).

  103. Thornton, J. M., Sibanda, B. L., Edwards, M. S. & Barlow, D. J. Analysis, design and modification of loop regions in proteins. Bioessays 8, 63–69 (1988).

    Article  MATH  Google Scholar 

  104. Aurora, R. & Rose, G. D. Helix capping. Protein Sci. 7, 21–38 (1998).

    Article  MATH  Google Scholar 

  105. Richardson, J. S. & Richardson, D. C. Amino acid preferences for specific locations at the ends of alpha helices. Science 240, 1648–1652 (1988).

    Article  ADS  MATH  Google Scholar 

  106. Wilmot, C. M. & Thornton, J. M. Analysis and prediction of the different types of β-turn in proteins. J. Mol. Biol. 203, 221–232 (1988).

    Article  MATH  Google Scholar 

  107. Brunet, A. P. et al. The role of turns in the structure of an alpha-helical protein. Nature 364, 355–358 (1993).

    Article  ADS  MATH  Google Scholar 

  108. Efimov, A. V. Patterns of loop regions in proteins. Curr. Opin. Struct. Biol. 3, 379–384 (1993).

    Article  MATH  Google Scholar 

  109. Aurora, R., Srinivasan, R. & Rose, G. D. Rules for alpha-helix termination by glycine. Science 264, 1126–1130 (1994).

    Article  ADS  MATH  Google Scholar 

  110. Harper, E. T. & Rose, G. D. Helix stop signals in proteins and peptides: the capping box. Biochemistry 32, 7605–7609 (1993).

    Article  MATH  Google Scholar 

  111. Engel, D. E. & DeGrado, W. F. Alpha-alpha linking motifs and interhelical orientations. Proteins 61, 325–337 (2005).

    Article  MATH  Google Scholar 

  112. Hill, R. B., Raleigh, D. P., Lombardi, A. & DeGrado, W. F. De novo design of helical bundles as models for understanding protein folding and function. Acc. Chem. Res. 33, 745–754 (2000).

    Article  MATH  Google Scholar 

  113. Canutescu, A. A. & Dunbrack, R. L. Jr. Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci. 12, 963–972 (2003).

    Article  Google Scholar 

  114. Cortés, J., Siméon, T., Remaud-Siméon, M. & Tran, V. Geometric algorithms for the conformational analysis of long protein loops. J. Comput. Chem. 25, 956–967 (2004).

    Article  MATH  Google Scholar 

  115. Barozet, A., Chacón, P. & Cortés, J. Current approaches to flexible loop modeling. Curr. Res. Struct. Biol. 3, 187–191 (2021).

    Article  MATH  Google Scholar 

  116. Mandell, D. J., Coutsias, E. A. & Kortemme, T. Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling. Nat. Methods 6, 551–552 (2009).

    Article  Google Scholar 

  117. Barozet, A. et al. MoMA-LoopSampler: a web server to exhaustively sample protein loop conformations. Bioinformatics 38, 552–553 (2022).

    Article  Google Scholar 

  118. Jiang, H. et al. De novo design of buttressed loops for sculpting protein functions. Nat. Chem. Biol. 20, 974–980 (2024).

    Article  MATH  Google Scholar 

  119. Aguilar Rangel, M. et al. Fragment-based computational design of antibodies targeting structured epitopes. Sci. Adv. 8, eabp9540 (2022).

    Article  Google Scholar 

  120. Mann, S. I., Nayak, A., Gassner, G. T., Therien, M. J. & DeGrado, W. F. De novo design, solution characterization, and crystallographic structure of an abiological Mn-porphyrin-binding protein capable of stabilizing a Mn(V) species. J. Am. Chem. Soc. 143, 252–259 (2021).

    Article  Google Scholar 

  121. Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).

    Article  ADS  MATH  Google Scholar 

  122. Wang, J. et al. Scaffolding protein functional sites using deep learning. Science 377, 387–394 (2022).

    Article  ADS  MATH  Google Scholar 

  123. Szegedy, C. et al. Going deeper with convolutions. In Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2015).

  124. Yeh, A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).

    Article  ADS  MATH  Google Scholar 

  125. Wicky, B. I. M. et al. Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022).

    Article  ADS  Google Scholar 

  126. Frank, C. et al. Scalable protein design using optimization in a relaxed sequence space. Science 386, 439–445 (2024).

    Article  MATH  Google Scholar 

  127. Frank, C., Schiwietz, D., Fuß, L., Ovchinnikov, S. & Dietz, H. Alphafold2 refinement improves designability of large de novo proteins. Preprint at bioRxiv https://doi.org/10.1101/2024.11.21.624687 (2024).

  128. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).

  129. Song, Y. et al. Score-based generative modeling through stochastic differential equations. In 9th International Conference on Learning Representations, ICLR 2021 https://openreview.net/forum?id=PxTIG12RRHS (OpenReview.net, 2021).

  130. Lin, Y., Lee, M., Zhang, Z. & AlQuraishi, M. Out of many, one: designing and scaffolding proteins at the scale of the structural universe with Genie 2. Preprint at https://arxiv.org/abs/2405.15489 (2024).

  131. Yim, J. et al. SE(3) diffusion model with application to protein backbone generation. In Proc. Mahine Learning Research https://proceedings.mlr.press/v202/yim23a.html (OpenReview.net, 2023).

  132. Yim, J. et al. Fast protein backbone generation with SE(3) flow matching. Preprint at https://arxiv.org/abs/2310.05297 (2023).

  133. Wang, C. et al. Proteus: exploring protein structure generation for enhanced designability and efficiency. In Proc. 41st International Conference on Machine Learning https://openreview.net/forum?id=IckJCzsGVS (OpenReview.net, 2024).

  134. Huguet, G. et al. Sequence-augmented SE(3)-flow matching for conditional protein backbone generation. In Thirty-Eighth Annual Conference on Neural Information Processing Systems https://openreview.net/forum?id=paYwtPBpyZ (OpenReview.net, 2024).

  135. Campbell, A., Yim, J., Barzilay, R., Rainforth, T. & Jaakkola, T. S. Generative flows on discrete state-spaces: enabling multimodal flows with applications to protein co-design. In Proc. Forty-first International Conference on Machine Learning https://openreview.net/forum?id=kQwSbv0BR4 (OpenReview.net, 2024).

  136. Ren, M., Zhu, T. & Zhang, H. CarbonNovo: joint design of protein structure and sequence using a unified energy-based model. In Forty-first International Conference on Machine Learning, ICML 2024 https://openreview.net/forum?id=FSxTEvuFa7 (OpenReview.net, 2024).

  137. Chu, A. E. et al. An all-atom protein generative model. Proc. Natl Acad. Sci. USA 121, e2311500121 (2024).

    Article  MATH  Google Scholar 

  138. Lisanza, S. L. et al. Multistate and functional protein design using RoseTTAFold sequence space diffusion. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02395-w (2024).

  139. Qu, W. et al. P(all-atom) is unlocking new path for protein design. Preprint at bioRxiv https://doi.org/10.1101/2024.08.16.608235 (2024).

  140. Dahiyat, B. I., Sarisky, C. A. & Mayo, S. L. De novo protein design: towards fully automated sequence selection. J. Mol. Biol. 273, 789–796 (1997).

    Article  Google Scholar 

  141. Lovell, S. C., Word, J. M., Richardson, J. S. & Richardson, D. C. The penultimate rotamer library. Proteins 40, 389–408 (2000).

    Article  MATH  Google Scholar 

  142. Shapovalov, M. V. & Dunbrack, R. L. Jr. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 19, 844–858 (2011).

    Article  Google Scholar 

  143. Cooper, M. C., de Givry, S. & Schiex, T. Graphical models: queries, complexity, algorithms. In Proc. 37th International Symposium on Theoretical Aspects of Computer Science Vol. 154 (STACS 2020) (eds Paul, C. & Bläser, M.) 4:1–4:22 (Schloss Dagstuhl — Leibniz-Zentrum für Informatik, 2020).

  144. Hallen, M. A. et al. OSPREY 3.0: open-source protein redesign for you, with powerful new features. J. Comput. Chem. 39, 2494–2507 (2018).

    Article  MATH  Google Scholar 

  145. Hallen, M. A. & Donald, B. R. Protein design by provable algorithms. Commun. ACM 62, 76–84 (2019).

    Article  MATH  Google Scholar 

  146. Allouche, D. et al. Computational protein design as an optimization problem. Artif. Intell. 212, 59–79 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  147. Pierce, N. A. & Winfree, E. Protein design is NP-hard. Protein Eng. 15, 779–782 (2002).

    Article  MATH  Google Scholar 

  148. Simoncini, D. et al. Guaranteed discrete energy optimization on large protein design problems. J. Chem. Theory Comput. 11, 5980–5989 (2015).

    Article  MATH  Google Scholar 

  149. Khatri, B., Majumder, P., Nagesh, J., Penmatsa, A. & Chatterjee, J. Increasing protein stability by engineering the nπ* interaction at the β-turn. Chem. Sci. 11, 9480–9487 (2020).

    Article  Google Scholar 

  150. Boyken, S. E. et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science 352, 680–687 (2016).

    Article  ADS  Google Scholar 

  151. Pavlovicz, R. E., Park, H. & DiMaio, F. Efficient consideration of coordinated water molecules improves computational protein–protein and protein–ligand docking discrimination. PLoS Comput. Biol. 16, e1008103 (2020).

    Article  ADS  Google Scholar 

  152. Ruffini, M. et al. Guaranteed diversity and optimality in cost function network based computational protein design methods. Algorithms 14, 168 (2021).

    Article  MATH  Google Scholar 

  153. Colom, M. S. et al. Complete combinatorial mutational enumeration of a protein functional site enables sequence–landscape mapping and identifies highly-mutated variants that retain activity. Protein Sci. 33, e5109 (2024).

    Article  Google Scholar 

  154. DiMaio, F., Leaver-Fay, A., Bradley, P., Baker, D. & André, I. Modeling symmetric macromolecular structures in Rosetta3. PLoS ONE 6, e20450 (2011).

    Article  ADS  Google Scholar 

  155. Defresne, M., Barbe, S. & Schiex, T. Protein design with deep learning. Int. J. Mol. Sci. 22, 11741 (2021).

    Article  MATH  Google Scholar 

  156. Goverde, C. A. et al. Computational design of soluble and functional membrane protein analogues. Nature 631, 449–458 (2024).

    Article  MATH  Google Scholar 

  157. Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. O. Learning from protein structure with geometric vector perceptrons. In 9th International Conference on Learning Representations, ICLR 2021 https://openreview.net/forum?id=1YLJDvSx6J4 (OpenReview.net, 2021).

  158. Young, G. & Householder, A. S. Discussion of a set of points in terms of their mutual distances. Psychometrika 3, 19–22 (1938).

    Article  MATH  Google Scholar 

  159. Corso, G., Stark, H., Jegelka, S., Jaakkola, T. & Barzilay, R. Graph neural networks. Nat. Rev. Methods Primers 4, 17 (2024).

    Article  MATH  Google Scholar 

  160. Krapp, L. F., Meireles, F. A., Abriata, L. A. & Peraro, M. D. Context-aware geometric deep learning for protein sequence design. Nat. Commun. 15, 6273 (2024).

    Article  MATH  Google Scholar 

  161. Dessaux, D. et al. Designing symmetrical multi-component proteins using a hybrid generative AI approach. Preprint at bioRxiv https://doi.org/10.1101/2024.06.13.598662 (2024).

  162. Li, A. J. et al. Neural network-derived Potts models for structure-based protein design using backbone atomic coordinates and tertiary motifs. Protein Sci. 32, e4554 (2023).

    Article  ADS  Google Scholar 

  163. Silva, L. A., Meynard-Piganeau, B., Lucibello, C. & Feinauer, C. Uncovering sequence diversity from a known protein structure. Preprint at https://arxiv.org/abs/2406.11975 (2024).

  164. Durante, V., Katsirelos, G. & Schiex, T. Efficient low rank convex bounds for pairwise discrete graphical models. In Proc. Machine Learning Research Vol. 162 (eds Chaudhuri, K.) 5726–5741 (PMLR, 2022).

  165. Liu, Y. et al. Rotamer-free protein sequence design based on deep learning and self-consistency. Nat. Comput. Sci. 2, 451–462 (2022).

    Article  MATH  Google Scholar 

  166. Liu, J., Guo, Z., You, H., Zhang, C. & Lai, L. All-atom protein sequence design based on geometric deep learning. Angew. Chem. Int. Ed. 63, e202411461 (2024).

    Article  Google Scholar 

  167. Dauparas, J. et al. Atomic context-conditioned protein sequence design using LigandMPNN. Preprint at bioRxiv https://doi.org/10.1101/2023.12.22.573103 (2023).

  168. Krapp, L. F. et al. Context-aware geometric deep learning for protein sequence design. Nat. Commun. 15, 6273 (2024).

    Article  MATH  Google Scholar 

  169. Baldwin, E., Hajiseyedjavadi, O., Baase, W. & Matthews, B. The role of backbone flexibility in the accommodation of variants that repack the core of T4 lysozyme. Science 262, 1715–1718 (1993).

    Article  ADS  Google Scholar 

  170. Bordner, A. & Abagyan, R. Large-scale prediction of protein geometry and stability changes for arbitrary single point mutations. Proteins Struct. Funct. Bioinf. 57, 400–413 (2004).

    Article  MATH  Google Scholar 

  171. Boehr, D. D., Nussinov, R. & Wright, P. E. The role of dynamic conformational ensembles in biomolecular recognition. Nat. Chem. Biol. 5, 789–796 (2009).

    Article  MATH  Google Scholar 

  172. Sonaglioni, D. et al. Dynamic personality of proteins and effect of the molecular environment. J. Phys. Chem. Lett. 15, 5543–5548 (2024).

    Article  MATH  Google Scholar 

  173. Gaillard, T., Panel, N. & Simonson, T. Protein side chain conformation predictions with an MMGBSA energy function. Proteins Struct. Funct. Bioinf. 84, 803–819 (2016).

    Article  Google Scholar 

  174. Murphy, G. S. et al. Increasing sequence diversity with flexible backbone protein design: the complete redesign of a protein hydrophobic core. Structure 20, 1086–1096 (2012).

    Article  MATH  Google Scholar 

  175. Khatib, F. et al. Algorithm discovery by protein folding game players. Proc. Natl Acad. Sci. USA 108, 18949–18953 (2011).

    Article  ADS  MATH  Google Scholar 

  176. Tyka, M. D. et al. Alternate states of proteins revealed by detailed energy landscape mapping. J. Mol. Biol. 405, 607–618 (2011).

    Article  MATH  Google Scholar 

  177. Loshbaugh, A. L. & Kortemme, T. Comparison of Rosetta flexible-backbone computational protein design methods on binding interactions. Proteins Struct. Funct. Bioinf. 88, 206–226 (2020).

    Article  Google Scholar 

  178. Ollikainen, N., de Jong, R. M. & Kortemme, T. Coupling protein side-chain and backbone flexibility improves the re-design of protein–ligand specificity. PLoS Comput. Biol. 11, e1004335 (2015).

    Article  Google Scholar 

  179. Smith, C. A. & Kortemme, T. Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. J. Mol. Biol. 380, 742–756 (2008).

    Article  MATH  Google Scholar 

  180. Sun, M. G. & Kim, P. M. Data driven flexible backbone protein design. PLoS Comput. Biol. 13, e1005722 (2017).

    Article  ADS  Google Scholar 

  181. Simoncini, D., Zhang, K. Y., Schiex, T. & Barbe, S. A structural homology approach for computational protein design with flexible backbone. Bioinformatics 35, 2418–2426 (2019).

    Article  MATH  Google Scholar 

  182. Gainza, P., Roberts, K. E. & Donald, B. R. Protein design using continuous rotamers. PLoS Comput. Biol. 8, e1002335 (2012).

    Article  ADS  Google Scholar 

  183. Hallen, M. A., Keedy, D. A. & Donald, B. R. Dead-end elimination with perturbations (deeper): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins Struct. Funct. Bioinf. 81, 18–39 (2013).

    Article  Google Scholar 

  184. Hallen, M. A. & Donald, B. R. Cats (coordinates of atoms by Taylor series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics 33, i5–i12 (2017).

    Article  MATH  Google Scholar 

  185. Zuckerman, D. M. Statistical Physics of Biomolecules: An Introduction (CRC Press, 2010).

  186. Jou, J. D., Holt, G. T., Lowegard, A. U. & Donald, B. R. Minimization-aware recursive k*: a novel, provable algorithm that accelerates ensemble-based protein design and provably approximates the energy landscape. J. Comput. Biol. 27, 550–564 (2020).

    Article  MathSciNet  MATH  Google Scholar 

  187. Viricel, C., de Givry, S., Schiex, T. & Barbe, S. Cost function network-based design of protein–protein interactions: predicting changes in binding affinity. Bioinformatics 34, 2581–2589 (2018).

    Article  MATH  Google Scholar 

  188. Ojewole, A. A., Jou, J. D., Fowler, V. G. & Donald, B. R. Bbk*(branch and bound over k*): a provable and efficient ensemble-based protein design algorithm to optimize stability and binding affinity over large sequence spaces. J. Comput. Biol. 25, 726–739 (2018).

    Article  MathSciNet  Google Scholar 

  189. Silver, N. W. et al. Efficient computation of small-molecule configurational binding entropy and free energy changes by ensemble enumeration. J. Chem. Theory Comput. 9, 5098–5115 (2013).

    Article  MATH  Google Scholar 

  190. Kamisetty, H., Ramanathan, A., Bailey-Kellogg, C. & Langmead, C. J. Accounting for conformational entropy in predicting binding free energies of protein–protein interactions. Proteins Struct. Funct. Bioinf. 79, 444–462 (2011).

    Article  Google Scholar 

  191. Valiant, L. G. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 410–421 (1979).

    Article  MathSciNet  MATH  Google Scholar 

  192. Nisonoff, H. Efficient Partition Function Estimation in Computational Protein Design: Probabilistic Guarantees and Characterization of a Novel Algorithm. PhD thesis, Duke University, Durham (2015).

  193. Viricel, C., Simoncini, D., Barbe, S. & Schiex, T. Guaranteed weighted counting for affinity computation: beyond determinism and structure. In Principles and Practice of Constraint Programming: 22nd International Conference, CP 2016, Toulouse, France, September 5–9, 2016, Proceedings Vol. 22, 733–750 (Springer, 2016).

  194. Havranek, J. J. & Harbury, P. B. Automated design of specificity in molecular recognition. Nat. Struct. Biol. 10, 45–52 (2003).

    Article  MATH  Google Scholar 

  195. Desjarlais, J. R. & Handel, T. M. Side-chain and backbone flexibility in protein core design. J. Mol. Biol. 290, 305–318 (1999).

    Article  MATH  Google Scholar 

  196. Hu, X., Wang, H., Ke, H. & Kuhlman, B. High-resolution design of a protein loop. Proc. Natl Acad. Sci. USA 104, 17668–17673 (2007).

    Article  ADS  MATH  Google Scholar 

  197. Murphy, P. M., Bolduc, J. M., Gallaher, J. L., Stoddard, B. L. & Baker, D. Alteration of enzyme specificity by computational loop remodeling and design. Proc. Natl Acad. Sci. USA 106, 9215–9220 (2009).

    Article  ADS  Google Scholar 

  198. Davis, I. W., Arendall, W. B., Richardson, D. C. & Richardson, J. S. The backrub motion: how protein backbone shrugs when a sidechain dances. Structure 14, 265–274 (2006).

    Article  MATH  Google Scholar 

  199. Friedland, G. D., Linares, A. J., Smith, C. A. & Kortemme, T. A simple model of backbone flexibility improves modeling of side-chain conformational variability. J. Mol. Biol. 380, 757–774 (2008).

    Article  Google Scholar 

  200. Ollikainen, N., Smith, C. A., Fraser, J. S. & Kortemme, T. in Methods in Enzymology Vol. 523, 61–85 (Elsevier, 2013).

  201. Fu, X., Apgar, J. R. & Keating, A. E. Modeling backbone flexibility to achieve sequence diversity: the design of novel α-helical ligands for Bcl-xL. J. Mol. Biol. 371, 1099–1117 (2007).

    Article  Google Scholar 

  202. Fung, H. K., Floudas, C. A., Taylor, M. S., Zhang, L. & Morikis, D. Toward full-sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys. J. 94, 584–599 (2008).

    Article  ADS  MATH  Google Scholar 

  203. Sala, D., Engelberger, F., Mchaourab, H. & Meiler, J. Modeling conformational states of proteins with AlphaFold. Curr. Opin. Struct. Biol. 81, 102645 (2023).

    Article  Google Scholar 

  204. Del Alamo, D., Sala, D., Mchaourab, H. S. & Meiler, J. Sampling alternative conformational states of transporters and receptors with AlphaFold2. eLife 11, e75751 (2022).

    Article  Google Scholar 

  205. Wayment-Steele, H. K. et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 625, 832–839 (2024).

    Article  ADS  MATH  Google Scholar 

  206. Stein, R. A. & Mchaourab, H. S. SPEACH_AF: sampling protein ensembles and conformational heterogeneity with AlphaFold2. PLoS Comput. Biol. 18, e1010483 (2022).

    Article  Google Scholar 

  207. Kalakoti, Y. & Wallner, B. AFsample2: predicting multiple conformations and ensembles with AlphaFold2. Preprint at bioRxiv https://doi.org/10.1101/2024.05.28.596195 (2024).

  208. Bryant, P. & Noé, F. Structure prediction of alternative protein conformations. Nat. Commun. 15, 7328 (2024).

    Article  MATH  Google Scholar 

  209. Jing, B. et al. Eigenfold: generative protein structure prediction with diffusion models. Preprint at https://arxiv.org/abs/2304.02198 (2023).

  210. Zheng, S. et al. Predicting equilibrium distributions for molecular systems with deep learning. Nat. Mach. Intell. 6, 558–567 (2024).

    Article  MATH  Google Scholar 

  211. Lu, J., Zhong, B. & Tang, J. Score-based enhanced sampling for protein molecular dynamics. In ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling https://openreview.net/forum?id=NO3QwxuHv9#all (2023).

  212. Jing, B., Berger, B. & Jaakkola, T. S. AlphaFold meets flow matching for generating protein ensembles. In NeurIPS 2023 Workshop on Generative AI and Biology https://openreview.net/pdf?id=yQcebEgQfH (OpenReview.net, 2024).

  213. Albergo, M. S. & Vanden-Eijnden, E. Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, ICLR 2023 https://openreview.net/forum?id=li7qeBbCR1t (OpenReview.net, 2023).

  214. Davey, J. A. & Chica, R. A. Multistate approaches in computational protein design. Protein Sci. 21, 1241–1252 (2012).

    Article  MATH  Google Scholar 

  215. Karimi, M. & Shen, Y. iCFN: an efficient exact algorithm for multistate protein design. Bioinformatics 34, i811–i820 (2018).

    Article  Google Scholar 

  216. Vucinic, J., Simoncini, D., Ruffini, M., Barbe, S. & Schiex, T. Positive multistate protein design. Bioinformatics 36, 122–130 (2020).

    Article  Google Scholar 

  217. Davey, J. A., Damry, A. M., Euler, C. K., Goto, N. K. & Chica, R. A. Prediction of stable globular proteins using negative design with non-native backbone ensembles. Structure 23, 2011–2021 (2015).

    Article  Google Scholar 

  218. Davey, J. A. & Chica, R. A. Multistate computational protein design with backbone ensembles. Methods Mol. Biol. 1529, 161–179 (2017).

    Article  MATH  Google Scholar 

  219. Sauer, M. F., Sevy, A. M., Crowe, J. E. Jr. & Meiler, J. Multi-state design of flexible proteins predicts sequences optimal for conformational change. PLoS Comput. Biol. 16, e1007339 (2020).

    Article  ADS  Google Scholar 

  220. Ambroggio, X. I. & Kuhlman, B. Computational design of a single amino acid sequence that can switch between two distinct protein folds. J. Am. Chem. Soc. 128, 1154–1161 (2006).

    Article  MATH  Google Scholar 

  221. Sevy, A. M., Jacobs, T. M., Crowe, J. E. Jr. & Meiler, J. Design of protein multi-specificity using an independent sequence search reduces the barrier to low energy sequences. PLoS Comput. Biol. 11, e1004300 (2015).

    Article  ADS  Google Scholar 

  222. Leaver-Fay, A., Jacak, R., Stranges, P. B. & Kuhlman, B. A generic program for multistate protein design. PLoS ONE 6, e20937 (2011).

    Article  ADS  Google Scholar 

  223. Allen, B. D. & Mayo, S. L. An efficient algorithm for multistate protein design based on faster. J. Comput. Chem. 31, 904–916 (2010).

    Article  MATH  Google Scholar 

  224. Negron, C. & Keating, A. E. in Methods in Enzymology Vol. 523, 171–190 (Elsevier, 2013).

  225. Fromer, M., Yanover, C. & Linial, M. Design of multispecific protein sequences using probabilistic graphical modeling. Proteins Struct. Funct. Bioinf. 78, 530–547 (2010).

    Article  MATH  Google Scholar 

  226. Fromer, M. et al. SPRINT: side-chain prediction inference toolbox for multistate protein design. Bioinformatics 26, 2466–2467 (2010).

    Article  MATH  Google Scholar 

  227. Yanover, C., Fromer, M. & Shifman, J. M. Dead-end elimination for multistate protein design. J. Comput. Chem. 28, 2122–2129 (2007).

    Article  MATH  Google Scholar 

  228. Hallen, M. A. & Donald, B. R. COMETS (constrained optimization of multistate energies by tree search): a provable and efficient protein design algorithm to optimize binding affinity and specificity with respect to sequence. J. Comput. Biol. 23, 311–321 (2016).

    Article  MATH  Google Scholar 

  229. Traoré, S. et al. Fast search algorithms for computational protein design. J. Comput. Chem. 37, 1048–1058 (2016).

    Article  MATH  Google Scholar 

  230. Löffler, P., Schmitz, S., Hupfeld, E., Sterner, R. & Merkl, R. Rosetta: MSF: a modular framework for multi-state computational protein design. PLoS Comput. Biol. 13, e1005600 (2017).

    Article  ADS  Google Scholar 

  231. Nazet, J., Lang, E. & Merkl, R. Rosetta:MSF:NN: boosting performance of multi-state computational protein design with a neural network. PLoS ONE 16, e0256691 (2021).

    Article  Google Scholar 

  232. Eisenstein, M. Seven technologies to watch in 2022. Nature 601, 658–661 (2022).

    Article  ADS  MATH  Google Scholar 

  233. Porebski, B. T. & Buckle, A. M. Consensus protein design. Protein Eng. Des. Sel. 29, 245–251 (2016).

    Article  MATH  Google Scholar 

  234. Plückthun, A. Designed ankyrin repeat proteins (DARPins): binding proteins for research, diagnostics, and therapy. Annu. Rev. Pharmacol. Toxicol. 55, 489–511 (2015).

    Article  Google Scholar 

  235. Pabo, C. O., Peisach, E. & Grant, R. A. Design and selection of novel Cys2His2 zinc finger proteins. Annu. Rev. Biochem. 70, 313–340 (2001).

    Article  MATH  Google Scholar 

  236. Spence, M. A., Kaczmarski, J. A., Saunders, J. W. & Jackson, C. J. Ancestral sequence reconstruction for protein engineers. Curr. Opin. Struct. Biol. 69, 131–141 (2021).

    Article  MATH  Google Scholar 

  237. Voet, A. R. D. et al. Computational design of a self-assembling symmetrical β-propeller protein. Proc. Natl Acad. Sci. USA 111, 15102–15107 (2014).

    Article  ADS  MATH  Google Scholar 

  238. Reynolds, K. A., Russ, W. P., Socolich, M. & Ranganathan, R. in Methods in Enzymology 213–235 (Elsevier, 2013).

  239. Brender, J. R., Shultis, D., Khattak, N. A. & Zhang, Y. An evolution-based approach to DE novo protein design. Methods Mol. Biol. 1529, 243–264 (2017).

    Article  Google Scholar 

  240. Russ, W. P. et al. An evolution-based model for designing chorismate mutase enzymes. Science 369, 440–445 (2020).

    Article  ADS  MathSciNet  MATH  Google Scholar 

  241. Schmitz, S., Ertelt, M., Merkl, R. & Meiler, J. Rosetta design with co-evolutionary information retains protein function. PLoS Comput. Biol. 17, e1008568 (2021).

    Article  ADS  Google Scholar 

  242. Malbranke, C., Bikard, D., Cocco, S., Monasson, R. & Tubiana, J. Machine learning for evolutionary-based and physics-inspired protein design: current and future synergies. Curr. Opin. Struct. Biol. 80, 102571 (2023).

    Article  MATH  Google Scholar 

  243. Fram, B. et al. Simultaneous enhancement of multiple functional properties using evolution-informed protein design. Nat. Commun. 15, 5141 (2024).

    Article  MATH  Google Scholar 

  244. Verkuil, R. et al. Language models generalize beyond natural proteins. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521521 (2022).

  245. Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).

    Article  MATH  Google Scholar 

  246. Munsamy, G. et al. Conditional language models enable the efficient design of proficient enzymes. Preprint at bioRxiv https://doi.org/10.1101/2024.05.03.592223 (2024).

  247. Winski, A. et al. AlphaFold2 captures the conformational landscape of the HAMP signaling domain. Protein Sci. 33, e4846 (2024).

    Article  Google Scholar 

  248. Akdel, M. et al. A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol. 29, 1056–1067 (2022).

    Article  MATH  Google Scholar 

  249. McDonald, E. F., Jones, T., Plate, L., Meiler, J. & Gulsevin, A. Benchmarking AlphaFold2 on peptide structure prediction. Structure 31, 111–119.e2 (2023).

    Article  Google Scholar 

  250. Castorina, L. V., Petrenas, R., Subr, K. & Wood, C. W. PDBench: evaluating computational methods for protein-sequence design. Bioinformatics 39, btad027 (2023).

    Article  Google Scholar 

  251. Dallago, C. et al. FLIP: benchmark tasks in fitness landscape inference for proteins. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) https://openreview.net/forum?id=p2dMLEwL8tF (OpenReview.net, 2021).

  252. Notin, P. et al. ProteinGym: large-scale benchmarks for protein fitness prediction and design. in 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (eds Oh, A. et al.), Vol. 36, 64331–64379 (Curran Associates, Inc., 2023).

  253. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).

    Article  MATH  Google Scholar 

  254. Arun, K. S., Huang, T. S. & Blostein, S. D. Least-squares fitting of two 3-D point sets. IEEE Trans. Patt. Anal. Mach. Intell. 9, 698–700 (1987).

    Article  MATH  Google Scholar 

  255. Li, S. C., Bu, D., Xu, J. & Li, M. Finding nearly optimal GDT scores. J. Comput. Biol. 18, 693–704 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  256. Mariani, V., Biasini, M., Barbato, A. & Schwede, T. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 2722–2728 (2013).

    Article  Google Scholar 

  257. Wallner, B. AFsample: improving multimer prediction with AlphaFold using massive sampling. Bioinformatics 39, btad573 (2023).

    Article  MATH  Google Scholar 

  258. Roney, J. P. & Ovchinnikov, S. State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101 (2022).

    Article  ADS  MATH  Google Scholar 

  259. Bennett, N. R. et al. Improving de novo protein binder design with deep learning. Nat. Commun. 14, 2625 (2023).

    Article  ADS  MATH  Google Scholar 

  260. Liu, C. et al. Diffusing protein binders to intrinsically disordered proteins. Preprint at bioRxiv https://doi.org/10.1101/2024.07.16.603789 (2024).

  261. Wu, K. et al. Sequence-specific targeting of intrinsically disordered protein regions. Preprint at bioRxiv https://doi.org/10.1101/2024.07.15.603480 (2024).

  262. Manfredi, M. et al. Alpha&ESMhFolds: a web server for comparing AlphaFold2 and ESMFold models of the human reference proteome. J. Mol. Biol. 436, 168593 (2024).

    Article  MATH  Google Scholar 

  263. Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).

    Article  MATH  Google Scholar 

  264. Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. DiffDock: diffusion steps, twists, and turns for molecular docking. In The Eleventh International Conference on Learning Representations https://openreview.net/forum?id=kKF8_K-mBbS (ICLR 2023).

  265. Moretti, R., Bender, B. J., Allison, B. & Meiler, J. Rosetta and the design of ligand binding sites. Methods Mol. Biol. 1414, 47–62 (2016).

    Article  MATH  Google Scholar 

  266. Basu, S. & Wallner, B. DockQ: a quality measure for protein–protein docking models. PLoS ONE 11, e0161879 (2016).

    Article  MATH  Google Scholar 

  267. Dominguez, C., Boelens, R. & Bonvin, A. M. J. J. HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).

    Article  Google Scholar 

  268. Kanitkar, T. R. et al. Methods for molecular modelling of protein complexes. Methods Mol. Biol. 2305, 53–80 (2021).

    Article  MATH  Google Scholar 

  269. Radom, F., Plückthun, A. & Paci, E. Assessment of ab initio models of protein complexes by molecular dynamics. PLoS Comput. Biol. 14, e1006182 (2018).

    Article  ADS  Google Scholar 

  270. Wang, L. et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc. 137, 2695–2703 (2015).

    Article  MATH  Google Scholar 

  271. Chipot, C. Free energy methods for the description of molecular processes. Annu. Rev. Biophys. 52, 113–138 (2023).

    Article  MATH  Google Scholar 

  272. Barros, E. P. et al. Improving the efficiency of ligand-binding protein design with molecular dynamics simulations. J. Chem. Theory Comput. 15, 5703–5715 (2019).

    Article  MATH  Google Scholar 

  273. Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature 550, 74–79 (2017).

    Article  ADS  MATH  Google Scholar 

  274. Childers, M. C. & Daggett, V. Insights from molecular dynamics simulations for computational protein design. Mol. Syst. Des. Eng. 2, 9–33 (2017).

    Article  MATH  Google Scholar 

  275. Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).

    Article  Google Scholar 

  276. Gainza, P. et al. De novo design of protein interactions with learned surface fingerprints. Nature 617, 176–184 (2023).

    Article  ADS  MATH  Google Scholar 

  277. Gligorijević, V. et al. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 12, 3168 (2021).

    Article  ADS  MATH  Google Scholar 

  278. Sanderson, T., Bileschi, M. L., Belanger, D. & Colwell, L. J. ProteInfer, deep neural networks for protein functional inference. eLife 12, e80942 (2023).

    Article  Google Scholar 

  279. Brandes, N., Ofer, D., Peleg, Y., Rappoport, N. & Linial, M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 38, 2102–2110 (2022).

    Article  Google Scholar 

  280. Khersonsky, O. et al. Automated design of efficient and functionally diverse enzyme repertoires. Mol. Cell 72, 178–186.e5 (2018).

    Article  Google Scholar 

  281. Weinstein, J. Y. et al. Designed active-site library reveals thousands of functional GFP variants. Nat. Commun. 14, 2890 (2023).

    Article  ADS  MATH  Google Scholar 

  282. Kumar, N. & Skolnick, J. EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics 28, 2687–2688 (2012).

    Article  MATH  Google Scholar 

  283. Somarowthu, S., Yang, H., Hildebrand, D. G. C. & Ondrechen, M. J. High-performance prediction of functional residues in proteins with machine learning and computed input features. Biopolymers 95, 390–400 (2011).

    Article  Google Scholar 

  284. Somarowthu, S. & Ondrechen, M. J. POOL server: machine learning application for functional site prediction in proteins. Bioinformatics 28, 2078–2079 (2012).

    Article  MATH  Google Scholar 

  285. Tong, W., Wei, Y., Murga, L. F., Ondrechen, M. J. & Williams, R. J. Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D structure and sequence properties. PLoS Comput. Biol. 5, e1000266 (2009).

    Article  ADS  Google Scholar 

  286. Song, J. et al. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J. Theor. Biol. 443, 125–137 (2018).

    Article  ADS  MATH  Google Scholar 

  287. Zou, Z., Tian, S., Gao, X. & Li, Y. MlDEEPre: multi-functional enzyme function prediction with hierarchical multi-label deep learning. Front. Genet. 9, 714 (2018).

    Article  MATH  Google Scholar 

  288. Feehan, R., Franklin, M. W. & Slusky, J. S. G. Machine learning differentiates enzymatic and non-enzymatic metals in proteins. Nat. Commun. 12, 3712 (2021).

    Article  ADS  Google Scholar 

  289. Feehan, R., Copeland, M., Franklin, M. W. & Slusky, J. S. G. MAHOMES II: a webserver for predicting if a metal binding site is enzymatic. Protein Sci. 32, e4626 (2023).

    Article  Google Scholar 

  290. van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).

    Article  MATH  Google Scholar 

  291. Kim, W. et al. Rapid and sensitive protein complex alignment with Foldseek-multimer. Preprint at bioRxiv https://doi.org/10.1101/2024.07.15.603480 (2024).

  292. Holm, L. in Methods in Molecular Biology (ed. Clifton, N. J.) 29–42 (Springer US, 2020).

  293. Shindyalov, I. N. & Bourne, P. E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. Des. Sel. 11, 739–747 (1998).

    Article  MATH  Google Scholar 

  294. Johnson, S. R. et al. Computational scoring and experimental evaluation of enzymes generated by neural networks. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02214-2 (2024).

  295. Stam, M. J. & Wood, C. W. DE-STRESS: a user-friendly web application for the evaluation of protein designs. Protein Eng. Des. Sel. 34, gzab029 (2021).

    Article  MATH  Google Scholar 

  296. Goldenzweig, A. et al. Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol. Cell 63, 337–346 (2016).

    Article  Google Scholar 

  297. Marques, S. M., Planas-Iglesias, J. & Damborsky, J. Web-based tools for computational enzyme design. Curr. Opin. Struct. Biol. 69, 19–34 (2021).

    Article  MATH  Google Scholar 

  298. Hon, J. et al. SoluProt: prediction of soluble protein expression in Escherichia coli. Bioinformatics 37, 23–28 (2021).

    Article  MATH  Google Scholar 

  299. Ding, Z. et al. MPEPE, a predictive approach to improve protein expression in E. coli based on deep learning. Comput. Struct. Biotechnol. J. 20, 1142–1153 (2022).

    Article  MATH  Google Scholar 

  300. Thumuluri, V. et al. NetSolP: predicting protein solubility in E. coli using language models. Bioinformatics 38, 941–946 (2021).

    Article  MATH  Google Scholar 

  301. Walker, J. M. The Proteomics Protocols Handbook (Humana Press, 2005).

  302. Cock, P. J. A. et al. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    Article  MATH  Google Scholar 

  303. Schavemaker, P. E., Śmigiel, W. M. & Poolman, B. Ribosome surface properties may impose limits on the nature of the cytoplasmic proteome. eLife 6, e30084 (2017).

    Article  Google Scholar 

  304. Yagi, S. et al. Seven amino acid types suffice to create the core fold of RNA polymerase. J. Am. Chem. Soc. 143, 15998–16006 (2021).

    Article  MATH  Google Scholar 

  305. Berger, S. et al. Preclinical proof of principle for orally delivered Th17 antagonist miniproteins. Cell 187, 4305–4317.e18 (2024).

    Article  MATH  Google Scholar 

  306. Structural Genomics Consortium et al. Protein production and purification. Nat. Methods 5, 135–146 (2008).

    Article  Google Scholar 

  307. Wingfield, P. T. Overview of the purification of recombinant proteins. Curr. Protocols Protein Sci. https://doi.org/10.1002/0471140864.ps0601s80 (2015).

  308. Du, M. et al. 1Progress, applications, challenges and prospects of protein purification technology. Front. Bioeng. Biotechnol. https://doi.org/10.3389/fbioe.2022.1028691 (2022).

  309. Stemmer, W. P., Crameri, A., Ha, K. D., Brennan, T. M. & Heyneker, H. L. Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene 164, 49–53 (1995).

    Article  Google Scholar 

  310. Gould, N., Hendy, O. & Papamichail, D. Computational tools and algorithms for designing customized synthetic genes. Front. Bioeng. Biotechnol. 2, 41 (2014).

    Article  MATH  Google Scholar 

  311. Langan, R. A. et al. De novo design of bioactive protein switches. Nature 572, 205–210 (2019).

    Article  ADS  MATH  Google Scholar 

  312. Miles, A. J., Janes, R. W. & Wallace, B. A. Tools and methods for circular dichroism spectroscopy of proteins: a tutorial review. Chem. Soc. Rev. 50, 8400–8413 (2021).

    Article  MATH  Google Scholar 

  313. Micsonai, A. et al. Accurate secondary structure prediction and fold recognition for circular dichroism spectroscopy. Proc. Natl Acad. Sci. USA 112, E3095–E3103 (2015).

    Article  Google Scholar 

  314. Koga, R. et al. Robust folding of a de novo designed ideal protein even with most of the core mutated to valine. Proc. Natl Acad. Sci. USA 117, 31149–31156 (2020).

    Article  ADS  MATH  Google Scholar 

  315. Gao, K., Oerlemans, R. & Groves, M. R. Theory and applications of differential scanning fluorimetry in early-stage drug discovery. Biophys. Rev. 12, 85–104 (2020).

    Article  MATH  Google Scholar 

  316. Lössl, P., van de Waterbeemd, M. & Heck, A. Jr. The diverse and expanding role of mass spectrometry in structural and molecular biology. EMBO J. 35, 2634–2657 (2016).

    Article  MATH  Google Scholar 

  317. Lanucara, F., Holman, S. W., Gray, C. J. & Eyers, C. E. The power of ion mobility-mass spectrometry for structural characterization and the study of conformational dynamics. Nat. Chem. 6, 281–294 (2014).

    Article  MATH  Google Scholar 

  318. Karch, K. R., Snyder, D. T., Harvey, S. R. & Wysocki, V. H. Native mass spectrometry: recent progress and remaining challenges. Annu. Rev. Biophys. 51, 157–179 (2022).

    Article  Google Scholar 

  319. Figueroa, M. et al. The unexpected structure of the designed protein Octarellin V.1 forms a challenge for protein structure prediction tools. J. Struct. Biol. 195, 19–30 (2016).

    Article  MATH  Google Scholar 

  320. Yagi, S. & Tagami, S. An ancestral fold reveals the evolutionary link between RNA polymerase and ribosomal proteins. Nat. Commun. 15, 5938 (2024).

    Article  MATH  Google Scholar 

  321. Porter, L. L., Artsimovitch, I. & Ramírez-Sarmiento, C. A. Metamorphic proteins and how to find them. Curr. Opin. Struct. Biol. 86, 102807 (2024).

    Article  Google Scholar 

  322. Bhattacharya, S. et al. NMR-guided directed evolution. Nature 610, 389–393 (2022).

    Article  ADS  MATH  Google Scholar 

  323. Jaskolski, M., Dauter, Z. & Wlodawer, A. A brief history of macromolecular crystallography, illustrated by a family tree and its nobel fruits. FEBS J. 281, 3985–4009 (2014).

    Article  MATH  Google Scholar 

  324. Wlodawer, A., Minor, W., Dauter, Z. & Jaskolski, M. Protein crystallography for non-crystallographers, or how to get the best (but not more) from published macromolecular structures. FEBS J. 275, 1–21 (2008).

    Article  MATH  Google Scholar 

  325. Wlodawer, A., Minor, W., Dauter, Z. & Jaskolski, M. Protein crystallography for aspiring crystallographers or how to avoid pitfalls and traps in macromolecular structure determination. FEBS J. 280, 5705–5736 (2013).

    Article  MATH  Google Scholar 

  326. Saibil, H. R. Cryo-EM in molecular and cellular biology. Mol. Cell 82, 274–284 (2022).

    Article  MATH  Google Scholar 

  327. Jacques, D. A. & Trewhella, J. Small-angle scattering for structural biology — expanding the frontier while avoiding the pitfalls. Protein Sci. 19, 642–657 (2010).

    Article  MATH  Google Scholar 

  328. Skou, S., Gillilan, R. E. & Ando, N. Synchrotron-based small-angle X-ray scattering of proteins in solution. Nat. Protoc. 9, 1727–1739 (2014).

    Article  Google Scholar 

  329. Byer, A. S., Pei, X., Patterson, M. G. & Ando, N. Small-angle X-ray scattering studies of enzymes. Curr. Opin. Chem. Biol. 72, 102232 (2023).

    Article  Google Scholar 

  330. Kobayashi, N. et al. Self-assembling nano-architectures created from a protein nano-building block using an intermolecularly folded dimeric de novo protein. J. Am. Chem. Soc. 137, 11285–11293 (2015).

    Article  MATH  Google Scholar 

  331. Morris, R., Black, K. A. & Stollar, E. J. Uncovering protein function: from classification to complexes. Essays Biochem. 66, 255–285 (2022).

    Article  Google Scholar 

  332. Zhou, M., Li, Q. & Wang, R. Current experimental methods for characterizing protein–protein interactions. ChemMedChem 11, 738–756 (2016).

    Article  MATH  Google Scholar 

  333. Poluri, K. M., Gulati, K. & Sarkar, S. Experimental Methods for Determination of Protein–Protein Interactions 197–264 (Springer Singapore, 2021).

  334. Bisswanger, H. Enzyme assays. Perspect. Sci. 1, 41–55 (2014).

    Article  Google Scholar 

  335. Chong, S. Overview of Cell-free Protein Synthesis: Historic Landmarks, Commercial Systems, and Expanding Applications 16.30.1–16.30.11 (John Wiley & Sons, Inc., 2014).

  336. Alfi, A. et al. Cell-free mutant analysis combined with structure prediction of a lasso peptide biosynthetic protease B2. ACS Synth. Biol. 11, 2022–2028 (2022).

    Article  MATH  Google Scholar 

  337. Taguchi, H. & Niwa, T. Reconstituted cell-free translation systems for exploring protein folding and aggregation. J. Mol. Biol. 436, 168726 (2024).

    Article  MATH  Google Scholar 

  338. Thornton, E. L. et al. Applications of cell free protein synthesis in protein design. Protein Sci. 33, e5148 (2024).

    Article  MATH  Google Scholar 

  339. Zielonka, S. & Krah, S. (eds) in Methods in Molecular Biology 1st edn (ed. Clifton, N. J.) (Humana Press, 2019).

  340. Newton, M. S., Cabezas-Perusse, Y., Tong, C. L. & Seelig, B. In vitro selection of peptides and proteins — advantages of mRNA display. ACS Synth. Biol. 9, 181–190 (2020).

    Article  Google Scholar 

  341. Gantz, M., Mathis, S. V., Nintzel, F. E. H., Lio, P. & Hollfelder, F. On synergy between ultrahigh throughput screening and machine learning in biocatalyst engineering. Faraday Discuss. 252, 89–114 (2024).

    Article  Google Scholar 

  342. Park, C. & Marqusee, S. Pulse proteolysis: a simple method for quantitative determination of protein stability and ligand binding. Nat. Methods 2, 207–212 (2005).

    Article  MATH  Google Scholar 

  343. Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).

    Article  ADS  MathSciNet  MATH  Google Scholar 

  344. Linsky, T. W. et al. Sampling of structure and sequence space of small protein folds. Nat. Commun. 13, 7151 (2022).

    Article  ADS  MATH  Google Scholar 

  345. Araya, C. L. & Fowler, D. M. Deep mutational scanning: assessing protein function on a massive scale. Trends Biotechnol. 29, 435–442 (2011).

    Article  MATH  Google Scholar 

  346. Forrer, P., Jung, S. & Pluckthun, A. Beyond binding: using phage display to select for structure, folding and enzymatic activity in proteins. Curr. Opin. Struct. Biol. 9, 514–520 (1999).

    Article  Google Scholar 

  347. Seelig, B. & Szostak, J. W. Selection and evolution of enzymes from a partially randomized non-catalytic scaffold. Nature 448, 828–831 (2007).

    Article  ADS  MATH  Google Scholar 

  348. Layton, C. J., McMahon, P. L. & Greenleaf, W. J. Large-scale, quantitative protein assays on a high-throughput DNA sequencing chip. Mol. Cell 73, 1075–1082.e4 (2019).

    Article  Google Scholar 

  349. Markin, C. J. et al. Revealing enzyme functional architecture via high-throughput microfluidic enzyme kinetics. Science 373, eabf8761 (2021).

    Article  Google Scholar 

  350. Lee, J. et al. A broadly generalizable stabilization strategy for sarbecovirus fusion machinery vaccines. Nat. Commun. 15, 5496 (2024).

    Article  MATH  Google Scholar 

  351. Boyoglu-Barnum, S. et al. Quadrivalent influenza nanoparticle vaccines induce broad protection. Nature 592, 623–628 (2021).

    Article  ADS  Google Scholar 

  352. Walls, A. C. et al. Elicitation of potent neutralizing antibody responses by designed protein nanoparticle vaccines for SARS-CoV-2. Cell 183, 1367–1382.e17 (2020).

    Article  ADS  MATH  Google Scholar 

  353. Parkinson, J., Hard, R. & Wang, W. The RESP AI model accelerates the identification of tight-binding antibodies. Nat. Commun. 14, 454 (2023).

    Article  ADS  MATH  Google Scholar 

  354. Mason, D. M. et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat. Biomed. Eng. 5, 600–612 (2021).

    Article  MATH  Google Scholar 

  355. Makowski, E. K. et al. Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space. Nat. Commun. 13, 3788 (2022).

    Article  ADS  MATH  Google Scholar 

  356. Shanker, V. R., Bruun, T. U. J., Hie, B. L. & Kim, P. S. Inverse folding of protein complexes with a structure-informed language model enables unsupervised antibody evolution. Preprint at bioRxiv https://doi.org/10.1101/2023.12.19.572475 (2023).

  357. Shanehsazzadeh, A. et al. Unlocking de novo antibody design with generative artificial intelligence. Preprint at bioRxiv https://doi.org/10.1101/2023.01.08.523187 (2023).

  358. Mahajan, S. P., Ruffolo, J. A., Frick, R. & Gray, J. J. Hallucinating structure-conditioned antibody libraries for target-specific binders. Front. Immunol. 13, 999034 (2022).

    Article  Google Scholar 

  359. Giordano-Attianese, G. et al. A computationally designed chimeric antigen receptor provides a small-molecule safety switch for T-cell therapy. Nat. Biotechnol. 38, 426–432 (2020).

    Article  Google Scholar 

  360. Sesterhenn, F. et al. Boosting subdominant neutralizing antibody responses with a computationally designed epitope-focused immunogen. PLoS Biol. 17, e3000164 (2019).

    Article  MATH  Google Scholar 

  361. Dawson, W. M. et al. Differential sensing with arrays of de novo designed peptide assemblies. Nat. Commun. 14, 383 (2023).

    Article  ADS  MATH  Google Scholar 

  362. Quijano-Rubio, A. et al. De novo design of modular and tunable protein biosensors. Nature 591, 482–487 (2021).

    Article  ADS  MATH  Google Scholar 

  363. Zhang, J. Z. et al. Thermodynamically coupled biosensors for detecting neutralizing antibodies against SARS-CoV-2 variants. Nat. Biotechnol. 40, 1336–1340 (2022).

    Article  MATH  Google Scholar 

  364. Ng, A. H. et al. Modular and tunable biological feedback control using a de novo protein switch. Nature 572, 265–269 (2019).

    Article  ADS  MATH  Google Scholar 

  365. Lee, G. R. et al. Small-molecule binding and sensing with a designed protein family. Preprint at bioRxiv https://doi.org/10.1101/2023.11.01.565201 (2023).

  366. Rhys, G. G. et al. De novo designed peptides for cellular delivery and subcellular localisation. Nat. Chem. Biol. 18, 999–1004 (2022).

    Article  MATH  Google Scholar 

  367. Huddy, T. F. et al. Blueprinting extendable nanomaterials with standardized protein blocks. Nature 627, 898–904 (2024).

    Article  ADS  MATH  Google Scholar 

  368. Wargacki, A. J. et al. Complete and cooperative in vitro assembly of computationally designed self-assembling protein nanomaterials. Nat. Commun. 12, 883 (2021).

    Article  ADS  MATH  Google Scholar 

  369. Kratochvil, H. T. et al. Transient water wires mediate selective proton transport in designed channel proteins. Nat. Chem. 15, 1012–1021 (2023).

    Article  MATH  Google Scholar 

  370. Scott, A. J. et al. Constructing ion channels from water-soluble α-helical barrels. Nat. Chem. 13, 643–650 (2021).

    Article  MATH  Google Scholar 

  371. Shimizu, K. et al. De novo design of a nanopore for single-molecule detection that incorporates a β-hairpin peptide. Nat. Nanotechnol. 17, 67–75 (2022).

    Article  ADS  MATH  Google Scholar 

  372. Zhang, S. et al. Bottom-up fabrication of a proteasome-nanopore that unravels and processes single proteins. Nat. Chem. 13, 1192–1199 (2021).

    Article  ADS  MATH  Google Scholar 

  373. Courbet, A. et al. Computational design of mechanically coupled axle-rotor protein assemblies. Science 376, 383–390 (2022).

    Article  ADS  MATH  Google Scholar 

  374. Cao, L. et al. Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022).

    Article  ADS  MATH  Google Scholar 

  375. Lauko, A. et al. Computational design of serine hydrolases. Preprint at bioRxiv https://doi.org/10.1101/2024.08.29.610411 (2024).

  376. Schnettler, J. D. et al. Selection of a promiscuous minimalist cAMP phosphodiesterase from a library of de novo designed proteins. Nat. Chem. 16, 1200–1208 (2024).

    Article  MATH  Google Scholar 

  377. Röthlisberger, D. et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190–195 (2008).

    Article  ADS  Google Scholar 

  378. Siegel, J. B. et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels–Alder reaction. Science 329, 309–313 (2010).

    Article  ADS  MATH  Google Scholar 

  379. Bjelic, S. et al. Computational design of enone-binding proteins with catalytic activity for the Morita–Baylis–Hillman reaction. ACS Chem. Biol. 8, 749–757 (2013).

    Article  Google Scholar 

  380. Rajagopalan, S. et al. Design of activated serine-containing catalytic triads with atomic-level accuracy. Nat. Chem. Biol. 10, 386–391 (2014).

    Article  MATH  Google Scholar 

  381. Khersonsky, O. et al. Evolutionary optimization of computationally designed enzymes: Kemp eliminases of the KE07 series. J. Mol. Biol. 396, 1025–1042 (2010).

    Article  MATH  Google Scholar 

  382. Khersonsky, O. et al. Optimization of the in-silico-designed Kemp eliminase KE70 by computational design and directed evolution. J. Mol. Biol. 407, 391–412 (2011).

    Article  MATH  Google Scholar 

  383. Khersonsky, O. et al. Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed kemp eliminase KE59. Proc. Natl Acad. Sci. USA 109, 10358–10363 (2012).

    Article  ADS  MATH  Google Scholar 

  384. Blomberg, R. et al. Precision is essential for efficient catalysis in an evolved Kemp eliminase. Nature 503, 418–421 (2013).

    Article  ADS  MATH  Google Scholar 

  385. Giger, L. et al. Evolution of a designed retro-aldolase leads to complete active site remodeling. Nat. Chem. Biol. 9, 494–498 (2013).

    Article  MATH  Google Scholar 

  386. Preiswerk, N. et al. Impact of scaffold rigidity on the design and evolution of an artificial Diels-Alderase. Proc. Natl Acad. Sci. USA 111, 8013–8018 (2014).

    Article  ADS  MATH  Google Scholar 

  387. Obexer, R. et al. Emergence of a catalytic tetrad during evolution of a highly active artificial aldolase. Nat. Chem. 9, 50–56 (2017).

    Article  Google Scholar 

  388. Crawshaw, R. et al. Engineering an efficient and enantioselective enzyme for the Morita–Baylis–Hillman reaction. Nat. Chem. 14, 313–320 (2022).

    Article  MATH  Google Scholar 

  389. Lux, M. W., Strychalski, E. A. & Vora, G. J. Advancing reproducibility can ease the ‘hard truths’ of synthetic biology. Synth. Biol. 8, ysad014 (2023).

    Article  Google Scholar 

  390. Koehler Leman, J. et al. Better together: elements of successful scientific software development in a distributed collaborative community. PLoS Comput. Biol. 16, e1007507 (2020).

    Article  Google Scholar 

  391. Koehler Leman, J. et al. Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks. Nat. Commun. 12, 6947 (2021).

    Article  ADS  MATH  Google Scholar 

  392. Sandve, G. K., Nekrutenko, A., Taylor, J. & Hovig, E. Ten simple rules for reproducible computational research. PLoS Comput. Biol. 9, e1003285 (2013).

    Article  ADS  Google Scholar 

  393. Moreau, D., Wiebels, K. & Boettiger, C. Containers for computational reproducibility. Nat. Rev. Methods Primers 3, 50 (2023).

    Article  Google Scholar 

  394. Wilson, G. et al. Good enough practices in scientific computing. PLoS Comput. Biol. 13, e1005510 (2017).

    Article  Google Scholar 

  395. Gibney, E. Not all ‘open source’ AI models are actually open: here’s a ranking. Nature https://doi.org/10.1038/d41586-024-02012-5 (2024).

  396. Liesenfeld, A. & Dingemanse, M. Rethinking open source generative AI: open washing and the EU AI act. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (ACM, 2024).

  397. Hsia, Y. et al. Design of a hyperstable 60-subunit protein dodecahedron [corrected]. Nature 535, 136–139 (2016).

    Article  ADS  MATH  Google Scholar 

  398. Alberstein, R. G., Guo, A. B. & Kortemme, T. Design principles of protein switches. Curr. Opin. Struct. Biol. 72, 71–78 (2022).

    Article  Google Scholar 

  399. Cerasoli, E., Sharpe, B. K. & Woolfson, D. N. ZiCo: a peptide designed to switch folded state upon binding zinc. J. Am. Chem. Soc. 127, 15008–15009 (2005).

    Article  Google Scholar 

  400. Zhu, J. & Lu, P. Computational design of transmembrane proteins. Curr. Opin. Struct. Biol. 74, 102381 (2022).

    Article  MATH  Google Scholar 

  401. Chakravarty, D. & Porter, L. L. AlphaFold2 fails to predict protein fold switching. Protein Sci. 31, e4353 (2022).

    Article  Google Scholar 

  402. Zambaldi, V. et al. De novo design of high-affinity protein binders with AlphaProteo. Preprint at https://arxiv.org/abs/2409.08022 (2024).

  403. Lu, H. et al. Machine learning-aided engineering of hydrolases for PET depolymerization. Nature 604, 662–667 (2022).

    Article  ADS  MATH  Google Scholar 

  404. Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019).

    Article  ADS  MathSciNet  MATH  Google Scholar 

  405. Jones, J. A., Andreas, M. P. & Giessen, T. W. Exploring the extreme acid tolerance of a dynamic protein nanocage. Biomacromolecules 24, 1388–1399 (2023).

    Article  Google Scholar 

  406. Groenhof, G. Introduction to QM/MM simulations. Methods Mol. Biol. 924, 43–66 (2013).

    Article  Google Scholar 

  407. Majewski, M. et al. Machine learning coarse-grained potentials of protein thermodynamics. Nat. Commun. 14, 5739 (2023).

    Article  ADS  MATH  Google Scholar 

  408. Johnston, B. et al. Molecularnodes: v4.2.9 for Blender 4.2+. Zenodo https://doi.org/10.5281/zenodo.14241983 (2024).

  409. Fleuret, F. The little Book of Deep Learning https://fleuret.org/public/lbdl.pdf (Université de Genève, 2023).

  410. Vijayakumar, A. K. et al. Diverse beam search for improved description of complex scenes. In Proc. Thirty-Second AAAI Conference on Artificial Intelligence (eds McIlraith, S. A. & Weinberger, K. Q.) https://doi.org/10.1609/aaai.v32i1.12340 (AAAI Press, 2018).

Download references

Acknowledgements

Protein structures were rendered using the ‘Molecular Nodes’ add-on for blender.org408. This work was supported by the French ‘Investing for the Future — PIA3’ programme under the Grant agreement ANR-23-IACL-0002, by the French National Research Agency, under the Grant agreement ANR-22-CE45-0025-01, by the RCUK | Biotechnology and Biological Sciences Research Council (BBSRC) under the Grant agreement BB/V004220/1 and by the National Science Foundation under the Grant agreement 2019598.

Author information

Authors and Affiliations

Authors

Contributions

Introduction (K.I.A., D.N.W. and T.S.); Experimentation (K.I.A., D.N.W., S.B. and T.S.); Results (K.I.A. and S.T.); Applications (K.I.A., S.B. and T.S.); Reproducibility and data deposition (K.I.A., S.T. and T.S.); Limitations and optimizations (K.I.A. and T.S.); Outlook (T.S., D.N.W. and S.T.); conceptualization of the Primer (T.S.).

Corresponding authors

Correspondence to Katherine I. Albanese, Sophie Barbe, Shunsuke Tagami, Derek N. Woolfson or Thomas Schiex.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Methods Primers thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Responsible AI x Biodesign: https://responsiblebiodesign.ai/

Supplementary information

Glossary

EC number

The Enzyme Commission (EC) number is a unique four-digit numerical classification system used to identify and categorize enzymes based on their catalytic function.

Embeddings

Numerical internal vector representations of some input data that can be extracted from deep-learning networks trained on this type of data.

Epistasis

Interactions between amino acid residues that influence the structure, function or stability of the protein.

Forward folding

Forward folding consists of predicting the structure of a designed protein sequence to check whether it is predicted to fold in the targeted structure.

Generative model

A probabilistic model of data \(P(X),P(X,Y)\,{\rm{or}}\,P(X|Y)\) that can be sampled to generate more data-like objects. The realism of the generated objects depends both on the training data and the ability of the model to capture the complex dependencies that exist.

Graphical model

Mathematical models that represent complex numerical functions of many variables as a combination, usually the sum, of many functions involving few variables, as do pairwise decomposable energy functions.

Inductive bias

Set of assumptions that a machine-learning model makes about the nature of data and the relationships within it, which influences how the model learns and generalizes from training data to new, unseen data. These assumptions are baked into the model’s architecture, learning algorithm and training process.

Interface predicted alignment error

(ipAE). Specific to multimer predictions, measuring the average pAE of interchain residue pairs.

Interface predicted template modelling score

Specific to multimer predictions, measuring the accuracy of the predicted relative positions of the subunits forming the protein–protein complex.

Inverse-folding problem

The problem of finding a sequence that will fold onto a given backbone structure.

Multiple sequence alignment

(MSAs). An arrangement of several biological sequences in a way that highlights regions of similarity, indicative of evolutionary relationships, functional similarities or structural similarities between the sequences.

Multistate design

(MSD). A design approach in which multiple states of the designed protein are simultaneously taken into account during sequence design.

Oligomeric state

Number of peptide or protein subunits that interact non-covalently to form a functional protein assembly.

Out-of-distribution

Data that are significantly different from the data a model was trained on.

Position-specific score matrix

(PSSM). A 20 ×  matrix with one column for every position of a multiple sequence alignment, where each column vector contains the \(\log ({f}_{{\rm{aa}}})\), in which \({f}_{{\rm{aa}}}\) is the frequency of amino acid aa in the multiple sequence alignment column.

Predicted alignment error

(pAE). A measure of how confident the structure prediction software is in the relative position of two residues within a predicted structure.

Predicted local distance difference test

(pLDDT). Scaled from 0 to 100, this test measures the confidence in the local structure, predicting how well the prediction would agree with an experimental structure. It is based on the local distance difference test , which is a score that does not rely on superposition but instead measures the correctness of the local distances.

Predicted template modelling score

(pTM). A prediction of how well the modelling software has predicted the overall structure.

Probability distribution

For discrete objects such as sequences, a probability distribution maps each object x from the considered collection of objects to its probability \(P(x)\). The sum of all the probabilities of all objects in the collection must sum to 1. This can be guaranteed by normalizing the distribution.

Protein language models

Probabilistic models of protein sequences that assign a probability to protein sequences.

Rational design

Human design following rules of thumb, physical and expert protein knowledge. May be computer-assisted, using molecular dynamics, for example.

Scoring functions

A combination of probabilistic information provided by physical energy with statistical information extracted from data, assembled to estimate the probability of observing a protein in a given conformation in a computationally tractable form.

Theozyme

A theozyme is a theoretical minimal active site model composed of a calculated transition state, including key functional groups from amino acid side chains needed for transition state stabilization of the substrate.

Topology

Protein topology is a property of a protein that does not change under deformation (without breaking a bond). In biology, this is extended to include mutual orientation of secondary structures (α-helices, β-strands, etc.) in the protein structure.

Transition state

The highest-energy intermediate state that briefly exists during a chemical reaction. Enzymes accelerate reactions by lowering this energy barrier.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Albanese, K.I., Barbe, S., Tagami, S. et al. Computational protein design. Nat Rev Methods Primers 5, 13 (2025). https://doi.org/10.1038/s43586-025-00383-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s43586-025-00383-1

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics