Complete computational design of high-efficiency Kemp elimination enzymes

Listov, Dina; Vos, Eva; Hoffka, Gyula; Hoch, Shlomo Yakir; Berg, Andrej; Hamer-Rogotner, Shelly; Dym, Orly; Kamerlin, Shina Caroline Lynn; Fleishman, Sarel J.

doi:10.1038/s41586-025-09136-2

Download PDF

Article
Open access
Published: 18 June 2025

Complete computational design of high-efficiency Kemp elimination enzymes

Nature volume 643, pages 1421–1427 (2025)Cite this article

66k Accesses
26 Citations
96 Altmetric
Metrics details

Subjects

This article has been updated

Abstract

Until now, computationally designed enzymes exhibited low catalytic rates^1,2,3,4,5 and required intensive experimental optimization to reach activity levels observed in comparable natural enzymes^5,6,7,8,9. These results exposed limitations in design methodology and suggested critical gaps in our understanding of the fundamentals of biocatalysis^10,11. We present a fully computational workflow for designing efficient enzymes in TIM-barrel folds using backbone fragments from natural proteins and without requiring optimization by mutant-library screening. Three Kemp eliminase designs exhibit efficiencies greater than 2,000 M⁻¹ s⁻¹. The most efficient shows more than 140 mutations from any natural protein, including a novel active site. It exhibits high stability (greater than 85 °C) and remarkable catalytic efficiency (12,700 M⁻¹ s⁻¹) and rate (2.8 s⁻¹), surpassing previous computational designs by two orders of magnitude^1,2,3,4,5. Furthermore, designing a residue considered essential in all previous Kemp eliminase designs increases efficiency to more than 10⁵ M⁻¹ s⁻¹ and rate to 30 s⁻¹, achieving catalytic parameters comparable to natural enzymes and challenging fundamental biocatalytic assumptions. By overcoming limitations in design methodology¹¹, our strategy enables programming stable, high-efficiency, new-to-nature enzymes through a minimal experimental effort.

Enriching productive mutational paths accelerates enzyme evolution

Article Open access 11 September 2024

Computational enzyme design by catalytic motif scaffolding

Article Open access 03 December 2025

The road to fully programmable protein catalysis

Article 01 June 2022

Main

Natural enzymes are exceptionally versatile, selective and highly efficient catalysts. Yet, computational design of enzymes that match this proficiency, particularly for non-natural reactions, remains elusive¹¹. Recent advances in computational design have enabled rapid and effective optimization of natural enzyme stability, expressibility, catalytic rate and selectivity through fully computational workflows^12,13. Furthermore, advances in fold design enabled the grafting of natural or engineered active sites into idealized de novo backbones^14,15. By contrast, enzymes designed de novo, that is, without recourse to naturally occurring enzymes that catalyse the same reaction, were orders of magnitude less active relative to comparable natural ones^1,2,3,4,5,11. Previous studies have therefore used repeated cycles of laboratory evolution, involving high-throughput screening of mutants, to reach effective enzymes^5,6,7,8,9. Such cycles are inefficient and are restricted to reactions that can be assayed in medium-to-high-throughput fashion¹¹. Critically, continuing to rely on large-library screening of random mutants suggests that our understanding and control of the fundamentals of biocatalysis are far from complete.

The Kemp elimination (KE) reaction (Fig. 1a), a prototype for natural base-catalysed proton abstraction, has long served as a model for studying de novo enzyme design, as no natural enzyme is known to have been evolved for this reaction. Despite increasing sophistication in protein design methods, computationally designed Kemp eliminases exhibited low catalytic efficiencies and rates (k_cat/K_M 1–420 M⁻¹ s⁻¹ and k_cat 0.006–0.7 s⁻¹, respectively)^1,3 and required further optimization by iterative mutational library screening to achieve catalytic parameters comparable to⁷ or above⁶ the median values of enzymes in nature (k_cat/K_M 10⁵ M⁻¹ s⁻¹, k_cat 10 s⁻¹)¹⁶.

**Fig. 1: Key steps in the design workflow.**

The underlying reasons for the low efficiencies of de novo designed enzymes have been intensely studied^10,17,18,19. These analyses revealed that the designed active sites exhibited significant structural distortions relative to the design conception^17,19. Notably, catalysis is extremely sensitive to molecular details, and shifts of the catalytic constellation by a few degrees or tenths of an Ångstrom from optimality may translate into orders of magnitude decreases in efficiency²⁰. Furthermore, designs often exhibited low stability and expressibility⁷, limiting their ability to accommodate activity-enhancing mutations^7,21. Further concerns were that fixed-backbone design methods fail to precisely position non-native catalytic groups¹; the molecular details of the designed transition state (theozyme) were uncertain^6,22; and that protein dynamics²³ and long-range electrostatic interactions may be necessary to achieve high catalytic efficiency but are unaccounted for in the design process^6,24,25.

Recent analyses suggested that overcoming the shortcomings of de novo enzyme design methodology may require artificial intelligence-based approaches, more accurate physics-based energetics and data from high-throughput screening^11,26. Here, we test whether recent developments in atomistic protein design that allow accurate backbone²⁷ and sequence^28,29 design in natural protein folds address the limitations of de novo enzyme design methodology without resorting to experimental optimization or big-data analyses. To directly compare with previous design approaches, we apply the strategy to the KE reaction and generate enzymes that rival laboratory-evolved eliminases without recourse to high-throughput screening or iterative mutagenesis.

Designing stability, foldability and activity

Our working hypothesis is that effective enzyme design demands control over all protein degrees of freedom to establish stability, foldability and accurate positioning of the theozyme. Foldability, the ability of the protein to fold uniquely into the design conception, has been a long-standing challenge for de novo enzyme design. Over the past decade, foldability has been partly addressed through de novo fold design, enabling the generation of numerous stable and accurately designed proteins^30,31,32,33. These design methods, however, maximize foldability, generating backbones that are dominated by ideal secondary structure elements that lack the non-ideal elements that may lower foldability but are nonetheless needed for sophisticated functions^12,34. Until now, functionalizing de novo generated folds has produced enzymes that exhibited rates (k_cat) well below 1 s⁻¹. In certain cases, de novo designs exhibited high catalytic efficiencies (10⁴ to 10⁵ M⁻¹ s⁻¹)^15,35 but only through very low K_M values (0.3–30 µM). Low K_M values indicate tight binding of the substrate in its ground state, suggesting that the designs optimize molecular recognition of their substrates before the catalytic step. By contrast, turnover numbers (k_cat) reflect the chemical transformation following substrate binding and are a more stringent test of the ability to design high-efficiency catalysts rather than effective binders³⁶. The persistently low k_cat values, including in recent studies^15,35, highlight the challenge of achieving catalytic control in enzyme design.

Given these limitations, we focused on the TIM-barrel fold, which is one of the most prevalent protein folds found among enzymes^37,38. In this fold, the residues of the central β barrel are oriented towards the active-site cavity, providing many opportunities for optimally placing the catalytic and substrate-binding groups. We reasoned that despite the challenges in designing accurate and functional TIM barrels^39,40, this fold provides an attractive framework for engineering new enzymatic functions.

We developed a computational method that can be applied, in principle, to any reaction, given a precomputed theozyme. The workflow starts by generating thousands of backbones using combinatorial assembly and design (Fig. 1b, step 1), which combines fragments from homologous proteins to generate new backbones^27,41,42. Subsequently, Protein Repair One Stop Shop (PROSS) design calculations are applied to stabilize the designed conformation²⁹ (Fig. 1b, step 2). The resulting structures show backbone variations within the active-site pocket, increasing the likelihood of obtaining foldable backbones that position the theozyme and supporting residues in a catalytically competent and energetically relaxed constellation. Following backbone generation, we implement geometric matching⁴³ to position the KE theozyme in each of the designed structures and optimize the remainder of the active site using Rosetta atomistic calculations⁴⁴, in effect mutating all active-site positions, including the vestigial catalytic residues of the natural enzyme (Fig. 1b, step 3). The workflow results in millions of designs which are filtered using a ‘fuzzy-logic’ optimization objective function⁴⁵. This approach balances potentially conflicting objectives that are critical for design of function, such as low system energy and high desolvation of the catalytic base. Selecting a few dozen top-scoring designs, we next stabilize the active site and positions in the protein core⁴⁶ (Fig. 1b, step 4), resulting in designs with more than 100 mutations from any natural protein. Unlike previous approaches, this workflow emphasizes stability across the entire protein. It capitalizes on the ability to generate thousands of stable, natural-like TIM barrels that exhibit backbone diversity in the active site²⁷ and on automated scaffold²⁹ and active-site²⁸ sequence design methods that have been validated on dozens of natural enzymes¹².

Efficient, stable and accurate Kemp eliminases

We applied our pipeline to the indole-3-glycerol-phosphate synthase (IGPS) enzyme family, which can sterically accommodate the 5-nitrobenzisoxazole substrate and was previously used to design Kemp eliminases^1,7. The theozyme builds on a catalytic constellation derived from quantum-mechanical calculations^47,48. It includes a nucleophile, such as Asp or Glu, which serves as a base for proton abstraction from the substrate, and an aromatic sidechain that forms π-stacking interactions with the substrate in the transition state (Fig. 1b, step 3). The latter interaction has been used in all previous computational Kemp eliminase design studies to promote binding to the aromatic benzisoxazole rings^1,3. Typical design studies also introduced a polar interaction with the isoxazole oxygen to stabilize the developing negative charge in the transition state^1,3. We excluded this requirement from our theozyme because a water molecule can satisfy it, and a misplaced polar group could reduce reactivity by lowering the pK_a of the catalytic base.

We selected 73 designs for experimental testing. The designs ranged from 245 to 268 amino acids and were diverse, with 30–93% sequence identity to one another and 41–59% identity to any natural protein. In total, 66 designs were solubly expressed and 14 showed cooperative thermal denaturation (Extended Data Fig. 1). Three designs showed measurable KE activity in an initial screen, with the top two designs, Des27 and Des61, exhibiting k_cat/K_M values of 130 and 210 M⁻¹ s⁻¹, respectively, and k_cat < 1 s⁻¹ (Extended Data Fig. 2, Extended Data Table 1 and Supplementary Table 1).

The catalytic rate and efficiency of these designs are on a par with previously designed enzymes^1,3, falling short by several orders of magnitude from comparable natural eliminases and from designed Kemp eliminases that were optimized through laboratory-evolution campaigns^6,7. To optimize these designs computationally, we applied FuncLib to active-site positions, excluding the theozyme residues. The FuncLib method restricts amino acid mutations to those likely to appear in the natural diversity of homologous proteins²⁸. To develop an optimization strategy for a de novo reaction, we removed all homology-based restrictions in the active site, thus using atomistic energy as the sole optimization objective function. We selected 6 and 12 low-energy designs for experimental testing for Des61 and Des27, respectively, each comprising 5–8 specific mutations relative to their origin. All designs exhibited high expression yields and showed cooperative denaturation (Extended Data Fig. 1 and Supplementary Table 1). One design derived from Des61 showed catalytic efficiency of 3,600 M⁻¹ s⁻¹ and k_cat of 0.85 s⁻¹. Remarkably, eight designs on the basis of Des27 showed increased catalytic rates by 10–70-fold (Extended Data Table 1 and Supplementary Table 1), with Des27.7, harbouring seven mutations relative to Des27, reaching k_cat/K_M 12,700 M⁻¹ s⁻¹ and k_cat 2.85 s⁻¹, a rate that is an order of magnitude greater than that of any previously reported computational design³ (Fig. 2a,b). This design diverges significantly from natural IGPSs, and a pairwise sequence alignment to the closest protein in the non-redundant sequence database reveals 141 mutations and multiple insertions and deletions (Extended Data Fig. 3). It also diverges in sequence and backbone from previously designed Kemp eliminases in natural IGPS scaffolds¹ and features a different active-site constellation and position.

**Fig. 2: Improving catalytic efficiency through low-throughput screening of FuncLib designs.**

We analysed the structural models of Des27 and its FuncLib-derived variants to understand the mechanistic basis for the differences in catalytic efficiency, which span three orders of magnitude, using the Rosetta force field and molecular dynamics (MD) simulations. A sequence alignment of the FuncLib designs shows that Ile136Val, Ile216Val and Val183Ile are associated with high catalytic efficiency (Fig. 3a). Contrasting the structure models of Des27 and Des27.7 reveals that these mutations may increase hydrophobic packing around the catalytic Asp162, probably improving its preorganization and desolvation and increasing its reactivity (Fig. 3a,b, top). Indeed, the Rosetta-computed van der Waals (vdW) energy of Asp162 is highly correlated with catalytic efficiency among the FuncLib designs (Spearman ρ = −0.88, P = 6 × 10⁻⁵; Fig. 3a). As further support, MD simulations show that Asp162 is conformationally dynamic in the ligand-unbound models, sampling multiple metastable conformations, and that the fraction of non-productive conformations decreases in Des27.7 relative to Des27 (Extended Data Fig. 4a). Furthermore, the Des27 model suggests that Leu236 may partly overlap with the substrate (Fig. 3b, middle), and that the mutation to Val in Des27.7 would alleviate this unfavourable interaction while increasing the volume of the pocket from 717 to 829 Å³ (Extended Data Fig. 4b,c). Finally, Ile54Val, Phe92His and Leu183Val may improve the solvation of the polar nitro moiety of the substrate (Fig. 3b, bottom), and Phe92His may enable water-mediated polar interactions with the nitro group. Thus, although the seven mutations in Des27.7 are mostly conservative, their aggregate markedly improves the catalytic parameters by reshaping the active-site pocket for better substrate recognition and optimizing the preorganization and reactivity of the catalytic base.

**Fig. 3: Structural, energetic and dynamic contributions to the improved catalytic rate of Des27.7.**

To analyse the stability of the substrate within the active site, we conducted microsecond MD simulations of Des27 and Des27.7, starting from their ligand-bound design models. In both cases and across all replicas, the substrate exited and re-entered the active-site pocket multiple times (Extended Data Fig. 5), with Des27.7 showing five times more substrate retention (Fig. 3c and Supplementary Table 2). This contrasts with the typical scenario in MD simulations in which unbinding events are terminal^49,50. Thus, the MD simulations indicate that our designs exhibit high affinity for the substrate, and that Des27.7 improves it further. We also noticed that the substrate may enter the pocket in two reactive conformations that are inverted: one that closely matches the design model, with the nitro substituent occupying the entrance to the active site, and one in which it is inverted by approximately 180° (Fig. 3d). Empirical valence bond (EVB) calculations of reaction free energies⁵¹ show similar energy profiles for both conformations, indicating that both are catalytically competent (Fig. 3e and Supplementary Table 3). Taken together, the MD and EVB calculations suggest that the experimentally measured results reflect the sum of both reaction modes, with the ‘out’ conformation (Fig. 3c) being occupied a greater fraction of MD simulation time than the ‘in’ conformation, but with EVB predicting the in conformation as being slightly more reactive in the optimized Des27.7 variant (Fig. 3e). Further, although such desolvated nitro group (‘in’) conformations were observed in previous de novo designed Kemp eliminases^6,49,50, in those studies only one conformation was catalytically competent⁵⁰. Thus, the high efficiency of Des27.7 may be partly due to the high preorganization of the active-site pocket and its ability to accommodate productive substrate interactions through distinct conformations.

To verify the molecular accuracy of the design process, we determined the structure of Des27.7 in the unbound form by crystallographic analysis (Extended Data Table 2; PDB 9HVB). All active-site positions aligned well with the design conception (less than 0.7 Å all-atom root mean squared deviation (r.m.s.d.) across 20 residues), including the catalytic Asp162, although a slight shift (r.m.s.d. 0.78 Å) was observed in the orientation of Phe113. Outside the active-site pocket, 180 of 257 positions aligned with backbone r.m.s.d. < 0.6 Å, but 65 amino acids either deviated or did not exhibit significant electron density, probably due to backbone flexibility in this region (Extended Data Fig. 6a,b). This fragment is known to be dynamic in the IGPS protein family⁵², but it lies outside the active-site pocket and probably does not contribute directly to reactivity and substrate recognition. Taken together, our results verify a fully computational pipeline that designs an accurate de novo active site and generates a stable and high-efficiency new-to-nature enzyme.

Necessary and sufficient conditions for design

Our computational workflow is based on the combination of several design components, each of which introduces multiple mutations that address aspects that are critical for efficient biocatalysis, such as backbone diversity, stability, foldability and activity. We next probed whether each of these components contributes to the intended property and whether all are essential.

We started by examining whether modular assembly and design is essential for generating diverse backbones. Instead of applying modular assembly and design, we applied the subsequent steps of the workflow to 1,072 representative IGPSs that were modelled using AlphaFold2 (Methods). We tested 55 designs (design round 2), of which 49 were solubly expressed (89%) and 28 (50%) exhibited apparent cooperative unfolding with apparent melting temperature (T_m) values 47–88 °C. In total, 70% of the cooperatively folded designs (20 designs) showed measurable KE activity with k_cat/K_M in the range of 0.5–155 M⁻¹ s⁻¹, demonstrating that the workflow can design stable and functional Kemp eliminases in a wide range of different starting points. As expected, designs that did not show cooperative unfolding lacked KE activity. We applied FuncLib to the active sites of six designs and tested 9–14 variants for each starting point. In five cases, catalytic efficiencies improved by 3–10-fold (Supplementary Table 1), with the highest catalytic efficiency reaching 300 M⁻¹ s⁻¹ (R2.Des39.2). We determined the crystallographic structure of two designs, R2.Des39 (k_cat/K_M 100 M⁻¹ s⁻¹) and Des49 (k_cat/K_M 150 M⁻¹ s⁻¹) (Extended Data Fig. 6c–i, PDB IDs 9HVH and 9HVG and Extended Data Table 2). The active sites were close to their design conceptions (r.m.s.d. < 0.6 Å and r.m.s.d. < 0.82 Å, respectively), but, in both cases, several loops either lacked electron density or exhibited significant conformational changes compared with the designs, which could impede substrate entry to the active site⁵³. To explore whether the foldability of these loops could be improved, we applied FuncLib to stabilize these regions according to the design models. Three of 16 FuncLib variants of Des39.2 showed a significant increase in catalytic efficiency, with improvements up to 20-fold compared with the original design, reaching k_cat/K_M 2,000 M⁻¹ s⁻¹ (Extended Data Table 1 and Supplementary Table 1), but none surpassed the performance of Des27.7. These results demonstrate that large-scale artificial intelligence-based structure prediction of natural enzymes provides a valuable resource for de novo enzyme design, and that the computational workflow reproducibly generates efficient enzymes. In this case, however, optimization with FuncLib reached superior catalytic parameters in the designs derived from modular assembly, which may reflect the greater structural diversity in these designs.

As a next step to understanding the necessary and sufficient conditions for design of high-efficiency enzymes, we deconvoluted the contributions of each design component to the high stability and activity of Des27.7. As a baseline, we tested the outcome of combinatorial assembly and design alone (with 92 mutations relative to any natural protein), excluding both the PROSS-based stability mutations and the active-site design. This variant exhibited an apparent melting temperature of 57 °C and no detectable KE activity. Adding the 11 PROSS-designed mutations substantially improved both bacterial expression and thermal stability (69 °C) (Fig. 4 and Supplementary Fig. 1). Separately, grafting the active site from Des27.7 (15 mutations) onto the combinatorial assembly starting point (without PROSS stabilizing mutations) conferred high activity levels (2,900 M⁻¹ s⁻¹) but fourfold lower than in Des27.7. Combining the modular assembly, PROSS and the designed active site yielded a synergistic, higher-than-expected improvement in both stability and reactivity beyond the contribution of the individual components. Thus, despite the large number of mutations introduced by each computational component, resulting designs did not exhibit the trade-offs between stability and activity that were often reported in laboratory-evolution campaigns^7,21,54. Furthermore, although active-site mutations are often assumed to compromise stability²¹, in our case, the designed active site contributed positively to stability. Collectively, these findings emphasize the importance of stabilizing the entire protein to obtain efficient enzymes and the potential for synergy between stability and activity-promoting mutations when using reliable sequence design methods²¹.

**Fig. 4: Deconvoluting the contributions of the mutations in Des27.7 reveals necessary and sufficient components for effective de novo enzyme design.**

Finally, we evaluated the contribution of the theozyme to the activity of Des27.7. Mutating the catalytic base Asp162 to Ala completely abolished activity, verifying that the designed base is essential. Remarkably, this single-point mutation also markedly increased protein stability, with the apparent melting temperature rising from 85 °C in Des27.7 to above boiling point (Fig. 4). This significant increase in stability underscores the strong destabilization induced by desolvating a charged group in the core of the active site and the importance of effective stability design methods.

We then tested whether the second theozyme residue, Phe113, was essential by replacing it with point mutations suggested by atomistic design. Replacement with Met and Leu exhibited similar Rosetta energies to the original Phe, and we subjected these point mutants to experimental analysis. The Met mutation showed similar catalytic parameters to Phe (Extended Data Table 1), suggesting that an aromatic identity is not essential at this position. Strikingly, Phe113Leu led to an order of magnitude increase in catalytic efficiency and rate to k_cat/K_M of 123,000 M⁻¹ s⁻¹ and k_cat of 30 s⁻¹, surpassing by two orders of magnitude recently designed enzymes in artificial intelligence-generated proteins (k_cat = 0.03–0.7 s⁻¹)^14,15,35. To understand the reasons for this large gain in efficiency, we compared Leu113 in models of the unbound and transition states. Unlike the reorientation observed for Phe113 between the ligand-bound model and unbound experimental structure of Des27.7 (Extended Data Fig. 7a), Leu113 exhibits almost no sidechain conformation changes (Extended Data Fig. 7b), suggesting that this mutation improves active-site preorganization.

We note that the aromatic theozyme residue was forced in all our design steps and was based on previous Kemp eliminase design studies^1,3. The fact that a completely aliphatic active-site pocket effectively accelerates the KE reaction is in line with the observation that London dispersion forces are sufficient for transition-state stabilization⁵⁵. This finding challenges a two-decade assumption in computational Kemp eliminase design that an aromatic residue is important for ligand binding^1,3, demonstrating how de novo design of function can expose shortcomings in our understanding of fundamental aspects in biocatalysis.

Conclusions

De novo enzyme design has until now resulted in rudimentary catalytic rates and required iterative random mutagenesis to close the gap with enzymes found in nature. Our strategy uses recent approaches for reliable backbone and sequence design in natural folds to generate diverse TIM-barrel backbones, stabilize the protein and design preorganized active-site constellations. This comprehensive design approach allowed us to explore the principles underlying high stability and activity in KE biocatalysis. In a single step, we generated a dozen designs with activities that spanned three orders of magnitude and gained insights into the determinants of high-efficiency catalysis. The best variant showed high stability and remarkable catalytic efficiency for a fully designed enzyme (greater than 85 °C and 12,700 M⁻¹ s⁻¹, respectively), which was increased to over 10⁵ M⁻¹ s⁻¹ with a single designed mutation. Active-site preorganization combined with the ability to adopt multiple catalytically competent substrate-bound modes distinguishes this design from previously generated ones. Importantly, our best design exhibited a catalytic rate (30 s⁻¹) and efficiency on par with the median values of natural enzymes¹⁶. Thus, the ability to design large sets of diverse backbones and encode high protein stability and active-site preorganization is necessary and sufficient for generating high-efficiency enzymes of model reactions. Furthermore, contrary to recent suggestions¹¹, the results confirm that current atomistic methods are already sufficiently reliable to generate efficient enzymes in natural folds without extensive experimental screening, big-data analyses or artificial intelligence-generated scaffolds. Future improvements in modelling theozymes may enable fully programmable biocatalysis.

Methods

Backbone generation and stabilization

Modular assembly and design was applied as described in ref. ⁴². In brief, five different IGPS structures (PDB entries 1LBF, 1I4A, 1JCM, 1VC4 and 4FB7) were aligned and segmented into five fragments according to points of maximum structure conservation at positions 44, 105, 154 and 206 (numbering relative to PDB entry 1I4N). The fragments were then computationally combined all against all, and Rosetta sequence design was applied to optimize the stability and compatibility between the segments, resulting in 2,500 backbones. Design calculations were constrained using a position-specific scoring matrix (PSSM) that was generated for each structure using PROSS²⁹. For designs 37–73, a further stabilization protocol was applied. This protocol, based on mutational scanning with PSSM constraints, identifies the most beneficial mutations across the protein. These mutations are combined and threaded onto the input structure. These backbones were then evaluated by an activity predictor²⁷ and the top 1,000 designs were chosen.

To implement the workflow without recourse to modular assembly and design (design round 2), we used BLAST to search the non-redundant sequence database with the sequence of the Thermotoga maritima IGPS (PDB entry 1I4N), identifying 4,381 IGPS homologues. These were clustered with CD-HIT⁵⁶ by 30–90% sequence identity to one another and 1,200 were selected for further analysis. The structures of these sequences were modelled using ColabFold AlphaFold2 (refs. ^57,58). Models with average local confidence in predicted structures (predicted local distance difference test (pLDDT)) scores below 90 were discarded, leaving 1,072 backbones. All structures were subjected to PROSS stability design calculations²⁹ and Design 8 for each was selected for further calculations. For models generated using AlphaFold2, PROSS design was disabled in amino acids that exhibited low predicted confidence (pLDDT < 90%) and those that were 5 Å from these residues⁵⁹.

Catalytic site generation

Theozyme geometries (Supplementary Table 4) were based on previous calculations¹. Geometric parameters that define the catalytic placements, such as tolerance, penalty coefficient, periodicity and number of matching samples to test, were manually adjusted⁶⁰. The interaction between the catalytic base and the acidic carbon on the ligand was defined as covalent to mimic transition-state geometry. Theozyme placement was carried out using the Rosetta Matcher algorithm⁴³. All positions inside or in the opening of the active site were allowed for theozyme matching (Fig. 1, step 3).

Initial active-site design and filtering

After matching, Rosetta sequence design was performed in an 8 Å shell around the ligand and catalytic residues. The design was performed under theozyme and PSSM constraints. To constrain the sequence space, the catalytic residues of the IGPS family, as described by the M-CSA database⁶¹, underwent Rosetta computational mutation scanning, and all mutations with ΔΔG_system < +1 R.e.u. compared with the starting identity were included as allowed for design. The Rosetta Match and design steps generated 10⁵ to 10⁶ designs for each starting structure. Designs were filtered on the basis of a ‘fuzzy’-logic objective function⁴⁵ that balanced potentially conflicting criteria: energy density (system energy divided by the protein length), energy rank relative to other designs in the same backbone, active-site vdW energy, catalytic base vdW, ligand solvation and accuracy of theozyme geometry. vdW energy is defined as the sum of the Rosetta atomistic energy terms fa_atr and fa_rep (as weighted in Rosetta scoring function⁶²).

Active-site and core stabilization

To enhance active-site stability, we performed an enumeration of all low-energy mutations in the active site with ΔΔG_system < +3 R.e.u. and chose the top variant. To ensure amino acid optimality throughout the protein, a pSUFER⁴⁶ scan was performed on the whole protein excluding the active site. Flagged positions, those with at least five favourable amino acid substitutions (ΔΔG_system < 0), were redesigned using FuncLib calculations²⁸. The lowest-energy design was selected.

Computational validation

Active-site preorganization was analysed by performing extensive rigid-body minimization in the absence of the ligand. Structures in which the catalytic base exhibited an r.m.s.d. > 1.2 Å relative to the ligand-unbound model were discarded. For the R2 series the workflow included an extra validation step comparing the bound model and the AlphaFold2-predicted model. Designs were accepted if the r.m.s.d. between the AlphaFold2 model and the Rosetta model was less than 1 Å.

Active-site optimization

All functional variants identified through experimental screening were optimized by identifying diverse and stable active-site constellations using FuncLib²⁸. FuncLib uses two filters to constrain the enumerated sequence space: a filter based on homologous sequences and exclusion of destabilizing point mutations. However, in de novo design of function, the homologous sequence filter is irrelevant and was omitted. For experimental screening, the 10–15 lowest-energy designs were selected.

Protein expression

The designed genes were ordered from Twist Bioscience, cloned into pET28 plasmid with an N-terminal His-tag, followed by a bdSUMO tag. Plasmids were transformed into Escherichia coli BL21 (DE3) cells. For expression, 50 ml of 2YT medium supplemented with 50 μg ml⁻¹ kanamycin was inoculated with 500 μl of overnight culture produced from a single colony and grown at 37 °C until optical density (OD)₆₀₀ 0.6–0.8. Overexpression was induced by adding 1 mM IPTG and the cultures were grown for 20 h at 16 °C and collected, and the pellet was frozen at −20 °C. The cells were resuspended in basic buffer (50 mM Tris-Cl pH 7.25, 200 mM NaCl) supplemented with 10 µg ml⁻¹ lysozyme, protease-inhibitor cocktail (Sigma) and benzonase, lysed by sonication and centrifuged at 20,000g for 30 min at 4 °C. The soluble fraction was loaded onto an Ni-NTA (nitrilotriacetic acid) column and washed twice with basic buffer and 20 mM imidazole. The protein was subjected to overnight on-column Sumo protease cleavage at 4 °C (5 μg ml⁻¹ in basic buffer). Protein purity was assessed by SDS–PAGE. Protein concentration was determined using Pierce BCA protein assay kit. For crystallography, large-scale expression was performed in 1,500 ml of culture. After Ni-NTA purification and bdSUMO cleavage, the protein was purified by gel filtration (HiLoad 26/600 Superdex75 preparative grade column, GE).

Activity assay and determination of kinetic parameters

Product formation was monitored spectrophotometrically at 380 nm in 200-μl reaction volumes using 96-well plates. For initial screening, the reactions were started by adding 150 μl of 1 mM 5-nitrobenzisoxazole in basic buffer to 50 μl of purified protein. 5-Nitrobenzisoxazole was used from 0.1 M stock in acetonitrile. For the kinetic characterization, 150 μl of 5-nitrobenzisoxazole at various concentrations (final 0.05–0.75 mM in basic buffer with 1 mM acetonitrile) was mixed with 50 μl of purified protein. Kinetic parameters were obtained by fitting the data to the Michaelis–Menten equation v₀ = k_cat[E]₀[S]₀/([S]₀ + K_M). At low substrate concentrations the data were fitted to the linear regime of the Michaelis–Menten model v₀ = [S]₀[E]₀k_cat/K_M, and k_cat/K_M values were inferred from the slope. All measurements in the main text were performed in biological duplicates or triplicates.

Thermal stability

Apparent T_m measurements were performed using nanoscale differential scanning fluorimetry (nanoDSF) experiments (Prometheus NT.Plex instrument, NanoTemper Technologies). The temperature ramp was 20–95 °C with 1.0 °C min⁻¹ slope.

Crystallization, data collection and structure determination

Crystals were grown at 19 °C using the sitting-drop vapour diffusion method. Diffraction data were collected from a single crystal flash-cooled to 100 K, using a wavelength of 1.34 Å. Data were collected using an in-house Rigaku liquid-metal-jet X-ray Synergy System with a HyPix Arc 150° detector. AlphaFold2 (ref. ⁵⁷) was used to generate all three models for molecular replacement (Extended Data Table 2). Initial models were iteratively rebuilt and refined using COOT⁶³ and PHENIX⁶⁴. Model geometry was evaluated using MOLPROBITY⁶⁵. Atomic coordinates and structure factors for Des27.7, R2.Des39 and R2.Des49 are deposited in the PDB database under accession numbers 9HVB, 9HVH and 9HVG, respectively.

Specific crystallization conditions

Des27.7: the well solution contained 0.15 M lithium sulfate monohydrate, 0.1 M citric acid (pH 3.5) and 18% polyethylene glycol (PEG) 6000. Diffraction data were collected to 2.0 Å. Des27.7 crystallized in the P6₁ space group, with one subunit in the asymmetric unit.

R2.Des39: the well solution contained 0.07 M citric acid, 0.03 M Bis-Tris propane (pH 3.4) and 14% PEG 3350. Diffraction data were collected to 2.1 Å resolution. R2.Des39 crystallized in the P2₁ space group, with two subunits in the asymmetric unit.

R2.Des49: the well solution contained 8% Tacsimate (pH 7.0) and 20% PEG 3350. Diffraction data were collected to 1.9 Å resolution. R2.Des49 crystallized in the C222₁ space group, with one subunit in the asymmetric unit.

MD and EVB simulations

System setup

Two designed Kemp eliminases, Des27 and Des27.7, were simulated in both their ligand-bound and unbound forms, using MD simulations to model dynamics, and EVB simulations⁵¹. All simulations were initiated from the FuncLib design models for each variant. Ligand-bound simulations were performed in complex with the substrate 5-nitrobenzisoxazole. The partial charges for the substrate were calculated using restrained electrostatic potential⁶⁶ fitting at the HF/6-31 G(d) level of theory with Antechamber⁶⁷, on the basis of gas-phase geometries optimized at the B3LYP/6-31 G(d) level of theory in Gaussian 16 Rev. B.01 (ref. ⁶⁸). All other force field parameters for the substrate were obtained from the General AMBER Force Field (GAFF2)⁶⁹. Residue protonation states were checked using PROPKA 3.0 (refs. ^70,71) to estimate sidechain pK_as, coupled with visual examination using PyMOL, on the basis of which all residues were kept in their standard protonation states at physiological pH. For simulations of the unbound system, the catalytic residue Asp162 was modelled in its protonated form. All systems were solvated in a truncated octahedral water box containing OPC water molecules⁷², extending 11.0 Å from the protein in all directions. Neutralization was achieved using 12 Mg²⁺ and 12 Cl⁻ counterions for the three variants. Protonation patterns of histidine residues for each system are collected in Supplementary Table 5. Non-standard substrate parameters are provided in Supplementary Table 6 and in the Zenodo data package available at https://doi.org/10.5281/zenodo.14563437 (ref. ⁷³).

Classical MD simulations

All MD simulations in this work were performed using the HIP-accelerated version of Amber24 (ref. ⁷⁴) using the ff19SB force field⁷⁵ and the OPC water model⁷². MD simulations for all systems followed the same protocol, used also in previous work modelling designed Kemp eliminases⁵⁰. For a detailed description, see ref. ⁵⁰. In brief, each trajectory was first energy minimized with 100 steps of the steepest-descent algorithm, followed by 900 steps of conjugate gradient minimization, applying a 100-kcal mol⁻¹ Å⁻² restraint to all solute (protein and substrate) atoms. The system was then heated from 50 to 300 K in an NVT ensemble using simulated annealing, reaching 300 K within the first 100 ps and continuing for a total of 1 ns with a 1-fs time step. Langevin temperature control⁷⁶ was used with a collision frequency of 1 ps⁻¹. During this stage, the 100-kcal mol⁻¹ Å⁻² solute restraints were maintained and subsequently reduced to 10 kcal mol⁻¹ Å⁻² in later equilibration steps. A second energy minimization and heating step followed, with positional restraints applied to solute heavy atoms. During subsequent equilibration, the restraints were progressively reduced from 10 to 1 to 0.1 kcal mol⁻¹ Å⁻² before being fully removed. The systems, now with no restraints applied, underwent final equilibration for 1 ns in an NPT ensemble (300 K, 1 atm) using a Berendsen barostat⁷⁷ with a 1-ps pressure relaxation time and Langevin temperature control (collision frequency of 1 ps⁻¹). The SHAKE algorithm⁷⁸ was applied to constrain all bonds involving hydrogen atoms, and all equilibration simulations used a 1-fs time step. Production MD runs were performed using a 4-fs time step, enabled by hydrogen mass repartitioning⁷⁹ and the SHAKE algorithm⁷⁸, with an 8 Å direct space non-bonded cutoff, Langevin temperature control (collision frequency of 1 ps⁻¹) and a Berendsen barostat (pressure relaxation time of 1 ps). Equilibration of these trajectories is shown in Extended Data Fig. 4d,e. The final production trajectories were 1-µs long for each system, with 5 independent replicas per system, resulting in a total of 5 µs of simulation time per system and 15 µs across all systems.

EVB simulations

EVB simulations were performed on the Des27 and Des27.7 variants, using both substrate conformers (‘in’ and ‘out’; Fig. 3c) observed as being dominant in the MD simulations, and following the same protocol described in detail in ref. ⁵⁰. Reactive in and out conformers were extracted from our MD simulations and overlaid onto the FuncLib predicted structures of Des27 and Des27.7 as starting coordinates for the EVB simulations. Before the EVB simulations, in all cases, the enzyme–substrate complex was minimized with Amber24 (ref. ⁷⁴) in vacuum, with 2,500 steps of the steepest-descent algorithm, followed by 2,500 steps of conjugate gradient minimization, applying 10-kcal mol⁻¹ Å⁻² positional restraints on all heavy (non-hydrogen) atoms. The minimization was repeated with the same steps, with 5-kcal mol⁻¹ Å⁻² positional restraint on protein C_α-atoms, and substrate heavy atoms, and with twice as many steps, keeping the restraint only on the substrate heavy atoms.

All EVB simulations were performed using the Q6 simulation package⁸⁰, the OPLS-AA force field⁸¹, the TIP3P water model⁸² and the surface constrained all atom solvent (SCAAS) model⁸³ to describe solvent. Long-range interactions were described using the local reaction field approach⁸⁴. Protonation states of ionizable residues within the explicit simulation sphere, as well as histidine protonation patterns (both of which were validated by PROPKA 3.0 (refs. ^70,71) and visual inspection), can be found in Supplementary Table 7. Each system was simulated in 30 replicas of 30-ns equilibration, with 5-kcal mol⁻¹ Å⁻² distance-based harmonic restraints applied between the substrate hydrogen donor carbon and the acceptor oxygen of Asp162. Each equilibration was followed by 10.2 ns of EVB simulations (200-ps window over 51 discrete EVB windows), carried out without the distance restraint applied, leading to a cumulative 612 ns of EVB simulation time per system (including ‘in’ and ‘out’ substrate conformations), and 3.6 µs of EVB equilibration time and 1.2 µs of EVB simulation time across all systems studied in this work (4.8 µs of simulation time in total). The corresponding r.m.s.d. values of the equilibration phase (calculated with the QCalc6 module of Q6) are shown in Extended Data Fig. 8a,b. We note that in one of the initial 30 EVB trajectories for Des27.7 with the substrate in the out conformation, we observed active-site distortion with the catalytic Asp moving into a non-reactive conformation. This trajectory was excluded from further analysis, with an extra trajectory being run to create a full set of 30 replicas.

Representative stationary points for the KE reaction catalysed by Des27, extracted from EVB simulations of this system, are shown in Extended Data Fig. 8c,d. To extract the conformations representing each stationary point, all 30 replicas were evaluated together. The MDTraj software (version 1.10.0)⁸⁵ was applied to convert the trajectories to a CPPTRAJ compatible format, and clustering was performed based on the r.m.s.d. of the substrate heavy atoms, using the average linkage clustering method, with an ε value of 0.75. We note that the key stationary points for Des27.7 are visually similar to those for Des27. Sample input files, parameter files, starting structures and simulation snapshots have all been made available on Zenodo at https://doi.org/10.5281/zenodo.14563437 (ref. ⁷³).

Simulation analysis

Unless otherwise stated, all MD analyses were performed using the CPPTRAJ module⁸⁶ of AmberTools24 (ref. ⁸⁷). Trajectory frames were extracted every 400 ps, and results (where applicable) are reported as averages and standard deviations over 5 × 1-µs trajectories per system. The fractions of unbound and bound modes during the simulations were determined by counting trajectory frames. A bound mode was defined on the basis of the distance between the ligand and the centre of mass of the active site, including residues 54, 84, 86, 92, 136, 162, 183 and 236. A threshold of 4 Å was defined to classify the frames into unbound or bound. Unbound modes have left the active-site pocket, but not necessarily dissociated from the protein itself (sampling non-productive conformations out of the active site). The substrate orientation was defined using the distance between the C_α-atom of residue Leu41 and the N1 and N2 atoms of the substrate. Conformations with a Leu41–N1 distance between 0 and 15 Å and L41–N2 distance between 15 and 20 Å were classified as ‘out’, and otherwise as ‘in’. Pocket volumes of Des27 and Des27.7 systems were calculated using MDPocket^88,89, with snapshots taken every 4 ns of the simulations for unbound systems. Additionally, the volume of the ligand was calculated using the mol_volume package in VMD⁹⁰. Finally, PyMOL was used for all visualization analyses.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All data generated and analysed during the study are available within the paper and its Supplementary Information. The crystal structures of Des27.7, R2.Des39 and R2.Des49 are deposited in the Protein Data Bank (PDB) under accession codes 9HVB, 9HVH and 9HVG, respectively. The crystal structures for all IGPS enzymes are available through the PDB with accession codes 1LBF, 1I4A, 1JCM, 1VC4 and 4FB7.

Code availability

Custom Python scripts, RosettaScripts⁹¹, command lines, Jupyter notebooks and datasets used for de novo enzyme design are available at https://github.com/Fleishman-Lab/denovoKemp. Code and specifications used for MD analysis are available at Zenodo (https://doi.org/10.5281/zenodo.14563437)⁷³.

Change history

24 November 2025
In the originally published Supplementary Information, some sequence names in Supplementary Table 1 did not correspond to the correct sequences. Supplementary Table 1 has now been corrected, and information about plasmid availability through AddGene has been added where relevant.

References

Röthlisberger, D. et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190–195 (2008).
Article ADS PubMed Google Scholar
Siegel, J. B. et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels-Alder reaction. Science 329, 309–313 (2010).
Article ADS PubMed PubMed Central CAS Google Scholar
Privett, H. K. et al. Iterative approach to computational enzyme design. Proc. Natl Acad. Sci. USA 109, 3790–3795 (2012).
Article ADS PubMed PubMed Central CAS Google Scholar
Jiang, L. et al. De novo computational design of retro-aldol enzymes. Science 319, 1387–1391 (2008).
Article ADS PubMed PubMed Central CAS Google Scholar
Yeh, A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).
Article ADS PubMed PubMed Central CAS Google Scholar
Blomberg, R. et al. Precision is essential for efficient catalysis in an evolved Kemp eliminase. Nature 503, 418–421 (2013).
Article ADS PubMed CAS Google Scholar
Khersonsky, O. et al. Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59. Proc. Natl Acad. Sci. USA 109, 10358–10363 (2012).
Article ADS PubMed PubMed Central CAS Google Scholar
Giger, L. et al. Evolution of a designed retro-aldolase leads to complete active site remodeling. Nat. Chem. Biol. 9, 494–498 (2013).
Article PubMed PubMed Central CAS Google Scholar
Patsch, D. et al. Enriching productive mutational paths accelerates enzyme evolution. Nat. Chem. Biol. 20, 1662–1669 (2024).
Article PubMed PubMed Central CAS Google Scholar
Korendovych, I. V. & DeGrado, W. F. Catalytic efficiency of designed catalytic proteins. Curr. Opin. Struct. Biol. 27, 113–121 (2014).
Article PubMed CAS Google Scholar
Lovelock, S. L. et al. The road to fully programmable protein catalysis. Nature 606, 49–58 (2022).
Article ADS PubMed CAS Google Scholar
Listov, D., Goverde, C. A., Correia, B. E. & Fleishman, S. J. Opportunities and challenges in design and optimization of protein function. Nat. Rev. Mol. Cell Biol. 25, 639–653 (2024).
Article PubMed CAS Google Scholar
Marques, S. M., Planas-Iglesias, J. & Damborsky, J. Web-based tools for computational enzyme design. Curr. Opin. Struct. Biol. 69, 19–34 (2021).
Article PubMed CAS Google Scholar
Braun, M. et al. Computational design of highly active de novo enzymes. Preprint at bioRxiv https://doi.org/10.1101/2024.08.02.606416 (2024).
Lauko, A. et al. Computational design of serine hydrolases. Science https://doi.org/10.1126/science.adu2454 (2025).
Article PubMed PubMed Central Google Scholar
Bar-Even, A. et al. The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry 50, 4402–4410 (2011).
Article PubMed CAS Google Scholar
Khare, S. D. & Fleishman, S. J. Emerging themes in the computational design of novel enzymes and protein-protein interfaces. FEBS Lett. 587, 1147–1154 (2013).
Article PubMed CAS Google Scholar
Baker, D. An exciting but challenging road ahead for computational enzyme design. Protein Sci. 19, 1817–1819 (2010).
Article PubMed PubMed Central CAS Google Scholar
Kries, H., Blomberg, R. & Hilvert, D. De novo enzymes by computational design. Curr. Opin. Chem. Biol. 17, 221–228 (2013).
Article PubMed CAS Google Scholar
Kiss, G., Çelebi-Ölçüm, N., Moretti, R., Baker, D. & Houk, K. N. Computational enzyme design. Angew. Chem. Int. Ed. Engl. 52, 5700–5725 (2013).
Article PubMed CAS Google Scholar
Goldenzweig, A. & Fleishman, S. J. Principles of protein stability and their application in computational design. Annu. Rev. Biochem. 87, 105–129 (2018).
Article PubMed CAS Google Scholar
Frushicheva, M. P., Cao, J., Chu, Z. T. & Warshel, A. Exploring challenges in rational enzyme design by simulating the catalysis in artificial Kemp eliminase. Proc. Natl Acad. Sci. USA 107, 16869–16874 (2010).
Article ADS PubMed PubMed Central CAS Google Scholar
Otten, R. et al. How directed evolution reshapes the energy landscape in an enzyme to boost catalysis. Science 370, 1442–1446 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Nagel, Z. D. & Klinman, J. P. A 21st century revisionist’s view at a turning point in enzymology. Nat. Chem. Biol. 5, 543–550 (2009).
Article PubMed CAS Google Scholar
Vaissier Welborn, V. & Head-Gordon, T. Computational design of synthetic enzymes. Chem. Rev. 119, 6613–6630 (2019).
Article PubMed CAS Google Scholar
Chu, A. E., Lu, T. & Huang, P.-S. Sparks of function by de novo protein design. Nat. Biotechnol. 42, 203–215 (2024).
Article PubMed PubMed Central CAS Google Scholar
Lipsh-Sokolik, R. et al. Combinatorial assembly and design of enzymes. Science 379, 195–201 (2023).
Article ADS PubMed CAS Google Scholar
Khersonsky, O. et al. Automated design of efficient and functionally diverse enzyme repertoires. Mol. Cell 72, 178–186.e5 (2018).
Article PubMed PubMed Central CAS Google Scholar
Goldenzweig, A. et al. Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol. Cell 63, 337–346 (2016).
Article PubMed PubMed Central CAS Google Scholar
Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).
Article ADS MathSciNet PubMed PubMed Central CAS Google Scholar
Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227–1233 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).
Article ADS PubMed PubMed Central CAS Google Scholar
Frank, C. et al. Scalable protein design using optimization in a relaxed sequence space. Science 386, 439–445 (2024).
Article ADS PubMed PubMed Central CAS Google Scholar
Lu, T., Liu, M. H., Chen, Y., Kim, J. & Huang, P.-S. Assessing generative model coverage of protein structures with SHAPES. Preprint at bioRxiv https://doi.org/10.1101/2025.01.09.632260 (2025).
Kim, D. et al. Computational design of metallohydrolases. Preprint at bioRxiv https://doi.org/10.1101/2024.11.13.623507 (2024).
Copeland, R. A. Enzymes: A Practical Introduction to Structure, Mechanism, and Data Analysis (Wiley-VCH, 1996).
Sillitoe, I. et al. CATH: expanding the horizons of structure-based functional annotations for genome sequences. Nucleic Acids Res. 47, D280–D284 (2019).
Article PubMed CAS Google Scholar
Nagano, N., Orengo, C. A. & Thornton, J. M. One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. J. Mol. Biol. 321, 741–765 (2002).
Article PubMed CAS Google Scholar
Huang, P. S. et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat. Chem. Biol. 12, 29–34 (2016).
Article ADS PubMed CAS Google Scholar
Eisenbeis, S. et al. Potential of fragment recombination for rational design of proteins. J. Am. Chem. Soc. 134, 4019–4022 (2012).
Article ADS PubMed CAS Google Scholar
Lapidoth, G. et al. Highly active enzymes by automated combinatorial backbone assembly and sequence design. Nat. Commun. 9, 2780 (2018).
Article ADS PubMed PubMed Central Google Scholar
Lipsh‐Sokolik, R., Listov, D. & Fleishman, S. J. The AbDesign computational pipeline for modular backbone assembly and design of binders and enzymes. Protein Sci. 1, 151–159 (2021).
Zanghellini, A. et al. New algorithms and an in silico benchmark for computational enzyme design. Protein Sci. 15, 2785–2794 (2006).
Article PubMed PubMed Central CAS Google Scholar
Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011).
Article PubMed PubMed Central CAS Google Scholar
Warszawski, S., Netzer, R., Tawfik, D. S. & Fleishman, S. J. A ‘fuzzy’-logic language for encoding multiple physical traits in biomolecules. J. Mol. Biol. 426, 4125–4138 (2014).
Article PubMed PubMed Central CAS Google Scholar
Listov, D. et al. Assessing and enhancing foldability in designed proteins. Protein Sci. 31, e4400 (2022).
Article PubMed PubMed Central CAS Google Scholar
Na, J., Houk, K. N. & Hilvert, D. Transition state of the base-promoted ring-opening of isoxazoles. Theoretical prediction of catalytic functionalities and design of haptens for antibody production. J. Am. Chem. Soc. 118, 6462–6471 (1996).
Article ADS CAS Google Scholar
Tantillo, D. J., Chen, J. & Houk, K. N. Theozymes and compuzymes: theoretical models for biological catalysis. Curr. Opin. Chem. Biol. 2, 743–750 (1998).
Article PubMed CAS Google Scholar
Hong, N. S. et al. The evolution of multiple active site configurations in a designed enzyme. Nat. Commun. 9, 3900 (2018).
Article ADS PubMed PubMed Central Google Scholar
Gutierrez-Rus, L. I. et al. Enzyme enhancement through computational stability design targeting NMR-determined catalytic hotspots. J. Am. Chem. Soc. https://doi.org/10.1021/jacs.4c09428 (2025).
Article PubMed PubMed Central Google Scholar
Warshel, A. & Weiss, R. M. An empirical valence bond approach for comparing reactions in solutions and in enzymes. J. Am. Chem. Soc. 102, 6218–6226 (1980).
Article ADS CAS Google Scholar
Hupfeld, E. et al. Conformational modulation of a mobile loop controls catalysis in the (βα)8-barrel enzyme of histidine biosynthesis HisF. JACS Au 4, 3258–3276 (2024).
Article PubMed PubMed Central CAS Google Scholar
Schlee, S. et al. Relationship of catalysis and active site loop dynamics in the (βα)8-barrel enzyme indole-3-glycerol phosphate synthase. Biochemistry 57, 3265–3277 (2018).
Article PubMed CAS Google Scholar
Goldsmith, M. et al. Overcoming an optimization plateau in the directed evolution of highly efficient nerve agent bioscavengers. Protein Eng. Des. Sel. 30, 333–345 (2017).
Article PubMed CAS Google Scholar
Kemp, D. S., Cox, D. D. & Paul, K. G. Physical organic chemistry of benzisoxazoles. IV. Origins and catalytic nature of the solvent rate acceleration for the decarboxylation of 3-carboxybenzisoxazoles. J. Am. Chem. Soc. 97, 7312–7318 (1975).
Article ADS CAS Google Scholar
Li, W., Jaroszewski, L. & Godzik, A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17, 282–283 (2001).
Article PubMed CAS Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article ADS PubMed PubMed Central CAS Google Scholar
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Article PubMed PubMed Central CAS Google Scholar
Barber-Zucker, S. et al. Stable and functionally diverse versatile peroxidases designed directly from sequences. J. Am. Chem. Soc. 144, 3564–3571 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Richter, F., Leaver-Fay, A., Khare, S. D., Bjelic, S. & Baker, D. De novo enzyme design using Rosetta3. PLoS ONE 6, e19230 (2011).
Article ADS PubMed PubMed Central CAS Google Scholar
Ribeiro, A. J. M. et al. Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. Nucleic Acids Res. 46, D618–D623 (2018).
Article PubMed CAS Google Scholar
Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).
Article PubMed PubMed Central CAS Google Scholar
Elmsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 (2010).
Article ADS Google Scholar
Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D Biol. Crystallogr. 66, 213–221 (2010).
Williams, C. J. et al. MolProbity: more and better reference data for improved all-atom structure validation. Protein Sci. 27, 293–315 (2018).
Article PubMed CAS Google Scholar
Woods, R. J. & Chappelle, R. Electrostatic potential atomic partial charges for condensed-phase simulations of carbohydrates. J. Mol. Struct. 527, 149–156 (2000).
Wang, J., Wang, W., Kollman, P. A. & Case, D. A. Automatic atom type and bond type perception in molecular mechanical calculations. J. Mol. Graph. Model. 25, 247–260 (2006).
Article ADS PubMed Google Scholar
Frisch, M. J. et al. Gaussian 16, Revision B.01 (Gaussian, 2016).
Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174 (2004).
Article ADS PubMed CAS Google Scholar
Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M. & Jensen, J. H. PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J. Chem. Theory Comput. 7, 525–537 (2011).
Article PubMed CAS Google Scholar
Søndergaard, C. R., Olsson, M. H. M., Rostkowski, M. & Jensen, J. H. Improved treatment of ligands and coupling effects in empirical calculation and rationalization of pKa values. J. Chem. Theory Comput. 7, 2284–2295 (2011).
Article PubMed Google Scholar
Izadi, S., Anandakrishnan, R. & Onufriev, A. V. Building Water Models: A Different Approach. J. Phys. Chem. Lett. 5, 3863–3871 (2014).
Listov, D. et al. Complete computational design of high-efficiency Kemp elimination enzymes. Zenodo https://doi.org/10.5281/zenodo.14563437 (2025).
Case, D. A. et al. Amber 2025 (Univ. California, San Francisco, 2025).
Tian, C. et al. Ff19SB: amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution. J. Chem. Theory Comput. 16, 528–552 (2020).
Article PubMed Google Scholar
Schneider, T. & Stoll, E. Molecular-dynamics study of a three-dimensional one-component model for distortive phase transitions. Phys. Rev. B 17, 1302–1322 (1978).
Article ADS CAS Google Scholar
Berendsen, H. J. C., Postma, J. P. M., Van Gunsteren, W. F., DiNola, A. & Haak, J. R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684–3690 (1984).
Article ADS CAS Google Scholar
Ryckaert, J.-P., Ciccotti, G. & Berendsen, H. J. C. Numerical integration of the Cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327–341 (1977).
Article ADS CAS Google Scholar
Hopkins, C. W., Le Grand, S., Walker, R. C. & Roitberg, A. E. Long-time-step molecular dynamics through hydrogen mass repartitioning. J. Chem. Theory Comput. 11, 1864–1874 (2015).
Article PubMed CAS Google Scholar
Bauer, P. et al. Q6: a comprehensive toolkit for empirical valence bond and related free energy calculations. SoftwareX 7, 388–395 (2018).
Article ADS Google Scholar
Kaminski, G. A., Friesner, R. A., Tirado-Rives, J. & Jorgensen, W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 105, 6474–6487 (2001).
Article CAS Google Scholar
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).
Article ADS CAS Google Scholar
King, G. & Warshel, A. A surface constrained all-atom solvent model for effective simulations of polar solvents. J. Chem. Phys. 91, 3647–3661 (1989).
Article ADS CAS Google Scholar
Lee, F. S. & Warshel, A. A local reaction field method for fast evaluation of long-range electrostatic interactions in molecular simulations. J. Chem. Phys. 97, 3100–3107 (1992).
Article ADS CAS Google Scholar
McGibbon, R. T. et al. MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophys. J. 109, 1528–1532 (2015).
Article ADS PubMed PubMed Central CAS Google Scholar
Roe, D. R. & Cheatham, T. E. 3rd PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. J. Chem. Theory Comput. 9, 3084–3095 (2013).
Article PubMed CAS Google Scholar
Case, D. A. et al. AmberTools. J. Chem. Inf. Model. 63, 6183–6191 (2023).
Article PubMed PubMed Central CAS Google Scholar
Schmidtke, P., Bidon-Chanal, A., Luque, F. J. & Barril, X. MDpocket: open-source cavity detection and characterization on molecular dynamics trajectories. Bioinformatics 27, 3276–3285 (2011).
Article PubMed CAS Google Scholar
Wagner, J. R. et al. POVME 3.0: software for mapping binding pocket flexibility. J. Chem. Theory Comput. 13, 4584–4592 (2017).
Article PubMed PubMed Central CAS Google Scholar
Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996).
Article PubMed CAS Google Scholar
Fleishman, S. J. et al. RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS ONE 6, e20161 (2011).
Article ADS PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

We thank D. Hilvert for discussions and R. Lipsh-Sokolik, Z. Avizemer, A. Tennenhouse and O. Khersonsky for critical reading. We also thank O. Khersonsky, M. Goldsmith and M. Ovadis for technical help. This work was funded by the Volkswagen Foundation grant no. 94747 (S.J.F.), the Israel Science Foundation grant no. 1844 (S.J.F.), the European Research Council through Consolidator and Advanced Award grants no. 815379 and no. 101140394 (S.J.F.), the European Innovation Council Pathfinder grant no. 101129798, W-BioCat (S.J.F.), the Institute for Environmental Sustainability at the Weizmann Institute of Science (S.J.F.), the Knut and Alice Wallenberg Foundation (S.C.L.K.), the Sven and Lily Lawski Foundation (G.H.) and a donation in memory of Sam Switzer (S.J.F.). We acknowledge the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725, for awarding this project access to the LUMI supercomputer, owned by the EuroHPC Joint Undertaking and hosted by CSC (Finland) and the LUMI consortium, as well as the Tetralith supercomputer at NSC Linköping. This work used the Hive cluster, which is supported by the National Science Foundation under grant no. 1828187, access to which was provided by the Partnership for an Advanced Computing Environment (PACE) at the Georgia Institute of Technology, Atlanta, GA, USA. G.H. acknowledges KIFÜ for awarding access to a resource based in Hungary (Komondor HPC). S.C.L.K. is the Georgia Research Alliance - Vasser Wooley Chair of Molecular Design at Georgia Tech.

Author information

Authors and Affiliations

Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
Dina Listov, Shlomo Yakir Hoch & Sarel J. Fleishman
School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
Eva Vos & Shina Caroline Lynn Kamerlin
Department of Chemistry, Lund University, Lund, Sweden
Gyula Hoffka & Shina Caroline Lynn Kamerlin
Department of Biochemistry and Molecular Biology, University of Debrecen, Debrecen, Hungary
Gyula Hoffka
Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden
Andrej Berg
Structural Proteomics Unit, Weizmann Institute of Science, Rehovot, Israel
Shelly Hamer-Rogotner & Orly Dym

Authors

Dina Listov
View author publications
Search author on:PubMed Google Scholar
Eva Vos
View author publications
Search author on:PubMed Google Scholar
Gyula Hoffka
View author publications
Search author on:PubMed Google Scholar
Shlomo Yakir Hoch
View author publications
Search author on:PubMed Google Scholar
Andrej Berg
View author publications
Search author on:PubMed Google Scholar
Shelly Hamer-Rogotner
View author publications
Search author on:PubMed Google Scholar
Orly Dym
View author publications
Search author on:PubMed Google Scholar
Shina Caroline Lynn Kamerlin
View author publications
Search author on:PubMed Google Scholar
Sarel J. Fleishman
View author publications
Search author on:PubMed Google Scholar

Contributions

D.L. and S.J.F. conceived the idea. D.L. developed the de novo design algorithm with help from S.Y.H. D.L. performed expression, purification and biochemical characterization. S.H.-R. performed crystallization experiments and O.D. determined crystal structures. E.V., G.H. and A.B. performed formal analysis, investigations and visualizations. S.J.F. and S.C.L.K. were responsible for funding acquisition, methodology, supervision, and reviewing and editing the manuscript. D.L. and S.J.F. wrote the paper with input from all authors.

Corresponding author

Correspondence to Sarel J. Fleishman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Hans Bunzel and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Expression and stability of the initial design round.

a. 66 designs were solubly expressed. High-functioning designs were expressed independently 2–5 times; otherwise, protein expression was performed once. For gel source data, see Supplementary Data 1. b. Temperature melts of representative cooperatively folded (left) and unfolded (right) designs. All temperature melts are performed in technical duplicates.

Extended Data Fig. 2 Representative Michaelis-Menten plots.

The data were fitted to the Michaelis-Menten equation v₀=k_cat[E]₀[S]₀/([S]₀ + K_M). When substrate saturation could not be attained due to limited substrate solubility, the data were fitted to the linear region of the Michaelis-Menten model and v₀ = [S]₀[E]₀k_cat/K_M, and k_cat/K_M were deduced from the slope. Kinetic experiments were performed for all functional designs in this study with at least two technical replicates, and high-functioning designs were further evaluated with 2–5 biological replicates. Data presented are mean of 2 technical repeats.

Extended Data Fig. 3 Protein BLAST search using the Des27.7 sequence as query against the NCBI “nr” database.

Top hit in nr. Mutations are in red, hyphens indicate insertions and deletions.

Extended Data Fig. 4 Simulations of dynamics, Asp162 conformations, and active-site pocket volume of Des27 and Des27.7.

a. Joint distribution of the Asp162 conformational space in the unbound state. Conformation sampled by the χ₁ and χ₂ dihedral angles of the Asp162 side chain along MD simulations of the Des27 and Des27.7. In Des27 there are three distinct metastable Asp162 conformations, which are illustrated in the right panel. The conformation numbers labeled on the plots correspond to the stick representations on the right, with 5-nitrobenzisoxazole depicted in white sticks for reference. b-c. Visual representation of the calculated active site pockets in b. Des27 and c. Des27.7, using MDPocket isosurfaces. Yellow spheres represent the pocket volume. Asp162 is shown in sticks. Des27.7 exhibits an active site pocket that can better accommodate 5-nitrobenzisoxazole. d-e. Root mean square deviations (rmsd, Å) of the C_α-atoms from MD simulations. Data was collected every 400 ps from 5 replicas of 1 µs length each. The gray lines show the five individual runs, and the colored solid line shows a rolling average of the rmsd from all five replicas for each system. d. rmsd for unbound systems Des27 (left), Des27.7 (right) e. rmsd for bound systems Des27 (left), Des27.7 (right).

Extended Data Fig. 5 Time evolution of the active site center of mass (COM) and ligand distance during total 5 × 1 μs of MD simulations.

a. Des27 and b. Des27.7. The dotted gray line at 4 Å marks the threshold for defining the ligand as being within the active site. In two out of ten replicas across the two systems, we observe ligand dissociation without rebinding towards the end of the trajectory on the timescale of our simulations; in a third, in the case of Des27.7, we observe substantial dissociation mid-trajectory with ligand rebinding within the simulation timeframes.

Extended Data Fig. 6 Model and crystallographic structure of Des27.7, Des39 and Des49.

The substrate, 5-nitrobenzisoxazole, in yellow sticks. Loop regions in wheat represented misfolded/missing regions in the experimentally determined structure. a. Des27.7 model. b. Des27.7 crystallographic structure (PDB 9HVB). Wheat-colored loop corresponds to the one in panel b. c. R2.Des39 model. d. R2.Des39 crystallographic structure (PDB 9HVH). R2.Des39 crystallized as a dimer in the asymmetric unit with few crystallographic contacts that stabilize the wheat-colored loop. The two monomers are in pink and white. Wheat-colored loop corresponds to the one in panel c. e. Active site model (blue sticks) vs structure (white sticks). rmsd between the active sites is 0.61Å. Catalytic Glu168 fits the modeled rotamer and there are only subtle rotameric changes in other active-site residues. These changes do not induce clashes with the modeled ligand. f. Asp62 stabilizes an alternative conformation in the crystal structure. Tyr97 (pink sticks) is from the second monomer. Colors as in panel d. g. R2.Des49 model. h. R2.Des49 crystallographic structure (PDB 9HVG). Wheat-colored loop corresponds to the one in panel g. i. Active-site model (blue sticks) vs structure (white sticks). rmsd between the active sites is 0.82Å. Trp115 acquires a rotamer different than modeled and might overlap with the ligand.

Extended Data Fig. 7 Comparison of substrate-bound and unbound structure of Des27.7 Phe113Leu.

a. A slight sidechain conformational change is observed, with an rmsd of 0.28 Å between the substrate-bound and unbound models of Des27.7 at position 113. b. No sidechain shift is observed in the case of Des27.7 Leu113. White sticks represent the substrate bound model, wheat sticks represent the unbound model, blue sticks represent unbound crystal structure.

Extended Data Fig. 8 EVB equilibration dynamics and representative structures of the Des27 system.

a-b Root mean square deviations (rmsd, Å) of all solute atoms of the Des27 and Des27.7 calculated for the equilibration phase prior to our EVB simulations. Data were collected every 10 ps from the initial equilibration runs and shown as averages and standard deviations over ten individual 25 ns MD simulations per system (i.e., 750 ns cumulative simulation time per system). The average rmsd per system is denoted by the colored solid line, and the standard deviations per point over all trajectories are illustrated by the shaded area on each plot. a. rmsd for EVB equilibration from “out” substrate conformation. b. rmsd for EVB equilibration from “in” substrate conformation. c-d EVB Representative structures of the Des27 system. c. For the “in” ligand conformation. d. For the “out” ligand conformation. Michaelis complex (MC, left panel), transition state (TS, middle panel), and product complex (PC, right panel) for the KE reaction catalyzed by this enzyme, extracted from EVB trajectories of this reaction. Structures were selected based on clustering analysis. The clustering was performed at the MC, TS and PC independently, in order to obtain representative structures for each state. Donor-acceptor distances (Å) are shown for each stationary point. These values are averages of the snapshots taken every 5 ps of the trajectory, determined based on the combined evaluation of 30 replicas.

Extended Data Table 1 Apparent thermal stability and catalytic parameters of designed Kemp eliminases

Full size table

Extended Data Table 2 Data collection and refinement statistics (molecular replacement)

Full size table

Supplementary information

Supplementary Data 1

Unprocessed gels.

Reporting Summary

Supplementary Data 2

This file contains Supplementary Fig. 1 and Supplementary Tables 2–7. Summarizing simulation parameters, activation energies, catalytic geometries and protonation states for Des27 and Des27.7.

Supplementary Table 1

Kinetic, thermostability and sequence data for all evaluated designs.

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Listov, D., Vos, E., Hoffka, G. et al. Complete computational design of high-efficiency Kemp elimination enzymes. Nature 643, 1421–1427 (2025). https://doi.org/10.1038/s41586-025-09136-2

Download citation

Received: 25 January 2025
Accepted: 09 May 2025
Published: 18 June 2025
Version of record: 18 June 2025
Issue date: 31 July 2025
DOI: https://doi.org/10.1038/s41586-025-09136-2

This article is cited by

Protein folding stability estimation with explicit consideration of unfolded states
- Heechan Lee
- Yugyeong Cho
- Hahnbeom Park
Nature Communications (2026)
Highly efficient enzymes designed from scratch
- Zhuofan Shen
- Sagar D. Khare
Nature (2025)
‘Remarkable’ new enzymes built by algorithm with physics know-how
- Heidi Ledford
Nature (2025)