Enhanced sampling of protein conformational changes via true reaction coordinates from energy relaxation

Li, Huiyu; Ma, Ao

doi:10.1038/s41467-025-55983-y

Download PDF

Article
Open access
Published: 17 January 2025

Enhanced sampling of protein conformational changes via true reaction coordinates from energy relaxation

Nature Communications volume 16, Article number: 786 (2025) Cite this article

10k Accesses
8 Citations
9 Altmetric
Metrics details

Subjects

Abstract

The bottleneck in enhanced sampling lies in finding collective variables that effectively accelerate protein conformational changes; true reaction coordinates that accurately predict the committor are the well-recognized optimal choice. However, identifying them requires unbiased natural reactive trajectories, which, paradoxically, require effective enhanced sampling. Using the generalized work functional method, we uncover that true reaction coordinates control both conformational changes and energy relaxation, enabling us to compute them from energy relaxation simulations. Biasing true reaction coordinates accelerates conformational changes and ligand dissociation in PDZ2 domain and HIV-1 protease by 10⁵ to 10¹⁵-fold. The resulting trajectories follow natural transition pathways, enabling efficient generation of unbiased reactive trajectories. In contrast, biased trajectories from empirical collective variables display non-physical features. Furthermore, our method uses a single protein structure as input, enabling predictive sampling of conformational changes. These findings unlock access to a broader range of protein functional processes in molecular dynamics simulations.

Functional protein dynamics in a crystal

Article Open access 15 April 2024

Universal protein misfolding intermediates can bypass the proteostasis network and remain soluble and less functional

Article Open access 02 June 2022

How synonymous mutations alter enzyme structure and function over long timescales

Article 05 December 2022

Introduction

The main goal of molecular biophysics is to understand how proteins function. Protein behavior is governed by a rugged energy landscape featuring numerous valleys separated by barriers¹. The valleys correspond to functionally important metastable conformations, with the deepest valley being the native structure. Transitions between conformations are critical for protein function^1,2,3, such as enzymatic reactions, allostery, substrate binding, and protein-protein interaction.

With the development of AlphaFold^4,5, the long-standing structure prediction problem has been solved, making the native structures of proteins readily available. The major challenge now is to identify the other functionally important conformations and understand the transition dynamics between them. Molecular dynamics (MD) simulation is a pivotal tool, as it can provide full atomic details—an advantage unmatched by experimental techniques. However, due to the large gap between the time scales of MD simulation (approaching microseconds) and functional processes (milliseconds to hours), MD simulation of functional processes has been infeasible except in a few special cases^6,7.

To overcome this time scale challenge, intensive efforts have been focused on developing methods to enhance the sampling of functionally important regions of the conformational space, including two branches. One branch focuses on sampling important metastable conformations without addressing the transition dynamics between them^8,9. The other branch focuses on sampling the transition dynamics between different metastable conformations^10,11.

Enhanced sampling of important conformations consists of two components: accelerating conformational changes to effectively explore the conformational space and reweighting to extract the correct thermodynamics. The former is the bottleneck and the focus of the current work. Many methods, such as umbrella sampling, adaptive biasing force, and metadynamics^12,13,14,15, use bias potentials on user-selected collective variables (CVs) to accelerate conformational changes. Their efficacy hinges on finding suitable CVs; without them, they provide no more acceleration than standard MD simulations⁸. Traditionally, CVs were chosen by user intuition, involving geometric parameters (e.g., radius of gyration), principal components, evolutionary correlations, and RMSD from reference structures^8,16,17. In recent years, it has been increasingly clear that intuition is inadequate for identifying the suitable CVs^8,18. Consequently, significant effort has been made to develop systematic methods for CV identification, yielding many innovative methods¹⁹. The predominant strategy was to use machine learning to extract slow modes from simulation data^18,20,21,22. Despite these efforts, finding CVs that can effectively accelerate protein conformational changes remains a formidable challenge.

Among the methods for sampling transition dynamics, the transition path sampling (TPS) method is an important example^10,11. It offers unique advantages over methods like milestoning, Markov state modeling, and string method^23,24,25 because it generates natural reactive trajectories (NRTs)—unbiased MD trajectories that connect the reactant and product basins without any assumption or approximation, providing the full details of transition dynamics that mirror the physical reality. NRTs also achieve optimal efficiency by covering the entire transition period while skipping the prolonged waiting period in the reactant basin. The waiting period is orders of magnitude longer than the transition period, causing MD simulations to be trapped in the reactant basin without reaching the transition region, thus preventing simulation of functional processes. TPS generates NRTs by initiating MD trajectories from conformations on existing NRTs close to the transition state (TS). However, finding an initial NRT or TS conformations poses a formidable challenge, and TPS does not provide an effective solution for this, hindering applications of TPS to complex systems.

The solutions to the bottleneck challenges in both branches of enhanced sampling are provided by true reaction coordinates (tRCs), the few essential protein coordinates that fully determine the committor of any system conformation^10,26,27. The committor (${p}_{B}$) is the probability that a trajectory initiated from a given system conformation, with momenta drawn from Boltzmann distribution, will reach the product state before the reactant state^10,26,28,29. It precisely tracks the progression of a conformational change, marking the key states: ${p}_{B}=0$ for the reactant, ${p}_{B}=1$ for the product, and ${p}_{B}=0.5$ for the TS. tRCs are coordinates that can accurately predict the committor for any conformation, rendering all other system coordinates irrelevant.

tRCs are widely regarded as the optimal CVs for accelerating conformational changes^8,9, as they are expected to not only provide efficient acceleration but also generate accelerated trajectories that follow the natural transition pathway. This is due to their role in energy activation—the critical step in protein conformational change where rare fluctuations channel energy into tRCs to propel the system over the activation barrier (i.e. the TS) to the product basin^30,31. Applying bias potentials on tRCs maximizes energy transfer into tRCs, leading to highly accelerated barrier crossing. In contrast, if the CVs significantly deviate from the tRCs, the bias potential will miss the actual activation barrier, resulting in the infamous “hidden barrier”^32,33 that prevents effective sampling.

Furthermore, tRCs provide an effective solution to the bottleneck problem in TPS. As we showed in Fig. 7a of ref. ³⁴, bias applied to tRCs generates trajectories that pass through the full range of intermediate committor values ${p}_{B}\in [0.1,\,0.9]$, which marks the transition period. This provides an efficient means to obtain TS conformations, which are otherwise difficult to attain, thereby addressing the key limitation of TPS.

Given the importance of tRCs, identifying them in complex biomolecules has been a central challenge in chemical physics and molecular biophysics since the pioneering work by Du et al. and Chandler and colleagues in the late 1990s^{10,19,26,27,35,36,37,38,39}. In recent years, we have made important progress by developing rigorous, physics-based methods—energy flow theory and the generalized work functional (GWF) method^{31,40,41,42,43}. Using these methods, we have identified the tRCs for the flap opening process of HIV-1 protease (HIV-PR) in implicit solvent³⁴, marking the first successful identification of tRCs in proteins. However, tRCs identification required NRTs, making them not useful in solving the bottlenecks in enhanced sampling.

In this work, we show that tRCs control both conformational changes and energy relaxation, enabling their computation from energy relaxation alone. Biasing tRCs in HIV-PR in explicit solvent accelerates flap opening and ligand unbinding, a process with an experimental lifetime of $8.9\times {10}^{5}$ s, to 200 ps. The resulting trajectories, termed RC-uncovered trajectories, follow natural transition pathways and pass through TS conformations, enabling NRT generation via TPS. These findings provide an effective solution to the bottlenecks in both accelerating protein conformational changes and harvesting NRTs. In contrast, biased trajectories using a standard CV follow non-physical transition pathways, reinforcing the well-accepted assumption that tRCs are optimal CVs for enhanced sampling. Our method computes tRCs from a single protein structure, enabling predictive sampling. Applying it to PDZ allostery, a long-standing puzzle for over 20 years⁴⁴, we observe previously unrecognized large-scale transient conformational changes at PDZ allosteric sites in NRTs for ligand dissociation. This discovery reveals an intuitive mechanism for PDZ allostery, where effectors affect ligand binding by interfering with these fleeting conformational changes.

Results

We use two methods to identify tRCs. First, potential energy flows (PEFs) through individual coordinates measure the importance of each coordinate in driving protein conformational changes, with the highest PEFs indicating the critical coordinates. Second, the generalized work functional (GWF) generates an orthonormal coordinate system, called singular coordinates (SCs), that disentangles tRCs from non-RCs by maximizing the PEFs through individual coordinates. Consequently, tRCs are identified as the SCs with the highest PEFs^34,40,43.

Potential energy flow

The motion of a coordinate is governed by its equation of motion (EoM). For a coordinate ${q}_{i}$, its EoM is generated by the mechanical work done on ${q}_{i}$, given by: $d{W}_{i}=-\frac{\partial U\left({{\bf{q}}}\right)}{\partial {q}_{i}}d{q}_{i}$⁴⁰. The EoM of ${q}_{i}$ (Hamiltonian equation) is: ${\dot{p}}_{i}=-\frac{\partial H}{\partial {q}_{i}}=-\frac{\partial K}{\partial {q}_{i}}-\frac{\partial U}{\partial {q}_{i}}$, where ${p}_{i}$ is the conjugate momentum of ${q}_{i}$, $K({{\bf{q}}},\,\dot{{{\bf{q}}}})$ is the kinetic energy, and $U({{\bf{q}}})$ is the potential energy of the system. Moving $\frac{\partial K}{\partial {q}_{i}}$ to the left-hand side results in:

$${\dot{p}}_{i}+\frac{\partial K}{\partial {q}_{i}}=-\frac{\partial U\left({{\bf{q}}}\right)}{\partial {q}_{i}}=\frac{d{W}_{i}}{d{q}_{i}}$$

(1)

The left-hand side of Eq. (1) represents the total change in the motion of q_i, meaning that dW_i is the energy cost of the motion of q_i, which we refer to as the PEF through q_i. The PEF through q_i during a finite period is given by the integration of $d{W}_{i}$: $\Delta {W}_{i}\left({t}_{1},{t}_{2}\right)={\int }_{{t}_{1}}^{{t}_{2}}d{W}_{i}=- {\int }_{{q}_{i}\left({t}_{1}\right)}^{{q}_{i}\left({t}_{2}\right)}\frac{\partial U\left({{\bf{q}}}\right)}{\partial {q}_{i}}d{q}_{i}$.

Intuitively, coordinates with a higher energy cost require more effort from the system and play a more significant role in dynamic processes. In protein conformational changes, which are activated processes, tRCs need to overcome the activation barrier, incurring the highest energy cost for their motion. Therefore, tRCs correspond to coordinates with the highest PEFs.

Generalized work functional

The remaining challenge in identifying tRCs is the entanglement between tRCs and non-RCs in general-purpose coordinate systems (e.g., internal coordinates), where many coordinates have components in both tRC and non-RC directions. To cleanly distinguish tRCs from non-RCs, we need to transform internal coordinates into an “ideal” coordinate system that disentangles them. Since PEFs measure the importance of individual coordinates, this “ideal” coordinate system should maximize the differences in PEFs, thereby maximally separating the most important coordinates (tRCs) from the unimportant ones (non-RCs). This can be achieved by maximizing the PEFs through individual coordinates in an orthogonal coordinate system.

Finding the “ideal” coordinate system starts with understanding how PEFs transform between different coordinate systems. The transformation of $d{W}_{i}={F}_{i}d{q}_{i}$ in coordinates ${{\bf{q}}}$ to $d{W}_{\alpha }={F}_{\alpha }d{r}_{\alpha }$ in coordinates ${{\bf{r}}}$, related by $d{{\bf{r}}}={{\bf{A}}}\cdot d{{\bf{q}}}$ and $d{{\bf{q}}}={{{\bf{A}}}}^{-1}\cdot d{{\bf{r}}}$, is given by the chain rule:

$$d{W}_{\alpha }=-\frac{\partial U}{\partial {r}_{\alpha }}d{r}_{\alpha }=- \sum _{i,k=1}^{N}\frac{\partial U}{\partial {q}_{i}}\frac{\partial {q}_{i}}{\partial {r}_{\alpha }}\frac{\partial {r}_{\alpha }}{\partial {q}_{k}}d{q}_{k}=\sum _{i,k=1}^{N}{A}_{\alpha i}\left({F}_{i}d{q}_{k}\right){A}_{k\alpha }^{-1}$$

(2)

Here, ${A}_{\alpha i}={A}_{i\alpha }^{-1}=\frac{\partial {r}_{\alpha }}{\partial {q}_{i}}=\frac{\partial {q}_{i}}{\partial {r}_{\alpha }}$ because A is an orthogonal matrix. Equation (2) introduces a conceptually new physical quantity ${F}_{i}d{q}_{k}$, which is not a mechanical work. Therefore, the coordinate transformation of $d{W}_{i}$ requires generalizing the concept of mechanical work to incorporate quantities like ${F}_{i}d{q}_{k}$.

The GWF generalizes the concept of mechanical work. Its differential in coordinates ${{\bf{q}}}$ is defined as:

$$d{{\mathbb{W}}}_{q}={{\bf{F}}}\otimes {{\rm{d}}}{{\bf{q}}}$$

(3)

Here, ⨂ denotes tensor product, making $d{{\mathbb{W}}}_{q}$ an asymmetric tensor and ${F}_{i}d{q}_{k}$ its elements. The GWF in coordinates r and q are related by a similarity transformation:

$${{d}}{{\mathbb{W}}}_{r}={{\bf{A}}}\cdot {{d}}{{\mathbb{W}}}_{q}\cdot {{{\bf{A}}}}^{-1}$$

(4)

Therefore, the coordinate transformation of GWF does not introduce any new quantities, making it a self-contained fundamental concept, with mechanical work as its sub-concept. All the important mechanical quantities, such as dW = (dW₁, dW₂, …, dW_N) and dU, are encompassed in GWF:

$$d{{{\bf{W}}}}_{{{\rm{r}}}}={diag}({{\rm{d}}}{{\mathbb{W}}}_{{{\rm{r}}}})={diag}({{\bf{A}}}\cdot d{{\mathbb{W}}}_{q}\cdot {{{\bf{A}}}}^{-1})$$

(5)

$${dU}={Tr}(d{{\mathbb{W}}}_{r})={Tr}(d{{\mathbb{W}}}_{q})$$

(6)

where diag(⋅) denotes diagonal vector and Tr(⋅) denotes trace. Equation (5) further shows that $d{\mathbb{W}}$ is the operator for transforming dW between different coordinate systems.

The singular value decomposition of GWF is:

$$d{\mathbb{W}}{\mathbb{=}}{{\bf{U}}}\cdot {{\mathbf{\Lambda }}}\cdot {{{\bf{V}}}}^{{{\rm{T}}}}$$

(7)

It decomposes $d{\mathbb{W}}$ into its optimal basis tensors ${{{\bf{u}}}}_{{{\bf{i}}}}\bigotimes {{{\bf{v}}}}_{{{\rm{i}}}}$ as: $d{\mathbb{W}}{\mathbb{=}}{\sum }_{i=1}^{N}{\lambda }_{i}{{{\bf{u}}}}_{{{\rm{i}}}}\bigotimes {{{\bf{v}}}}_{{{\rm{i}}}}$. Here, u_i and v_i are the i-th column vectors of the left and right singular matrices ${{\bf{U}}}$ and ${{\bf{V}}}$, respectively; λ_i is the i-th singular value. The optimality of ${{{\bf{u}}}}_{{{\rm{i}}}}\bigotimes {{{\bf{v}}}}_{{{\rm{i}}}}$ means that ${\lambda }_{i}{{{\bf{u}}}}_{{{\rm{i}}}}\bigotimes {{{\bf{v}}}}_{{{\rm{i}}}}$ represents the i-th largest contribution to $d{\mathbb{W}}$, and the sum ${\sum }_{i=1}^{m\ll N}{\lambda }_{i}{{{\bf{u}}}}_{{{\rm{i}}}}\bigotimes {{{\bf{v}}}}_{{{\rm{i}}}}$ provides the optimal m-dimensional reduced description. Since $d{\mathbb{W}}{\mathbb{=}}{{\bf{F}}}\bigotimes d{{\bf{q}}}$, u_i and v_i are the optimal basis vectors for the force and displacement spaces, respectively.

The collection of all ${{{\bf{u}}}}_{{{\rm{i}}}}$ forms an orthonormal coordinate system: $d{{\bf{u}}}={{{\bf{U}}}}^{{{\rm{T}}}}\cdot d{{\bf{q}}}$, which we refer to as the SCs. ${F}_{i}=-\frac{\partial U}{\partial {s}_{i}}$ represents the force with the i-th largest impact on the system’s dynamics, and $d{W}_{i}={F}_{i}d{s}_{i}={\lambda }_{i}({{{\bf{u}}}}_{{{\rm{i}}}}\cdot {{{\bf{v}}}}_{{{\rm{i}}}})$ is the i-th highest PEF in the system. Consequently, ${\sum }_{i=1}^{m\ll N}d{W}_{i}$ provides the optimal m-dimensional reduced description of ${dU}$, the generating function of all equations of motion in the system. This is the condition tRCs would meet if they exist in a protein. Therefore, the SCs provide the “ideal” coordinate system; the leading SCs are tRCs, as they represent the directions of forces with the highest impact on the system’s dynamics.

A hypothesis on activation and energy relaxation

In ref.³⁴, we demonstrated that the leading SCs of the GWF calculated from an ensemble of NRTs for the flap-opening of HIV-PR in implicit solvent are tRCs, as they can determine the committor with high accuracy. This result confirms that the GWF method can effectively identify tRCs in proteins. In addition, applying bias potentials along the tRCs led to RC-uncovered trajectories that open the flaps within 4 ps—an acceleration of over 10⁴-fold compared to MD³⁴. This result supports the long-standing hypothesis that tRCs are the optimal CVs for enhanced sampling of protein conformational changes.

In our application of the GWF method to energy relaxation in myoglobin⁴², we observed a significant gap between the PEFs of the leading SCs and the others, resembling the PEF pattern of tRCs in ref. ³⁴. (Fig. 2C in ref. ³⁴). Energy relaxation and activation are closely related, as a system needs to relax into the stable basin after crossing the activation barrier. Drawing on Onsager’s regression hypothesis⁴⁵, we propose that the coordinates essential for energy relaxation (i.e. its leading SCs) are identical to those governing activation (i.e. the tRCs).

Leading SCs of energy relaxation are the same as tRCs of conformational transition

To determine if the leading SCs of HIV-PR energy relaxation are also the tRCs for its activation (e.g. flap opening; Fig. 1), we simulated energy relaxation of ligand-free HIV-PR in implicit solvent, the same system as in ref. ³⁴. To mimic energy relaxation after ligand binding or enzymatic reaction, we deposit excess kinetic energy into the active site of HIV-PR (details in the Methods) and generate an ensemble of N MD trajectories, each one 5 ps in length⁴². We then compute GWF and PEFs averaging over this ensemble.

$$\left\langle \Delta {\mathbb{W}}\left(0\to t\right)\right\rangle=\frac{1}{N}\sum _{\alpha=1}^{N}\Delta {\mathbb{W}}(0\to {t;}\, \alpha )$$

(8)

$$\left\langle \Delta {{W}}_{i}\left(0\to t\right)\right\rangle=\frac{1}{N}\sum _{\alpha=1}^{N}\Delta {W}_{i}(0\to {t;}\, \alpha )$$

(9)

Here, t = 0 is the starting point of energy relaxation when the excess kinetic energy is deposited to the active site; $\Delta {\mathbb{W}}\left(0\to {t;}\, \alpha \right)={\int }_{0}^{t}d{\mathbb{W}}$ and $\Delta {{W}}_{i}\left(0\to {t;}\, \alpha \right)={\int }_{\!0}^{t}d{W}_{i}$ are GWF and PEF through q_i along trajectory α, respectively.

**Fig. 1: Representative structures of the closed and open states of DRV-bound HIV-PR protease.**

Figure 2a shows the inner product between the six leading SCs for the energy relaxation of HIV- PR in implicit solvent and the six tRCs of its flap opening we identified in ref. ³⁴. Each SC (${u}_{a}$) or tRC (${r}_{a}$) is a linear combination of the backbone dihedrals (${\chi }_{i}$) of HIV-PR: ${u}_{a}={\sum }_{a=1}^{N}{U}_{{ia}}{\chi }_{i}$. All inner products are close to 1—confirming our hypothesis that the leading SCs of energy relaxation

**Fig. 2: Singular vectors of energy relaxation of HIV-PR.**

are the same as the tRCs for conformational transitions.

This result revealed a fundamental connection between activation and energy relaxation, which, in our understanding, stems from the need for optimal efficiency in protein function. Both processes rely on systematic protein dynamics driven by systematic PEFs through individual coordinates, which are absent during equilibrium fluctuations. The importance of a coordinate is determined by the magnitude of its PEF, which is an intrinsic physical property determined by the protein structure and fine-tuned through evolution. tRCs, with the highest PEFs, serve as optimal energy flow channels.

During activation, the protein delivers energy to the active site to perform its function. For optimal delivery, energy systematically flows from low-capacity channels (non-RCs) to high-capacity channels (tRCs). During energy relaxation, the protein dissipates excess energy from the active site to the rest of the protein and eventually to the solvent. For efficient dissipation, energy first flows into the tRCs for rapid removal and then disperses into the non-RCs. Thus, energy flows through the same channels in both activation and relaxation but in opposite directions. This dual role of tRCs allows us to compute them from energy relaxation as well as from activated processes.

tRCs provide effective enhanced sampling of conformational changes

A common assumption is that tRCs are the optimal CVs for enhanced sampling of protein conformational changes⁸. If we can compute tRCs from energy relaxation, it will be highly valuable for applications. To explore this possibility, we simulated energy relaxation in two systems, HIV-PR bound with DRV and MA/CA peptide in explicit solvent. Figure 2b shows the PEFs of SCs of energy relaxation of DRV-bound HIV-PR. The first seven SCs (${u}_{0}$ to ${u}_{6}$; red lines in Fig. 2b) show much higher PEFs than the rest (gray lines in Fig. 2b), suggesting they are the tRCs. The same holds true for the SCs of MA/CA-bound HIV-PR. To distinguish, we will refer to them as tRCs from now on.

By applying bias potentials on the tRCs (details in Methods section and ref. ³⁴), we generated RC- uncovered trajectories for flap opening and ligand unbinding of HIV-PR in ligand-free, DRV-bound, and MA/CA-bound states. The time evolution of the flap distance ${d}_{f}$ (Fig. 1a) along these trajectories is shown in Fig. 3. For all tRCs and across all systems, ${d}_{f}$ quickly increases to the range of 2.5 to 3 nm within 20 ps, corresponding to wide-open conformations in the literature^46,47. Moreover, the ligand dislodges from the active site and is well on its way to exit the flaps within 200 ps (Fig. 4a,c; Supplementary video S1, S2).

**Fig. 3: Efficiency of flap opening along RC-uncovered trajectories in ligand-free, DRV-bound and MA/CA-bound HIV-PR.**

To evaluate the efficiency of enhanced sampling by the tRCs, we compare the time scales of flap opening and ligand unbinding in RC-uncovered trajectories with those observed in other simulations and experiments^47,48,49. Sadiq and Fabritiis conducted 461 MD trajectories, each 50 ns long, totaling 23 µs simulation time⁴⁷. They observed four wide-open events in ligand-free HIV- PR, averaging 5.8 µs per event, in line with the 100 µs observed by NMR⁴⁶. In comparison, RC-uncovered trajectories show wide-open flaps within 10 ps, at least a $2.9\times {10}^{5}$-fold acceleration compared to MD for ligand-free HIV-PR. For MA/CA-bound HIV-PR, Sadiq et al. conducted 100 MD simulations totaling 10 μs and observed only thermal fluctuations⁴⁹. Thus, flap opening in this system requires much longer than 10 µs. In stark contrast, RC-uncovered trajectories demonstrate wide-open flaps within 20 ps, marking an acceleration far exceeding ${10}^{6}$-fold. Additionally, while experimentally measured half-life of DRV unbinding is $8.9\times {10}^{5}$ s⁵⁰, RC-uncovered trajectories achieved this in 200 ps (Fig. 4a). Together, these results demonstrate the efficiency of RC-uncovered trajectories.

Reversible transitions along individual tRCs

Beyond sampling the transition paths, an important goal of enhanced sampling is to compute thermodynamic variables, especially free energy surfaces. This requires that the system undergo reversible transitions. The current system has multiple tRCs, with different tRCs opening the flaps in distinct ways, as shown in Fig. 6 in ref. ³⁴. Therefore, it is crucial that the flaps can open and close reversibly along a single tRC so that, with proper application of bias potentials on multiple tRCs, a multi-dimensional free energy surface could be computed. Figure 5 shows time evolution of both ${u}_{0}$ and the flap distance along an RC-uncovered trajectory obtained from applying bias on ${u}_{0}$ of ligand-free HIV-PR, featuring multiple rounds of flap opening and closing. The high similarity in time dependence of both $\Delta {u}_{0}(t)$ and ${d}_{f}(t)$ across different transition cycles demonstrates the robustness of transition reversibility.

**Fig. 5: RC-uncovered trajectory of HIV-PR with reversible transitions.**

RC-uncovered trajectories follow natural transition pathways

A major advantage of RC-uncovered trajectories is that they follow natural transition pathways, the same pathways traced by NRTs. This is crucial for understanding protein functions, the main goal of MD simulations. A rigorous way to verify whether a trajectory follows natural pathways is through the committor. Along an NRT, the committor covers the full range of values: ${p}_{B}\in [0,\,1]$, with ${p}_{B}\in (0.1,\,0.9)$ marking barrier crossing. By contrast, a trajectory deviating from natural pathways will show an abrupt jump in committor values from 0 to 1 or do not manifest well-defined committor values, as it bypasses the actual activation barrier and TS. Instead, it explores a non-physical region of the conformational space, where committor values are not well-defined. Therefore, a progression through intermediate committor values is a clear signature that a trajectory follows natural transition pathways.

In Fig. 7a of ref. ³⁴, RC-uncovered trajectories of all 6 tRCs span the full range of committor values, confirming that they follow natural transition pathways. Therefore, an RC-uncovered trajectory that follows natural pathways can validate the corresponding RC as a tRC. For HIV-PR in explicit solvent, calculating committor is prohibitively expensive. As a computationally efficient but rigorous alternative, we use the shooting move of TPS¹⁰ to show that RC-uncovered trajectories of the current systems pass through the TS region. The TS is the bottleneck along a natural transition pathway and represents the most critical intermediate committor value: ${p}_{B}\simeq 0.5$. It is difficult to envision a scenario where a trajectory reaches the TS but still diverges from the natural pathway.

In the shooting move¹⁰, we select a conformation R₀ from an RC-uncovered trajectory and draw momenta ${{{\bf{p}}}}_{0}$ from Boltzmann distribution. We then launch a pair of MD trajectories from ${{{\bf{R}}}}_{0}$ with initial momenta ${{{\bf{p}}}}_{0}$ and $-{{{\bf{p}}}}_{0}$, respectively. If these trajectories reach opposite basins, we leverage the time reversibility of classical mechanics to create a reactive trajectory by reversing the momenta along the trajectory that ends in the reactant basin and merging the two trajectories at ${{{\bf{R}}}}_{0}$¹⁰. Since no bias is used, trajectories generated by the shooting move are NRTs, as demonstrated by Chandler and colleagues^10,11.

The likelihood of successfully generating NRTs from a conformation ${{{\bf{R}}}}_{0}$ is determined by $p\left({{\rm{RT}}}\,|\,{{{\bf{R}}}}_{0}\right)$, the probability that a dynamic trajectory passing through ${{{\bf{R}}}}_{0}$ is an NRT. This probability is related to the committor by $p({{\rm{RT}}}\,|\,{{{\bf{R}}}}_{0})=2{p}_{B}({{{\bf{R}}}}_{0})\,\left(1-{p}_{B}({{{\bf{R}}}}_{0})\right)$^51,52, which reaches a maximum of 0.5 at ${p}_{B}\left({{{\bf{R}}}}_{0}\right)=$ 0.5, and drops to 0 at ${p}_{B}\left({{{\bf{R}}}}_{0}\right)=0$ or 1. Given this, the likelihood of generating NRTs via shooting is significant only when ${{{\bf{R}}}}_{0}$ is close to the TS—where the free energy difference between ${{{\bf{R}}}}_{0}$ and the TS is within the thermal energy ${{{\rm{k}}}}_{{{\rm{B}}}}{{\rm{T}}}$. Successfully generating NRTs through the shooting move provides objective validation that RC-uncovered trajectories pass through the TS, thereby confirming that they follow natural pathways.

We obtained NRTs for flap opening and complete ligand dissociation in DRV-bound and MA/CA-bound HIV-PRs (Fig. 4b, d; Supplementary videos S3, S4) by shooting from the respective RC-uncovered trajectories. To our knowledge, this is the first successful attainment of NRTs for ligand dissociation from HIV-PR. For computational efficiency, only five pairs of shooting trajectories were attempted per conformation. Consequently, conformations that successfully generate NRTs are likely close to the TS. These results demonstrate that the RC-uncovered trajectories pass through TS and follow the natural pathways, validating the tRCs computed from energy relaxation simulations.

Comparison with enhanced sampling by empirically derived CVs

tRCs are widely recognized as the optimal CVs for enhanced sampling of protein conformational changes. To test this, we simulate MA/CA dissociation from HIV-PR by applying bias potentials on commonly used empirical CVs and compare the results with NRT and RC-uncovered trajectories. The NRT provides the correct mechanism of this process because it is an unbiased MD trajectory that covers the entire transition period.

The empirical CV we use is the distance s_l between the center of mass (CoM) of C_α atoms of MA/CA and the CoM of C_α atoms of active site residues (residues 24, 26, 27). We chose s_l because it was employed in ref. ¹⁷, an extensive metadynamics-based bias-exchange simulation of the dissociation of a six-residue peptide from HIV-PR that has an accumulated simulation time of 1.6 µs.

We first present the results from steered MD simulation using the umbrella pulling method implemented in GROMACS, similar to how we applied bias to tRCs. The pulling strength on s_l was adjusted to achieve dissociation in 1.4 ns. As shown in Fig. 6, the resulting trajectory is fundamentally different from the NRT and RC-uncovered trajectory (Fig. 4c,d).

**Fig. 6: Non-physical features of enhanced sampling with empirical CVs.**

In the steered MD trajectory (Fig. 6a), ${s}_{l}$ and the flap distance ${d}_{f}$ increase together, indicating that the ligand is actively pushing the flaps open, causing elastic distortion of the flaps (Fig. 6d). Moreover, ${d}_{f}$ only reaches 1.8 nm, just enough for the ligand to slip through. We consider it an elastic distortion rather than a true flap-opening because it involves only the bending of the flaps (Fig. 1). This behavior contradicts the established mechanism of HIV-PR function, which requires

full flap-opening.

By contrast, in both NRTs and RC-uncovered trajectories (Fig. 4), the flaps open to 2 nm as the ligand begins to dislodge (192 ns for the NRT, 7 ps for the RC-uncovered trajectory). Afterwards, the influx of water molecules into the active site drives ligand motion and further flap opening. Full ligand dissociation occurs when the flaps open to 3 nm (Fig. 6c). Furthermore, flap opening in NRT and RC-uncovered trajectory (Fig. 4) is driven by global, collective protein structural changes, causing the flaps to swivel open rather than just bend. The main difference between the NRT and RC-uncovered trajectory is the longer time scale and larger fluctuations in ${d}_{f}$ along the NRT (Fig. 6c).

These results demonstrate that the NRT and RC-uncovered trajectory follow the same transition pathway, which aligns with the established mechanism of ligand dissociation from HIV-PR and contrasts sharply with the non-physical behavior seen in the steered MD trajectory, a typical issue when applying bias on non-RC CVs. A more detailed discussion is provided in Supplementary Note 2 (Supplementary Fig. 2).

For fair comparison, we need to ensure that the non-physical features shown in Fig. 6d are due to the empirical CV, not the specifics of how the bias is applied. For this purpose, we simulated the same process using the same CV with the well-tempered metadynamics implemented in PLUMED2⁵³. Unlike steered MD, metadynamics applies bias potentials in a highly adaptive and flexible manner, minimizing the risk of artifacts from the non-adaptive bias protocol used in steered MD. Supplementary Fig. 4 shows the time evolution of ${d}_{f}$ and ${s}_{l}$ under varying strength of Gaussian bias. To achieve flap-opening within 2 ns, a Gaussian height at 10,000-fold the recommended value (0.2 ${{\rm{kJ\; mo}}}{{{\rm{l}}}}^{-1}$) is needed. The resulting trajectory displays the same features as the steered MD simulation, evident in the time evolutions of ${s}_{l}$ and ${d}_{f}$ (Fig. 6b) and the snapshots (Fig. 6e). Notably, ref. ¹⁷ reported peptide dissociation with minimal flap opening, in line with our observations here. These results confirm that the non-physical characteristics of the MA/CA dissociation trajectories are the consequence of using ${s}_{l}$ as the CV.

Efficiency of enhanced sampling by tRCs

In the results above, the time scales for DRV dissociation in RC-uncovered trajectories, NRTs, and experiments differ by orders of magnitude. This is because they emphasize different phases of the same process. The experimental half-life corresponds to the average time for observing a DRV dissociation event in an extremely long MD trajectory containing many rounds of DRV binding and unbinding. In this context, NRTs represent a ${10}^{12}$-fold acceleration over MD. This is enabled by the shooting move’s focus on the actual barrier crossing phase while bypassing the extended waiting time in the reactant basin, which is the determinant of the half-life of an activated process^10,11. To bypass waiting, shooting move must start from a TS conformation—the least populated conformational state that requires a long waiting period to reach—which is provided by RC-uncovered trajectories. In contrast, enhanced sampling trajectories obtained with empirically derived CVs do not sample TS conformations. Finally, RC-uncovered trajectories are ~${10}^{3}$-fold shorter than NRTs, aggregating into a ${10}^{15}$ -fold acceleration over MD. This is achieved by reducing diffusive motions—the most time-consuming aspect of NRTs⁵⁴.

The efficiency of RC-uncovered trajectories is enabled by an intriguing physical mechanism. The critical step of an activated process is energy activation, where rare fluctuations channel energy into the tRCs, enabling the system to cross the activation barrier^30,31. The efficiency of enhanced sampling hinges upon effective transfer of energy into the tRCs to expedite activation. tRCs are optimal CVs because, in this case, the bias potentials directly inject energy into tRCs, thereby maximizing the efficiency of energy activation.

Predictive enhanced sampling by tRCs: PDZ domain allostery

Energy relaxation requires only a single protein structure, yet the resulting tRCs enable enhanced sampling of the protein’s inherent conformational transitions, underscoring the predictive capability of our approach. In our HIV-PR simulations, the only input was the flap-closed conformation, but we successfully obtained NRTs of the flap opening process and the ensemble of flap-open conformations. To further validate this predictive capability, we use the PDZ domain as a blind test, because it has only one known structure, and its allostery has been a challenging puzzle for over two decades.

PDZ domains are a large family of protein-ligand interaction modules⁵⁵. They share the same canonical fold (Fig. 7) and function as organization centers in multi-protein signaling complexes. Two examples of PDZ allostery have been extensively studied. The first is a ~ 13-fold increase in ligand binding affinity when Cdc42 binds to the Par6 CRIB-PDZ (Fig. 7b)⁵⁶ at the interface formed by ${\alpha }_{1}$-helix and ${\beta }_{1}$-sheet. The second example involves an extra ${\alpha }_{3}$-helix appended to the C-terminal of PDZ3 (pink helix in Fig. 7c). Two studies found a consistent conclusion⁵⁷: the presence of ${\alpha }_{3}$ increases PDZ3 ligand affinity by 21- to 120-fold, despite the lack of direct interactions between ${\alpha }_{3}$ and the ligand binding groove formed by ${\alpha }_{2}$ and ${\beta }_{2}$ (Fig. 7a)⁵⁸.

**Fig. 7: Structures of PDZ domains and allosteric sites.**

The difficulty in understanding PDZ allostery is that crystal structures across different PDZ domains in both ligand-bound and ligand-free states are virtually identical (Fig. 7)⁵⁵, challenging the conventional view that allosteric effectors modify ligand affinity by exploiting the structural differences between apo and holo states. It is difficult to understand how the perturbations introduced by effectors—Cdc42 or ${\alpha }_{3}$—at distal sites are communicated to the binding groove and alter ligand affinity. Despite intensive efforts for over two decades, the mechanism of PDZ allostery remains elusive. This ongoing challenge is encapsulated by the title of a recent review: Allostery Frustrates the Experimentalist⁴⁴.

The process underlying PDZ domain allostery is its ligand binding; it has been speculated that conformational changes during this process are responsible for allostery^59,60. A solid understanding of ligand binding could resolve the PDZ puzzle unequivocally. However, the time scale for PDZ ligand unbinding—estimated to be 10 to 100 ms—is too slow for MD⁵⁹. Consistent with this estimation, in an extensive MD simulation that totaled 0.5 ms, no ligand unbinding or discernable conformational change in PDZ2 was observed⁵⁹. Therefore, ligand unbinding from PDZ2 provides an ideal blind test for our method’s predictive capability.

Mechanism of PDZ allostery from predictive enhanced sampling

Supplementary Fig. 5a shows the potential energy flows along SCs from energy relaxation of PDZ2 bound to an eight-residue (ENEQVSAV) RA-GEF2 peptide⁴². There is a clear gap between the first six SCs (${u}_{0}$ to ${u}_{5}$) and the rest, suggesting they are the tRCs. To distinguish, we refer to them as the tRCs from now on.

RC-uncovered trajectories show significant PDZ2 conformational changes and ligand dislocation from the binding groove. Supplementary Fig. 5b presents end structures along RC-uncovered trajectories of the six RCs. They show a consistent pattern: the ligand-binding site opens between ${\alpha }_{2}$ and ${\beta }_{2}$, while the Cdc42 binding face spanning over the ${\alpha }_{1}$-${\beta }_{1}$ cleft significantly expands due to the large shift of ${\alpha }_{1}$ and the ${\alpha }_{1}$-${\beta }_{4}$ loop (Figs. 8, 9). These results strongly suggest that PDZ allostery is due to effectors interfering with conformational changes critical for ligand binding. To validate this hypothesis and pin down the allosteric mechanism, we generate NRTs (Fig. 8) for ligand unbinding by applying the shooting move on TS conformations from RC-uncovered trajectories^10,11. Figure 9b shows example TS conformations, which are identified by their ability to successfully produce NRTs. They all show the opening of the ${\alpha }_{1}$-${\beta }_{1}$ cleft and the binding groove, highlighting the critical importance of these changes to ligand unbinding. Across the TS ensemble, opening of the binding groove is virtually identical, whereas opening of the ${\alpha }_{1}$-${\beta }_{1}$ cleft span a small range marked by the TS conformations in cyan and yellow in Fig. 9b.

**Fig. 8: Snapshots from a natural trajectory for PDZ2 ligand unbinding.**

**Fig. 9: Conformational dynamics of PDZ2 during ligand unbinding.**

The NRT for ligand unbinding in Fig. 8 (Supplementary Video S5) shows a transient large-scale conformational change in PDZ2 that lasts 20 to 30 ns. This conformational change shares the same features shown in RC-uncovered trajectories: opening in both the ${\alpha }_{1}$-${\beta }_{1}$ cleft and the binding groove. These two conformational changes are concerted. Figure 9a shows the time evolution of three CVs for characterizing PDZ2 conformational changes and ligand dissociation along the NRT in Fig. 8. The distance ${d}_{1}$ between the CoMs of ${\alpha }_{1}$ and upper ${\beta }_{1}$ marks the opening of the ${\alpha }_{1}$-${\beta }_{1}$ cleft, ${d}_{2}$ between the CoMs of ${\alpha }_{2}$ and ${\beta }_{2}$ characterizes the opening of the binding groove, and ${d}_{3}$ between CoMs of the ligand and the binding groove delineates ligand dissociation. During ligand unbinding, the ${\alpha }_{1}$-${\beta }_{1}$ cleft opens first, marked by the increase of ${d}_{1}$ from 8 ${{\text{\AA}}}$ in the crystal structure to 12 ${{\text{\AA}}}$ in the TS. After reaching the TS, ${d}_{1}$ starts to decrease while ${d}_{2}$ starts to increase from its value of 10 ${{\text{\AA}}}$ in the crystal structure until it reaches 16 ${{\text{\AA}}}$. The period of ${d}_{2}$ increase coincides with the dislodging of the ligand, which begins to dissociate rapidly from the binding groove once ${d}_{2}$ reaches 16 ${{\text{\AA}}}$. After the ligand fully dissociates, both the ${\alpha }_{1}$-${\beta }_{1}$ cleft and the binding groove return to their crystal structure conformation in 10 ns. As shown in Fig. 8, the opening of the ${\alpha }_{1}$-${\beta }_{1}$ cleft and the binding groove occur only during the barrier crossing process, leaving the PDZ2 conformation at the beginning and the end of the NRT effectively the same as the crystal structure. The transient duration of this critical conformational change explains why it has never been observed in experiments, demonstrating the unparalleled value of NRTs in providing mechanistic insights into protein function.

The ${\alpha }_{1}$-${\beta }_{1}$ cleft is the major component of the Cdc42 binding interface (Fig. 7b), thus Cdc42 binding will hinder its opening. As shown in Fig. 9c, opening of the ${\alpha }_{1}$-${\beta }_{1}$ cleft moves the ${\alpha }_{1}$-${\beta }_{3}$-${\beta }_{2}$ block toward ${\alpha }_{3}$, while opening the binding groove moves the ${\beta }_{2}$-${\beta }_{3}$ loop toward ${\alpha }_{3}$. The combined effects are the severe steric clashes in the TS (Fig. 9c): between α₃ and two residues (Tyr36, Lys38) in ${\beta }_{3}$, and between ${\alpha }_{3}$ and two residues (Val30, Arg31) in the ${\beta }_{2}$-${\beta }_{3}$ loop. These results suggest that both Cdc42 and ${\alpha }_{3}$ modify ligand affinity to PDZ by interfering with the transient conformational changes during ligand unbinding, potentially slowing down the dissociation process. This interaction provides a straightforward and intuitive mechanism for PDZ allostery. Overall, our simulation results demonstrate the predictive capability of our method by enabling the detection of large-scale conformational changes critical for PDZ allostery that eluded intensive studies for two decades⁴⁴.

Discussion

In this work, we discovered that tRCs control both activation and energy relaxation, revealing a surprising reciprocity between these processes despite their significant differences in energy, timescale, and motion. Energy dissipated in relaxation is on the order of thermal energy, whereas activation requires energy far beyond thermal levels. Additionally, while energy relaxation occurs within a few picoseconds, ligand unbinding in HIV-PR can take over 200 hours. Energy relaxation is also characterized by small-amplitude vibrational motions, in contrast to the large-scale conformational changes typical of activation. These differences highlight the complexity of the underlying mechanisms. Yet, both processes are governed by the same set of essential coordinates—the tRCs—demonstrating a fundamental unity between these seemingly disparate phenomena. If this reciprocity is confirmed across other proteins, it could lead to a generalization of the fluctuation-dissipation theorem in proteins, extending its applicability to macroscopic energy scales.

Our finding has significant applications in enhanced sampling, the primary method for bridging the time-scale gap between MD simulations and functionally important protein processes. The main challenge in enhanced sampling is identifying CVs that can effectively accelerate protein conformational changes. While tRCs are widely regarded as the optimal CVs, their identification previously required NRTs, which themselves required effective enhanced sampling, creating a paradox. Our discovery allows for the computation of tRCs from energy relaxation at minimal computational cost, enabling predictive and efficient enhanced sampling of protein conformational changes. This significantly broadens the range of protein functional processes accessible to MD simulations.

The GWF is a general, flexible method that can identify nonlinear tRCs through piecewise linearization, as detailed in ref. ⁴³. In practice, for all processes we have studied, the tRCs have been linear within numerical error^34,43. It remains to be seen whether this linearity is a general feature of tRCs in proteins and if there is a fundamental physical reason for it.

Methods

The workflow for computing tRCs and generating NRTs of protein conformational changes consists of six steps. (1) Simulate the energy relaxation of the input protein structure. (2) Compute GWF from the energy relaxation trajectories using Eq. (8). (3) Apply singular value decomposition to $\langle \Delta {\mathbb{W}}\left(0\to t\right)\rangle$ and compute the PEFs of the singular coordinates. (4) Identify the leading singular coordinates as the tRCs, distinguished by a clear separation between their PEFs and those of the other singular coordinates. (5) Generate RC-uncovered trajectories by applying bias potentials to move the system along the tRCs. (6) Generate NRTs by applying the shooting move to selected conformations on the RC-uncovered trajectories.

All simulations are constant NVE and use the CHARMM36m force field and TIP3P water model^61,62. For HIV-PR, the simulation system consists of 71,589 atoms from water and 3280 atoms from protein and ligand. For PDZ2, the simulation system consists of 22,101 atoms from water and 1482 atoms from protein and ligand. Simulations of energy relaxation and RC-uncovered trajectories are performed with GROMACS 2019.2 using CPUs, whereas natural trajectories are generated using GROMACS 2022 on GPUs⁶³. All the bonds involving hydrogen atoms are constrained using the LINCS algorithm⁶⁴. Time step used in all simulations is 1 fs.

Simulating energy relaxation

To simulate energy relaxation of ligand-bound HIV-PR and PDZ2 domain in explicit solvent, extra kinetic energy is deposited into the ligand to increase the temperature of each ligand coordinate by 400 K, the same as we did for myoglobin⁴². For energy relaxation of ligand-free HIV-PR in implicit solvent, extra kinetic energy was injected into atoms of active site residues (residues 25, 28, 29) instead. Afterwards, we run regular MD simulation of the system for 5 ps. This is to mimic the process of dissipating the excess energy at the active site after ligand binding or enzymatic reaction. We select 5 ps simulation time because we found that systematic energy flows in the system stop after 5 ps in our study of myoglobin⁴². To test if the SCs depend on the specific amount of excess energy deposited into the ligand, we also tried increasing the ligand temperature by 150 K and 800 K. The results are the same.

Clustering energy relaxation trajectories

For the GWF analysis, 8000 energy relaxation trajectories are divided into 6 clusters (${E}_{0}\left({q}_{i}\right)$ to ${E}_{5}\left({q}_{i}\right)$) using k-mean clustering. Visual inspection of PEFs suggests the presence of six clusters (Supplementary Fig. 1), a pattern consistently observed across all energy relaxation systems studied, including myoglobin⁴², HIV-PR and the PDZ2 domain. Clustering starts with seed trajectories identified by visual inspection. Each energy relaxation trajectory is assigned to the cluster with the closest seed trajectory, resulting in six initial clusters. Trajectories are then re-assigned to the cluster where their average distance to cluster’s trajectories is the shortest. This process repeats until cluster assignments no longer change. The GWF was computed using cluster ${E}_{0}\left({q}_{i}\right)$, which contains 3300 trajectories. More details are provided in Supplementary Note 1.

Generating RC-uncovered trajectories

To generate RC-uncovered trajectories, we apply bias potentials to tRCs ${u}_{a}\,(a={\mathrm{0,1}},..,6)$. Because of the periodicity of angles, we replace u_a by ${Q}_{a}={\sum }_{i=1}^{{N}_{a}}{U}_{{ia}}\cos \left({\chi }_{i}-{\chi }_{i}^{*}\right)$⁶⁵, where ${U}_{{ia}}$ is an element of the left singular matrix of the GWF, ${N}_{a}$ is the number of coordinates included in the definition of ${Q}_{a}$, ${\chi }_{i}$ denotes a dihedral, ${\chi }_{i}^{*}={\chi }_{i}^{0}+c\cdot {U}_{{ia}}$ is the target value of ${\chi }_{i}$, ${\chi }_{i}^{0}$ is the value of ${\chi }_{i}$ in the starting structure, and c is the constant that we use to decide how many units of ${u}_{a}$ we want to move the system. The use of cosine function is to remove the discontinuity caused by the periodicity of angular coordinates. Because the GWF inevitably contains noise, we do not include all the backbone dihedrals in ${Q}_{a}$. Instead, we only include ${\chi }_{i}$ if $\left|{U}_{{ia}}\right| > \,\varepsilon$ to remove dihedrals that are included in ${u}_{a}$ due to the noise in the GWF. In current calculations, we used $\varepsilon=0.03$, resulting in ${N}_{a}\in \left(240,\,290\right)$ for different RCs of HIV-PR.

To apply the bias force gradually, we divide the interval between the minimum value $-c{\sum }_{i}{U}_{{ia}}$ and the target value $c{\sum }_{i}{U}_{{ia}}$ of ${Q}_{a}$ into 100 bins, with the n-th bin spanning the interval $\left[{Q}_{a,n},\,{Q}_{a,n+1}\right)$. At any instant t, when the system configuration is at ${Q}_{a}\left(t\right)\in \left[{Q}_{a,n},\,{Q}_{a,{n}+1}\right)$, it feels an bias potential of the form $V\left({Q}_{a}\right)=\frac{1}{2}k{\left({Q}_{a}\left(t\right)-{Q}_{a,n+1}\right)}^{2}$. In this way, the system always feels a gentle force pulling it towards the target value, with the center of the bias potential shifting adaptively as the system moves from one bin to another. Even though the relationship between ${Q}_{a}$ and ${u}_{a}$ is nonlinear, Fig. 5 shows that applying a bias potential to ${Q}_{a}$ moves ${u}_{a}$ continuously and efficiently, suggesting that the bias force on ${Q}_{a}$ is effectively translated into a force acting on ${u}_{a}$. Because ${Q}_{a}$ is a unitless variable, the spring constant $k$ has a unit of ${{\rm{kJ\; mo}}}{{{\rm{l}}}}^{-1}$.

To achieve multiple rounds of reversible transitions, we will reverse the direction of the bias potential once the system reaches the terminal bins centered at $\Delta {u}_{0}=0$ and 2.6 radians respectively, and has stabilized there for 50 ps. To balance the duration for flap opening and closing transitions and minimize the overall duration of each cycle, we empirically selected $k=2000\,{{\rm{kJ\; mo}}}{{{\rm{l}}}}^{-1}$ for flap opening and $k=500\,{{\rm{kJ\; mo}}}{{{\rm{l}}}}^{-1}$ for flap closing, respectively.

Generating NRTs

To generate NRTs using the shooting move¹⁰, we first identify a set of candidate TS conformations from an RC-uncovered trajectory based on visual inspection and intuition. From each candidate conformation, we launch 5 pairs of MD trajectories with opposite initial momenta. A natural reactive trajectory is successfully generated when a pair of MD trajectories launched from a candidate conformation reached opposite basins. This procedure also allows us to identify TS conformations by their success in generating natural trajectories.

Enhanced sampling with empirical CVs

For metadynamics simulation of MA/CA dissociation, we used the widely adopted well-tempered metadynamics implemented in PLUMED2⁶⁶. The parameters used in the simulations are: Gaussian height 0.2 ${{\rm{kJ\; mo}}}{{{\rm{l}}}}^{-1}$, Gaussian width 0.25 nm, and deposition of a Gaussian every 0.5 ps, all based on PLUMED2 recommendations and ref. ⁶⁷. Additional Gaussian heights of 1, 100, 1000, and 2000 ${{\rm{kJ\; mo}}}{{{\rm{l}}}}^{-1}$ are also tested (Supplementary Fig. 4) to achieve MA/CA dissociation.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The PDB files for DRV and MA/CA-bound HIV-PR and PDZ2, Par6 CRIB-PDZ, PDZ3 domains were obtained from wwPDB under accession codes 1T3R, 1KJ4, 3LNY, 1NF3, and 1TP5, respectively. The trajectory data without water coordinates, together with simulation parameters and initial and final structures of trajectories have been deposited in Zenodo under accession code (https://doi.org/10.5281/zenodo.14531159). The raw simulation data have not been deposited due to their size; access can be obtained by contacting the authors. Source data are provided with this paper.

Code availability

All the custom codes used in this study are deposited in Code Ocean (https://codeocean.com/capsule/6873439/tree).

References

Frauenfelder, H., Sligar, S. G. & Wolynes, P. G. The energy landscapes and motions of proteins. Science 254, 1598–1603 (1991).
Article ADS CAS PubMed MATH Google Scholar
Henzler-Wildman, K. & Kern, D. Dynamic personalities of proteins. Nature 450, 964–972 (2007).
Article ADS CAS PubMed MATH Google Scholar
Xie, T., Saleh, T., Rossi, P. & Kalodimos, C. G. Conformational states dynamically populated by a kinase determine its function. Science 370, eabc2754 (2020).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).
Article ADS CAS PubMed MATH Google Scholar
Dror, R. O. et al. Structural basis for modulation of a G-protein-coupled receptor by allosteric drugs. Nature 503, 295–299 (2013).
Article ADS CAS PubMed MATH Google Scholar
Shan, Y. et al. Molecular basis for pseudokinase-dependent autoinhibition of JAK2 tyrosine kinase. Nat. Struct. Mol. Biol. 21, 579–584 (2014).
Article CAS PubMed PubMed Central MATH Google Scholar
Bussi, G. & Laio, A. Using metadynamics to explore complex free-energy landscapes. Nat Rev Phys 2, 200–212 (2020).
Article MATH Google Scholar
Henin, J., Lelievre, T., Shirts, M. R., Valsson, O. & Delemotte, L. Enhanced sampling methods for molecular dynamics simulations. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.04164 (2022).
Bolhuis, P. G., Chandler, D., Dellago, C. & Geissler, P. L. Transition path sampling: throwing ropes over rough mountain passes, in the dark. Annu. Rev. Phys. Chem. 53, 291–318 (2002).
Article ADS CAS PubMed MATH Google Scholar
Schwartz, S. D. Perspective: Path sampling methods applied to enzymatic catalysis. J Chem Theory Comput 18, 6397–6406 (2022).
Article CAS PubMed PubMed Central Google Scholar
Laio, A. & Parrinello, M. Escaping free-energy minima. Proc. Natl Acad. Sci. USA 99, 12562–12566 (2002).
Article ADS CAS PubMed PubMed Central Google Scholar
Torrie, G. M. & Valleau, J. P. Non-physical sampling distributions in monte-carlo free-energy estimation - umbrella sampling. J. Comput. Phys. 23, 187–199 (1977).
Article ADS MATH Google Scholar
Darve, E. & Pohorille, A. Calculating free energies using average force. J. Chem. Phys. 115, 9169–9183 (2001).
Article ADS CAS MATH Google Scholar
Invernizzi, M., Piaggi, P. M. & Parrinello, M. Unified approach to enhanced sampling. Phys. Rev. X 10, 041034 (2020).
Levy, R. M., Srinivasan, A. R., Olson, W. K. & McCammon, J. A. Quasi-harmonic method for studying very low frequency modes in proteins. Biopolymers 23, 1099–1112 (1984).
Article CAS PubMed MATH Google Scholar
Pietrucci, F., Marinelli, F., Carloni, P. & Laio, A. Substrate binding mechanism of HIV-1 protease from explicit-solvent atomistic simulations. J. Am. Chem. Soc. 131, 11811–11818 (2009).
Article CAS PubMed MATH Google Scholar
Bonati, L., Piccini, G. & Parrinello, M. Deep learning the slow modes for rare events sampling. Proc. Natl Acad. Sci. USA 118, e2113533118 (2021).
Kang, P., Trizio, E. & Parrinello, M. Computing the committor with the committor to study the transition state ensemble. Nat. Comput. Sci. 4, 451–460 (2024).
Article PubMed MATH Google Scholar
Bonati, L., Rizzi, V. & Parrinello, M. Data-driven collective variables for enhanced sampling. J. Phys. Chem. Lett. 11, 2998–3004 (2020).
Article CAS PubMed MATH Google Scholar
Belkacemi, Z., Gkeka, P., Lelievre, T. & Stoltz, G. Chasing collective variables using autoencoders and biased trajectories. J. Chem. Theory Comput. 18, 59–78 (2022).
Article CAS PubMed MATH Google Scholar
Mardt, A., Pasquali, L., Wu, H. & Noe, F. VAMPnets for deep learning of molecular kinetics. Nat. Commun. 9, 5 (2018).
Article ADS PubMed PubMed Central MATH Google Scholar
Kirmizialtin, S. & Elber, R. Revisiting and computing reaction coordinates with directional milestoning. J. Phys. Chem. A 115, 6137–6148 (2011).
Article CAS PubMed PubMed Central MATH Google Scholar
Pande, V. S., Beauchamp, K. & Bowman, G. R. Everything you wanted to know about Markov state models but were afraid to ask. Methods 52, 99–105 (2010).
Article CAS PubMed PubMed Central Google Scholar
Weinan, E., Ren, W. Q. & Vanden-Eijnden, E. String method for the study of rare events. Phys. Rev. B 66, 052301 (2002).
Du, R., Pande, V. S., Grosberg, A. Y., Tanaka, T. & Shakhnovich, E. S. On the transition coordinate for protein folding. J Chem Phys 108, 334–350 (1998).
Article ADS CAS MATH Google Scholar
Ma, A. & Dinner, A. R. Automatic method for identifying reaction coordinates in complex systems. J. Phys. Chem. B 109, 6769–6779 (2005).
Article CAS PubMed MATH Google Scholar
Onsager, L. Initial recombination of ions. Phys. Rev. 54, 554–557 (1938).
Article ADS CAS MATH Google Scholar
Ryter, D. On the eigenfunctions of the fokker-planck operator and of its adjoint. Phys. A 142, 103–121 (1987).
Article MathSciNet MATH Google Scholar
Berne, B. J., Borkovec, M. & Straub, J. E. Classical and modern methods in reaction-rate theory. J. Phys. Chem. Us 92, 3711–3725 (1988).
Article CAS MATH Google Scholar
Wu, S. & Ma, A. Mechanism for the rare fluctuation that powers protein conformational change. J. Chem. Phys. 156, 05419 (2022).
Article Google Scholar
Zheng, L. Q., Chen, M. G. & Yang, W. Random walk in orthogonal space to achieve efficient free-energy simulation of complex systems. Proc. Natl Acad. Sci. USA 105, 20227–20232 (2008).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Zheng, L., Chen, M. & Yang, W. Simultaneous escaping of explicit and hidden free energy barriers: application of the orthogonal space random walk strategy in generalized ensemble based conformational sampling. J. Chem. Phys. 130, 234105 (2009).
Article ADS PubMed Google Scholar
Wu, S., Li, H. & Ma, A. Exact reaction coordinates for flap opening in HIV-1 protease. Proc. Natl Acad. Sci. USA 119, e2214906119 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bolhuis, P. G., Dellago, C. & Chandler, D. Reaction coordinates of biomolecular isomerization. Proc. Natl Acad. Sci. USA 97, 5877–5882 (2000).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Ma, A., Nag, A. & Dinner, A. R. Dynamic coupling between coordinates in a model for biomolecular isomerization. J. Chem. Phys. 124, 144911 (2006).
Article ADS PubMed Google Scholar
Li, W. & Ma, A. Recent developments in methods for identifying reaction coordinates. Mol. Simul. 40, 784–793 (2014).
Article CAS PubMed PubMed Central MATH Google Scholar
Best, R. B. & Hummer, G. Reaction coordinates and rates from transition paths. Proc. Natl Acad. Sci. USA 102, 6732–6737 (2005).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Jung, H. et al. Machine-guided path sampling to discover mechanisms of molecular self-organization. Nat. Comput. Sci. 3, 334–345 (2023).
Article PubMed PubMed Central MATH Google Scholar
Li, W. & Ma, A. Reaction mechanism and reaction coordinates from the viewpoint of energy flow. J. Chem. Phys. 144, 114103 (2016).
Article ADS PubMed PubMed Central MATH Google Scholar
Li, H. & Ma, A. Kinetic energy flows in activated dynamics of biomolecules. J. Chem. Phys. 153, 094109 (2020).
Article ADS CAS PubMed MATH Google Scholar
Li, H., Wu, S. & Ma, A. Origin of protein quake: energy waves conducted by a precise mechanical machine. J. Chem. Theory Comp. 18, 5692–5702 (2022).
Article CAS MATH Google Scholar
Wu, S., Li, H. & Ma, A. A rigorous method for identifying one-dimensional reaction coordinate in complex molecules. J. Chem. Theo. Comp. 18, 2836–2844 (2022).
Article CAS MATH Google Scholar
Gianni, S. & Jemth, P. Allostery frustrates the experimentalist. J. Mol. Biol. 435, 167934 (2023).
Article CAS PubMed Google Scholar
Onsager, L. Reciprocal relations in irreversible processes. II. Phys. Rev. 38, 2265–2279 (1931).
Article ADS CAS MATH Google Scholar
Ishima, R., Freedberg, D. I., Wang, Y. X., Louis, J. M. & Torchia, D. A. Flap opening and dimer-interface flexibility in the free and inhibitor-bound HIV protease, and their implications for function. Structure 7, 1047–1055, (1999).
Article CAS PubMed Google Scholar
Sadiq, S. K. & De Fabritiis, G. Explicit solvent dynamics and energetics of HIV-1 protease flap opening and closing. Proteins 78, 2873–2885 (2010).
Article CAS PubMed MATH Google Scholar
Miao, Y., Huang, Y. M., Walker, R. C., McCammon, J. A. & Chang, C. A. Ligand binding pathways and conformational transitions of the HIV protease. Biochemistry 57, 1533–1541 (2018).
Article CAS PubMed MATH Google Scholar
Sadiq, S. K., Noe, F. & De Fabritiis, G. Kinetic characterization of the critical step in HIV-1 protease maturation. Proc. Natl Acad. Sci. USA 109, 20449–20454 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Dierynck, I. et al. Binding kinetics of darunavir to human immunodeficiency virus type 1 protease explain the potent antiviral activity and high genetic barrier. J. Virol. 81, 13845–13851 (2007).
Article CAS PubMed PubMed Central MATH Google Scholar
Hummer, G. From transition paths to transition states and rate coefficients. J. Chem. Phys. 120, 516–523 (2004).
Article ADS CAS PubMed MATH Google Scholar
Jung, H., Okazaki, K. & Hummer, G. Transition path sampling of rare events by shooting from the top. J. Chem. Phys. 147, 152716 (2017).
Barducci, A., Bussi, G. & Parrinello, M. Well-tempered metadynamics: a smoothly converging and tunable free-energy method. Phys. Rev. Lett. 100, 020603 (2008).
Article ADS PubMed MATH Google Scholar
Berezhkovskii, A. M. & Szabo, A. Diffusion along the splitting/commitment probability reaction coordinate. J. Phys. Chem. B 117, 13115–13119 (2013).
Article CAS PubMed PubMed Central MATH Google Scholar
Liu, X. & Fuentes, E. J. Emerging themes in PDZ domain signaling: structure, function, and inhibition. Int. Rev. Cell Mol. Biol. 343, 129–218 (2019).
Article CAS PubMed MATH Google Scholar
Peterson, F. C., Penkert, R. R., Volkman, B. F. & Prehoda, K. E. Cdc42 regulates the Par-6 PDZ domain through an allosteric CRIB-PDZ transition. Mol. Cell 13, 665–676 (2004).
Article CAS PubMed Google Scholar
Bozovic, O., Jankovic, B. & Hamm, P. Sensing the allosteric force. Nat. Commun. 11, 5841 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Petit, C. M., Zhang, J., Sapienza, P. J., Fuentes, E. J. & Lee, A. L. Hidden dynamic allostery in a PDZ domain. Proc. Natl Acad. Sci. USA 106, 18249–18254 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Bozovic, O. et al. Real-time observation of ligand-induced allosteric transitions in a PDZ domain. Proc. Natl Acad. Sci. USA 117, 26031–26039 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Buchli, B. et al. Kinetic response of a photoperturbed allosteric protein. Proc. Natl Acad. Sci. USA 110, 11725–11730 (2013).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Huang, J. et al. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods 14, 71–73 (2017).
Article CAS PubMed MATH Google Scholar
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).
Article ADS CAS Google Scholar
Hess, B., Kutzner, C., van der Spoel, D. & Lindahl, E. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. 4, 435–447 (2008).
Article CAS PubMed Google Scholar
Hess, B., Bekker, H., Berendsen, H. J. C. & Fraaije, J. G. E. M. LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem. 18, 1463–1472 (1997).
Article CAS Google Scholar
Tiwary, P. & Berne, B. J. Spectral gap optimization of order parameters for sampling complex molecular systems. Proc. Natl Acad. Sci. USA 113, 2839–2844 (2016).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Tribello, G. A., Bonomi, M., Branduardi, D., Camilloni, C. & Bussi, G. PLUMED 2: New feathers for an old bird. Comput. Phys. Commun. 185, 604–613 (2014).
Article ADS CAS Google Scholar
Pietrucci, F., Marinelli, F., Carloni, P. & Laio, A. Substrate binding mechanism of HIV−1 protease from explicit-solvent atomistic simulations. J. Am. Chem. Soc. 131, 11811–11818 (2009).
Article CAS PubMed MATH Google Scholar

Download references

Acknowledgements

We thank NIH (R21 AI162197, R21 AI186936) and NSF (CHE−1665104) awards to A.M. We thank Jie Liang for his critical reading of an early version of the manuscript and helpful comments. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

Author information

Authors and Affiliations

Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill Department of Biomedical Engineering, The University of Illinois Chicago, 851 South Morgan Street, Chicago, IL, 60607, USA
Huiyu Li & Ao Ma

Authors

Huiyu Li
View author publications
Search author on:PubMed Google Scholar
Ao Ma
View author publications
Search author on:PubMed Google Scholar

Contributions

A.M. designed the research. H.L. and A.M. conducted research and analyzed data. A.M. wrote the manuscript.

Corresponding author

Correspondence to Ao Ma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data (download ZIP )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, H., Ma, A. Enhanced sampling of protein conformational changes via true reaction coordinates from energy relaxation. Nat Commun 16, 786 (2025). https://doi.org/10.1038/s41467-025-55983-y

Download citation

Received: 26 April 2024
Accepted: 02 January 2025
Published: 17 January 2025
Version of record: 17 January 2025
DOI: https://doi.org/10.1038/s41467-025-55983-y