NEP-MB-pol: a unified machine-learned framework for fast and accurate prediction of water’s thermodynamic and transport properties

Xu, Ke; Liang, Ting; Xu, Nan; Ying, Penghua; Chen, Shunda; Wei, Ning; Xu, Jianbin; Fan, Zheyong

doi:10.1038/s41524-025-01777-1

Download PDF

Article
Open access
Published: 27 August 2025

NEP-MB-pol: a unified machine-learned framework for fast and accurate prediction of water’s thermodynamic and transport properties

Ke Xu^1,2,
Ting Liang²,
Nan Xu³,
Penghua Ying⁴,
Shunda Chen⁵,
Ning Wei⁶,
Jianbin Xu² &
…
Zheyong Fan¹

npj Computational Materials volume 11, Article number: 279 (2025) Cite this article

7746 Accesses
17 Citations
6 Altmetric
Metrics details

Subjects

Abstract

The complex interatomic interactions and strong nuclear quantum effects in water pose significant challenges for accurately modeling its structural, thermodynamic, and transport behavior across varied conditions. While machine-learned potentials have improved the prediction of either static or transport properties individually, a unified computational framework that accurately captures both has remained elusive. Here, we introduce a machine-learned framework with a highly accurate and efficient neuroevolution potential trained on extensive many-body polarization reference data approaching coupled-cluster-level accuracy, combined with path-integral molecular dynamics and quantum-correction techniques. By capturing the quantum nature of water, this framework accurately predicts its structural, thermodynamic, and transport properties across a broad temperature range, enabling fast, accurate, and simultaneous prediction of self-diffusion coefficient, viscosity, and thermal conductivity. This work represents a major stride in water modeling, providing a unified and robust approach for exploring water’s thermodynamic and transport properties, with broad applications across multiple scientific disciplines.

Realistic phase diagram of water from “first principles” data-driven quantum simulations

Article Open access 08 June 2023

Machine learning-based heat flux estimation from high-speed video during saturated pool boiling over vertical tube

Article Open access 14 February 2026

Viscosity in water from first-principles and deep-neural-network simulations

Article Open access 01 July 2022

Introduction

Water is the foundation of life and its unique properties make it a central focus of research across numerous scientific disciplines, including physical, chemical, materials, biological, geological, environmental, climate sciences, as well as in energy, food, and technological applications^1,2. Water’s structural complexity and anomalous behavior have motivated significant research efforts and extensive studies to characterize and understand its fundamental properties, which are crucial across fields where water serves as a primary or critical component^3,4,5,6. Despite the critical need to accurately model water’s behavior across a wide range of temperatures and pressures, water’s intrinsic complexity has made this a long-standing challenge⁷.

Advances in computer simulations of water and aqueous systems have provided powerful tools to study water’s properties at the atomistic level, offering insights even beyond experimental capabilities. However, a key obstacle in atomistic simulations is achieving both accuracy and computational efficiency. Empirical potentials provide low-cost calculations but often lack the fidelity needed to capture water’s complex properties, while quantum-mechanical approaches are usually more accurate but come with prohibitive computational costs for large-scale simulations. Machine-learned potentials (MLPs)⁸ have recently transformed this landscape by combining high accuracy with reduced computational cost, allowing simulations that were previously out of reach. Particularly, MLPs have shown promise for calculating various properties of water⁹.

Despite this progress, previous works have primarily focused on structural and thermodynamic properties of water^{10,11,12,13,14,15}, while relatively few studies address transport properties such as self-diffusion coefficient, viscosity, and thermal conductivity^{10,16,17,18,19,20}. Morawietz et al.,¹⁰ calculated the self-diffusion coefficient and viscosity of water for a few temperatures using a MLP trained using reference data based on revised Perdew-Burke-Ernzerhof (RPBE) or Becke-Lee-Yang-Parr (BLYP) functionals, revealing that van der Waals interactions in the reference data play a crucial role in accurately calculating transport properties. However, the results based on both RPBE and BYLP reference data only achieve qualitative agreement with experimental measurements. Yao and Kanai¹⁶ calculated the self-diffusion coefficient using a MLP trained on strongly constrained and appropriately normed (SCAN) reference data, obtaining results that are significantly lower than experimental values. Malosso et al.¹⁷ computed viscosity using a deep potential (DP) model trained on SCAN reference data, finding that their predictions are significantly overestimated compared to experimental values; however, a temperature shift could bring the predictions closer to experimental data. For thermal transport, both DP^18,19 and neuroevolution potential (NEP)²⁰ trained on SCAN reference data have been used to predict the temperature-dependent thermal conductivity. Only the NEP model, when combined with a quantum correction scheme, achieves quantitative agreement with experimental results²⁰.

The lack of quantitative agreement between calculations and measurements highlights two potential limitations in previous works. First, the quality of the reference data has a significant impact on the accuracy of MLP predictions, as indeed demonstrated by the markedly different results for transport coefficients^10,16,17 obtained using MLPs trained on reference data based on different functionals in density-functional theory calculations. Second, nuclear quantum effects (NQEs)²¹ play a crucial role in determining water’s transport properties. While NQEs have been shown to impact many static properties of water in atomistic simulations using MLPs^11,12, the influence of NQEs on transport properties remains not fully understood.

In this work, we present NEP-MB-pol, a neuroevolution potential model trained on a highly accurate coupled-cluster-level MB-pol dataset, combined with path-integral molecular dynamics (PIMD)^22,23 and quantum-correction techniques to account for NQEs. MB-pol^24,25 is built upon the many-body expansion of the interatomic interactions and has been parameterized according to highly accurate quantum-chemistry calculations at the coupled-cluster level, including single, double, and perturbative triple excitations [CCSD(T)], as illustrated in Fig. 1a. MB-pol has been demonstrated to be capable of describing the properties of water from the gas to the condensed phases²⁶. Although more efficient than typical quantum-mechanical calculations, MB-pol is still too computationally demanding for direct calculation of water’s transport properties over a broad temperature range. To accelerate these calculations, we employ the NEP approach²⁷ to train a MLP using MB-pol reference data (Fig. 1b), achieving an interatomic potential model for water with quantum-chemistry-level accuracy and empirical-potential-like speed (Fig. 2). This NEP model enables extensive large-scale and long-duration molecular dynamics (MD) simulations that robustly account for NQEs (Fig. 1c). For the first time, this NEP-MB-pol framework enables quantitative predictions of all three transport coefficients, including self-diffusion coefficient, viscosity, and thermal conductivity, as well as heat capacity and structural properties like density and radial distribution function across a temperature range simultaneously. For comparison, we also train a NEP model on SCAN reference data, finding it accurately predicts only the thermal conductivity, highlighting the advantage of MB-pol reference data in capturing water’s transport behavior. This NEP-MB-pol framework represents a significant advance in modeling water’s thermodynamic and transport properties, with great potential for broader applications.

**Fig. 1: NEP-MB-pol Workflow/Framework.**

**Fig. 2: Evaluation of force accuracy and computational speed.**

Results

The trained NEP models

The NEP approach²⁷ is a framework for generating highly efficient MLPs. It is a neural network potential trained using an evolutionary algorithm. The site energy U_i of atom i is taken as a function of a descriptor vector with a number of radial and angular descriptors, which are constructed based on the Chebyshev polynomials and the spherical harmonics. The descriptors are invariant with respect to translation, rotation, and permutation of atoms of the same species. The total energy is obtained by summing the site energies across atoms $U=\sum _{i}{U}_{i}$. The force on an atom is derived as the negative gradient of the total energy with respect to the atom’s position, F_i = − ∂U/∂r_i.

MLPs for water based on DP have been developed using both MB-pol²⁸ and SCAN¹³ reference data. Both datasets are very comprehensive. The MB-pol dataset²⁸ includes 59181 molecular structures (water monomers, dimers, etc), 43,494 bulk water structures, and 1932 bulk-molecule interface structures, totaling 104607 structures (Fig. 1a). The SCAN dataset¹³ contains 48419 structures sampled from MD simulations with a wide range of temperature (150–2000 K) and pressure (10⁻⁴ to 50 GPa). In this work, we utilize these two datasets, with sub-sampling as detailed below, to train NEP models.

In principle, we could train NEP models using the full datasets that were originally used for training the DP models^13,28. However, this is not an optimal strategy for the very data-efficient NEP approach, which is based on smaller-scale neural network models. By employing farthest-point sampling in the descriptor space, we obtained significantly reduced yet representative training datasets, containing only 1% of the original full datasets (1250 and 601 structures for the reduced MB-pol and SCAN datasets, respectively). These reduced datasets are visualized via principal component analysis (PCA) of descriptor space, as shown in Fig. 1b and Fig. S1. The NEP models trained on these reduced datasets are referred to as NEP-MB-pol and NEP-SCAN, respectively, while the DP models trained on the full datasets are denoted as DP-MB-pol²⁸ and DP-SCAN¹³, respectively. One advantage of using a reduced training dataset is availability of many unseen structures, which enables robust validation against over-fitting. For NEP-MB-pol, the root-mean-square errors of energy and force evaluated on the entire dataset are 1.9 meV atom⁻¹ and 47.7 meV Å⁻¹, respectively. In comparison, the corresponding values for DP-MB-pol²⁸ (specifically, the model labeled DNN-seed2) are 23.3 meV atom⁻¹ and 48.2 meV Å⁻¹, respectively (see Fig. S2 for parity plots). For NEP-SCAN, the root-mean-square errors of energy and force evaluated on the whole dataset are 3.3 meV atom⁻¹ and 85.1 meV Å⁻¹, respectively, while the corresponding values for DP-SCAN¹³ are 2.3 meV atom⁻¹ and 121.1 meV Å⁻¹, respectively (see Fig. S3 for parity plots). These comparisons demonstrate that the NEP models make highly accurate predictions for unseen structures. Another benefit of using a reduced training dataset is flexibility for future extensions; adding a small number of structures into a large existing dataset typically does not lead to significant improvements, while a reduced dataset allows for more effective extension.

Further accuracy validation of NEP models

To further evaluate the accuracy of our NEP models, we generated an independent validation dataset at the CCSD(T) level of theory (see Methods for details). Using CCSD(T) as the reference gold standard, we present parity plots (Fig. 2a–f) that compare reference forces from CCSD(T) with those predicted by various approaches. Our results demonstrate that MB-pol is indeed significantly more accurate than density-functional theory calculations using the SCAN functional (see Methods for details). Consequently, the NEP-MB-pol model, trained on the MB-pol dataset, is more accurate than the NEP-SCAN model, which was trained on the SCAN dataset. Among the four MLPs, our NEP-MB-pol model stands out, achieving the best accuracy, with a force root-mean-square error of 69.77 meV Å⁻¹, which is only slightly higher than the MB-pol model’s value of 50.56 meV Å⁻¹. For comparison, the DP-MB-pol²⁸ and DP-SCAN¹³ models are found to be less accurate. The DP-MB-pol model discussed here was selected based on its relatively good performance in predicting water density²⁸, as will be further discussed below. Error metrics for additional DP-MB-pol models²⁸ are provided in Supplementary Table S1.

Computational speed of NEP-MB-pol

The computational efficiency of a MLP is crucial for its effective applications in large-scale and long-duration MD simulations, especially when numerous replicas per atom are needed to capture the NQEs. The NEP model, as implemented in GPUMD²⁹, exhibits excellent computational performance. Our NEP model achieves a computational speed of about 1 × 10⁷ atom step s⁻¹ in MD simulations with systems containing more than 12000 atoms, using a single GeForce RTX 4090 GPU card paired with a single i5-14600K CPU core (Fig. 2g). This is about 100 times faster than DP running on the same GPU. It is also faster (for systems with more than 5000 atoms) than the SPC/E empirical potential³⁰ running on 16 i5-14600K CPU cores or on the same single RTX 4090 GPU paired with a single i5-14600K CPU core using LAMMPS compiled with the KOKKOS package, with a cutoff of 10 Å in real space and the PPPM method with a target accuracy of 10⁻⁴ in reciprocal space. It is important to note that the MB-pol model can only simulate up to about 12,000 atoms and its computational cost scales quadratically with respect to system size. At this scale, NEP-MB-pol is about 5 orders of magnitude faster than MB-pol. This dramatic speedup empowers us with a water potential model with quantum-chemical accuracy and empirical-potential-like speed.

Structural and thermodynamic properties

Prominent features of water can be reflected by the radial distribution function g(r). For O-O pairs, NEP-MB-pol can very accurately reproduce the experimental data, even at the classical MD level, with PIMD introducing minimal changes. This indicates that NQEs are negligible for oxygen. In contrast, NEP-SCAN overshoots both the first and the second peaks of the O-O distribution, as shown in Fig. 3a. However, the O-H and H-H distributions exhibit very strong NQEs, as shown in Fig. 3b, c. With classical MD, both NEP-MB-pol and NEP-SCAN significantly underestimate the width of the first peaks in g_OH and g_HH, reflecting the absence of zero-point motion in classical MD simulations. Remarkably, PIMD brings NEP-MP-pol results much closer to experimental data, particularly for g_HH. Note that the small differences around the second peak in g_OH from NEP-MB-pol with classical MD have also been observed in the prior work using MB-pol²⁶. On the other hand, the results from the NEP-SCAN model with classical MD show more pronounced deviations from the experiential data, which cannot be totally attributed to the absence of NQEs. The excellent agreement between NEP-MB-pol with PIMD and experimental data highlights both the high accuracy of our NEP-MB-pol model and the critical role of NQEs in describing O-H and H-H bonds.

**Fig. 3: Radial distribution functions, density, and isobaric heat capacity of water.**

The accurate prediction of density further demonstrates the reliability of the NEP-MB-pol potential. In the temperature range from 280 to 370 K at 1 atm pressure, the density of water monotonically decreases according to experimental data³¹, which have very small uncertainties (Fig. 3d). NEP-MB-pol, even with classical MD simulations, reproduces this trend well. The agreement between NEP-MB-pol and experiments is further improved by considering NQEs using PIMD simulations. This is consistent with the radial distribution function results: NQEs result in lower density at a given temperature. The maximum difference between PIMD simulation with NEP-MB-pol and experiments is at 280 K, which is about 0.4%. We have checked that this difference cannot be made smaller by using a larger number of replicas in the PIMD simulations (Fig. S4). It thus represents a small degree of inaccuracy of the NEP-MB-pol model, likely inherited from the MB-pol model. Indeed, density predictions using the original MB-pol model^32,33 at the classical MD level agree well with our NEP-MB-pol results. Notably, the classical MD results from DP-MB-pol²⁸ show poor agreement with those from MB-pol, reflecting the relatively higher accuracy of our NEP-MB-pol model over the DP-MB-pol model.

Figure 3d also presents density results obtained by classical MD and PIMD simulations using the NEP-SCAN model. It is evident that the NEP-SCAN model predicts a drastically different trend than experiments, significantly overestimating the density when T > 300 K. This is consistent with the overestimation of the melting point by SCAN, which originates from its overestimated strength of hydrogen bonds³⁴.

Figure 3e shows the heat capacity C_p of water as a function of the number of beads, as predicted by the NEP-MB-pol model. For systems with 64 beads, C_p obtained from NEP-MB-pol is in excellent agreement with experimental data³¹. The relationship between C_p and temperature is presented in Fig. 3f. As observed, predictions from PIMD simulations with NEP-MB-pol, incorporating NQEs, match experimental observations very well. In contrast, predictions from classical MD simulations with NEP-MB-pol show noticeable deviations, and the results from classical MD simulations with NNP-SCAN³⁴ exhibit even greater deviations. These results highlight the importance of using a high-quality dataset and accurately incorporating NQEs to achieve precise predictions of water’s heat capacity.

Self-diffusion coefficient

We now move on to the study of water’s transport properties, starting from the self-diffusion coefficient. Note that prior to the calculation of each transport property, the system is equilibrated for 50 ps. The diffusion coefficient can be calculated either as the time derivative of the mean-square displacement, or equivalently, the time integral of the velocity autocorrelation function. The running diffusion coefficient as a function of the correlation time calculated using the NEP-MB-pol model in the temperature range of 280 to 370 K (all at 1 atm) are shown in Fig. S5. The running diffusion coefficient converge well up to a correlation time of 5 ps, which is then taken as the upper limit in the time integral. A convergence test with respect to the number of beads in TRPMD simulations is provided in Fig. S6, showing good convergence at 32 beads. Finite-size effects may influence diffusion coefficient calculations in MD simulations. We have carefully checked the finite-size effects in Fig. S7, and found that the finite-size effects are negligible when the number of atoms in the simulation cell exceeds 10000. We have chosen a very safe calculation cell with 24567 atoms in all the diffusion coefficient and subsequent transport calculations to ensure accuracy.

The time-converged diffusion coefficient values from 280 to 370 K are presented in Fig. 4a. Using classical MD simulations, NEP-MB-pol already achieves a good agreement with experimental results³⁵. Incorporating NQEs using thermostatted ring-polymer MD (TRPMD)^23,36 further improves the agreement, particularly at higher temperatures. Minor deviations between NEP-MB-pol results and experiment data likely stem from the small inaccuracies in the NEP-MB-pol model, as noted earlier. Note that time-correlation functions and transport properties should be calculated using the TRPMD algorithm, in which the centroid degree of freedom is not thermostatted. In contrast, static thermodynamic properties are best calculated using the PIMD algorithm, which properly samples the quantum statistical distribution.

**Fig. 4: Transport coefficients of water.**

Prior theoretical predictions of the diffusion coefficient of water using MLPs also exhibit limitations. For example, using classical MD simulations, the predicted diffusion coefficients from the ANN potential trained with SCAN reference data (ANN-SCAN)¹⁶ are significantly underestimated across the whole temperature range. This trend is also confirmed by our NEP-SCAN model, which was trained on a different SCAN reference dataset¹³. The discrepancy between ANN-SCAN and NEP-SCAN models may arise from differences in dataset generation, model architecture, hyperparameters, or training protocols. The quantum effects as revealed by the ANN-SCAN simulations are also rather small. Using multiple NNP models trained using RPBE (NNP-RPBE) reference data^10,37,38, the calculated diffusion coefficient values from classical MD simulations significantly overestimate the experimental results. Given the minimal overall NQEs in diffusion coefficient calculations, the results in Fig. 4a strongly suggest that NEP-MB-pol has a higher accuracy than previous MLPs.

The capability of accurately predicting the diffusion coefficient across a wide range of temperatures using NEP-MB-pol suggests that this model has prediction power and our computational framework could be very useful for exploring water properties at extreme conditions that are inaccessible to experimental measurements.

Shear viscosity

We next examine shear viscosity η of water, which can be calculated as a time integration of the shear stress autocorrelation function (see Method for details). We consider the temperatures from 280 to 370 K and a constant pressure of 1 atm. Convergence tests with respect to the number of beads and system size in TRPMD simulations are shown in Fig. S6 and Fig. S7, respectively. The running shear viscosity as a function of the correlation time calculated using the NEP-MB-pol model for selected temperatures are shown in Fig. S8. Similar to the running diffusion coefficient, the running viscosity η(t) also converges with respect to the correlation time t. One difference between diffusion coefficient and viscosity is that the latter involves a collective correlation function, which exhibits higher statistical error with the same length of trajectory. Therefore, we had to perform more independent runs for viscosity calculations. Here we have performed 30 independent runs for each temperature, and the results are well converged.

The time-converged shear viscosity values calculated from NEP-MB-pol and NEP-SCAN are presented in Fig. 4b. The results are compared with experimental data³¹ and previous results from NNP-SCAN¹⁷. Using classical MD simulations, the NEP-MB-pol model produces results generally smaller than experimental values. The maximum deviation is about 40% at 280 K. Incorporating NQEs via thermostatted ring-polymer MD simulations significantly improves the calculated results, although there is still an underestimation of about 20% at 280 K. This underestimation is likely related to the small inaccuracy of the NEP-MB-pol mentioned above. This means that viscosity is a physical property that is highly sensitive to the accuracy of the interatomic force in the MD simulations. Indeed, the NNP-RPBE model with classical MD simulations¹⁰ shows a larger underestimation than our NEP-MB-pol model, while both the NEP-SCAN model and the previous NNP-SCAN model predict viscosity values that are several times larger than the experimental results.

Thermal conductivity

Finally, we investigate the thermal conductivity κ of water, which can be calculated as a time integral of the heat current auto-correlation function. Because the heat current can be decomposed into potential (p) and kinetic (k) parts, the heat current auto-correlation function, and hence the running thermal conductivity, can be decomposed into three terms: the p-p term (κ^pp), the k-k term (κ^kk), and the cross term (κ^pk). The running thermal conductivity for the three terms all converges well up to a correlation time of 1 ps, as shown in Fig. S9. We observe that the cross term κ^pk is essentially zero and the k-k term κ^kk contributes a small but non-negligible portion.

The total thermal conductivity from both NEP-MB-pol and NEP-SCAN calculated from 280 to 370 K (with a constant pressure of 1 atm) are shown in Fig. 4c. It is clear that both sets of results are quite consistent with each other and are significantly overestimated compared to the experimental results. The deviation between calculations and experiments increases at lower temperatures, which again indicates strong NQEs. Unfortunately, the heat current is a nonlinear operator and there are so far no feasible path-integral techniques that can account for the NQEs in thermal conductivity calculations. However, we notice that the simpler quantum-correction method based on harmonic approximation³⁹ has been proven to be a feasible one for disordered materials, which naturally include liquid water²⁰. Here, the harmonic approximation means that one assumes that NQEs are negligible for the anharmonic characteristics in a system. This can be well confirmed by the calculation of heat capacity³⁹. As a prerequisite for applying this quantum-correction scheme, we need to first obtain a spectral decomposition κ(ω) of the thermal conductivity, which, fortunately, can be conveniently achieved within the homogeneous nonequilibrium MD formalism⁴⁰. Here the part that needs to be quantum-corrected is the p-p component, which becomes ${\kappa }^{pp}(\omega ){x}^{2}{e}^{x}/{({e}^{x}-1)}^{2}$ after quantum correction, where x = ℏω/k_BT and ℏ is the reduced Planck constant. Adding up the quantum corrected κ^pp and the original κ^kk gives the total thermal conductivity that agrees well with experiments as shown in Fig. 4c.

Discussion

In this work, we achieved quantitative predictions of a broad range of physical properties of liquid water across various temperatures. The physical properties evaluated include not only structural characteristics, such as density and radial distribution functions, but also thermodynamic properties such as isobaric heat capacity, and transport properties including the self diffusion coefficient, viscosity, and thermal conductivity. The high level of agreement between our calculations and experiments can be attributed to two key factors.

First, we developed the highly accurate and efficient NEP-MB-pol model, which serves as the foundation of this work. The high accuracy of our NEP-MB-pol model, further validated by using an independent CCSD(T) dataset prepared in this study, is inherited from the underlying training data based on the MB-pol model, which has been shown to be highly accurate^32,33. While the MB-pol model itself is very computationally intensive and has been impractical for calculating transport properties that require extensive simulations, our NEP-MB-pol model not only retains MB-pol’s high accuracy but also achieves computational speeds several orders of magnitude faster, making it feasible to predict transport properties that require extensive large-scale simulations.

Second, the reliable prediction of water’s properties benefits from the use of appropriate methods that accurately account for nuclear quantum effects, which are especially strong in water. For static properties like density (equation of state), path-integral molecular dynamics correctly captures the nuclear quantum effects related to zero-point motion, resulting in density values that align closely with experimental observations. For dynamic properties, thermostatted ring-polymer molecular dynamics effectively captures diffusion and viscosity calculations, while the quasi-harmonic quantum corrections address overpopulated high-frequency vibrations in the calculation of thermal conductivity.

An important aspect of our approach is that our NEP-MB-pol model was developed without fitting to experimental data, yet it accurately predicts multiple experimentally validated properties. Our approach distinguishes itself from empirical models, which are often tailored to fit specific experimental properties and may have limited applicability across different thermodynamic conditions. By relying on first principles calculations and machine learning, the predictive power of our NEP-MB-pol model is constrained only by the accuracy of the MB-pol model and the breadth and diversity of the training data.

Importantly, this framework is not a mere variant of MB-pol. It is certainly not limited to MB-pol data. Our framework can also be applied to other coupled-cluster-level datasets, for example, q-AQUA⁴¹ datasets, which include higher-order interactions. To further improve the model’s accuracy, additional training data from direct CCSD(T) calculations could be incorporated.

While this study focused on the water properties under ambient conditions, our approach is extendable. With an extended training dataset, NEP-MB-pol has the potential to model water’s behavior under more extreme thermodynamic conditions.

In conclusion, by capturing the quantum nature of water, our framework fills a long-standing gap in water modeling, achieving both high accuracy and computational efficiency. This work represents a significant advance, offering a versatile and scalable approach with broad applications in chemistry, materials science, biophysics, and beyond.

Methods

The NEP model

In the NEP approach²⁷, the site energy U_i of atom i can be written as

$${U}_{i}=\mathop{\sum }\limits_{\mu =1}^{{N}_{{\rm{neu}}}}{w}_{\mu }^{(1)}\tanh \left(\mathop{\sum }\limits_{\nu =1}^{{N}_{{\rm{des}}}}{w}_{\mu \nu }^{(0)}{q}_{\nu }^{i}-{b}_{\mu }^{(0)}\right)-{b}^{(1)},$$

(1)

where $\tanh (x)$ is the activation function, w⁽⁰⁾, w⁽¹⁾, b⁽⁰⁾, and b⁽¹⁾ are the weight and bias parameters. The descriptor ${q}_{\nu}^{i}$ is an abstract vector whose components group into radial and angular parts. The radial descriptor components ${q}_{n}^{i}$ $(0\le n\le {n}_{\max }^{{\rm{R}}})$ are defined as

$${q}_{n}^{i}=\sum _{j\ne i}{g}_{n}({r}_{ij}),$$

(2)

where r_ij is the distance between atoms i and j and g_n(r_ij) are a set of radial functions, each of which is formed by a linear combination of Chebyshev polynomials. The angular components include n-body (n = 3, 4, 5) correlations. For the 3-body part, the descriptor components are defined as ($0\le n\le {n}_{\max }^{{\rm{A}}}$, $1\le l\le {l}_{\max }^{{\rm{3body}}}$)

$${q}_{nl}^{i}=\sum _{m}{(-1)}^{m}{A}_{nlm}^{i}{A}_{nl(-m)}^{i};$$

(3)

$${A}_{nlm}^{i}=\sum _{j\ne i}{g}_{n}({r}_{ij}){Y}_{lm}({\hat{{\bf{r}}}}_{ij}).$$

(4)

Here, Y_lm are the spherical harmonics and ${\hat{{\bf{r}}}}_{ij}$ is the unit vector of r_ij. Note that the radial functions g_n(r_ij) for the radial and angular descriptor components can have different cutoff radii, which are denoted as ${r}_{{\rm{c}}}^{{\rm{R}}}$ and ${r}_{{\rm{c}}}^{{\rm{A}}}$, respectively.

Generating a validation dataset at CCSD(T) level of theory

The CFOUR program package⁴², with aug-cc-pVTZ (aVTZ) basis set, was utilized to generate an independent validation dataset at the coupled-cluster level of theory, including single, double, and perturbative triple excitations [CCSD(T)]. This validation dataset comprises a total of 262 structures, including 56 clusters containing up to six water molecules and 206 bulk structures sampled by MD simulations driven by the NEP-MB-pol model.

Density-functional theory calculations using SCAN functional

Density-functional theory calculations using the strongly constrained and appropriately normed (SCAN) functional were performed with the Vienna Ab initio Simulation Package (VASP, version 6.3.0)⁴³, to predict the energies, forces, and virial of the 262 structures in the independent validation CCSD(T) dataset (see above). To account for the non-spherical contributions to the gradient correction within the projector-augmented-wave sphere, the flag LASPH was set to TRUE. A kinetic energy cutoff of 1500 eV was applied for the plane waves, and a reciprocal space sampling grid spacing of 0.5 Å⁻¹ was used. The self-consistent field convergence threshold was set to 10⁻⁶ eV.

Molecular dynamics simulations

For all the MD simulations conducted to compute the physical properties reported in this study, the simulation system consists of 24,567 atoms in a periodic cubic box with dimensions of about 6.2 nm in each direction. The system pressure is maintained at 1 atm. The time step of 0.5 fs is used for integration in classical, path-integral, and thermostatted ring-polymer molecular dynamics simulations. To calculate the density of water at each temperature, the system is equilibrated for 50 ps in the isothermal-isobaric ensemble. The pressure of the system is still kept at 1 atm, while the temperature is varied from 280 K to 370 K in increments of 10 K. All the molecular dynamics simulations were performed using the graphics processing units molecular dynamics (GPUMD) package²⁹.

Isobaric heat capacity

The isobaric heat capacity C_p is defined as the rate of change of enthalpy H with respect to temperature at constant pressure:

$${C}_{p}={\left(\frac{\partial H}{\partial T}\right)}_{p},$$

(5)

where H = E + pV, and E, T, p, and V represent the internal energy, temperature, pressure, and volume, respectively. To compute C_p, the value of H is calculated at a series of temperature points with the same pressure p. A quadratic function is then fitted to the relationship between H and T, and its first derivative yields C_p as a function of T.

Self-diffusion coefficient

The running self-diffusion coefficient for water is calculated using the following Green-Kubo relation:

$$D(t)=\frac{1}{3}\mathop{\int}\nolimits_{\!\!\tau = 0}^{t}{C}_{vv}(\tau ){\rm{d}}\tau$$

(6)

where the velocity auto-correlation function is defined as ${C}_{vv}(\tau )=\frac{1}{N}\mathop{\sum }\nolimits_{i}^{N}\langle {{\bf{v}}}_{i}(0)\cdot {{\bf{v}}}_{i}(\tau )\rangle$. Here, N is the number of atoms in the systems and v_i is the velocity of atom i.

Shear viscosity

The shear viscosity is defined as $\eta =\frac{1}{3}({\eta }_{xy}+{\eta }_{xz}+{\eta }_{yz})$, where the running integral of η_αβ is calculated using the following Green-Kubo relation:

$${\eta }_{\alpha \beta }(t)=\frac{V}{{k}_{{\rm{B}}}T}\mathop{\int}\nolimits_{\!\!0}^{t}{C}_{pp}(\tau ){\rm{d}}\tau .$$

(7)

Here, C_pp(τ) = 〈(p_αβ(0) − 〈p_αβ〉)(p_αβ(τ) − 〈p_αβ〉)〉 is the pressure auto-correlation function, V is the volume, k_B is Boltzmann’s constant, T is temperature, and p_αβ is the pressure tensor.

Thermal conductivity

Similarly, we can use a Green-Kubo relation to calculate thermal conductivity:

$$\kappa (t)=\frac{1}{{k}_{{\rm{B}}}{T}^{2}V}\mathop{\int}\nolimits_{\!\!0}^{t}\langle {\bf{J}}(0)\cdot {\bf{J}}(\tau )\rangle d\tau ,$$

(8)

where J(t) is the heat current and 〈J(0) ⋅ J(τ)〉 is the heat current auto-correlation function. For liquid system, the heat current has two contributions, J = J^k + J^p. The kinetic term is ${{\bf{J}}}^{{\rm{k}}}=\sum _{i}{{\bf{v}}}_{i}{E}_{i}$, where E_i and v_i are the total energy and velocity of atom i, respectively. The potential term for many-body potentials is²⁷ ${{\bf{J}}}^{{\rm{p}}}=\sum _{i}{{\bf{W}}}_{i}\cdot {{\bf{v}}}_{i}$, where ${{\bf{W}}}_{i}=\sum _{j\ne i}{{\bf{r}}}_{ij}\otimes \frac{\partial {U}_{j}}{\partial {{\bf{r}}}_{ji}}$ is the virial tensor of atom i and r_ij = r_j − r_i, r_i being the position of atom i. According to the decomposition of the heat current, the thermal conductivity can be decomposed into three terms: κ(t) = κ^pp(t) + κ^kk(t) + κ^pk(t), where the potential-potential term κ^pp, the kinetic-kinetic term κ^kk, and the cross term κ^pk correspond to the following heat current auto-correlation functions: 〈J^p(0) ⋅ J^p(τ)〉, 〈J^k(0) ⋅ J^k(τ)〉, and 〈J^p(0) ⋅ J^k(τ)〉 + 〈J^k(0) ⋅ J^p(τ)〉.

Besides, we use the homogeneous nonequilibrium MD method⁴⁰ to calculate κ^pp. In this method, an external driving force ${{\bf{F}}}_{i}^{{\rm{ext}}}={{\bf{F}}}_{{\rm{e}}}\cdot {{\bf{W}}}_{i}$ is exerted on each atom i, driving the system out of equilibrium. Here, F_e (with magnitude F_e) is the driving force parameter with the dimension of inverse length. In this work, F_e was chosen as 0.001 Å⁻¹, which has been tested to be small enough to keep the system within the linear response regime. The driving force will induce an ensemble-averaged steady-state non-equilibrium heat current J^p (with magnitude J^p) of the potential term, which is related to the thermal conductivity: ${\kappa }^{{\rm{pp}}}=\frac{{J}^{{\rm{p}}}}{TV{F}_{{\rm{e}}}}$. The thermal conductivity can be further decomposed with respect to the vibrational frequency ω to obtain the spectral thermal conductivity κ^pp(ω)⁴⁰.

Data availability

The training and test datasets, the trained machine-learned potential models, the compilation and installation guide, as well as demos for MD simulations utilizing the trained models, have been deposited in a Zenodo repository, accessible at https://zenodo.org/records/15033656. The source code for the graphics processing units molecular dynamics (GPUMD-v3.9.3) package is available at https://zenodo.org/records/11122339.

References

Levy, Y. & Onuchic, J. N. Water mediation in protein folding and molecular recognition. Annu. Rev. Biophys. 35, 389–415 (2006).
Article CAS Google Scholar
Cremer, P. S., Flood, A. H., Gibb, B. C. & Mobley, D. L. Collaborative routes to clarifying the murky waters of aqueous supramolecular chemistry. Nat. Chem. 10, 8–16 (2018).
Article CAS Google Scholar
Sellberg, J. A. et al. Ultrafast X-ray probing of water structure below the homogeneous ice nucleation temperature. Nature 510, 381–384 (2014).
Article CAS PubMed Google Scholar
Thämer, M., Marco, L. D., Ramasesha, K., Mandal, A. & Tokmakoff, A. Ultrafast 2D IR spectroscopy of the excess proton in liquid water. Science 350, 78–82 (2015).
Article PubMed Google Scholar
Yang, J. et al. Direct observation of ultrafast hydrogen bond strengthening in liquid water. Nature 596, 531–535 (2021).
Article CAS PubMed Google Scholar
Flór, M. et al. Dissecting the hydrogen bond network of water: charge transfer and nuclear quantum effects. Science 386, eads4369 (2024).
Article PubMed Google Scholar
Pettersson, L. G. M., Henchman, R. H. & Nilsson, A. Water-the most anomalous liquid. Chem. Rev. 116, 7459–7462 (2016).
Article CAS PubMed Google Scholar
Deringer, V. L., Caro, M. A. & Csányi, G. Machine learning interatomic potentials as emerging tools for materials science. Adv. Mater. 31, 1902765 (2019).
Article CAS Google Scholar
Omranpour, A., Montero De Hijes, P., Behler, J. & Dellago, C. Perspective: atomistic simulations of water and aqueous systems with machine learning potentials. J. Chem. Phys. 160, 170901 (2024).
Article CAS PubMed Google Scholar
Morawietz, T., Singraber, A., Dellago, C. & Behler, J. How van der Waals interactions determine the unique properties of water. Proc. Natl. Acad. Sci. 113, 8368–8373 (2016).
Article CAS PubMed PubMed Central Google Scholar
Cheng, B., Behler, J. & Ceriotti, M. Nuclear quantum effects in water at the triple point: using theory as a link between experiments. J. Phys. Chem. Lett. 7, 2210–2215 (2016).
Article CAS PubMed Google Scholar
Cheng, B., Engel, E. A., Behler, J., Dellago, C. & Ceriotti, M. Ab initio thermodynamics of liquid and solid water. Proc. Natl. Acad. Sci. 116, 1110–1115 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, L., Wang, H., Car, R. & E, W. Phase diagram of a deep potential water model. Phys. Rev. Lett. 126, 236001 (2021).
Article CAS PubMed Google Scholar
Bore, S. L. & Paesani, F. Realistic phase diagram of water from “first principles” data-driven quantum simulations. Nat. Commun. 14, 3349 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chen, Z., Berrens, M. L., Chan, K.-T., Fan, Z. & Donadio, D. Thermodynamics of water and ice from a fast and scalable first-principles neuroevolution potential. J. Chem. Eng. Data 69, 128–140 (2024).
Article CAS Google Scholar
Yao, Y. & Kanai, Y. Temperature dependence of nuclear quantum effects on liquid water via artificial neural network model based on SCAN meta-GGA functional. J. Chem. Phys. 153, 044114 (2020).
Article CAS PubMed Google Scholar
Malosso, C., Zhang, L., Car, R., Baroni, S. & Tisi, D. Viscosity in water from first-principles and deep-neural-network simulations. npj Comput. Mater. 8, 139 (2022).
Article CAS Google Scholar
Tisi, D. et al. Heat transport in liquid water from first-principles and deep neural network simulations. Phys. Rev. B 104, 224202 (2021).
Article CAS Google Scholar
Zhang, C., Puligheddu, M., Zhang, L., Car, R. & Galli, G. Thermal conductivity of water at extreme conditions. J. Phys. Chem. B 127, 7011–7017 (2023).
Article CAS PubMed PubMed Central Google Scholar
Xu, K. et al. Accurate prediction of heat conductivity of water by a neuroevolution potential. J. Chem. Phys. 158, 204114 (2023).
Article CAS PubMed Google Scholar
Markland, T. E. & Ceriotti, M. Nuclear quantum effects enter the mainstream. Nat. Rev. Chem. 2, 0109 (2018).
Article CAS Google Scholar
Parrinello, M. & Rahman, A. Study of an F center in molten KCl. J. Chem. Phys. 80, 860–867 (1984).
Article CAS Google Scholar
Ying, P. et al. Highly efficient path-integral molecular dynamics simulations with GPUMD using neuroevolution potentials: case studies on thermal properties of materials. J. Chem. Phys. 162, 064109 (2025).
Article CAS PubMed Google Scholar
Babin, V., Leforestier, C. & Paesani, F. Development of a “first principles” water potential with flexible monomers: dimer potential energy surface, VRT spectrum, and second virial coefficient. J. Chem. Theory Comput. 9, 5395–5403 (2013).
Article CAS PubMed Google Scholar
Babin, V., Medders, G. R. & Paesani, F. Development of a “first principles” water potential with flexible monomers. II: Trimer potential energy surface, third virial coefficient, and small clusters. J. Chem. Theory Comput. 10, 1599–1607 (2014).
Article CAS PubMed Google Scholar
Medders, G. R., Babin, V. & Paesani, F. Development of a “first-principles” water potential with flexible monomers. III. liquid phase properties. J. Chem. Theory Comput. 10, 2906–2910 (2014).
Article CAS PubMed Google Scholar
Fan, Z. et al. Neuroevolution machine learning potentials: combining high accuracy and low cost in atomistic simulations and application to heat transport. Phys. Rev. B 104, 104309 (2021).
Article CAS Google Scholar
Zhai, Y., Caruso, A., Bore, S. L., Luo, Z. & Paesani, F. A “short blanket” dilemma for a state-of-the-art neural network potential for water: reproducing experimental properties or the physics of the underlying many-body interactions? J. Chem. Phys. 158, 084111 (2023).
Article CAS PubMed Google Scholar
Fan, Z., Chen, W., Vierimaa, V. & Harju, A. Efficient molecular dynamics simulations with many-body potentials on graphics processing units. Comput. Phys. Commun. 218, 10 – 16 (2017).
Article Google Scholar
Berendsen, H. J. C., Grigera, J. R. & Straatsma, T. P. The missing term in effective pair potentials. J. Phys. Chem. 91, 6269–6271 (1987).
Article CAS Google Scholar
Huber, M. L. et al. New international formulation for the viscosity of H₂O. J. Phys. Chem. Ref. Data 38, 101–125 (2009).
Article CAS Google Scholar
Riera, M. et al. MBX: a many-body energy and force calculator for data-driven many-body simulations. J. Chem. Phys. 159, 054802 (2023).
Article CAS PubMed PubMed Central Google Scholar
Paesani, F. Getting the right answers for the right reasons: toward predictive molecular simulations of water with many-body potential energy functions. Acc. Chem. Res. 49, 1844–1851 (2016).
Article CAS PubMed Google Scholar
Piaggi, P. M., Panagiotopoulos, A. Z., Debenedetti, P. G. & Car, R. Phase equilibrium of Water with hexagonal and cubic ice using the SCAN functional. J. Chem. Theory Comput. 17, 3065–3077 (2021).
Article CAS PubMed Google Scholar
Holz, M., Heil, S. R. & Sacco, A. Temperature-dependent self-diffusion coefficients of water and six selected molecular liquids for calibration in accurate ¹H NMR PFG measurements. Phys. Chem. Chem. Phys. 2, 4740–4742 (2000).
Article CAS Google Scholar
Rossi, M., Ceriotti, M. & Manolopoulos, D. E. How to remove the spurious resonances from ring polymer molecular dynamics. J. Chem. Phys. 140, 234116 (2014).
Article PubMed Google Scholar
Montero de Hijes, P., Dellago, C., Jinnouchi, R., Schmiedmayer, B. & Kresse, G. Comparing machine learning potentials for water: Kernel-based regression and Behler-Parrinello neural networks. J. Chem. Phys. 160, 114107 (2024).
Article CAS PubMed Google Scholar
Montero de Hijes, P., Romano, S., Gorfer, A. & Dellago, C. The kinetics of the ice-water interface from ab initio machine learning simulations. J. Chem. Phys. 158, 204706 (2023).
Article CAS PubMed Google Scholar
Berens, P. H., Mackay, D. H. J., White, G. M. & Wilson, K. R. Thermodynamics and quantum corrections from molecular dynamics for liquid water. J. Chem. Phys. 79, 2375–2389 (1983).
Article CAS Google Scholar
Fan, Z., Dong, H., Harju, A. & Ala-Nissila, T. Homogeneous nonequilibrium molecular dynamics method for heat transport and spectral decomposition with many-body potentials. Phys. Rev. B 99, 064308 (2019).
Article CAS Google Scholar
Yu, Q. et al. q-AQUA: A many-body CCSD(T) water potential, including four-Body interactions, demonstrates the quantum nature of water from clusters to the liquid phase. J. Phys. Chem. Lett. 13, 5068–5074 (2022).
Article CAS PubMed Google Scholar
Matthews, D. A. et al. Coupled-cluster techniques for computational chemistry: the CFOUR program package. J. Chem. Phys. 152, 214108 (2020).
Article CAS PubMed Google Scholar
Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996).
Article CAS Google Scholar
Skinner, L. B. et al. Benchmark oxygen-oxygen pair-distribution function of ambient water from x-ray diffraction measurements with a wide Q-range. J. Chem. Phys. 138, 074506 (2013).
Article PubMed Google Scholar
Soper, A. The radial distribution functions of water and ice from 220 to 673 K and at pressures up to 400 MPa. Chem. Phys. 258, 121–137 (2000).
Article CAS Google Scholar
Wagner, W. & Pruß, A. The IAPWS formulation 1995 for the thermodynamic properties of ordinary water substance for general and scientific use. J. Phys. Chem. Ref. Data 31, 387–535 (2002).
Article CAS Google Scholar
Huber, M. L. et al. New international formulation for the thermal conductivity of H₂O. J. Phys. Chem. Ref. Data 41, 033102 (2012).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Science and Technology Advanced Materials Major Program of China (No. 2024ZD0606900). KX, TL, and JX acknowledge support from the National Key R&D Project from the Ministry of Science and Technology of China (No. 2022YFA1203100), the Research Grants Council of Hong Kong (No. AoE/P-701/20), and RGC GRF (No. 14220022).

Author information

Authors and Affiliations

College of Physical Science and Technology, Bohai University, Jinzhou, PR China
Ke Xu & Zheyong Fan
Department of Electronic Engineering and Materials Science and Technology Research Center, The Chinese University of Hong Kong, Shatin, PR China
Ke Xu, Ting Liang & Jianbin Xu
College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, PR China
Nan Xu
Department of Physical Chemistry, School of Chemistry, Tel Aviv University, Tel Aviv, Israel
Penghua Ying
Department of Civil and Environmental Engineering, George Washington University, Washington, DC, USA
Shunda Chen
Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology, Jiangnan University, Wuxi, PR China
Ning Wei

Authors

Ke Xu
View author publications
Search author on:PubMed Google Scholar
Ting Liang
View author publications
Search author on:PubMed Google Scholar
Nan Xu
View author publications
Search author on:PubMed Google Scholar
Penghua Ying
View author publications
Search author on:PubMed Google Scholar
Shunda Chen
View author publications
Search author on:PubMed Google Scholar
Ning Wei
View author publications
Search author on:PubMed Google Scholar
Jianbin Xu
View author publications
Search author on:PubMed Google Scholar
Zheyong Fan
View author publications
Search author on:PubMed Google Scholar

Contributions

K.X. trained the machine-learned models and performed the molecular dynamics simulations. T.L. did the density functional theory calculations. N.X. did the quantum chemistry level calculations. P.Y. did the training data sampling. S.C., N.W., J.X. and Z.F. supervised the project. K.X., L.T., S.C. and Z.F. drafted the manuscript. All authors proofread the manuscript.

Corresponding authors

Correspondence to Shunda Chen, Jianbin Xu or Zheyong Fan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Xu, K., Liang, T., Xu, N. et al. NEP-MB-pol: a unified machine-learned framework for fast and accurate prediction of water’s thermodynamic and transport properties. npj Comput Mater 11, 279 (2025). https://doi.org/10.1038/s41524-025-01777-1

Download citation

Received: 24 April 2025
Accepted: 14 August 2025
Published: 27 August 2025
Version of record: 27 August 2025
DOI: https://doi.org/10.1038/s41524-025-01777-1

This article is cited by

Multiscale investigation of thermal transport in β-Ga2O3-based heterointerfaces enabled by machine learning potential: cross-scale parameter
- Zhanpeng Sun
- Zijun Qi
- Gai Wu
npj Computational Materials (2026)

Subjects

Abstract

Similar content being viewed by others

Realistic phase diagram of water from “first principles” data-driven quantum simulations

Machine learning-based heat flux estimation from high-speed video during saturated pool boiling over vertical tube

Viscosity in water from first-principles and deep-neural-network simulations

Introduction

Results

The trained NEP models

Further accuracy validation of NEP models

Computational speed of NEP-MB-pol

Structural and thermodynamic properties

Self-diffusion coefficient

Shear viscosity

Thermal conductivity

Discussion

Methods

The NEP model

Generating a validation dataset at CCSD(T) level of theory

Density-functional theory calculations using SCAN functional

Molecular dynamics simulations

Isobaric heat capacity

Self-diffusion coefficient

Shear viscosity

Thermal conductivity

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information (download PDF )

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Multiscale investigation of thermal transport in β-Ga2O3-based heterointerfaces enabled by machine learning potential: cross-scale parameter

Search

Quick links