E(n)-Equivariant cartesian tensor message passing interatomic potential

Wang, Junjie; Wang, Yong; Zhang, Haoting; Yang, Ziyang; Liang, Zhixin; Shi, Jiuyang; Wang, Hui-Tian; Xing, Dingyu; Sun, Jian

doi:10.1038/s41467-024-51886-6

Download PDF

Article
Open access
Published: 01 September 2024

E(n)-Equivariant cartesian tensor message passing interatomic potential

Nature Communications volume 15, Article number: 7607 (2024) Cite this article

9352 Accesses
38 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Machine learning potential (MLP) has been a popular topic in recent years for its capability to replace expensive first-principles calculations in some large systems. Meanwhile, message passing networks have gained significant attention due to their remarkable accuracy, and a wave of message passing networks based on Cartesian coordinates has emerged. However, the information of the node in these models is usually limited to scalars, and vectors. In this work, we propose High-order Tensor message Passing interatomic Potential (HotPP), an E(n) equivariant message passing neural network that extends the node embedding and message to an arbitrary order tensor. By performing some basic equivariant operations, high order tensors can be coupled very simply and thus the model can make direct predictions of high-order tensors such as dipole moments and polarizabilities without any modifications. The tests in several datasets show that HotPP not only achieves high accuracy in predicting target properties, but also successfully performs tasks such as calculating phonon spectra, infrared spectra, and Raman spectra, demonstrating its potential as a tool for future research.

E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials

Article Open access 04 May 2022

Efficient equivariant model for machine learning interatomic potentials

Article Open access 26 February 2025

Machine learning potentials for metal-organic frameworks using an incremental learning approach

Article Open access 06 February 2023

Introduction

Molecular dynamics (MD) is a powerful computational technique allowing for the exploration of various physical and chemical phenomena at the atomic level and the study of the behavior of molecules and materials over time. It bridges the gap between theoretical predictions and experimental observations, enabling researchers to gain a comprehensive understanding of the behavior, properties, and interactions of molecules and materials. With sufficient computational resources, first principles calculations based on Density Functional Theory (DFT)¹ can simulate systems with hundreds or even thousands of atoms. However, it struggles when it comes to larger systems. Another approach to computing atomic interactions is empirical force fields, providing much quicker calculations and the ability to handle significantly larger systems. Nevertheless, many of these force fields rely on empirical observations, limiting their applicability to specific ranges and lacking universality and transferability. The machine learning potential (MLP)^{2,3,4,5,6,7,8}, which aim to accurately describe the potential energy surface of atomic configurations, combines the advantages of both DFT and empirical force fields. A well-trained machine learning force field can achieve accuracy close to DFT and even beyond DFT^9,10, and perform very large-scale, long-time simulations, offering a glimpse into the future of research in studying complex dynamical problems.

Most existing machine learning potentials are based on the framework proposed by Behler², which fits the total energy as a sum of atomic energies $E=\sum {E}_{i}$, and the atomic energies are determined by the atomic environment within a certain cutoff radius. This format ensures the scalability of the potential, allowing the network to be trained on small systems and extrapolated to larger systems. The quality of such a model is highly dependent on the choice of descriptors that describe the atomic environments¹¹. A reasonable descriptor should first possess invariance to rotations, translations, and atom permutation; thus, the same atomic environment yields the same atomic energy. A common approach is to construct a series of symmetric functions based on interatomic distances and angles between atoms since these two quantities are naturally invariant under rotations and translations. Depending on the number of atoms involved, a series of so-called two-body and three-body descriptors can be obtained, such as atom-centered symmetry functions (ACSF)^2,12,13, the NEP descriptor^7,14,15,16, the smooth overlap of atomic positions (SOAP)¹⁷, the DeePMD descriptor^5,18. However, these descriptors are not complete¹⁹, as different atomic environments can yield the same descriptor. Atomic cluster expansion (ACE)^6,20 and Moment Tensor Potential (MTP)^4,21 have proposed complete descriptors that can account for interactions of arbitrary order, but the number of descriptors can easily grow to tens of thousands as the order increases. Another issue is that such descriptors are only dependent on the coordinate information within the cutoff radius. When dealing with long-range interactions, simply increasing the cutoff radius would significantly raise computational complexity since the number of atoms is proportional to the cube of cutoff radius.

Message passing network (MPN)²² can help address both of these issues. In the context of MLP, MPN is used to represent molecule or crystal structures as graphs, where atoms are nodes and bonds are edges. The key idea behind MPN is the iterative passing of messages between nodes, allowing information to be exchanged and aggregated. Such message passing processes can, on one hand, lead to the emergence of multiple atoms in the final descriptor (thus, resulting in n-body symmetric functions). On the other hand, it allows information from atoms beyond the cutoff radius to be transmitted to the current atom. As a result, many machine learning potentials based on message passing networks have achieved high levels of accuracy^{8,23,24,25,26,27,28,29,30,31}. It is worth noting that as long as the energy (or other properties) obtained at the end satisfies the symmetry requirements, the messages used in the network do not necessarily have to be scalars^24,32. For example, NequIP⁸, BotNet³⁰, and MACE³¹ utilize high-order tensors based on spherical harmonics in the message passing, coupling them through Clebsch-Gordon (CG) coefficients to construct equivariant networks. These methods have shown significant improvements in accuracy compared to approaches that only use scalar messages. Another category of methods, including PaiNN²⁶ and torchMD-Net²⁸, directly utilize vectors in Cartesian space as messages and obtain equivariant results through a series of designed layers. This approach does not require coupling through CG coefficients, but the vectors are only equivalent to l = 1 tensors in the spherical harmonics method. TeaNet³³ can pass matrices information equivalent to l = 2 tensors, but this introduces a multitude of artificially designed duplications, resulting in a highly intricate network structure that becomes challenging to extend to higher order tensors. Machine learning potential that use tensors of arbitrary orders as messages based on Cartesian coordinates have not been proposed.

In this work, we propose High-order Tensor Passing Potential (HotPP), which can utilize arbitrary order Cartesian tensors as messages. By combining some basic equivariant operations between tensors, all the high-order tensors used in the network are E(n)-equivariant, thus the output is consistent with the rotation of coordinates. In other words, if the output is a scalar, it remains invariant under rotations, while the vector output will rotate in accordance with the rotation of the coordinates, and the matrix output will transform as ${M}^{{\prime} }={RM}{R}^{T}$. Therefore, the method can directly predict high-order tensors such as dipole moments and polarizability tensors without any modifications. We validate HotPP in three prediction tasks: the energies and forces of molecular dynamics trajectory of small molecule; the energies, forces, and stresses of carbon with periodic boundary conditions; and the dipole moments and polarizability tensors of small molecules with coupled cluster singles and doubles (CCSD) accuracy. In these tests, our model achieves good performance with fewer parameters comparable to other high-order models, which provide a frame of equivariant network based on Cartesian coordinates.

Results

Equivariant functions of Cartesian tensors

Cartesian tensors are the tensors that transform under rotations in Euclidean space in a simple way. In other words, a Cartesian tensor is a tensor whose components transform as a product of vectors and covectors under rotations, without any additional factors that depend on the rotation matrix itself. Specifically, a n-th rank tensor transforms under rotation as:

$${{{\boldsymbol{T}}}}_{{i}_{1}{i}_{2}\cdots {i}_{n}}{\to }^{{{\boldsymbol{R}}}}{{{{\boldsymbol{T}}}}^{{\prime} }}_{{i}_{1}{i}_{2}\cdots {i}_{n}}=\left({{{\boldsymbol{R}}}}_{{i}_{1}{j}_{1}}\right)\left({{{\boldsymbol{R}}}}_{{i}_{2}{j}_{2}}\right)\cdots \left({{{\boldsymbol{R}}}}_{{i}_{n}{j}_{n}}\right){{{\boldsymbol{T}}}}_{{j}_{1}{j}_{2}\cdots {j}_{n}}$$

(1)

where R is an orthogonal matrix. Under this definition, it is easy to find that since ${{{\bf{v}}}}_{i}\to {{{\bf{v}}}}_{i}^{{\prime} }={{{\boldsymbol{R}}}}_{{ij}}{{{\bf{v}}}}_{j}$, the vectors are first-order tensors, and the dyadic product of two vectors is a second-order tensor since ${({{\bf{u}}}{{\bf{v}}})}_{{i}_{1}{i}_{2}}{\longrightarrow }^{{{\boldsymbol{R}}}}{({{\bf{u}}}{{\bf{v}}})}_{{i}_{1}{i}_{2}}^{{\prime} }=({{{\boldsymbol{R}}}}_{{i}_{1}{j}_{1}})({{{\boldsymbol{R}}}}_{{i}_{2}{j}_{2}}){({{\bf{u}}}{{\bf{v}}})}_{{j}_{1} \, {j}_{2}}$.

And equivariance is a property of functions or transformations between two spaces, where the transformation preserves the relationships between the elements of those spaces. More formally, a function ${{\rm{\phi }}}:{{\bf{X}}}\to {{\bf{Y}}}$ is said to be equivariant with respect to a group G acting on two sets X and Y if for all $g\in {{\bf{G}}}$ and $x\in {{\bf{X}}}$, we have:

$${{\rm{\phi }}}\left(g^{\circ} \, x\right)=g^{\circ} \, {{\rm{\phi }}}\left(x\right)$$

(2)

This means that applying a function ϕ to an object x and then applying a group element g to the resulting object should give the same result as first applying the group element to the object and then applying the function. And the composition of equivariant maps is also equivariant:

$${{\rm{\psi }}}\left({{\rm{\phi }}}\left(g\circ x\right)\right)={{\rm{\psi }}}\left(g\circ {{\rm{\phi }}}\left(x\right)\right)=g\circ {{\rm{\psi }}}\left({{\rm{\phi }}}\left(x\right)\right)$$

(3)

Therefore, by providing some basic equivariant functions between Cartesian tensors and combining them, we can obtain an equivariant neural network. Here, we use the following three equivariant operations, whose equivariances are proven in the Supplementary Note 1:

1.
Linear combinations of tensors with the same order: $f\left({T}_{1},{T}_{2},\cdots,{T}_{m}\right)=\sum {c}_{i}{T}_{i}$.
2.
Contraction of two tensors.

The contraction of tensors is a mathematical operation that reduces the rank of tensors by summing over one or more pairs of indices. For example, consider a 3-order tensor A and a 2-order tensor B, the contraction of them can be ${{{\boldsymbol{C}}}}_{{ijk}}={{{\boldsymbol{A}}}}_{{ijl}}{{{\boldsymbol{B}}}}_{{kl}}$, this will reduce the rank of tensors by 2. More generally, if we sum over more than one pair of indices between an x-order tensor T₁ and a y-order tensor T₂ such as:

$${{{\rm{\phi }}}}_{z}{\left({{{\boldsymbol{T}}}}_{1},{{{\boldsymbol{T}}}}_{2}\right)}_{{a}_{1}\cdots {a}_{x-z}{b}_{1}\cdots {b}_{y-z}}={{{{\boldsymbol{T}}}}_{1}}_{{a}_{1}\cdots {a}_{x-z}{c}_{1}\cdots {c}_{z}}\cdot {{{{\boldsymbol{T}}}}_{2}}_{{b}_{1}\cdots {b}_{y-z}{c}_{1}\cdots {c}_{z}}$$

(4)

We can get a new tensor with $x+y-2z$ order, where $0\le z\le \min (x,y)$. When z = 0, none of the indices are contracted and the Eq. (4) becomes tensor product ${{{\boldsymbol{T}}}}_{{a}_{1}\cdots {a}_{x}{b}_{1}\cdots {b}_{y}}={{{{\boldsymbol{T}}}}_{1}}_{{a}_{1}\cdots {a}_{x}}\cdot {{{{\boldsymbol{T}}}}_{2}}_{{b}_{1}\cdots {b}_{y}}$.

3. Partial derivative with respect to another Cartesian tensor: $\frac{\partial }{\partial {{{\boldsymbol{T}}}}_{{j}_{1}{j}_{2}\cdots {j}_{n}}}$.

By the combination of these operations, we can get many equivariant functions. Many common operations frequently used in other equivariant neural networks that operate on vectors, such as scaling of vectors: ${{\rm{s}}}\cdot \vec{{{\rm{v}}}}$, scalar products $\left\langle \vec{{v}_{1}},\,\vec{{v}_{2}}\right\rangle $, vector products $\vec{{v}_{1}}\times \vec{{v}_{2}}$ (the upper triangle part of $\vec{{v}_{1}}\otimes \vec{{v}_{2}}-\vec{{v}_{2}}\otimes \vec{{v}_{1}}$) can all be viewed as special cases of these three operations. Some other more complex descriptors, such as MTP descriptors^4,21, can also be obtained through combinations of these operations as shown in the Supplementary Note 3.

Equivariant message passing neural network

To obtain an end-to-end machine learning model for predicting material properties, the input should be the positions $\{{{{\bf{r}}}}_{i}\}$ and chemical elements $\{{Z}_{i}\}$ of all atoms, and for periodic crystals, the lattice parameters should also be considered. To apply graph neural networks, we first transform the crystal structure into a graph $\{{n}_{i},{e}_{{ij}}\}$, where each node ${n}_{i}$ corresponds to an atom i in the unit cell, and all atoms j within a given cutoff distance r_cut are considered connected to ${n}_{i}$ labeled with their relative positions r_ij. For periodic structures, since atom j and its equivalent atom j’ may both lie within the cutoff distance of atom i, ${n}_{i}$ may have more than one edge connected to ${n}_{j}$. To extract the information of the nodes, we use the scheme of MPN. A normal MPN can be described as:

$${m}_{i}^{t+1}={\oplus }_{j\in N(i)}{{{\rm{M}}}}_{t}\left({h}_{i}^{t},{h}_{j}^{t},{e}_{{ij}}\right)$$

(5)

$${h}_{i}^{t+1}={{{\rm{U}}}}_{t}\left({h}_{i}^{t},{m}_{i}^{t+1}\right)$$

(6)

where ${h}_{i}^{t}$ is the hidden feature of ${n}_{i}$ at layer t that captures its local information, messages are then passed between nodes along edges, with the message at each edge ${e}_{{ij}}$ being a function ${{{\rm{M}}}}_{t}$ of the features of the nodes connected by that edge. The ⨁ is a differentiable and permutation invariant function such as sum, mean or max to aggregate the message at each node together to produce an updated message ${m}_{i}^{t+1}$ for that node, which in turn is used to update the hidden feature with the function ${{{\rm{U}}}}_{t}$ for the next iteration.

A concrete example illustrating the principles of MPN is presented in Fig. 1. We first determine the connectivity of a structure based on a given cutoff radius and convert it into a graph (quotient graph for periodic structures). Then the messages on the nodes can be passed through the edges by a two-body interaction. As the process of message passing, the information from atoms beyond the cutoff radius can also be conveyed to the central atom. As illustrated in Fig. 1b, the blue arrows represent the first time of message passing, while the yellow arrows denote the second. To be noticed, each layer of message passing is performed simultaneously on all atoms; here, we focus on a specific atom in each layer for ease of analysis. During the initial time of message passing, information of atom 1 is encoded into the hidden information of atom 2. Subsequently, in the second time of message passing, the information of atom 2 including some information of atom 1 is collectively transmitted to atom 3, thereby achieving non-local effects from atom 1 to atom 3. On the other hand, due to the interaction between atom 4 and atom 2 in the second time of message passing also containing the information from atom 1, the effective interaction is elevated from a two-body interaction to a three-body interaction. This indeed encapsulates the two advantages of the message-passing architecture.

**Fig. 1: The schematic diagram of message-passing networks.**

However, the scalar hidden feature, message, and edge information (always relatively distance between two atoms) here will limit the expressive capacity and may cause the incompleteness of atomic structure representations. As shown in Fig. 2a and d, if we only use scalar information ${h}_{i}$, ${h}_{j}$, and ${d}_{{ij}}$ to pass the message in Eq. (5) and update the feature in Eq. (6), all nodes will always produce the same embedding information. As a result, the network will be unable to distinguish between these two structures and give the same total energy. Even if the 3-body message includes angles ${\alpha }_{{ijk}}$ are taken into consideration, some structures with only 4 atoms cannot be distinguished²⁶, as shown in Fig. 2b and e. Due to the identical atomic environments within the truncation radius, no matter how many message passing iterations are performed, these two different structures will only yield the same result. To alleviate this problem, a series of models that use high-order geometric tensors during the message passing have been proposed. For example, allowing vectors in the message passing process can differentiate Fig. 2b and e, but in the case of Fig. 2c and f which have different α, the summation in Eq. (5) would cause the network to confuse these two structures (a more detailed explanation can be seen in the Supplementary Note 5). It can be anticipated that increasing the order of tensors in message passing would enhance the expressive power of the network. Previously, the order of high-order tensor networks based on Cartesian space was typically limited to 2^33,34, while our method can work with any order Cartesian tensors. In the following, we use ${\scriptstyle{l}\atop}\!{{\boldsymbol{h}}}_{i}^{t}$ to represent the l-order Cartesian tensor features of node i in the t-th layer, and ${{{\bf{r}}}}^{\otimes n}$ to represent tensor product of a vector r for n times: r⊗r⊗⋯⊗r. In particular, for n = 0 we define this to a learnable function of ${||}{{\bf{r}}}{||}:\,{{{\bf{r}}}}^{\otimes 0}={{\rm{f}}}({||}{{\bf{r}}}{||})$.

**Fig. 2: Some structures that cannot be distinguished by message passing networks that do not utilize high-order tensor information.**

Initialize of node features

The scalar features in the first layer ${\scriptstyle0\atop}\!h_{i}^{0}$ should be invariant to rotation, translation, and permutation of the atoms with the same chemical species. This is also the requirement for most descriptors used in machine learning potentials, so these descriptors such as ACSF, SOAP, ACE, MTP, etc., can be used directly to expedite the process of feature extraction. Here, we used the trainable chemical embedding similar to SchNet²³ to minimize human-designed elements as much as possible. Specifically, the atomic numbers are first encoded by one-hot and then multiplied by a learnable weight matrix, resulting in a learnable embedding for each element Z_i. For high-order features ${\scriptstyle{l}\atop}\!{{\boldsymbol{h}}}_{i}^{0}$ with l > 0, we set them all to 0 at the beginning.

Message and aggregate

To combine the information of neighboring nodes, we need to design a message passing function ${{{\rm{M}}}}_{t}$ in Eq. (5). Considering that the hidden feature ${\scriptstyle{l}_{i}\atop}\!{{\boldsymbol{h}}}_{i}^{t}$, ${\scriptstyle{l}_{j}\atop}\!{{\boldsymbol{h}}}_{j}^{t}$, the bond info ${e}_{{ij}}$, and the target message ${\scriptstyle{l}_{{{\rm{out}}}}\atop}\!{{\boldsymbol{m}}}_{{ij}}^{t+1}$ can be tensors of arbitrary order. Therefore, we need to find an equivariant way to compose the two tensors to a new tensor with different order, and Eq. (4) is such an operation. In our model, we write ${{{\rm{M}}}}_{t}$ in Eq. (5) as:

$${\atop{\scriptstyle{l}_{i},{l}_{r}\!}}{\!\scriptstyle{l}_{o}\atop}{\!{\boldsymbol{\!m}}}_{{ij}}^{t}={{{\rm{M}}}}_{t}\left({{{\boldsymbol{h}}}}_{i}^{t},{{{\boldsymbol{h}}}}_{j}^{t},{{{\boldsymbol{e}}}}_{{ij}}\right)={{{\rm{f}}}}_{l_r}^{t}\left({d}_{{ij}}\right){{\cdot }}{\scriptstyle{l}_{i}\atop}{\!{{\boldsymbol{h}}}}_{j_{a_{1}\cdots {a}_{{l}_{i}{-}{l}_{c}}{c}_{1}\cdots {c}_{{l}_{c}}}}^{t}{{\cdot }}{\left({{{\bf{u}}}}_{{ij}}^{\bigotimes {l}_{r}}\right)}_{{c}_{1}\cdots {c}_{{l}_{c}}{b}_{1}\cdots {b}_{{l}_{r}{-}{l}_{c}}}$$

(7)

Where ${d}_{{ij}}={||}{r}_{{ij}}{||}$ is the relative distance between atom i and atom j, ${{{\bf{u}}}}_{{ij}}=\frac{{{{\bf{r}}}}_{{ij}}}{{d}_{{ij}}}$ is the direction vector, $0\le {l}_{c}\le \min ({l}_{i},{l}_{r})$ is the number of the indices summing up during the contraction. ${{{\rm{f}}}}_{{l}_{r}}^{t}\left({d}_{{ij}}\right)$ is the radial function, which is a learnable multi-layer perceptron of radial basis functions such as Bessel basis and Chebyshev basis. The result is a Cartesian tensor with order ${l}_{o}=|{l}_{i}+{l}_{r}-2{l}_{c}|$, which is between $|{l}_{i}-{l}_{r}|$ and ${l}_{i}+{l}_{r}$. Since ${l}_{r}$ can be chosen arbitrarily, we can obtain an equivariant tensor of order from 0 to any arbitrary order.

We use a summation operation as the aggregation function for the messages in Eq. (5), that is, directly adding all the messages obtained from neighboring nodes. For tensors of the same order obtained from different $({l}_{i},{l}_{r})$, we add them together with different coefficients. Due to the arbitrariness of ${l}_{o}$ and ${l}_{r}$, we need to specify their maximum values. With given ${l}_{{{\rm{omax}}}}^{t}$, ${l}_{{{\rm{rmax}}}}^{{{\rm{t}}}}$, we sum up all possible $({l}_{i},{l}_{r})$:

$${\scriptstyle{l}_{o}\atop}{\!{\boldsymbol{m}}}_{i}^{t+1}={\sum}_{{l}_{r}\le {l}_{{{\rm{rmax}}}}^{t}}{\sum}_{{l}_{i}}{c}_{{l}_{o},{l}_{r},{l}_{i}}{\sum}_{j\in N\left(i\right)}{{{\rm{f}}}}_{{l}_{r}}\left({d}_{{ij}}\right)\cdot {\scriptstyle{l}_{i}\atop}{\!{{\boldsymbol{h}}}}_{j_{{a}_{1}\cdots {a}_{{l}_{i}-{l}_{c}}{c}_{1}\cdots {c}_{{l}_{c}}}}^{t}\cdot {\left({{{\bf{u}}}}_{{ij}}^{\bigotimes {l}_{r}}\right)}_{{c}_{1}\cdots {c}_{{l}_{c}}{b}_{1}\cdots {b}_{{l}_{r}-{l}_{c}}}$$

(8)

Update

For scalar message ${\scriptstyle0\atop}\!m_{i}^{t+1}$, we feed it to a fully connected layer followed by a non-linear activation function to extract the information, and update the hidden feature with residual neural networks:

$${{\scriptstyle0}\atop}\!h_{i}^{t+1}={\scriptstyle0\atop}\!h_{i}^{t}+{{\upsigma }}\left({\scriptstyle0\atop}\!{{\rm{W}}}^{t}({\scriptstyle0\atop}\!m_{i}^{t+1})\right)$$

(9)

Where σ is the nonlinear activation function, ${\scriptstyle0\atop}\!{{\rm{W}}}^{t}$ is a linear layer with bias in the t layer for the scalar message. However, for the tensors above 0 order, both the bias and the activation function will break the equivariance (Supplementary Note 2). Therefore, we only apply bias when l = 0.

For the high-order activation function, as shown in Eq. (3), tensor multiplication by a scalar is equivariant. Hence, we need to find a mapping from an n-order tensor to a scalar. One simple idea is to use the squared norm of the tensor ${{||}{{\boldsymbol{T}}}{||}}^{2}={\sum }_{{i}_{1}\cdots {i}_{l}}{{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}^{2}$ since it is invariant by definition. Therefore, for $l \, > \, 0$, we write the element-wise non-linear function as:

$${\scriptstyle{l}\atop}{\!{\upsigma }}{\left({{\boldsymbol{T}}}\right)}_{{i}_{1}\cdots {i}_{l}}={{{\upsigma }}}^{{\prime} }\left({\scriptstyle{l}\atop}\!{{\rm{W}}}({{||}{{\boldsymbol{T}}}{||}}^{2})\right)\cdot {{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}$$

(10)

It should be noted that different notations σ and σ' were used for the activation function in Eqs. (9) and (10), as the choice of activation function may vary for scalar and high-order tensors. Take the SiLU function ${{\rm{\sigma }}}\left(x\right)=x\cdot {{\rm{sigmoid}}}(x)$ for example and suppose ${\scriptstyle{l}\atop}\!{{\rm{W}}}$ is the identity function. For scalar, SiLU maps x to x itself when $x\gg 0$. However, for higher-order tensors, Eq. (10) will map ${{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}$ to ${{||}{{\boldsymbol{T}}}{||}}^{2}\cdot {{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}$ instead of ${{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}$ when ${{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}\gg 0$. This is because if we apply the formula for higher-order tensors to a scalar, which is ${y}_{i}=\sigma \left({{\rm{W}}}({x}_{i})\right)\cdot {x}_{i}$, an extra ${x}_{i}$ is multiplied. Therefore, if we use SiLU function for ${{\rm{\sigma }}}$, we should use Sigmoid function for ${{\rm{\sigma }}}^{\prime} $. Other activation functions can be handled using a similar approach, and hence the update function for high-order tensors is:

$${\scriptstyle{l}\atop}{\!{\boldsymbol{h}}}_{i}^{t+1}={\scriptstyle{l}\atop}{\!{\boldsymbol{h}}}_{i}^{t}+{\scriptstyle{l}\atop}{\!{\upsigma }}\left({\scriptstyle{l}\atop}\!{{\rm{W}}}^{t}({\scriptstyle{l}\atop}\!{{\boldsymbol{m}}}_{i}^{t + 1})\right)$$

(11)

Readout

For a target n-order property, we utilize a two-layer nonlinear MLP to operate on the n-order tensor at the last hidden layer. For the same reason, bias and element-wise nonlinear functions cannot be used in the high-order tensor.

$${\scriptstyle{l}\atop}\!{{\boldsymbol{o}}}_{i}={\scriptstyle{l}\atop}\!{{\rm{W}}}_{2}\left({\scriptstyle{l}\atop}\right.{{\upsigma }}\left({\scriptstyle{l}\atop}\!{{\rm{W}}}_{1}({\scriptstyle{l}\atop}\!{{\boldsymbol{h}}}_{i}^{t})\right)$$

(12)

Performance of HotPP

We validate the accuracy of our method on a diverse range of systems, including small organic molecules, periodic structures, and predictions of dipole moments and polarizability tensors. For each system, we trained the HotPP model on the commonly used dataset and compared the results with other models. To demonstrate the robustness of our model, most models were trained using the same network architecture shown in Fig. 3 and similar hyperparameters. More training details can be seen in Methods section.

Small organic molecule

We first test our model on molecular dynamics trajectories of small organic molecules. The ANI-1x dataset^35,36 contains DFT calculations for approximately five million diverse molecular conformations obtained through an active learning algorithm. To evaluate the extrapolation capability of HotPP, we train our model with 70% data from the ANI-1x dataset and test on the COmprehensive Machine-learning Potential (COMP6) benchmark³⁵, which samples the chemical space of molecules larger than those included in the training set. The results are shown in Table 1. Compared to ani-1x, our model has demonstrated superior performance across the majority of prediction tasks.

Table 1 Test results on COMP6 dataset

Full size table

Periodic systems

After testing HotPP on small molecule datasets without periodicity, we evaluated its performance on periodic systems. We selected the carbon system with various phases as the example³⁷. It is a complicated dataset with a wide range of structures containing structural snapshots from ab initio MD and iteratively extended from GAP-driven simulations, and randomly distorted unit cells of the crystalline allotropes, diamond, and graphite. We show the results in Table 2. Clearly, our model with about 150k parameters demonstrates advantages in predicting forces and virials compared to most models and can achieve accuracy close to l = 3 NequIP model with around 2 million parameters. And when the parameters are expanded to 600k, HotPP achieves the best results on this dataset.

Table 2 Test results on Carbon

Full size table

Next, we verify the accuracy of our potential in calculating phonon dispersions of diamond, which was not well-predicted in some of the previous models for carbon³⁸. We can obtain the force constant matrix of the structure directly through automatic differentiation $\frac{{\partial }^{2}E}{\partial {r}_{i\alpha }\partial {r}_{j\beta }}$. We used Phonopy Python package^39,40 to calculate the phonon spectrum of diamond and compared it with the results from DFT in Fig. 4. The results show that HotPP can describe the vibrational behavior well. Although there are relatively large errors in the high-frequency part at the gamma point, this could be attributed to the inaccuracy of the DFT calculations within the training dataset. We retrained the model using a more accurate dataset³⁸, and the newly calculated phonon spectrum almost perfectly matches the results from DFT, which demonstrates the reliability of our model.

Dipole moment and polarizability tensor

Since our model can directly output vectors and matrices, we attempted to directly predict the dipole moments and polarizability tensors of structures in the final section. We consider the water systems including water monomer, water dimer, Zundel cation, and liquid water bulk⁴¹. The dipole and polarizability of the aperiodic systems were calculated by CCSD theory and those of liquid water were calculated by DFT. Each system contains 1000 structures and we use 70% of them as a training set and the rest as a testing set. We calculate the RMSEs relative to the standard deviation of the testing samples to compare with previous results obtained by other models^41,42,43 as shown in Table 3 In most cases, HotPP gets the best results except for the polarizability tensor of the water monomer. Compared to T-EANN⁴² and REANN⁴³, HotPP performs particularly well in the case of the dipole moment of liquid water. This may be because they fit the dipole moment by learning ${q}_{i}$ and calculating ${{\boldsymbol{\mu }}}={\sum }_{i=1}^{N}{q}_{i}{{{\bf{r}}}}_{i}$, which is inappropriate for periodic systems. In contrast, the output results of our model are all obtained through relative coordinates and thus we can get rid of the selecting of reference point.

Table 3 Comparison the results obtained by SA-GPR, T-EANN, REANN and HotPP in different water systems

Full size table

Since now we can obtain the dipole moment and polarizability by HotPP, we can calculate the infrared (IR) absorption spectrum and Raman spectrum for liquid water. We separately trained a machine learning potential to learn the energy, forces, and stresses of liquid water to assist us in conducting dynamic simulations. With this potential, we perform a classical MD simulation under ambient conditions (300 K, 1 bar) for 100 ps and calculate the dipole moment and the polarizability tensor every 1 fs. Then we compute the IR and Raman spectra by Fourier transforming the autocorrelation function (ACF) of them and the results are compared to the experiment data^44,45 as shown in Fig. 5. We can observe that both HotPP model and DeePMD model^46,47 can closely approximate the experimental IR spectra. Our results accurately fit the first three peaks, corresponding to the hindered translation, libration, and H-O-H bending respectively, but there is a long tail in comparison to the experimental data for the O-H stretching mode. This discrepancy may arise from not accounting for quantum effects in our classical molecule dynamic simulation. And for Raman spectra, our model also gives the result in agreement with experimental data.

**Fig. 5: Experimental and simulated infrared absorption spectrum and reduced anisotorpic Raman spectra of liquid water under ambient condition.**

Discussion

In this work, we introduce HotPP, an E(n) equivariant high-order tensor message passing network based directly on Cartesian space. Compared to other Cartesian-space based high-order tensor networks, HotPP can utilize tensors of arbitrary order, providing enhanced expressive power. In contrast to high-order tensor networks based on spherical harmonics and coupled with CG coefficients, HotPP employs simple tensor contraction operations, resulting in a reduced number of parameters and better computational efficiency for small l (Supplementary Note 6). Moreover, the network’s output can be any-order tensor, enabling convenient prediction of vector or tensor properties. With its ability to achieve high accuracy while saving substantial computational time compared to first-principles calculations, HotPP holds great promise in scenarios such as molecular dynamics simulations and structure optimizations, where exploration of potential energy landscapes is essential. In future work, we would investigate approaches to eliminate redundancies in high-order Cartesian tensors to further enhance accuracy. Additionally, due to its E(n) equivariance (rather than E(3)), HotPP can be explored for high-dimensional structure optimization⁴⁸ to expedite potential energy surface exploration. It can also serve as a foundation for generating models to directly generate structures or predict wave functions. Overall, HotPP is a promising neural network that we believe can facilitate further explorations in physical chemistry, biology, and related fields.

Methods

Software

All experiments were run with the HotPP software available at https://gitlab.com/bigd4/hotpp, git commit be36dae6c2b35148ba214d5626f9960a8eaf5a07. In addition, the Pytorch version was 2.0.1+cu117, PyTorch Lightning under version 2.0.7, and Python under version 3.9.17.

Datasets

Ani-1x: The ANI-1x dataset^35,36 contains approximately five million diverse molecular conformations obtained through an active learning algorithm. We use 70% data of ani1x.h5 dataset downloaded from https://doi.org/10.6084/m9.figshare.c.4712477.v1 to train the model, and then test on the COmprehensive Machine-learning Potential (COMP6) benchmark³⁴. The reference energies are extracted from “wb97x_dz.energy” label, and forces from “wb97x_dz.forces” label.

Carbon: The GAP-17 dataset³⁷ consists of a training set comprising 4080 structures and a test set comprising 450 structures. The reference data are got by single-point DFT-LDA computations with dense reciprocal-space meshes. The GAP-20 dataset³⁸ contains 6088 structures and are calculated with the optB88-vdW dispersion inclusive exchange–correlation functional. We use 70% of the data as the training set.

Water: This dataset includes water monomer, water dimer, Zundel cation, and liquid water bulk⁴¹, each systems contains 1000 configurations, 70% of which are used for training. The water monomer, water dimer, and Zundel cation are calculated at (CCSD)/d-aug-cc-pVTZ level, and the liquid water bulk at the DFT/PBE-USPP level.

Training details

The tensor of the out layer ${l}_{{{\rm{omax}}}}^{t}$ and the tensor product of the relative coordinates ${{{\rm{l}}}}_{{{\rm{rmax}}}}^{{{\rm{t}}}}$ in the models are both set to 2 and a discussion about the effect of these values can be seen in the Supplementary Note 4. The number of chemical embedding channels and features of node messages is 64. The radial function is a 3 layer MLP of dimensions [64, 64, 64] with SiLU nonlinearity and the basis function is 8 trainable Bessel functions similar to NequIP⁸. The readout layer is a 2 layer MLP of dimensions [64, 64] and SiLU nonlinearity. The models were trained with the Adam optimizer⁴⁹ in PyTorch with default parameters. We used a learning rate of 0.01 and the learning rate was reduced using an on-plateau scheduler based on the validation loss with a decayfactor of 0.8.

Ani-1x: We used 5 propagation layers, a radial cutoff of 4.5 Å, a batchsize of 128, and the following loss function with ${\lambda }_{E}$ = 0.1 and ${\lambda }_{f}=1$:

$$L=\frac{{\lambda }_{E}}{N}{\left|\left|\hat{E}-{E}\right|\right|}^{2}+\frac{{\lambda }_{f}}{3N}{\sum }_{i=1}^{N}{\sum }_{\alpha=1}^{3}{\left|\left|-\frac{\partial \widehat{E}}{\partial {{{\bf{r}}}}_{i\alpha }}-{{{\bf{F}}}}_{i\alpha }\right|\right|}^{2}$$

Where N, $E$, $\hat{E}$, ${{{\bf{F}}}}_{i\alpha }$ denote the number of atoms, target energies, predicted energies, and the force of atom i on direction $\alpha $.

Carbon: The cell is multiplied by a unit matrix S before calculations. We used 5 propagation layers, the difference between the models with different parameters is that, for the model with more parameters, we updated both node information and edge information in the propagation layers. We used a radial cutoff of 3 Å, a batchsize of 16, and the following loss function with ${\lambda }_{E}$ = 0.5, ${\lambda }_{f}=1$, and ${\lambda }_{v}=0.2$:

$$L=\frac{{\lambda }_{E}}{N}{\left|\left|\hat{E}-E\right|\right|}^{2}+\frac{{\lambda }_{f}}{3N}{\sum }_{i=1}^{N}{\sum }_{\alpha=1}^{3}{\left|\left|\frac{\partial \hat{E}}{\partial {r}_{i\alpha }}-{{{\bf{F}}}}_{i\alpha }\right|\right|}^{2}+\frac{{\lambda }_{v}}{9}{\sum}_{\alpha,\beta }{\left|\left|-\frac{\partial \widehat{E}}{\partial {{{\boldsymbol{s}}}}_{\alpha \beta }}-{{{\bf{V}}}}_{\alpha \beta }\right|\right|}^{2}$$

Where ${{{\boldsymbol{V}}}}_{\alpha \beta }$, ${{{\boldsymbol{s}}}}_{\alpha \beta }$ denotes the $\alpha,\,\beta $ component of the virial and the unit matrix S.

The NequIP model was trained on the same dataset with NVIDIA GeForce RTX 4090 24GB. We used 4 layers with 64 channels for even and odd parity for both l = 1, l = 2, and l = 3. Radial features are generated using 8 trainable Bessel basis functions and a polynomial envelope for the cutoff with p = 6, the numbers of invariant layers and neurons were set to 2 and 64. Such hyperparameter settings result in model parameter quantities of 389k, 971k, and 1,970k respectively. Models were trained with Adam optimizer with default parameters. We initialized the learning rate to 0.01 and used an on-plateau scheduler based on the validation loss with a patience of 100 and a decay factor of 0.8. We used an exponential moving average with weight 0.99.

Water: We used 4 propagation layers, a radial cutoff of 4 Å, a batchsize of 4, and the following loss functions:

$$L={\left|\left|\widehat{{{\bf{P}}}}-{{\bf{P}}}\right|\right|}^{2}$$

$$L={\left|\left|\widehat{{{\boldsymbol{\alpha }}}}-{{\boldsymbol{\alpha }}}\right|\right|}^{2}$$

Where ${{\bf{P}}}{{\boldsymbol{,}}}\,\hat{{{\bf{P}}}}{{\boldsymbol{,}}}\,{{\boldsymbol{\alpha }}}{{\boldsymbol{,}}}\,\hat{{{\boldsymbol{\alpha }}}}$ are target dipoles, predicted dipoles, target polarizations, predicted polarizations.

Molecluar dynamics simulations

We constructed the PES of liquid water using HotPP with the dataset provided by DeePMD⁵⁰, including 1888 structures computed with the strongly constrained and appropriately normed (SCAN) functional⁵¹. Our model gives the RMSE of 2 meV/atom for the per atom energy, 49 meV/ Å for forces, and 11 meV/atom for per atom virials.

Next we used LAMMPS⁵² to perform the MLMD with the system of 512 water molecules in a cubic≈24.8 Å supercell to make sure the density close to 1 ${{\rm{g\; c}}}{{{\rm{m}}}}^{-3}$. The system was equilibrated at ambient conditions using Nosé-Hoover chain thermostat^53,54 for 200 ps. 10 statistically independent initial conditions were then sampled from the last 100 ps simulation to initialize NVE trajectories of 200 ps length using a time step of 0.5 fs. The initial and final configurations of these 10 simulations can be found in Supplementary Data 1. The dipole moment and polarizability tensor were calculated every 1 fs.

Calculation of infrared absorption spectrum and Raman spectrum

We can get different types of vibrational spectra by Fourier transforming the time autocorrelation functions (ACF) of different physical properties with a trajectory sampled by molecule dynamics. For IR absorption, it can be computed by the ACF of the dipole moment as:

$$I(\omega )\propto {\omega }^{2}{\int }_{\!\!\!\!-{\infty }}^{+{\infty }}{e}^{-i\omega t}\left\langle \mu \left(0\right)\cdot \mu \left(t\right)\right\rangle {dt}$$

Where $\mu $ is dipole moment, $\omega $ is the vibrational frequency, and the bracket is the average over the time origin.

And for Raman spectrum, the isotropic components can be calculated by:

$${R}_{{iso}}(\omega )\propto \omega \tanh (\frac{{{\hslash }}\omega }{2{kT}}){\int }_{\!\!\!\!-{\infty }}^{+{\infty }}{e}^{-i\omega t}\left\langle {\alpha }_{{iso}}\left(0\right)\cdot {\alpha }_{{iso}}\left(t\right)\right\rangle {dt}$$

Where ${\alpha }_{{iso}}\equiv {Tr}(\alpha )/3$ is the isotropic components of the polarizability tensor. And,

$${R}_{{aniso}}(\omega )\propto \omega \tanh (\frac{{{\hslash }}\omega }{2{kT}}){\int }_{\!\!\!\!-{\infty }}^{+{\infty }}{e}^{-i\omega t}{Tr}\left\langle \beta \left(0\right)\cdot \beta \left(t\right)\right\rangle {dt}$$

Where $\beta $ is the anisotropic traceless tensor $\beta=\alpha -{\alpha }_{{iso}}I$.

The IR spectrum and Raman spectrum are scaled to make the maximum values consistent with the experimental data.

Data availability

The datasets used in this paper (ANI-1x, carbon, and water) are publicly available (see “Method”). Source data are provided with this paper.

Code availability

The HotPP code used in the current study is available at GitLab (https://gitlab.com/bigd4/hotpp) and Zenodo (https://doi.org/10.5281/zenodo.12952612), ref. ⁵⁵.

References

Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964).
Article ADS MathSciNet Google Scholar
Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
Article ADS PubMed Google Scholar
Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).
Article ADS PubMed Google Scholar
Shapeev, A. V. Moment tensor potentials: a class of systematically improvable interatomic potentials. Multiscale Model. Simul. 14, 1153–1173 (2016).
Article MathSciNet Google Scholar
Zhang, L., Han, J., Wang, H., Saidi, W. A. & Car, R. End-to-end Symmetry Preserving Inter-atomic Potential Energy Model for Finite and Extended Systems. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 4441–4451 (Curran Associates Inc., Red Hook, NY, USA, 2018).
Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 99, 014104 (2019).
Article ADS CAS Google Scholar
Fan, Z. et al. GPUMD: a package for constructing accurate machine-learned potentials and performing highly efficient atomistic simulations. J. Chem. Phys. 157, 114801 (2022).
Article ADS PubMed CAS Google Scholar
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Smith, J. S. et al. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 10, 2903 (2019).
Article ADS PubMed PubMed Central Google Scholar
Daru, J., Forbert, H., Behler, J. & Marx, D. Coupled cluster molecular dynamics of condensed phase systems enabled by machine learning potentials: liquid water benchmark. Phys. Rev. Lett. 129, 226001 (2022).
Article ADS PubMed CAS Google Scholar
Musil, F. et al. Physics-inspired structural representations for molecules and materials. Chem. Rev. 121, 9759–9815 (2021).
Article PubMed CAS Google Scholar
Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).
Article ADS PubMed Google Scholar
Gastegger, M., Schwiedrzik, L., Bittermann, M., Berzsenyi, F. & Marquetand, P. WACSF - weighted atom-centered symmetry functions as descriptors in machine learning potentials. J. Chem. Phys. 148, 241709 (2018).
Article ADS PubMed CAS Google Scholar
Fan, Z. et al. Neuroevolution machine learning potentials: combining high accuracy and low cost in atomistic simulations and application to heat transport. Phys. Rev. B 104, 104309 (2021).
Article ADS CAS Google Scholar
Fan, Z. Improving the accuracy of the neuroevolution machine learning potential for multi-component systems. J. Phys.: Condens. Matter 34, 125902 (2022).
ADS CAS Google Scholar
Xu, N. et al. Tensorial properties via the neuroevolution potential framework: fast simulation of infrared and raman spectra. J. Chem. Theory Comput. https://doi.org/10.1021/acs.jctc.3c01343 (2024).
Bartók, A. P., Kondor, R. & Csányi, G. On representing chemical environments. Phys. Rev. B 87, 184115 (2013).
Article ADS Google Scholar
Zhang, L. et al. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120, 143001 (2018).
Article ADS PubMed CAS Google Scholar
Pozdnyakov, S. N. et al. Incompleteness of atomic structure representations. Phys. Rev. Lett. 125, 166001 (2020).
Article ADS MathSciNet PubMed CAS Google Scholar
Drautz, R. Atomic cluster expansion of scalar, vectorial, and tensorial properties including magnetism and charge transfer. Phys. Rev. B 102, 024104 (2020).
Article ADS CAS Google Scholar
Novikov, I. S., Gubaev, K., Podryabinkin, E. V. & Shapeev, A. V. The MLIP package: moment tensor potentials with MPI and active learning. Mach. Learn.: Sci. Technol. 2, 025002 (2021).
Google Scholar
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for Quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning. Vol. 70, 1263–1272 (JMLR.org, Sydney, NSW, Australia, 2017).
Schütt, K. T. et al. SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst. 2017, 992–1002 (2017).
Google Scholar
Thomas, N. et al. Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. Preprint at http://arxiv.org/abs/1802.08219 (2018).
Unke, O. T. & Meuwly, M. PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).
Article PubMed CAS Google Scholar
Schütt, K., Unke, O. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. in Proceedings of the 38th International Conference on Machine Learning (eds. Meila, M. & Zhang, T.) 139 9377–9388 (PMLR, 2021).
Gasteiger, J., Groß, J. & Günnemann, S. Directional Message Passing for Molecular Graphs. In International Conference on Learning Representations (ICLR) (2020).
Thölke, P. & Fabritiis, G. D. Equivariant Transformers for Neural Network based Molecular Potentials. In International Conference on Learning Representations (2022).
Haghighatlari, M. et al. NewtonNet: a Newtonian message passing network for deep learning of interatomic potentials and forces. Digital Discov. 1, 333–343 (2022).
Article CAS Google Scholar
Batatia, I. et al. The Design Space of E(3)-Equivariant Atom-Centered Interatomic Potentials. Preprint at http://arxiv.org/abs/2205.06643 (2022).
Batatia, I., Kovacs, D. P., Simm, G. N. C., Ortner, C. & Csanyi, G. MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. In Advances in Neural Information Processing Systems (eds. Oh, A. H., Agarwal, A., Belgrave, D. & Cho, K.) (2022).
Finkelshtein, B., Baskin, C., Maron, H. & Dym, N. A Simple and Universal Rotation Equivariant Point-cloud Network. In TAG-ML (2022).
Takamoto, S., Izumi, S., Li, J. & TeaNet Universal neural network interatomic potential inspired by iterative electronic relaxations. Computational Mater. Sci. 207, 111280 (2022).
Article CAS Google Scholar
Simeon, G. & Fabritiis, G. D. TensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular Potentials. In Thirty-seventh Conference on Neural Information Processing Systems (2023).
Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).
Article ADS PubMed Google Scholar
Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 134 (2020).
Article PubMed PubMed Central CAS Google Scholar
Deringer, V. L. & Csányi, G. Machine learning based interatomic potential for amorphous carbon. Phys. Rev. B 95, 094203 (2017).
Article ADS Google Scholar
Rowe, P., Deringer, V. L., Gasparotto, P., Csányi, G. & Michaelides, A. An accurate and transferable machine learning potential for carbon. J. Chem. Phys. 153, 034702 (2020).
Article PubMed CAS Google Scholar
Togo, A. First-principles phonon calculations with phonopy and phono3py. J. Phys. Soc. Jpn. 92, 012001 (2023).
Article ADS Google Scholar
Togo, A., Chaput, L., Tadano, T. & Tanaka, I. Implementation strategies in phonopy and phono3py. J. Phys.: Condens. Matter 35, 353001 (2023).
CAS Google Scholar
Grisafi, A., Wilkins, D. M., Csányi, G. & Ceriotti, M. Symmetry-adapted machine learning for tensorial properties of atomistic systems. Phys. Rev. Lett. 120, 036002 (2018).
Article ADS PubMed CAS Google Scholar
Zhang, Y. et al. Efficient and accurate simulations of vibrational and electronic spectra with symmetry-preserving neural network models for tensorial properties. J. Phys. Chem. B 124, 7284–7290 (2020).
Article PubMed CAS Google Scholar
Zhang, Y., Xia, J. & Jiang, B. REANN: a PyTorch-based end-to-end multi-functional deep neural network package for molecular, reactive, and periodic systems. J. Chem. Phys. 156, 114801 (2022).
Article ADS PubMed CAS Google Scholar
Bertie, J. E. & Lan, Z. Infrared intensities of liquids xx: the intensity of the oh stretching band of liquid water revisited, and the best current values of the optical constants of H₂O(l) at 25 °C between 15,000 and 1 cm⁻¹. Appl. Spectrosc., 50, 1047–1057 (1996).
Article ADS CAS Google Scholar
Brooker, M. H., Hancock, G., Rice, B. C. & Shapter, J. Raman frequency and intensity studies of liquid H2O, H218O and D2O. J. Raman Spectrosc. 20, 683–694 (1989).
Article ADS CAS Google Scholar
Zhang, C. et al. Modeling liquid water by climbing up jacob’s ladder in density functional theory facilitated by using deep neural network potentials. J. Phys. Chem. B 125, 11444–11456 (2021).
Article PubMed CAS Google Scholar
Sommers, G. M., Calegari Andrade, M. F., Zhang, L., Wang, H. & Car, R. Raman spectrum and polarizability of liquid water from deep neural networks. Phys. Chem. Chem. Phys. 22, 10592–10602 (2020).
Article PubMed CAS Google Scholar
Pickard, C. J. Hyperspatial optimization of structures. Phys. Rev. B 99, 054102 (2019).
Article ADS CAS Google Scholar
Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2017).
Zhang, L. et al. Phase diagram of a deep potential water model. Phys. Rev. Lett. 126, 236001 (2021).
Article ADS PubMed CAS Google Scholar
Sun, J., Ruzsinszky, A. & Perdew, J. P. Strongly constrained and appropriately normed semilocal density functional. Phys. Rev. Lett. 115, 036402 (2015).
Article ADS PubMed Google Scholar
Thompson, A. P. et al. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Computer Phys. Commun. 271, 108171 (2022).
Article CAS Google Scholar
Nosé, S. A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 52, 255–268 (1984).
Article ADS Google Scholar
Hoover, W. G. Canonical dynamics: equilibrium phase-space distributions. Phys. Rev. A 31, 1695–1697 (1985).
Article ADS CAS Google Scholar
Wang, J. et al. E(n)-equivariant cartesian tensor passing potential: HotPP. Zenodo https://doi.org/10.5281/zenodo.12952612 (2024).
Medders, G. R. & Paesani, F. Infrared and raman spectroscopy of liquid water through “first-principles” many-body molecular dynamics. J. Chem. Theory Comput. 11, 1145–1154 (2015).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant number. 12125404), the Basic Research Program of Jiangsu (Grant BK20233001, BK20241253), the Jiangsu Funding Program for Excellent Postdoctoral Talent (2024ZB075), the Postdoctoral Fellowship Program of CPSF (Grant GZC20240695), and the Fundamental Research Funds for the Central Universities. Y. W. was partially supported by the Computational Chemical Sciences Center: Chemistry in Solution and at Interfaces (CSI) funded by DOE Award DE-SC0019394, and used resources of the National Energy Research Scientific Computing Center (NERSC) operated under Contract No.DE-AC02-05CH11231 using NERSC award ERCAP0021510. The calculations were carried out using supercomputers at the High-Performance Computing Center of Collaborative Innovation Center of Advanced Microstructures, the high-performance supercomputing center of Nanjing University.

Author information

These authors contributed equally: Junjie Wang, Yong Wang.

Authors and Affiliations

National Laboratory of Solid State Microstructures, School of Physics and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China
Junjie Wang, Yong Wang, Haoting Zhang, Ziyang Yang, Zhixin Liang, Jiuyang Shi, Hui-Tian Wang, Dingyu Xing & Jian Sun
Department of Chemistry, Princeton University, Princeton, NJ, 08544, USA
Yong Wang

Authors

Junjie Wang
View author publications
Search author on:PubMed Google Scholar
Yong Wang
View author publications
Search author on:PubMed Google Scholar
Haoting Zhang
View author publications
Search author on:PubMed Google Scholar
Ziyang Yang
View author publications
Search author on:PubMed Google Scholar
Zhixin Liang
View author publications
Search author on:PubMed Google Scholar
Jiuyang Shi
View author publications
Search author on:PubMed Google Scholar
Hui-Tian Wang
View author publications
Search author on:PubMed Google Scholar
Dingyu Xing
View author publications
Search author on:PubMed Google Scholar
Jian Sun
View author publications
Search author on:PubMed Google Scholar

Contributions

J.Sun. conceived and led the project. J.W. and Y.W. deduced the formula and completed the coding. H.Z. helped to optimize the efficiency of the code. Z.Y., Z.L. and J.Shi. trained the model in different datasets. J.W., Y.W., H.-T.W., D.X. and J. Sun wrote the manuscript. All authors discussed the results and commented on the manuscript.

Corresponding author

Correspondence to Jian Sun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Wilson Gregory and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, J., Wang, Y., Zhang, H. et al. E(n)-Equivariant cartesian tensor message passing interatomic potential. Nat Commun 15, 7607 (2024). https://doi.org/10.1038/s41467-024-51886-6

Download citation

Received: 16 March 2024
Accepted: 16 August 2024
Published: 01 September 2024
Version of record: 01 September 2024
DOI: https://doi.org/10.1038/s41467-024-51886-6

This article is cited by

Optimal invariant sets for atomistic machine learning
- Alice E. A. Allen
- Emily Shinkle
- Nicholas Lubbers
npj Computational Materials (2026)
Monolayer methane hydrate formation in 2D confinement with multiple plastic phases and low superionic pressure
- Chi Ding
- Yu Han
- Jian Sun
Science China Physics, Mechanics & Astronomy (2026)
Efficient crystal structure prediction based on the symmetry principle
- Yu Han
- Chi Ding
- Jian Sun
Nature Computational Science (2025)
Accurate piezoelectric tensor prediction with equivariant attention tensor graph neural network
- Luqi Dong
- Xuanlin Zhang
- Yunhao Lu
npj Computational Materials (2025)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Equivariant functions of Cartesian tensors

Equivariant message passing neural network

Initialize of node features

Message and aggregate

Update

Readout

Performance of HotPP

Small organic molecule

Periodic systems

Dipole moment and polarizability tensor

Discussion

Methods

Software

Datasets

Training details

Molecluar dynamics simulations

Calculation of infrared absorption spectrum and Raman spectrum

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links