Introduction

Machine learning interatomic potentials (MLIPs) have been widely employed in molecular simulations to investigate the properties of all kinds of materials, ranging from small organic molecules to inorganic crystals and biological systems. MLIPs offer a balance between accuracy and efficiency1,2,3,4, making them a suitable choice between first-principles methods such as density functional theory (DFT) and classical interatomic potentials such as the embedded atom method (EAM)5.

Since the introduction of the pioneering Behler–Parrinello neural network (BPNN) potential6 and the Gaussian approximation potential (GAP)7, many valuable MLIPs have been developed8,9,10,11,12,13,14,15,16,17,18,19. At the same time, techniques have emerged to assess the reliability of MLIP predictions by quantifying uncertainties and evaluating their influence on downstream properties20,21,22,23,24. Despite this progress, the pursuit of enhancing and developing new MLIPs remains ongoing. The current leading MLIPs in terms of accuracy and stability in molecular dynamics (MD) simulations are those based on graph neural networks (GNNs). These models represent an atomic structure as a graph and perform message passing between atoms to propagate information25. The passed messages can be either scalars10,12, vectors13, or, more generally, tensors16,17,18,19. Using GNNs, a couple of universal MLIPs have recently been developed26,27,28,29,30,31,32, which can cover a wide range of materials with chemical species across the periodic table.

At the core of any MLIP lies a description of the atomic environment, which converts the information contained in an atomic neighborhood to a numerical representation that can then be updated and used in an ML regression algorithm. The representation must satisfy certain symmetries (invariance to permutation of atoms, and invariance or equivariance to translation, rotation, and inversion) inherent to atomic systems. Despite the diverse choice of ML regression algorithms, the atomic environment representation is typically constructed by expanding atomic positions on some basis functions. Early works employ manually crafted basis functions for the bond distances and bond angles, such as the atom-centered symmetry functions used in BPNN6 and ANI-133. More systematic representations have been developed and adopted in GAP7, SNAP34, ACE35, NequIP17, and MACE16, among others. In these models, a representation is obtained by first expanding the atomic environment using a radial basis (e.g., Chebyshev polynomials) and the spherical harmonics angular basis to obtain tensorial equivariant features. The equivariant features are then updated and finally contracted to yield the scalar potential energy.

Another approach seeks to develop atomic representations and MLIPs entirely in Cartesian space, bypassing the need for spherical harmonics. This can potentially offer greater simplicity and reduced complexity compared to models using spherical representations. The MTP model by Shapeev36 adopted this approach, which constructs Cartesian moment tensors to capture the angular information of an atomic environment. It has been shown that MTP and ACE are highly related given that a moment tensor can be expressed using spherical harmonics37. TeaNet18 and TensorNet19 have explored the use of Cartesian tensors limited to the second rank to create MLIPs. The recent CACE model38 has extended the atomic cluster expansion to Cartesian space and subsequently built MLIPs on this foundation. These works represent excellent efforts toward building MLIPs using a fully Cartesian representation of atomic environments. However, their performance has not yet matched that of models built using spherical tensors.

In this work, we propose the Cartesian Atomic Moment Potential (CAMP), a systematically improvable MLIP developed in Cartesian space under the GNN framework. Inspired by MTP36, CAMP employs physically motivated moment tensors to characterize atomic environments. It then generates hyper moments through tensor products of these moment tensors, effectively capturing higher-order interactions. We have devised specific construction rules for atomic and hyper moment tensors that direct message flow from higher-rank to lower-rank tensors, ultimately to scalars. This approach aligns with our objective of modeling potential energy (a scalar quantity) and significantly reduces the number of moment tensors, enhancing computational efficiency. These moment tensors are subsequently integrated into a message-passing GNN framework, undergoing iterative updates and refinement to form the final CAMP. To evaluate the performance of the model, we have conducted tests across diverse material systems, including periodic LiPS inorganic crystals17 and bulk water39, non-periodic small organic molecules8, and partially periodic two-dimensional (2D) graphene40. These comprehensive benchmark studies demonstrate that CAMP is an accurate, efficient, and stable MLIP for molecular simulations.

Results

Model Architecture

CAMP is a GNN model that processes atomic structures, utilizing atomic coordinates, atomic numbers, and cell vectors as input, and predicts the total potential energy, stresses on the structure, and the forces acting on individual atoms.

Figure 1 presents a schematic overview of the model architecture. In the following, we discuss the key components and model design choices.

Fig. 1: Schematic overview of the CAMP model architecture.
figure 1

a An atomic structure is represented as a graph. Each atom is associated with a feature h, and all atoms within a distance of rcut to a center atom i constitute its local atomic neighborhood. b Atomic numbers zi and zj of a pair of atoms and their relative position vector r are expanded into the radial part R and angular part D. c Atomic moment M of the central atom is obtained by aggregating information from neighboring atoms. d Hyper moment H of the central atom is computed by self-interactions between atomic moments. e Hyper moments Ht of different layers t are used to construct the energy of the central atom via some mapping function V.

Atomic graph

An atomic structure is represented as a graph G(V, E) consisting of a set of nodes V and a set of edges E. Each node in V represents an atom, and an edge in E is created between two atoms if they are within a cutoff distance of rcut (Fig. 1a). An atom node i is described by a 3-tuple (ri, zi, hi), where ri is the position of the atom, zi is the atomic number, and hi is the atom feature. The vector rij = rjri associated with the edge from atom i to atom j gives their relative position.

The atom feature hi is a set of tensors that carry two indices u and v, becoming \({{\boldsymbol{h}}}_{uv}^{i}\) in its full form, where u and v denote the channel and rank of each feature tensor, respectively. The use of channels allows for multiple copies of tensors of the same rank, helping to increase the expressiveness of the feature.

The atom features are mixed across different channels using a linear mapping:

$${{\boldsymbol{h}}}_{uv}^{i}=\sum _{{u}^{{\prime} }}{W}_{u{u}^{{\prime} }}{{\boldsymbol{h}}}_{{u}^{{\prime} }v}^{i},$$
(1)

where \({W}_{u{u}^{{\prime} }}\) are trainable parameters.

Radial basis

The edge length rij = rij is expanded using a set of radial basis functions, Bu(rij, zi, zj), indexed by the channel u. Following MTP36,41, each basis function is a linear combination of the Chebyshev polynomials of the first kind Qβ:

$${B}_{u}({r}^{ij},{z}^{i},{z}^{j})=\mathop{\sum }\limits_{\beta =0}^{{N}_{\beta }}{W}_{u{z}^{i}{z}^{j}}^{\beta }{Q}^{\beta }\left(\frac{{r}^{ij}}{{r}_{{\rm{cut}}}}\right),$$
(2)

where Nβ denotes the maximum degree of the Chebyshev polynomials. Separate trainable parameters \({W}_{u{z}^{i}{z}^{j}}^{\beta }\) are used for different atom pairs, enabling customized radial basis dependent on their atomic numbers zi and zj. Moreover, the radial basis Bu allows the model to scale to any number of chemical species without increasing the model size.

Angular part

The angular information of an atomic environment is contained in the normalized edge vectors, \(\hat{{\boldsymbol{r}}}={\boldsymbol{r}}/r\), where we have omitted the atom indices i and j in rij and rij for simplicity. Unlike many existing models that expand it on spherical harmonics, we directly adopt the Cartesian basis. The rank-vpolyadic tensor of \(\hat{{\boldsymbol{r}}}\) is constructed as

$${{\boldsymbol{D}}}_{v}=\hat{{\boldsymbol{r}}}\otimes \hat{{\boldsymbol{r}}}\otimes \cdots \otimes \hat{{\boldsymbol{r}}}\quad (v\,{\rm{of}}\,\hat{{\boldsymbol{r}}}),$$
(3)

where denotes a tensor product. For example, D0 = 1 is a scalar, \({{\boldsymbol{D}}}_{1}=\hat{{\boldsymbol{r}}}=[{\hat{r}}_{x},{\hat{r}}_{y},{\hat{r}}_{z}]\) is a vector, and \({{\boldsymbol{D}}}_{2}=\hat{{\boldsymbol{r}}}\otimes \hat{{\boldsymbol{r}}}\) is a rank-2 tensor, which can be written in a matrix form as

$${{\boldsymbol{D}}}_{2}=\left[\begin{array}{ccc}{\hat{r}}_{x}^{2}&{\hat{r}}_{x}{\hat{r}}_{y}&{\hat{r}}_{x}{\hat{r}}_{z}\\ {\hat{r}}_{y}{\hat{r}}_{x}&{\hat{r}}_{y}^{2}&{\hat{r}}_{y}{\hat{r}}_{z}\\ {\hat{r}}_{z}{\hat{r}}_{x}&{\hat{r}}_{z}{\hat{r}}_{y}&{\hat{r}}_{z}^{2}\end{array}\right],$$
(4)

where \({\hat{r}}_{x}\), \({\hat{r}}_{y}\), and \({\hat{r}}_{z}\) are the Cartesian components of \(\hat{{\boldsymbol{r}}}\) (Fig. 1b). Any Dv is symmetric, as can be seen from the definition in Eq. (3). In CAMP, the angular information of an atomic environment is captured by D0, D1, D2, and so on, which are straightforward to evaluate. In full notation with atom indices, a polyadic tensor is written as \({{\boldsymbol{D}}}_{v}^{ij}\).

Atomic moment

A representation of the local atomic environment of an atom is constructed from the radial basis, angular part, and atom feature, which we call the atomic moment:

$${{\boldsymbol{M}}}_{uv,p}^{i}=\sum _{j\in {{\mathcal{N}}}_{i}}{R}_{uv{v}_{1}{v}_{2}}{{\boldsymbol{h}}}_{u{v}_{1}}^{j}{\odot }^{c}{{\boldsymbol{D}}}_{{v}_{2}}^{ij},$$
(5)

where \({{\mathcal{N}}}_{i}\) denotes the set of neighboring atoms within a distance of rcut to atom i. The radial part is obtained by passing the radial basis through a multilayer perceptron (MLP), \({R}_{uv{v}_{1}{v}_{2}}={\rm{MLP}}({B}_{u})\). For different combinations of v, v1, and v2, different MLPs are used. The symbol c denotes a degree-c contraction between two tensors. For example, for tensors A and B of rank-2 and rank-3, respectively, A 2B means Ck = ∑ijAijBijk, resulting in a rank-1 tensor. In other words, c is the number of indices contracted away in each of the two tensors involved in the contraction; thus, the output tensor has a rank of v = v1 + v2 − 2c. We denote the combination of v1, v2, and c as a path: p = (v1, v2, c). In Eq. (5), multiple paths can result in output tensors of the same rank v. For example, p = (2, 3, 2) and p = (1, 2, 1) both lead to atomic moment rank v = 1. This is why the index p is used in \({{\boldsymbol{M}}}_{uv,p}^{i}\) to denote the path from which the atomic moment is obtained.

The atomic moment carries significant physical implications. For a physical quantity Q at a distance r from a reference point, the nth moment of Q is defined as rnQ. For example, when Q represents a force and n = 1, rQ corresponds to the torque. Eq. (5) generalizes this concept, with the feature \({{\boldsymbol{h}}}_{u{v}_{1}}^{j}\) of atom j acting as Q, and \({{\boldsymbol{D}}}_{{v}_{2}}^{ij}\) serving as the “distance” r. The radial part \({R}_{uv{v}_{1}{v}_{2}}\) functions as a weighting factor, modulating the relative contribution of different atoms j. This is why we refer to the output in Eq. (5) as the atomic moment.

By construction, the atomic moment is symmetric. This is because, first, both \({{\boldsymbol{h}}}_{u{v}_{1}}^{j}\) (explained below around Eq. (9)) and \({{\boldsymbol{D}}}_{{v}_{2}}^{ij}\) are symmetric; second, an additional constraint c = v1 < v2 is imposed. The constraint means that the indices of \({{\boldsymbol{h}}}_{u{v}_{1}}^{j}\) and thus the output atomic moment has v = v2v1 indices from \({{\boldsymbol{D}}}_{{v}_{2}}^{ij}\). A detailed example of evaluating Eq. (5) is provided in the Supplementary Information (SI).

Atomic moment tensors of the same rank v from different paths are combined using a linear layer:

$${{\boldsymbol{M}}}_{uv}=\sum _{p}{W}_{uv,p}{{\boldsymbol{M}}}_{uv,p},$$
(6)

in which Wuv,p are trainable parameters, and the atom index i is omitted for simplicity.

Hyper moment

From atomic moments, we create the hyper moment as

$${{\boldsymbol{H}}}_{uv,p}={{\boldsymbol{M}}}_{u{v}_{1}}{\odot }^{{c}_{1}}{{\boldsymbol{M}}}_{u{v}_{2}}{\odot }^{{c}_{2}}...{\odot }^{{c}_{n-1}}{{\boldsymbol{M}}}_{u{v}_{n}}.$$
(7)

Atomic moments capture two-body interactions between a center atom and its neighbors. Three-body, four-body, and higher-body interactions are incorporated into the hyper moments via the self-interactions between atomic moments in Eq. (7). The hyper moments can provide a complete description of the local atomic environment by increasing the order of interactions36,37, which is crucial for constructing systematically improvable MLIPs. The hyper moments are analogous to the B-basis used in ACE35 and MACE16.

The primary target of an MLIP model is the potential energy, a scalar quantity. Aligned with this, we construct hyper moments such that information flows from higher-rank atomic moment tensors to lower-rank ones, ultimately to scalars. Specifically, we impose the following rule: let \({{\boldsymbol{M}}}_{u{v}_{n}}\) be the tensor of the highest rank among all atomic moment tensors in Eq. (7), then possible contractions are only between it and the other atomic moments, but not between \({{\boldsymbol{M}}}_{u{v}_{1}}\) and \({{\boldsymbol{M}}}_{u{v}_{2}}\), and so forth. This design choice substantially reduces the number of hyper moments, which simplifies both the model architecture and the overall training process. Moreover, similar to the atomic moment, the indices of lower-rank atomic tensors \({{\boldsymbol{M}}}_{u{v}_{1}},{{\boldsymbol{M}}}_{u{v}_{2}},...\) are completely contracted away, and therefore we have v = vn − (v1 + v2 + . . . + vn−1). This guarantees that Huv,p is constructed to be symmetric. Multiple contractions can result in hyper moments of the same rank v; they are indexed by p. A detailed example of evaluating Eq. (7) to create hyper moments is provided in the SI.

Message Passing

The messages to an atom i from all neighboring atoms are chosen to be a linear expansion over the hyper moments:

$${{\boldsymbol{m}}}_{uv}=\sum _{p}{W}_{uv,p}{{\boldsymbol{H}}}_{uv,p}.$$
(8)

Atom features are then updated using a residual connection42 by linearly combining the message and the atom feature of the previous layer:

$${{\boldsymbol{h}}}_{uv}^{t}=\sum _{{u}^{{\prime} }}{W}_{1,u{u}^{{\prime} }}{{\boldsymbol{m}}}_{{u}^{{\prime} }v}+\sum _{{u}^{{\prime} }}{W}_{2,u{u}^{{\prime} }}{{\boldsymbol{h}}}_{{u}^{{\prime} }v}^{t-1},$$
(9)

where t is the layer index. Atom features in the input layer \({{\boldsymbol{h}}}_{u{v}^{{\prime} }}^{0}\) consist only of scalars \({h}_{u0}^{0}\), obtained as a u-dimensional embedding of the atomic number z. By construction, \({{\boldsymbol{h}}}_{uv}^{t}\) is symmetric.

Output

The message passing is performed multiple times T. We find that typical values of two or three are good enough based on benchmark studies to be discussed in the following sections. The atomic energy of atom i is then obtained from the scalar atom features \({h}_{u0}^{i,t}\) from all layers:

$${E}^{i}=\mathop{\sum }\limits_{t=1}^{T}V({{\boldsymbol{h}}}_{u0}^{i,t}).$$
(10)

V is set to an MLP for the last layer T, and a linear function for others layers, that is, \(V({{\boldsymbol{h}}}_{u0}^{i,t})=\sum _{u}{W}_{u}^{t}{{\boldsymbol{h}}}_{u0}^{i,t}\) for t < T. The atomic energies of all atoms are summed to get the total potential energy

$$E=\sum _{i}{E}^{i}.$$
(11)

Forces on atom i can then be computed as

$${{\boldsymbol{F}}}_{i}=-\frac{\partial E}{\partial {{\boldsymbol{r}}}_{i}}.$$
(12)

Computational complexity

The most computationally demanding part of CAMP is creating the atomic moments and hyper moments. Let \({v}_{\max }\) represent the maximum rank of tensors used in the model. This implies that all features h, atomic moments M, and hyper moments H have ranks up to \({v}_{\max }\). The time complexity for constructing both the atomic moment in Eq. (5) and the hyper moment in Eq. (7) is \({\mathcal{O}}({3}^{{v}_{\max }})\) (detailed analysis in the SI). In contrast, for models based on spherical tensors (such as NequIP17 and MACE16), the time complexity of the Clebsch–Gordan tensor product is \({\mathcal{O}}({L}^{6})\), where L is the maximum degree of the irreducible spherical tensors. A Cartesian tensor (e.g., M and H) of rank v can be decomposed as the sum of spherical tensors of degrees up to v43; therefore, as far as the computational cost is concerned, \({v}_{\max }\) and L can be thought to be equivalent to each other. Asymptotically, as L (or \({v}_{\max }\)) increases, the exponential-scaling Cartesian models are computationally less efficient than the polynomial-scaling spherical models. Nevertheless, empirical evidence suggests that, for many material systems, the adopted values of L or (\({v}_{\max }\)) are below five16,17,44. In such cases, Cartesian models should be more computationally efficient. We note that there exist techniques to reduce the computational cost of the Clebsch–Gordan tensor product in spherical models from \({\mathcal{O}}({L}^{6})\) to \({\mathcal{O}}({L}^{3})\), such as reducing SO(3) convolutions to SO(2)45 and replacing SO(3) convolutions by matrix multiplications46. When such techniques are adopted, spherical models can be more efficient.

Inorganic crystals

We first test CAMP on a dataset of inorganic lithium phosphorus sulfide (LiPS) solid-state electrolytes17. Multiple models were trained using varying numbers of training samples (10, 100, 1000, and 2500), validated on 1000 samples, and tested on 5000 samples (training details in Methods). Mean absolute errors (MAEs) of energy and forces on the test set are listed in Table 1. Compared with NequIP17, CAMP yields smaller or equal errors on seven of the eight tasks. The learning curve in Fig. 2 exhibits a linear decrease in both energy and forces MAEs on a log-log scale as the number of training samples increases. These results demonstrate CAMP’s accuracy and its capacity for further improvement with increased data.

Table 1 Performance of CAMP on the LiPS dataset
Fig. 2: Learning curve of CAMP.
figure 2

On a log-log scale, the MAEs of both energy (meV/atom) and forces (meV/Å) decrease linearly with the training set size.

We further examine the performance in MD simulations, focusing on computing the diffusivity of lithium ions (Li+) in LiPS. Diffusivity is a crucial characteristic for assessing the potential of solid materials as electrolytes in next-generation solid-state batteries. Following existing benchmark studies47, we trained another model with 19000 structures with the same validation and test sets discussed above. Five structures were randomly selected from the test set and an independent MD simulation was performed for each structure. As seen in Fig. 3, CAMP accurately captures the structural information of LiPS, reproducing the radial distribution function (RDF) and the angular distribution function (ADF) of the S–P–S tetrahedral angle (Fig. S1 in the SI) from ab initio molecular dynamics (AIMD) simulations.

Fig. 3: MD simulation results on LiPS.
figure 3

a Radial distribution function. b Angular distribution function for the S–P–S tetrahedral angle. c Mean squared displacement of Li+ in LiPS. Five MD simulations are performed, and the calculated Li+ diffusion coefficient is D = (1.08 ± 0.08) × 10−5 cm2/s.

CAMP can yield stable MD simulations. It is known that MLIPs that perform well on energy and forces do not necessarily guarantee high-quality MD simulations. In particular, MD simulations can collapse due to model instability. Therefore, before calculating the diffusivity, we checked the simulation stability. Stability is measured as the difference between the RDF averaged over a time window and the RDF of the entire simulation (see Methods). CAMP shows excellent stability, reaching the entire simulation time of 50 ps in all five simulations (Table 2). In ref. 47, a small timestep of 0.25 fs was adopted in the MD simulations using the models in Table 2. We also tested CAMP with a large timestep of 1 fs and found that all five simulations are still stable up to 50 ps (see Fig. S2 in the SI).

Table 2 Results on LiPS for models developed with 19000 training samples

The diffusion coefficient D of Li+ in LiPS calculated from CAMP agrees well with the AIMD result. After confirming the stability, D was calculated using the Einstein equation by fitting a linear line between the mean squared displacement (MSD) of Li+ and the correlation time (see Methods). The MSDs of the five MD runs are shown in Fig. 3c, from which the diffusion coefficient is calculated as D = (1.08 ± 0.08) × 10−5 cm2/s. This agrees reasonably well with AIMD result of 1.37 × 10−5 cm2/s17.

Bulk water

To evaluate CAMP’s ability to model complex liquid systems, we test it on a dataset of bulk water39. See Methods Dataset and training details are in Root-mean-square errors (RMSEs) of energy and forces on the test set are listed in Table 3. In general, GNN models (REANN48, NequIP17, MACE16, and CACE38) have smaller errors than single-layer ACE35 and descriptors-based BPNN39. Our CAMP achieves the best performance on both energy and forces, with RMSEs of 0.59 meV/atom and 34 meV/Å, respectively. We observe that CACE38 demonstrates comparable RMSEs in energy when compared to CAMP. CACE38 is a recent model also developed entirely in Cartesian space. Similar to CAMP, it also first constructs the atomic basis (Eq. (5)), then builds the product basis on top of the atomic basis (Eq. (7)), and iteratively updates the features under a GNN framework to refine the representations. A major difference is that CACE is formulated based on the atomic cluster expansion, while CAMP is designed using the atomic moment. The connections between CAMP and CACE, as well as other models, are further elaborated in Discussion.

Table 3 Model performance on the water dataset

CAMP accurately reproduces experimental structural and dynamical properties of water. Figure 4 shows the oxygen–oxygen RDF of water from MD simulations using CAMP (computation details in Methods), together with experimental observations obtained from neutron diffraction49 and x-ray diffraction50. The RDF by CAMP almost overlaps with the x-ray diffraction result, demonstrating its ability to capture the delicate structural features of water. We also calculated the diffusion coefficient of water (1 g/cm3) at 300 K to be D = (2.79 ± 0.18) × 10−5 cm2/s, which is in excellent agreement with AIMD results of 2.67 × 10−5 cm2/s51. The model shows superior stability for water, producing stable MD simulations at high temperatures up to 1500 K. The RDF and the diffusion coefficient at various high temperatures are presented in Fig. S3 in the SI.

Fig. 4: Oxygen–Oxygen RDF of water.
figure 4

Experimental neutron diffraction49 and x-ray diffraction50 results are shown for comparison.

We examine the efficiency of CAMP by measuring the number of completed steps per second in MD simulations. It is not a surprise that DeePMD9 runs much faster due to its simplicity in model architecture, although its accuracy falls short when compared with more recent models (Table 3). For those more accurate models, CAMP is about 1.4 and 3.6 times faster than CACE38 and NequIP17, respectively, on the water system of 192 atoms. More running speed data of other models are provided in Table S1 in the SI.

Small organic molecules

The MD17 dataset consists of AIMD trajectories of several small organic molecules8. For each molecule, we trained CAMP on 950 random samples, validated on 50 samples, and the rest were used for testing (dataset and training details in Methods). MAEs of energy and forces by CAMP and other MLIPs are listed in Table 4. While NequIP17 generally maintains high performance, CAMP demonstrates high competitiveness. It surpasses NequIP in energy predictions for three out of seven molecules and ranks second in both energy and force predictions across various cases.

Table 4 Model performance on the MD17 dataset

We further tested the ability of CAMP to maintain long-time stable MD simulations. Following the benchmark study in ref. 47, 9500, 500, and 10,000 molecules were randomly sampled for training, validation, and testing, respectively. With the trained models, we performed MD simulations at 500 K for 300 ps and examined the stability by measuring the change in bond length during the simulations (see Methods). Figure 5 presents the results for Aspirin and Ethanol (numerical values in Table S2 in the SI). In general, a small forces MAE does not guarantee stable MD simulations47, as seen from a comparison between ForceNet and GemNet-T. GemNet-T has a much smaller MAE of forces, but its stability is not as good as ForceNet. For CAMP, its MAE of forces is slightly larger than that of, e.g. GemNet-T and SphereNet, but it can maintain stable MD simulations for the entire simulation time of 300 ps, a significant improvement over most of the existing models.

Fig. 5: Comparison of MAE of forces, MD stability, and running speed on small molecules.
figure 5

The forces MAE, stability, and running speed are normalized by their maximum values for easier visualization. All numerical values are provided in Table S2 in the SI. Error bars indicate the standard deviation from five MD runs using different starting atomic positions. Computational speed is measured on a single NVIDIA V100 GPU.

In terms of running speed, similar patterns are observed here as those reported in the water system. Models based on simple scalar features such as SchNet10 and DeepPot-SE52 are faster than those based on tensorial features, although they are far less accurate (Fig. 5). For both Aspirin and Ethanol, CAMP can complete about 33 steps per second in MD simulations on a single NVIDIA V100 GPU, which is approximately 1.2, 1.4, and 3.9 times faster than GemNet-T53, SphereNet54, and NequIP17, respectively.

The MD17 dataset was adopted due to the excellent benchmark study by Fu et al.47, which provides a consistent basis for comparing different models. However, it has been observed that the MD17 dataset contains significant numerical errors, prompting the introduction of the rMD17 dataset to address this issue55. We evaluated CAMP on the aspirin and ethanol molecules from the rMD17 dataset, and it demonstrates competitive performance. While CAMP has errors higher than those of leading spherical models like NequIP17, Allegro56, and MACE16, it outperforms models such as GAP7, FCHL57, ACE35, and PaiNN13. Detailed results are presented in Table S3 of the SI.

Two-dimensional materials

In addition to periodic systems and small organic molecules, we evaluate the performance of CAMP on partially periodic 2D materials. These materials exhibit distinct physical and chemical properties. Despite their significance, to the best of our knowledge, no standardized benchmark dataset exists for MLIPs for 2D materials. We have thus constructed a new DFT dataset of bilayer graphene, building upon our previous investigations of carbon systems40. See Methods for detailed information on this new dataset.

Widely used empirical potentials for carbon systems, such as AIREBO58, AIREBO-M59, and LCBOP60 have large errors in predicting both the energy and forces of bilayer graphene. The MAEs on the test set against DFT results are on the order of hundreds of meV/atom for energy and meV/Å for forces (Table 5). This is not too surprising given that: first, these potentials are general-purpose models for carbon systems, which are not specifically designed for multilayer graphene systems; second, their training data were experimental properties and/or DFT calculations using different density functionals from the test set. MLIPs such as hNN40 (a hybrid model that combines BPNN6 and Lennard–Jones61), which was trained on the same data as used here, can significantly reduce the errors to 1.4 meV/atom for energy and 46.0 meV/Å for forces. CAMP can further drive the MAEs down to 0.3 meV/atom for energy and 6.3 meV/Å for forces.

Table 5 Results on the bilayer graphene dataset

It is interesting to investigate the interlayer interaction between the graphene layers, which controls many structural, mechanical, and electronic properties of 2D materials62,63. Here, we focus on the energetics in different stacking configurations: AB, AA, and saddle point (SP) stackings (Fig. 6a). Empirical models such as LCBOP cannot distinguish between the different stacking states at all. The interlayer energy versus layer distance curves are almost identical between AB and AA (Fig. 6b), and the generalized stacking fault energy surface is nearly flat (not shown), with a maximum value on the order of 0.01 meV/atom. This is also the case for AIREBO and AIREBO-M (we refer to ref. 40 for plots). On the contrary, CAMP and hNN can clearly distinguish between the stackings. Both the interlayer energy versus layer distance curves and the generalized stacking fault energy surface agree well with DFT references, and CAMP has a slightly better prediction of the energy barrier (ΔESP-AB) and the overall energy corrugation (ΔEAA-AB). The hNN model is specifically designed as an MLIP for 2D materials, and its training process is complex, requiring separate training of the Lennard–Jones and BPNN components. In contrast, CAMP does not require such special treatment and is straightforward to train.

Fig. 6: Interlayer energetics of bilayer graphene.
figure 6

a Bilayer graphene in AB, AA, and saddle point (SP) stacking, where the blue solid dots represent atoms in the bottom layer, and the green hollow dots represent atoms in the top layer. b Interlayer energy Eint versus layer distance d of bilayer graphene in AB and AA stacking. The dots represent DFT results. The layer distance is shifted such that Δd = dd0, where d0 = 3.4 Å is the equilibrium layer distance. c Generalized stacking fault energy of bilayer graphene. This is obtained by sliding the top layer against the bottom layer at a fixed layer distance of d0, where Δa1 and Δa2 indicate the shifts of the top layer against the bottom layer, along the lattice vectors a1 and a2, respectively.

Discussion

In this work, we develop CAMP, a new class of MLIP that is based on atomic moment tensors and operates entirely in Cartesian space. CAMP is designed to be physically inspired, flexible, and expressive, and it can be applied to a wide range of systems, including periodic structures, small organic molecules, and 2D materials. Benchmark tests on these systems demonstrate that CAMP achieves performance surpassing or comparable to current leading models based on spherical tensors in terms of accuracy, efficiency, and stability. It is robust and straightforward to train, without the need to tune a large number of hyperparameters. In all the tested systems, we only need to set four hyperparameters: the number of channels u, the maximum tensor rank \({v}_{\max }\), the number of layers T, and the cutoff radius rc to achieve good performance.

CAMP is related to existing models in several ways. As shown in ref. 37, ACE35 and MTP36 can be viewed as the same model, with the former constructed in spherical space while the latter in Cartesian space. Both are single-layer models without iterative feature updates. Loosely speaking, MACE16 and CAMP can be regarded as a generalization of the spherical ACE and the Cartesian MTP, respectively, to multilayer GNNs with iterative feature updates and refinement. The atomic moment in Eq. (5) and hyper moment in Eq. (7) are related to the A-basis and B-basis in ACE and MACE. The recent CACE model is related to MACE and CAMP. Both MACE and CACE are based on atomic cluster expansion, but MACE implements this expansion in spherical space while CACE builds it in Cartesian space. Both CAMP and CACE develop features in Cartesian space, but CAMP uses atomic moments while CACE employs atomic cluster expansion. Moreover, CAMP generalizes existing Cartesian tensor models such as TeaNet18 and TensorNet19 that use at most second-rank tensors to tensors of arbitrary rank. Despite these connections, CAMP is unique in its design of the atomic moment and hyper moment, and the selection rules that govern the tensor contractions. These characteristics make CAMP a physically inspired, flexible, and expressive model.

Beyond energy and forces, CAMP can be extended to model tensorial properties. The atom features used in CAMP are symmetric Cartesian tensors, which can be used to output tensorial properties with slight modifications to the output block. For example, NMR tensors can be modeled by selecting and summing the scalar, vector, and symmetric rank-2 tensor components of the hyper moments, rather than using only the scalar component for potential energies. However, the current implementation of CAMP in PyTorch has certain limitations. All feature tensors are stored in full form, without exploiting the fact that they are symmetric. In addition, it is possible to extend CAMP to use irreducible representations of Cartesian tensors (basically symmetric traceless tensors), analogous to spherical irreducible representations used in spherical tensor models. Leveraging these symmetries and irreducible representations could further enhance model accuracy and improve time and memory efficiency. These are directions for future work.

Methods

Dataset

The LiPS dataset17 consists of 250001 structures of lithium phosphorus sulfide (Li6.75P3S11) obtained from an AIMD trajectory. Each structure has 83 atoms, with 27 Li, 12 P, and 44 S. The data are randomized before splitting into training, validation, and test sets. The water dataset39 consists of 1593 water structures, each with 192 atoms, generated with AIMD simulations at 300 K. The data are randomly split into training, validation, and test sets with a ratio of 90:5:5. The MD17 dataset8 consists of AIMD trajectories of several small organic molecules. The number of data points for each molecule ranges from 133,000 to 993,000. Our new bilayer graphene dataset is derived from ref. 40, with 6178 bilayer graphene configurations from stressed structures, atomic perturbations, and AIMD trajectories. The data were generated using DFT calculations with the PBE functional64, and the many-body dispersion correction method65 was used to account for van der Waals interactions. The data are split into training, validation, and test sets with a ratio of 8:1:1.

Model training

The models are trained by minimizing a loss function of energy and forces. For an atomic configuration \({\mathcal{C}}\), the loss is

$$l(\theta ;{\mathcal{C}})={w}_{{\rm{E}}}{\left(\frac{E-\hat{E}}{N}\right)}^{2}+{w}_{{\rm{F}}}\frac{\mathop{\sum }\nolimits_{i = 1}^{N}\parallel {{\boldsymbol{F}}}_{i}-{\hat{{\boldsymbol{F}}}}_{i}{\parallel }^{2}}{3N},$$
(13)

where N is the number of atoms, E and \(\hat{E}\) are the predicted and reference energies, respectively, and Fi and \({\hat{{\boldsymbol{F}}}}_{i}\) are the predicted and reference forces on atom i, respectively. Both the energy weight wE and force weight wF are set to 1. The total loss to minimize at each optimization step is the sum of the losses of multiple configurations,

$$L(\theta )=\mathop{\sum }\limits_{j=1}^{B}l(\theta ;{{\mathcal{C}}}_{j}),$$
(14)

where B is the mini-batch size.

The CAMP model is implemented in PyTorch66 and trained with PyTorch Lightning67. We trained all models using the Adam optimizer68 with an initial learning rate of 0.01, which is reduced by a factor of 0.8 if the validation error does not decrease for 100 epochs. The training batch size and allowed maximum number of training epochs vary for different datasets. Training is stopped if the validation error does not reduce for a certain number of epochs (see Table S4 in the SI). We use an exponential moving average with a weight 0.999 to evaluate the validation set as well as for the final model.

Regarding model structure, Chebyshev polynomials of degrees up to Nβ = 8 are used for the radial basis functions. Other hyperparameters include the number of channels u, maximum tensor rank \({v}_{\max }\), number of layers T, and cutoff distance rcut. Optimal values of these hyperparameters are searched for each dataset, and typical values are around u = 32, \({v}_{\max }=3\), T = 3, and rcut = 5 Å. These result in small, parameter-efficient models with fewer than 125k parameters. Detailed hyperparameters for each dataset are provided in Table S5 in the SI.

Diffusivity

The diffusivity can be computed from MD simulations via the Einstein equation, which relates the MSD of particles to their diffusion coefficient:

$$D=\mathop{\lim }\limits_{t\to \infty }\frac{\left\langle \frac{1}{N}\mathop{\sum }\nolimits_{i}^{N}{\left\vert {{\boldsymbol{r}}}_{i}(t)-{{\boldsymbol{r}}}_{i}(0)\right\vert }^{2}\right\rangle }{2nt},$$
(15)

where the MSD is computed as the ensemble average (denoted by 〈 〉) using diffusing atoms, ri(t) represents the position of atom i at time t, N is the total number of considered diffusion atoms, n denotes the number of dimensions (three here), and D is the diffusion coefficient. To solve for the diffusion coefficient D, we employ a linear fitting approach as implemented in ASE69, where D is obtained as the slope of the MSD versus 2nt.

The diffusivity of Li+ in LiPS was computed from MD simulations under the canonical ensemble using the Nosé–Hoover thermostat. Simulations were performed at a temperature of 520 K for a total of 50 ps, with a timestep of 0.25 fs, consistent with the settings reported in the benchmark study in ref. 47. We similarly computed the diffusivity of water; five simulations were performed at 300 K, each using a timestep of 1 fs and running for a total of 50 ps.

Stability criteria

For periodic systems, stability is monitored from the RDF g(r) such that a simulation becoming “unstable” at time t when47

$$\mathop{\int}\nolimits_{0}^{\infty }\parallel {\langle g(r)\rangle }_{t}^{t+\tau }-\langle g(r)\rangle \parallel \,\,\text{d}\,r > \Delta ,$$
(16)

where 〈 〉 denotes the average using the entire trajectory, \({\langle \cdot \rangle }_{t}^{t+\tau }\) denotes the average in a time window [t, t + τ], and Δ is a threshold. In other words, when the difference of the area under the RDF obtained from the time window and the entire trajectory exceeds Δ, the simulation is considered unstable.

This criterion cannot be applied to characterize the stability of MD simulations where large structural changes are expected, such as in phase transitions or chemical reactions. However, for the LiPS system studied here, no such events are expected, and the RDF-based stability criterion is appropriate. We adopted τ = 1 ps and Δ = 1, as proposed in ref. 47.

For molecular systems, the stability is monitored through the bond lengths, and a simulation is considered “unstable” at time T when47

$$\mathop{\max }\limits_{(i,j)\in {\mathcal{B}}}| {r}^{ij}(T)-{b}^{ij}\,|\, >\, \Delta ,$$
(17)

where \({\mathcal{B}}\) denotes the set of all bonds, Δ is a threshold, rij(T) is the bond length between atoms i and j at time T, and bij is the corresponding equilibrium bond length, computed as the average bond length using the reference DFT data. For the MD17 dataset, we adopted Δ = 0.5 Å as in ref. 47, and the MD simulations were performed at 500 K for 300 ps with a timestep of 0.5 fs, using the Nosé–Hoover thermostat.