Introduction

Machine Learning Interatomic Potentials (MLIPs) are powerful tools for modeling interatomic interactions. They can achieve near-quantum mechanical accuracy at a fraction of the computational cost1 and linear scaling with the number of particles. Thus, highly scalable and accurate MLIPs can enable precise simulations of larger systems and allow computational studies of complex phenomena that would otherwise be computationally inaccessible2,3,4,5. Particularly, equivariant Graph Neural Network (GNN) MLIPs such as Allegro6 and MACE7 are highly expressive models and can learn potential energy surfaces end-to-end from data8. Therefore, these models generalize well even for chemically highly diverse datasets9,10. However, strictly local MLIPs, which assume that atomic interactions are dominated by their immediate environment1, and message-passing MLIPs, which propagate information beyond the immediate environment11, cannot model interactions beyond a strict effective cutoff radius12. Thus, these effectively local MLIPs can accurately capture short-range interactions6, but cannot capture long-range electrostatic interactions, charge transfer, and dispersion effects3,13.

Without additional mechanisms to address long-range interactions, the effective locality of most MLIPs greatly limits their application to many real-world scenarios1,14. Long-range interactions are crucial in several key physical phenomena, including molecular aggregation, protein folding, or the behavior of ionic liquids15,16. For instance, in proteins, long-range interactions have been shown to play a significant role in their structure and function, with most residues participating in such interactions15,17, some of which can theoretically span up to 15 Å18. As a result, even MLIPs with near quantum-level accuracy for short-range interactions would be unable to fully capture the behavior of such proteins without additional schemes considering these long-range effects.

The challenge of modeling long-range effects is a recognized obstacle in developing MLIPs1,19. Thus, several approaches to incorporate long-range effects into GNN MLIPs have been proposed. Reciprocal space methods model long-range interactions by processing structural20 or learned features12,21 in Fourier space. However, these methods based on lattice vectors have limited generalizability, as they struggle with differently oriented structures or other supercells and cannot easily be applied to simulations in realistic conditions2. The Euclidean Fast Attention (EFA) scheme2 overcomes this limitation and respects relevant physical symmetries but requires integrating possible lattice orientations over the unit sphere, increasing computational costs. Methods such as Long-Short-Range Message-Passing, RANGE, and Erwin aim to model long-range interactions by improving the efficiency of message passing by utilizing coarse-grained or hierarchical representations22,23,24. However, these methods are not directly generalizable to bulk systems.

On the other hand, physics-driven approaches have been proposed. The simplest approaches treat short-range interactions with MLIPs separately from long-range interactions. Therefore, long-range contributions, such as van der Waals25 or electrostatic interactions26,27, are subtracted from MLIP training data and added to MLIP predictions. However, long-range interactions often correlate with the immediate environments of atoms. For example, capturing long-range interactions through electrostatic effects requires atomic charges that depend on the dynamic chemical environment. Therefore, methods have since emerged that predict charges28,29 or directly model long-range interactions30, using features of the local environment. Still, these point charge-based methods generally assume chemical locality, which becomes problematic in systems where non-local effects dominate31. Moreover, direct charge prediction often requires additional correction schemes to ensure charge conservation and prevent unphysical behavior28,29. Thus, traditional methods fail to account for, e.g., local bonding environments and the global electrostatic landscape32. The Charge Equilibration Neural Network (CENT)32 was introduced by Ghasemi et al. and later adapted by Ko et al. and Shaidu et al. to address these challenges in coupling long-range and short-range effects. The CENT method globally distributes charges via the Charge Equilibration method (Qeq) based on electrostatic features of the local environment extracted via a Behler-type neural network26. Therefore, the CENT method correlates short- and long-range effects. Moreover, explicitly predicting charge distributions can be advantageous, as charge transfer can be observed, and systems in external electric fields can be simulated using predicted charges. Still, the CENT method does not explicitly account for short-range non-electrostatic interactions. To overcome this issue, the fourth-generation high-dimensional neural network potentials (4GHDNNPs) comprise a second neural network, modeling short-ranged interactions dependent on the charge state3. This method accurately captures global charge distributions in simple systems with non-local effects but requires training an additional neural network on ambiguously defined reference point charges3,13. Thus, alternative methods predict electrostatic features alongside the short-range corrections using a single Behler-type neural network and a more fidelity Qeq scheme without reference charges13 or replace the Qeq method with a self-consistent method to represent electrostatic interactions using well-defined Maximally Localized Wannier Function Centers33. Still, these methods require multi-step training procedures and employ Behler-type neural networks relying on hand-crafted descriptors. Therefore, they are not simple to generalize to chemically diverse datasets8.

Previous machine-learning approaches are often costly, hard to scale, or not simple to generalize to chemically diverse systems. Thus, this work introduces the Charge Equilibration Layer for Long-range Interactions (CELLI), a novel architectural building block for equivariant GNN MLIPs. By generalizing the Qeq method to chemically diverse systems, CELLI enables MLIPs to model long-range interactions and condition short-range interactions on the local charge environment via learned representations. In a series of experiments with crucial charge-transfer and charge-state dependence3, we show that integrating CELLI with Allegro6 and MACE7 can overcome the inherent locality of state-of-the-art MLIPs. Moreover, on the OE62 dataset34, we demonstrate that CELLI can generalize across a more diverse chemical space while only marginally increasing computational costs. In addition, we employ CELLI on subsets of the SPICE dataset35 to prove that it can produce stable molecular dynamics simulations.

Results

Charge equilibration layer for long-range interactions (CELLI)

With the Charge Equilibration Layer for Long-range Interactions (CELLI), we introduce the classical non-local Qeq method to recent expressive equivariant GNN architectures. Similar to Ko et al., we split the total potential energy U = UCoul + ΔU into an electrostatic component UCoul and a correction ΔU. Following the CENT approach32, CELLI leverages the Qeq method, accounting for non-local charge transfer, to compute partial charges Q and electrostatic energy UCoul (Subsection “Charge equilibration method (Qeq)”) using features of the GNN. Subsequently, the GNN learns the correction ΔU dependent on non-local features provided through the equilibrated partial charges embedded by CELLI. Thus, instead of learning charges and the potential energy through separate NNs3, CELLI enables flexible integration of Qeq-based charge prediction and non-local interaction modeling within a single model.

We describe CELLI using the example of the Allegro architecture6, visualized in Fig. 1. Allegro is a strictly local equivariant GNN that learns scalar features \({{\boldsymbol{x}}}_{ij}^{l}\) and tensorial features Vij for the directed edges ij of a graph through a sequence of L tensor-product layers (outlined in Subsection “Graph neural networks”), which we call Interaction Layers in the following.

Fig. 1: Charge equilibration layer for long-range interactions (CELLI).
figure 1

Left: CELLI updates scalar latent features \({{\boldsymbol{x}}}_{ij}^{l}\) from the previous layer l using node species Zi and covalent radii γi. σ+ denotes the generalized softplus function, an elementwise multiplcation, Σk the sum over all incident edges, and \ a split in the feature dimension. The dashed line denotes an optional connection. Right: CELLI included in the strictly local Allegro architecture. Allegro predicts the total energy U = UCoul + ΔU and partial charges Q using scalar \({{\boldsymbol{x}}}_{ij}^{l}\) and tensorial \({V}_{ij}^{l}\) edge features of an input graph with node positions R, node species Z, and connectivity \({\mathcal{N}}\). denotes a weighted residual update. Radii \({\gamma }_{i}^{\exp }\) are only used by CELLI.

To extend Allegro, we insert one instance of CELLI at a location between the Interaction Layers. CELLI embeds and applies the Qeq method to learn long-range electrostatic interactions and partial charges using the latent scalar features. The latent scalar features encode many-body information of the environment to an order increasing with the number of previous Interaction Layers6. As global charge transfer might affect the local electronic structure and thus the local many-body interactions3, CELLI embeds the charge environment into the latent features \({{\boldsymbol{x}}}_{ij}^{l+1}\) passed to the following Interaction Layers. The following Interaction Layers can correlate charge information from neighbor atoms up to an order determined by the number of subsequent Interaction Layers. Thus, the placement of CELLI within the Interactions Layers determines the body-order (additionally, the receptive field for not strictly local models such as MACE36) of the local environment description passed to the Qeq method and the local charge environment description learned by the following Interaction Layers. Finally, the readout layer uses the latent features to predict per-edge energies, summing up to the correction potential ΔU.

Environment embedding

The latent features from the previous l Interaction Layers encode scalar descriptions of the local edge environments. We use a multi-layer perceptron \({{\rm{MLP}}}_{{\mathcal{R}}}\) to predict a per-edge contribution to the electronegativity, and optionally the hardness \(\left({\tilde{\chi }}_{ij},{\tilde{J}}_{ij}^{{\mathcal{R}}}\right)={{\rm{MLP}}}_{{\mathcal{R}}}\left({{\boldsymbol{x}}}_{ik}^{l}\right)\). Summing up the contributions of all directed edges from i to j, multiplied by a species-invariant learnable factor f, yields the electronegativity \({\chi }_{i}=f{\sum }_{k\in {\mathcal{N}}(i)}{\tilde{\chi }}_{ij}\) of particle i.

Species embedding

We encode the particle species to an environmentally independent contribution to the hardnesses \({\tilde{J}}_{i}^{Z}\). To ensure positive hardnesses, we use the generalized soft plus activation function denoted as \({\sigma }_{+}({x}_{1},\ldots ,{x}_{k})=\log (1+\exp ({x}_{1})+\ldots +\exp ({x}_{k}))\) to combine the species-dependent with the environment-dependent hardness contributions. Therefore, we obtain the particle hardness \({J}_{i}={\sigma }_{+}\left({\tilde{J}}_{i}^{Z},{\sum }_{j\in {\mathcal{N}}(i)}{\tilde{J}}_{ij}^{{\mathcal{R}}}\right)\). Appropriate radii can crucially determine whether the optimization converges. Therefore, we base the charge radii on single-bond covalent radii γi37 and learn a positive species-dependent scaling factor \({\tilde{s}}_{i}\) to obtain \({\gamma }_{i}=\frac{{\sigma }_{+}({\tilde{s}}_{i})}{\log (2)}{\gamma }_{i}^{\exp }\). Additionally, we embed species as features ci(Zi) for later use in the charge embedding.

Charge equilibration method (Qeq)

The charge equilibration method takes the environment-dependent electronegativities, species or environment-dependent hardnesses, and species-dependent charge radii to predict partial charges and the coulombic potential (Subsection “Charge equilibration method (Qeq)”). As different systems require different treatments of the long-range electrostatic interactions, the method optionally employs, e.g., the Smooth Particle Mesh method38 for periodic systems.

Charges embedding and latent feature update

We embed the equilibrated charge environment into the scalar features to provide non-local information to the network. Therefore, a first multi-layer perceptron MLPQ generates charge-dependent features yij = MLPQ(Qi, Qj, ci, cj) using the equilibrated charges and species embedding ci of the central and neighbor atoms. These charge-dependent features yij are used by a second multi-layer-perceptron MLPx to update the scalar features from the previous layer \({{\boldsymbol{x}}}_{ij}^{l}\) to \({{\boldsymbol{x}}}_{ij}^{l+1}={{\rm{MLP}}}_{{\boldsymbol{x}}}\left({{\boldsymbol{y}}}_{ij},{{\boldsymbol{x}}}_{ij}^{l}\right){p}_{{\rm{env}}}\left(\left\Vert {{\boldsymbol{R}}}_{i}-{{\boldsymbol{R}}}_{j}\right\Vert \right)\), where penv is a polynomial envelope function.

Benchmark systems with strictly local models

First, we test our approach on four benchmark systems introduced by Ko et al. These systems were constructed to be unsolvable by strictly local and charge-independent methods, and allow for visual inspection to verify that the model does not exhibit unphysical behavior, and include: Carbon Chains, Silver Clusters, Sodium Chloride Clusters and Gold Dimers on MgO(001) surface (Methods, Section “Systems and datasets”). They have also been used in related works, enabling a direct comparison with other approaches based on strictly local models3,13,30.

In almost all cases, we observed significant improvements over the models presented by Ko et al., Shaidu et al., Kim et al. (Table 1). This improvement might arise due to the use of equivariant GNNs instead of Behler-Parrinello Neural Networks utilized in 4GHDNN-based models and a learned embedding of the charge environment. In the Supplementary Information, we performed additional studies indicating that environment charge embedding compared to local charge embedding (see Supplementary Table 3) and inserting CELLI in a central position in the model (see Supplementary Table 4) in most cases yields the model with the highest accuracy. Moreover, in most cases—except for the AuMgO system, where CACE-LR employed an increased cutoff—we achieved substantial accuracy gains not only over the baseline Allegro model, where our RMSE is in some case several orders of magnitude better, but also over the non-4GHDNN-based CACE-LR model, which captures long-range effects solely through local feature augmentation. These results suggest that the improved performance stems from a novel integration of environment-dependent charges via the Qeq mechanism into the equivariant GNN framework, enabling accurate modeling of long-range interactions. Notably, the largest error reductions were achieved with CELLI when using environment-dependent hardnesses, suggesting that this variant should be preferred in future applications. However, in some cases, the improved charge predictions from the environment-dependent hardness version did not lead to substantial changes in force or energy accuracy, indicating that prioritizing charge accuracy may not always result in a better overall model.

Table 1 Root mean square errors (RMSE) in units of meV/atom, meV/Å, and me, for CELLI applied to strictly local Allegro model in comparison to the baseline Allegro model and the previous local descriptor methods 4G3, LRSR13, and CACE-LR30 modeling long-range interactions

In addition, we conducted experiments on these systems to verify whether the model could accurately represent long-range effects and the resulting changes in the PES. Allegro consistently fails these tests, showing significantly higher errors and producing unphysical results (see Fig. 2). In contrast, CELLI yields predictions that closely match DFT calculations and align with theoretical expectations. The results confirm CELLI’s ability to model can effectively capture long-range charge transfer and electrostatics (carbon chains, NaCl clusters), handle differences between charged states (silver clusters), and accurately model energies and forces in charge-sensitive environments (gold dimers on MgO(001) surfaces). These findings underscore CELLI’s strength in representing critical phenomena in diverse systems.

Fig. 2: Long-range and charge-dependent interactions benchmarks.
figure 2

a Visualization of three benchmark systems used in experiments: gold dimers on a MgO(001) surface, positively and negatively charged silver clusters, and sodium chloride clusters. In the gold dimers, colors represent atom types (Ag - yellow, Mg - green, O - red). For the other systems, colors visualize partial charges (red corresponding to positive and blue to negative charge). b Predicted minimum energy conformations for charged silver clusters. The baseline Allegro fails to distinguish between charge states, while CELLI-enhanced Allegro closely matches the DFT predicted minimum energy conformations. c Relative energies for \({{\rm{Na}}}_{8}{{\rm{Cl}}}_{8}^{+}\) and \({{\rm{Na}}}_{9}{{\rm{Cl}}}_{8}^{+}\) clusters as a function of the Na-Na distance along a predefined path (indicated by arrow). CELLI-enhanced Allegro closely reproduces DFT energy profiles and correctly identifies distinct minima for the two charge states, unlike baseline Allegro. d Predicted bond energies for a gold dimer on an MgO(001) surface in the upright (non-wetting) geometry, with and without Al doping. CELLI-enhanced Allegro matches DFT results for both cases, while baseline Allegro fails to differentiate between doped and undoped substrates.

Long-range interactions for message-passing models

To demonstrate that CELLI is applicable beyond strictly local architectures, we integrate it into the message-passing network MACE7. Using the scalar node features \({h}_{i}^{(l)}\) and bessel radial basis edge embeddings erbf,ij (see Methods, Section “Graph neural networks”), we construct edge features \({x}_{ij}^{(0)}=({p}_{{\rm{env}}}({r}_{ij}){h}_{i}| | {e}_{{\rm{rbf}},ij})\), where penv is the envelope function of the model and is a concatenation. The output \({x}_{ij}^{(1)}\) from CELLI is aggregated to a weighted residual update of the scalar node features \({h}_{i}^{(l+1)}={h}_{i}^{(l)}+\varepsilon {\sum }_{j\in {\mathcal{N}}(i)}{x}_{ij}^{(1)}\), where ε is a learnable weight for message aggregation.

We assess the effect of CELLI using the benchmarks from the previous section, excluding the silver clusters due to already being fully contained in the receptive field of the strictly local Allegro model. Additionally, we compare our results to SpookyNet31 (Table 2).

Table 2 Root mean square errors (RMSE) in units of meV/atom, meV/Å, and me, for CELLI applied to the message-passing model MACE vs. baseline MACE and SpookyNet31

CELLI significantly enhances the performance of the baseline MACE model, in fact, in some cases it achieves errors almost ten times lower. Both SpookyNet and CELLI(6) achieve particularly low errors compared to models with two message-passing steps, likely due to their deeper architectures with six message-passing layers, which allow them to capture complex interactions even in smaller systems. However, this depth may limit their applicability to larger systems, as using many message-passing layers can become computationally impractical6.

Interestingly, while CELLI tends to produce significantly larger errors in partial charge predictions compared to SpookyNet, these discrepancies do not consistently correlate with errors in energy or force predictions. This may reflect differences in how each model utilizes charge information and is consistent with previous findings that highlight limitations of models based on charge partitioning schemes13. This observation highlights the decoupling between partial charge prediction accuracy and overall energy/force performance, reinforcing the idea that incorporation of charge partitioning schemes may constrain model expressiveness13. Such a behavior could be due to several factors: first, the predicted charges are solutions to a constrained quadratic minimization problem; second, partial charges do not fully capture the underlying electronic structure relevant to molecular energetics; and third, the incorporation of predefined physical equations, while interpretable, may reduce model flexibility.

Overall, this comparison underscores CELLI’s effectiveness in modeling systems with non-local interactions, even within message-passing neural networks.

Generalization to chemically diverse systems

The previous benchmarks consist of relatively small and simple systems. Therefore, they cannot demonstrate the generalizability across a wide chemical space and the advantageous scalability of our scheme with the size of the system. To this end, we included the OE62 dataset34, which allows us to evaluate the performance of our model on larger and more diverse systems of varying sizes, providing a complementary assessment to the simpler benchmarks focused on single structures with different charge states. In addition, we evaluate the scalability of CELLI by measuring forward-pass times across systems with varying atom counts. We compare three versions of CELLI (two using Allegro and one using MACE) against four Allegro baselines, which differ in model size and the inclusion of an additional tensor product layer. We also include various MPNN architectures, including models that incorporate other long-range correction schemes: Ewald and Neural P3M12,21 as well as baseline DimeNet++ with hyperparameters as in Kosmala et al. (Detailed Results in Supplementary Table 1).

Our results show that CELLI outperformed the baseline Allegro models (Table 3) with only a marginal increase in computational cost (Fig. 3). Notably, even the small CELLI variant outperformed the largest Allegro baseline model, demonstrating that applying the Qeq scheme is more effective than merely increasing model size, as certain effects cannot be captured without an appropriate long-range correction method. In fact introduction of CELLI to strictly local Allegro model makes its performance comparable to MPNN architectures with state-of-the-art long-range correction schemes. Moreover, CELLI combined with Allegro not only improves upon baseline models but also achieves results comparable with state-of-the-art Dimnet++ with Neural P3M and significantly outperforms PaiNN models with Ewald and Neural P3M corrections, which have substantially more parameters and message-passing steps. Reducing the number of parameters and memory requirements of the models helps avoid memory-related issues39 in large-scale simulations. Moreover, CELLI’s compatibility with a strictly local baseline model could increase its potential to scale efficiently across multiple GPUs39. In the case of MACE, the improvement over the baseline is noticeably smaller, which is likely due to good performance of baseline and presence of message passing. While CELLI exhibits a slightly higher MAE than DimeNet++, it achieves a lower RMSE and substantially improved computational efficiency, reducing runtime by approximately a factor of two (Fig. 3). The discrepancy between the RMSE and MAE results could be due to more outliers in DimeNet++. Additionally, the MACE model has a higher potential to achieve high scalability on multi-GPU simulations than DimeNet++ and PaiNN, due to fewer message passing steps. Thus, CELLI offers significant improvements for highly local baseline models with only a marginal increase in computational cost, which is mainly determined by the underlying architecture. Therefore, CELLI promises efficient and accurate MD simulations of large, complex structures.

Table 3 Summary of accuracy for all trained models (cutoff 0.6 nm) on the OE62 dataset compared to other long-range modeling approaches
Fig. 3: Computational cost of CELLI.
figure 3

Average forward pass runtime per structure for the smallest Allegro and MACE models in the baseline (B) and CELLI (C) extended variants, and Dimnet++ for different systems with different numbers of atoms. To obtain a more reliable average, several structure sizes were binned together. a Shows runtime per structure for a single sample, b shows the optimal runtime per structure for batch sizes [1, 10, 25, 50, 100]. Allegro and MACE exhibit significantly better performance for larger structures with a marginal impact of CELLI.

It is worth noting that in both the carbon chain benchmark and the OE62 dataset, CELLI combined with Allegro performs worse than the baseline MACE model without long-range corrections. These datasets feature small organic molecules where electrostatic interactions and long-range effects are not as pronounced as in other benchmark cases. Additionally, the OE62 dataset consists of ground-state geometries, further reducing the relevance of dynamic charge redistribution. Since these systems are largely dominated by local interactions, introducing message passing steps can improve the model more than CELLI, particularly when electrostatics play a limited role. Nevertheless, this case shows that CELLI can generalize to diverse chemical spaces and allows for a comparison to different models and long-range correction schemes.

Verifying simulation stability

Performing MD simulations requires models to be stable for many timesteps. To validate the robustness of CELLI, we perform a series of MD simulations at ambient conditions (see Method section). Therefore, we train a baseline and a CELLI-enhanced Allegro on the SPICE dataset. We selected this dataset because it provides forces for non-equilibrium low and high-energy structures. Therefore, the dataset promotes model stability by providing much information about conformations encountered in MD simulations, compared to the OE62 dataset, which contains only minimum energy structures.

Replacing one interaction layer by CELLI reduced the energy and force mean absolute errors from 15.5 meV/atom to 9.4 meV/atom and from 81.4 meV/Å to 72.5 meV/Å. In the MD simulations, none of the 16 selected structures suffered from instabilities such as broken bonds or overlapping particles for both Allegro variants. Additionally, we analyzed the physical interpretability of the model by comparing the predicted electronegativities and hardnesses against computed references40 and the predicted atomic radii against experimental references37 (see Supplementary Fig. 2). The predictions correlate well with literature values within a physically reasonable margin (see Supplementary Note 1). Therefore, CELLI efficiently increases simulation accuracy at high efficiency for chemically diverse systems, capturing physically meaningful interactions without introducing artifacts for samples unseen in training.

Discussion

This paper presents CELLI, a model-agnostic building block introducing the established Qeq method into highly descriptive equivariant GNN MLIPs. Using equivariant GNNs, CELLI can propose accurate parameters for chemically highly diverse environments. Through the Qeq method, CELLI integrates information about long-range electrostatic interactions and charge transfer into effectively local MLIPs. Therefore, CELLI offers a solution to the long-standing challenge of accurately modeling long-range interactions with MLIPs for chemically diverse systems and applications.

In a series of benchmark cases, we showed that strictly and effectively local MLIPs struggle with modeling long-range electrostatic effects and charge-state dependence. These models can effectively learn complex electrostatic environments through CELLI, significantly enhancing their predictive accuracy and physical validity. Moreover, we showed that CELLI can generalize to chemically diverse datasets and large molecules, marginally increasing the computational costs of the baseline model. Furthermore, in a series of molecular dynamics simulations, we demonstrated that CELLI provides robust predictions for samples unseen in training, which is crucial to running long and stable simulations.

Our method addresses crucial limitations of existing methods to model long-range interactions. On the one hand, by leveraging highly expressive equivariant GNNs, CELLI does not rely on hand-crafted descriptors as used in Behler-Parrinello type Neural Networks26. Thus, CELLI-enhanced models can be trained end-to-end, making the Qeq approach applicable for modeling large and complex chemical systems. Moreover, end-to-end trained CELLI-enhanced models can learn representations for the charge environment, which is crucial to achieve state-of-the-art accuracies for strictly local GNNs. On the other hand, it is also significantly more cost-effective and generally applicable than other proposed machine learning methods. For example, CELLI does not require artificially defined periodicity and anisotropy as seen in lattice-based methods2,12,21, but can be applied to systems with an arbitrary number of periodic dimensions. Additionally, CELLI can be applied to strictly local GNNs, which are highly parallelizable across multiple GPUs39. Therefore, CELLI’s flexibility, in combination with high accuracy, generalizability, and efficiency, makes it ideal to run large-scale and accurate molecular dynamics simulations of complex systems under strict computational cost constraints.

In the feature, we plan to interface CELLI with the large-scale molecular dynamics simulation framework LAMMPS41. LAMMPS provides efficient algorithms for charge equilibration and enables running molecular dynamics simulations in parallel on multiple GPUs. Therefore, this integration would simplify deploying CELLI to large-scale simulations. Moreover, we plan to extend CELLI with other physics-based priors1,42,43, which might further reduce its costs while increasing accuracy and robustness. Finally, we plan to assess CELLI’s capabilities in predicting simulation observables, such as IR spectra in vacuum44 and under electric fields45, which require accurate modeling of dynamics and electrostatics.

Methods

Graph neural networks

Molecular systems can be represented as graphs by describing atoms as nodes and defining edges between neighboring atoms within a fixed cutoff radius, allowing GNNs to learn atom-centered representations. In the first step, GNNs embed this graph, assigning initial features \({{\boldsymbol{h}}}_{i}^{0}\) to the nodes and features xij to the edges from atom species Zi and atom displacements. Subsequently, GNNs encode the graph by iteratively updating edge and node features that are finally read out to obtain node, edge, and graph property predictions.

The popular class of Message-passing neural networks (MPNNs) class, first formalized by Gilmer et al., encodes the graph by iteratively performing message-passing

$${{\boldsymbol{m}}}_{i}^{l+1}=\sum _{j\in {\mathcal{N}}(i)}{{\mathcal{M}}}^{l}({{\boldsymbol{h}}}_{i}^{l},{{\boldsymbol{h}}}_{j}^{l},{{\boldsymbol{x}}}_{ij}),$$
(1)
$${{\boldsymbol{h}}}_{i}^{l+1}={{\mathcal{U}}}^{l}({{\boldsymbol{m}}}_{i}^{l+1},{{\boldsymbol{h}}}_{i}^{l}),$$
(2)

where \({{\mathcal{M}}}^{l}\) and \({{\mathcal{U}}}^{l}\) are learnable functions of the layer l. As messages \({{\boldsymbol{m}}}_{i}^{l}\) contain information from all graph neighbors \(j\in {\mathcal{N}}(i)\) of a particle i, MPNNs pass information of each atom’s neighborhood along the graph. Therefore, message-passing gradually expands the atom’s receptive field and enables the capture of many-body correlations6,46.

Unfortunately, this information propagation complicates parallelized implementations of GNNs, e.g., in large-scale atomistic MD frameworks such as LAMMPS41. Therefore, strictly local architectures such as Allegro6 have been proposed. Conceptually, Allegro updates the directed edge features through the steps

$${{\boldsymbol{w}}}_{ij}^{l+1}=\sum _{k\in {\mathcal{N}}(i)}{{\mathcal{W}}}^{l}\left({{\boldsymbol{x}}}_{ij}^{l},{{\boldsymbol{x}}}_{ik}^{l}\right),$$
(3)
$${{\boldsymbol{x}}}_{ij}^{l+1}={{\mathcal{U}}}^{l}\left({{\boldsymbol{w}}}_{ij}^{l+1},{{\boldsymbol{x}}}_{ij}^{l+1}\right),$$
(4)

where \({{\boldsymbol{w}}}_{ij}^{l+1}\) contains information from all edges that originate from the same node. Corresponding to the message-passing framework, the function \({{\mathcal{W}}}^{l}\) encodes information about the environment of an edge into the update function \({{\mathcal{U}}}^{l}\). However, as two directed edges between nodes can contain different information (\({{\boldsymbol{x}}}_{ij}^{l}\ne {{\boldsymbol{x}}}_{ji}^{l}\)), no information is passed along the graph.

Efficient computation of electrostatic interactions

Electrostatic effects are commonly approximated by coulombic interactions. For a system of N charges Q with Gaussian density, located at the centers of the particles R, the coulombic interaction potential is

$${U}_{{\rm{Coul}}}({\boldsymbol{R}},{\boldsymbol{Q}})=\mathop{\sum }\limits_{i}^{N}\mathop{\sum }\limits_{j > i}^{N}\frac{{\rm{erf}}({\alpha }_{ij}{r}_{ij})}{{r}_{ij}}{Q}_{i}{Q}_{j}+\mathop{\sum }\limits_{i=1}^{N}\frac{2{\alpha }_{ii}}{\sqrt{\pi }}{Q}_{i}^{2},$$
(5)

where \({\alpha }_{ij}=\frac{1}{\sqrt{2}}{({\gamma }_{i}^{2}+{\gamma }_{j}^{2})}^{-1/2}\) depends on the radii γi of the charges separated by a distance \({r}_{ij}=\left\Vert | {{\boldsymbol{R}}}_{i}-{{\boldsymbol{R}}}_{j}\right\Vert\)32. These interactions can extend over larger distances as the interaction decays approximately with the factor 1/r. Moreover, the contributions from distant charges must be accurately captured without truncation or oversimplification1. Therefore, coulombic interactions are more challenging to model efficiently than, e.g., short-ranged van-der-Waals interactions.

Nevertheless, classical approaches have been proposed to model long-range interactions efficiently without computing direct pairwise interactions beyond a small cutoff. Essentially, these methods decompose the interaction potential into a rapidly decaying short-range part and a smooth but slowly decaying long-range part. The methods then treat the short-range part directly like other short-range interactions. However, as the long-range part still accounts for contributions from distant charges, a more efficient computation requires a different treatment. For example, the Fast Multipole Method47 hierarchically groups particles and computes distant interactions between these clusters collectively to achieve a O(N) scaling with respect to the number of particles. Especially for periodic systems, the Smooth Particle Mesh Ewald (SPME) method38 computes long-ranged interactions more efficiently in the reciprocal space. Similar to the short-ranged part in real space, the long-ranged part decays quickly in the reciprocal space and can be truncated without losing accuracy. Additionally, by mapping charges to a grid leveraging B-spline interpolation for smooth gradients and employing fast Fourier transforms, it achieves a computational complexity of \(O(N\log N)\). Notably, SPME is not limited to periodic systems but can be generalized to systems with partial or fully non-periodic boundary conditions, e.g., to treat isolated clusters48.

Charge equilibration method (Qeq)

Several approaches can compute long-ranged electrostatic interactions accurately and efficiently in many systems. Nevertheless, these interactions must be adequately parametrized for the respective systems by assigning partial charges to the atoms. Assigning fixed partial charges can introduce significant errors due to charge transfer induced by changes in the chemical environment32. Therefore, methods with dynamic partial charge assignment are necessary to accurately model molecular systems with significant electrostatic interactions.

To model environment-dependent partial charges, the Charge Equilibration (Qeq) method40 proposes to redistribute charges in the system to minimize the total energy while maintaining charge conservation \(\mathop{\sum }\nolimits_{i = 1}^{N}{Q}_{i}={Q}_{{\rm{tot}}}\). In the Qeq method, the contribution of charges to the total energy

$${U}_{{\rm{Qeq}}}({\boldsymbol{R}},{\boldsymbol{Q}})={U}_{{\rm{Coul}}}({\boldsymbol{R}},{\boldsymbol{Q}})+\mathop{\sum }\limits_{i=1}^{N}\left[{\chi }_{i}{Q}_{i}+\frac{{J}_{ii}}{2}{Q}_{i}^{2}\right]$$
(6)

consists of the coulombic interaction between charges UCoul given in Eq. (5) and a second-order approximation of the charge-core interaction determined by the electronegativities χi and chemical hardnesses Ji. Due to the form of the coulombic interaction, the charge energy is quadratic in Q. Consequently, the minimum of UQeq is the solution of the linear system

$$\left[{\left.\frac{{\partial }^{2}{U}_{{\rm{Coul}}}}{\partial {Q}_{i}\partial {Q}_{j}}\right| }_{{\boldsymbol{R}}}+{J}_{ii}\right]{Q}_{j}=-{\chi }_{i},$$
(7)

subject to the charge conserving equality constraint 1TQ = Qtot. For smaller systems, direct linear solvers can determine the optimal charges within a short runtime. However, due to the cubic scaling with the number of particles O(N3), several other approaches have been proposed to solve the system in quadratic49 or quasi-linear time50, leveraging efficient treatments of long-range interactions outlined in Section “Efficient computation of electrostatic interactions”.

Systems and datasets

The benchmark datasets for long-range and electrostatic interactions comprise four organic and inorganic systems with up to four different species in free and periodic boundary conditions3. For each system, DFT computations of energies and forces were obtained with the PBE functional, while charges were generated with Hirshfeld population analysis. The datasets are available at https://doi.org/10.24435/materialscloud:f3-yh.

The first benchmark system consists of neutral and charged carbon chains. C10H2 is a neutral linear chain of carbons terminated with hydrogen atoms, while \({{\rm{C}}}_{10}{{\rm{H}}}_{3}^{+}\) is obtained by protonating one end of the chain, leading to global charge redistribution. This system highlights how a given model accounts for long-range charge transfer caused by local perturbations.

The second benchmark involves triangular and linear silver trimers (Ag3) with total charges of +1 and −1, respectively. These systems test the model’s ability to handle differences in charge states, geometries, and identification of energetically favorable conformations.

We also evaluated the sodium chloride clusters benchmark, consisting of \({{\rm{Na}}}_{8}{{\rm{Cl}}}_{8}^{+}\) and \({{\rm{Na}}}_{9}{{\rm{Cl}}}_{8}^{+}\). In these systems, moving a sodium atom along a predefined path reveals two distinct energy minima, which are sensitive to long-range electrostatics and charge redistribution and can demonstrate the model’s ability to accurately predict changes in the potential energy surface.

The final benchmark is a periodic system consisting of a gold dimer (Au2) adsorbed on MgO(001) surfaces, both undoped and Al-doped. Two configurations were considered: “wetting,” where both Au atoms lie near Mg atoms, and “non-wetting,” where one Au atom binds to an O atom while the other remains farther away. These configurations assess the model’s ability to capture adsorption energies and forces in charge-sensitive periodic settings.

The OE62 dataset provides a diverse benchmark for the evaluation of our model, as it consists of 62,000 organic molecules extracted from the Cambridge Structural Database, with DFT-optimized geometries at the PBE level, including van der Waals corrections34. OE62 spans a broad chemical space, with up to 174 atoms and 16 elements, offering a comprehensive test case for assessing the scalability and generalizability of models on chemically complex and diverse systems. The dataset is available at https://doi.org/10.14459/2019mp1507656.

The SPICE dataset51 (v2.0.1) spans a large chemical space of peptides and drug-like molecules consisting of 17 different chemical species, including low and high-energy conformations and systems with non-zero net charge. Each sample provides DFT computed energies and forces using the ωB97M − D3(BJ) functional with dispersion correction and the def2-TZVPPD basis set as well as MBIS charges. For this paper, we selected the Amino Acid Ligand, PubChem Sets, DES370K, DES Monomers, and Dipeptides subsets. The full dataset is available at https://doi.org/10.5281/zenodo.10975225.

Model optimization

We performed all experiments in the deep-learning framework JAX using chemtrain52 to train the models. Therefore, we adapted JAX-MD53, and JAX compatible implementations of Allegro54 and DimeNet++55,56, and MACE7,57.

The reference energies in the datasets contain large negative shifts. Therefore, we shift the reference energies U by species-dependent constant shifts Us to obtain the target energies

$${\hat{U}}_{i}={U}_{i}-\mathop{\sum }\limits_{s=1}^{S}{U}_{s}{N}_{s,i}$$
(8)

where Ns,i counts the occurrences of species s in sample i. We determined the shifts Us through a ridge-regression fit to the dataset.

We train the models via the Force Matching method52,58. Therefore, we optimize the parameters θ to minimize the loss function

$${\mathcal{L}}(\theta )=\frac{1}{D}\mathop{\sum }\limits_{i=1}^{D}\left[{\gamma }_{U}{\left\Vert {U}_{\theta }({{\boldsymbol{R}}}_{i})-{\hat{U}}_{i}\right\Vert }^{2}+\frac{{\gamma }_{F}}{3{N}_{i}}{\left\Vert {{\boldsymbol{F}}}_{\theta }({{\boldsymbol{R}}}_{i})-\hat{{\boldsymbol{F}}}\right\Vert }^{2}+\frac{{\gamma }_{Q}}{{N}_{i}}{\left\Vert {{\boldsymbol{Q}}}_{\theta }({{\boldsymbol{R}}}_{i})-\hat{{\boldsymbol{Q}}}\right\Vert }^{2}\right]$$
(9)

between the reference values \(\hat{U},\hat{{\boldsymbol{F}}},\hat{{\boldsymbol{Q}}}\) and the model predictions U, F, Q for D samples R of the training dataset via stochastic optimization using the ADAM optimizer59 and a polynomial step-size schedule with weight decay. The parameters γU, γF, γQ balance the contributions of the targets to the loss and are set problem-specific. We monitor the convergence by empirically estimating the loss on a disjoint validation split and select the parametrization θ that yielded the lowest error on the validation split.

Hyperparameters

Model cutoffs were chosen similar to Ko et al. for the four benchmark systems and to Kosmala et al. for the OE62 dataset (Supplementary Table 2). In the four benchmark systems and for the SPICE dataset, CELLI is replaced by an additional interaction layer to obtain the baseline Allegro model. For the OE62 dataset, CELLI is excluded without replacement (S, L) or replaced by an additional Interaction Layer (S+, L+) to obtain the Allegro baseline variants. For the MACE model, CELLI is always excluded without replacement. DimeNet++ hyperparameters are similar to Kosmala et al., except for the loss function, which is chosen to comply with Eq. (9).

Timing OE62 forward passes

We evaluate the forward pass run times for all models on a single NVIDIA A100. For evaluating the computational performance, we partition the training split at [25, 50, 75, 115, 174] atoms per molecule and choose the 5000 largest structures. For each subset, we choose the maximum number of edges and triplets to account for the maximum required by any sample in the subset. We then time the forward pass for batch sizes of [1, 10, 25, 50, 100] for the ahead-of-time compiled model.

Simulating SPICE systems

We run MD simulations for 16 different systems drawn equally from the four different subsets PubChem Sets, Amino Acid Ligands, Dipeptides, and DES370K. Starting from a randomly selected conformation from the testing set, we run simulations for 1 ns with a step size of 0.5 fs at 300 K using a stochastic thermostat60 with a friction coefficient of 100 ps−1. Therefore, we perform 2 million update steps per model.