Ab initio calculation of real solids via neural network ansatz

Li, Xiang; Li, Zhe; Chen, Ji

doi:10.1038/s41467-022-35627-1

Download PDF

Article
Open access
Published: 22 December 2022

Ab initio calculation of real solids via neural network ansatz

Nature Communications volume 13, Article number: 7895 (2022) Cite this article

9514 Accesses
49 Citations
25 Altmetric
Metrics details

Subjects

Abstract

Neural networks have been applied to tackle many-body electron correlations for small molecules and physical models in recent years. Here we propose an architecture that extends molecular neural networks with the inclusion of periodic boundary conditions to enable ab initio calculation of real solids. The accuracy of our approach is demonstrated in four different types of systems, namely the one-dimensional periodic hydrogen chain, the two-dimensional graphene, the three-dimensional lithium hydride crystal, and the homogeneous electron gas, where the obtained results, e.g. total energies, dissociation curves, and cohesive energies, reach a competitive level with many traditional ab initio methods. Moreover, electron densities of typical systems are also calculated to provide physical intuition of various solids. Our method of extending a molecular neural network to periodic systems can be easily integrated into other neural network structures, highlighting a promising future of ab initio solution of more complex solid systems using neural network ansatz, and more generally endorsing the application of machine learning in materials simulation and condensed matter physics.

Solving quasiparticle band spectra of real solids using neural-network quantum states

Article Open access 21 May 2021

A machine learning based classifier for topological quantum materials

Article Open access 30 December 2024

A systematic approach to generating accurate neural network potentials: the case of carbon

Article Open access 14 April 2021

Introduction

Solving the many-body electronic structure of real solids from ab initio is one of the grand challenges in condensed matter physics and materials science¹. More accurate ab initio solutions can push the limit of our understanding of many fundamental and mysterious emergent phenomena, such as superconductivity, light–matter interaction, and heterogeneous catalysis, to name just a few². The current workhorse method is density functional theory (DFT), whose accuracy depends quite sensitively on the choice of the so-called exchange-correlation functional and unfortunately there lacks a systematic routine towards the exact^3,4. Other commonly used ab initio quantum chemistry methods, such as the coupled-cluster and configuration interaction theories⁵, can provide more accurate solutions for molecules but face severe difficulty when applied to solid systems due to their high computational complexity. Recently, several breakthroughs have been made in applying these quantum chemistry methods on solids^6,7, driving the study of solid systems towards a new frontier.

Meanwhile, in the last few years, many attempts to tackle the correlated wavefunction problem in molecules or model Hamiltonians using neural network-based approaches have been reported by different groups^{8,9,10,11,12,13,14,15,16}. The key idea is to use the neural network as the wavefunction ansatz in quantum Monte Carlo (QMC) simulations. The stochastic nature of QMC enables a considerably economical time scaling and efficient parallelization^6,17,18,19. The universal approximation theorem behind neural network-based ansatz significantly improves the accuracy of the traditional QMC method. This strategy has been proved successful in studying small molecules^10,11,12,13 in the first and second quantization, and solids in the second quantization¹⁴. However, how to apply such neural network ansatz for real solids in continuous space, and whether it can describe the long-range electron correlations in extended systems remain as open questions.

Here we propose a powerful periodic neural network ansatz for solids, which combines periodic distance features²⁰ with existing molecule neural networks¹⁰. Based on that, we develop a highly efficient QMC method for ab initio calculation of real solid and general periodic systems with high accuracy. We apply our method to periodic hydrogen chains, graphene, lithium hydride (LiH) crystals, and homogeneous electron gas. These systems cover a wide range of interests, including materials dimension from one to three, electronic structures from metallic to insulating, and bonding types from covalent to ionic. Standard techniques are employed to reduce finite-size errors. The calculated dissociation curve, cohesive energy and correlation energy, can be compared satisfactorily with available experimental values and other state-of-the-art computational approaches. Electron densities of typical systems are further calculated to test our neural network and explore the underlying physics. All the results demonstrate that our method can achieve accurate electronic structure calculations of solid/periodic systems. In parallel to our work, refs. 21, 22 also developed periodic versions of neural networks to study the homogeneous electron gas system and obtained high-accuracy results. A more detailed comparison is discussed in the following sections.

Results

Neural network for a solid system

Periodicity and anti-symmetry are two fundamental properties of the wavefunction of a solid system. The anti-symmetry can be ensured by the Slater determinant, which has been commonly used as the basic block in molecular neural networks. We also approximate the wavefunction by two Slater determinants of one spin-up channel and one spin-down channel,

$$\psi ({{{{{{{\bf{r}}}}}}}})={{{{{{{{\rm{Det}}}}}}}}}_{\uparrow }\left[{e}^{i{{{{{{{\bf{k}}}}}}}}\cdot {{{{{{{{\bf{r}}}}}}}}}_{\uparrow }}{u}_{{{{{{{{\rm{mol}}}}}}}}}^{\uparrow }(d)\right]{{{{{{{{\rm{Det}}}}}}}}}_{\downarrow }\left[{e}^{i{{{{{{{\bf{k}}}}}}}}\cdot {{{{{{{{\bf{r}}}}}}}}}_{\downarrow }}{u}_{{{{{{{{\rm{mol}}}}}}}}}^{\downarrow }(d)\right].$$

(1)

In this regard, our ansatz resembles the structure of FermiNet^10,11, whereas other neural network wavefunction ansatz may include extra terms in addition to the Slater determinants¹². Each determinant is then constructed from a set of periodic orbitals, which inherits the physics captured by the generalized collective Bloch function formed by a product of phase factor e^ik⋅r and collective molecular orbital u_mol. The generalized many-body Bloch function incorporates electron correlations and goes beyond single-electron approximation¹⁸.

Figure 1 displays more details on the structure of our neural network. Building an efficient and accurate periodic ansatz is the key step in developing ab initio methods for solids. Here we have followed the recently proposed scheme of Whitehead et al. to construct a set of periodic distance features d(r)²⁰ using lattice vectors in real and reciprocal space (a_i, b_i),

$$d({{{{{{{\bf{r}}}}}}}})= \, \frac{\sqrt{{{{{{{{\bf{A}}}}}}}}{{{{{{{\bf{M}}}}}}}}{{{{{{{{\bf{A}}}}}}}}}^{T}}}{2\pi },\, {{{{{{{\bf{A}}}}}}}}=({{{{{{{{\bf{a}}}}}}}}}_{1},\, {{{{{{{{\bf{a}}}}}}}}}_{2},\, {{{{{{{{\bf{a}}}}}}}}}_{3}),\\ {{{{{{{{\bf{M}}}}}}}}}_{ij}= \, {f}^{2}({\omega }_{i}){\delta }_{ij}+g({\omega }_{i})g({\omega }_{j})(1-{\delta }_{ij}),\, {\omega }_{i}={{{{{{{\bf{r}}}}}}}}\cdot {{{{{{{{\bf{b}}}}}}}}}_{i}.$$

(2)

The periodic metric matrix M is constructed by periodic functions f, g, which retain ordinary distances at the origin and regulate them to periodic ones at far distances, ensuring asymptotic cusp form, continuity, and periodicity requirement at the same time.

**Fig. 1: Sketch of neural network architecture.**

The constructed periodic distance features d(r) can then be fed into molecular neural networks to form collective orbitals u_mol. Specifically, in this work, we represent the molecular networks with FermiNet¹⁰, which incorporates electron–electron interactions. The inclusion of all-electron features promotes the traditional single-particle orbitals to the collective ones, and hence the description of wavefunction and correlation effects can be improved while fewer Slater determinants are required. In addition, the wavefunction of solid systems is necessarily complex-valued, and we introduce two sets of molecular orbitals to represent the real and imaginary parts of the solid wavefunction, respectively. The plane-wave phase factors e^ik⋅r in Fig. 1 are used to construct the Bloch function-like orbitals, and the corresponding k points are selected to minimize the Hartree–Fock (HF) energy.

Based on the variational principle, our neural network is trained using the variational Monte Carlo (VMC) approach. To efficiently optimize the network, a Kronecker-factored curvature estimator (KFAC) optimizer²³ implemented by DeepMind team²⁴ is modified and adopted, which significantly outperforms traditional energy minimization algorithms. Calculations are also ensured by efficient and massive parallelization on multiple nodes of high-performance GPUs. More details on the theories, methods, and computations are included in the Methods section and the supplementary information.

Hydrogen chain

Hydrogen chain is one of the simplest models in condensed matter research. Despite its simplicity, the hydrogen chain is a challenging and interesting system, serving as a benchmark system for electronic structure methods and featuring intriguing correlated phenomena²⁵. The calculated energy of the periodic H₁₀ chain as a function of the bond length is shown in Fig. 2a. The results from lattice-regularized diffusion Monte Carlo (LR-DMC) and traditional VMC are also plotted for comparison²⁵. We can see that our results nearly coincide with the LR-DMC results and significantly outperform traditional VMC (see Supplementary Table 3). In Fig. 2b, the energy of hydrogen chains of different atom numbers are calculated for extrapolation to the thermodynamic limit (TDL). The shaded bar in Fig. 2b illustrates the extrapolated energy of the periodic hydrogen chain at TDL from auxiliary field quantum Monte Carlo (AFQMC), which is considered as the current state-of-the-art along with LR-DMC. Our TDL result is comparable with both AFQMC and LR-DMC (see Supplementary Table 4).

**Fig. 2: Calculated results of neural network.**

Graphene

Graphene is arguably the most famous two-dimensional system (Fig. 2c) receiving broad attention in the past two decades for its mechanical, electronic, and chemical applications²⁶. Here we carry out simulations to estimate its cohesive energy, which measures the strength of C-C chemical bonding and long-range dispersion interactions. The calculations are performed on a 2 × 2 supercell of graphene using twist average boundary condition (TABC)²⁷ in conjunction with structure factor S(k) correction²⁸ (see Supplementary Fig. 3) to reduce the finite-size error. The calculated results are plotted in Fig. 2d along with the experimental value²⁹, and it shows that our neural network can deal with graphene very well, producing a cohesive energy of graphene within 0.1 eV/atom to the experimental reference (see Supplementary Table 6). We also plotted the results with periodic boundary conditions (PBC), namely the Γ point-only result, which deviates from the experiment data by 1.25 eV/atom.

Lithium hydride crystal

For a three-dimensional system, we consider the LiH crystal with a rock-salt structure (Fig. 2e), another benchmark system for accurate ab initio methods^6,30,31. Despite consisting of only simple elements, LiH represents typical ionic and covalent bonds upon changing the lattice constants. Using our neural network, we first simulate the equation of the state of LiH on a 2 × 2 × 2 supercell, as shown in Fig. 2f. In addition, we employ a standard finite-size correction based on Hartree–Fock calculations of a large supercell (see Supplementary Fig. 5). In Fig. 2f we also show the Birch–Murnaghan fitting to the equation of state, based on which we can obtain thermodynamic quantities such as the cohesive energy, the bulk modulus, and the equilibrium lattice constant of LiH. As shown in the inset, our results on the thermodynamic quantities agree very well with experimental data³⁰ (see Supplementary Table 8, 9).

For further validation, we have also computed directly the 3 × 3 × 3 supercell of LiH at its equilibrium length of 4.061 Å, which contains 108 electrons. To the best of our knowledge, this is the largest electronic system computed using a high-quality neural network ansatz. The 3 × 3 × 3 supercell calculation predicts the total energy per unit cell of LiH is −8.160 Hartree and the cohesive energy per unit cell is −4.770 eV after the finite-size correction (see Supplementary Table 10), which is also very close to the experimental value −4.759 eV³⁰.

Homogeneous electron gas

In addition to the solids containing nuclei, our computational framework can also apply straightforwardly to model systems such as homogeneous electron gas (HEG). HEG has been studied for a long time to understand the fundamental behavior of metals and electronic phase transitions³². Several seminal ab initio works have reported the energy of HEG at different densities^{21,22,32,33,34,35}. Recently two other works have extended neural network ansatz to study HEG^21,22. Although our computational framework is independently designed for solids, the network structure between this work and refs. 21, 22 employ similar ideas. Different physics-inspired envelope functions and periodic features are used in these works, which suit the features of solids and homogeneous electron gas respectively. We make comparisons between these networks and ours on HEG, and observe consistent performance, which further proves the effectiveness of neural network-based QMC works. In this section, we present the results calculated on a simple cubic cell containing 54 electrons in a closed-shell configuration, the largest HEG system studied in this work (Fig. 2g). More results and comparisons with other works on smaller systems are discussed in the section “Network comparison” and Supplementary Table 13.

Figure 2h shows our calculated correlation error on the 54-electrons HEG at six different densities from r_s = 0.5 Bohr to 20.0 Bohr. The state-of-the-art results, namely VMC with backflow correlation (BF)³³, distinguishable cluster with double excitations (DCD)³⁴, and transcorrelated full configuration interaction quantum Monte Carlo (TC-FCIQMC)³⁵ are also plotted for comparison, and BF-DMC result is often used as the reference energy of correlation error. Overall, our neural network performs very well, with an error of less than 1% in a wide range of density (see Supplementary Table 14). Generally, the correlation error increases as the density of HEG decreases when the correlation effects become larger, which also appears in DCD calculations.

Electron density

Besides the total energy of solid systems, the electron density is also a key quantity to be calculated. For example, the electron density is crucial for characterizing different solids, such as the difference between insulators and conductors, and the distinct nature of ionic and covalent crystals. In DFT the one-to-one correspondence between many-body wavefunction and electron density proved by Hohenberg and Kohn in 1964 suggests that electron density is a fundamental quantity of materials. However, an interesting survey found that while new functionals in DFT improve the energy calculation the obtained density somehow can deviate from the exact³⁶. Here, with our accurate neural network wavefunction, we can also obtain accurate electron density of solids and provide a valuable benchmark and guidance for method development.

A conductor features free-moving electrons, which would have macroscopic movements under external electric fields. In contrast, electrons are localized and constrained in insulators and cause considerable electron resistance. In Fig. 3, as an example, we show the calculated electron density of the hydrogen chain at two different bond lengths. As we can see, for the compressed hydrogen chain (L = 2 Bohr), the electron density is rather uniform and has small fluctuations. As the chain is stretched, the electrons are more localized and the density profile has larger variations. The observation is consistent with the well-known insulator-conductor transition on the hydrogen chain by varying the bond length. To measure the transition more quantitatively, we further calculate the complex polarization Z as the order parameter for insulator-conductor transition³⁷. A conducting state is characterized by a vanishing complex polarization modulus ∣Z∣ ~0, while an insulating state has a finite ∣Z∣ ~1. We can see that the insulator-conductor transition bond length of the hydrogen chain is around 3 Bohr according to the calculated results, which is also consistent with the previous studies³⁷.

**Fig. 3: Electron density of H₁₀ chains.**

Ionic and covalent bonds are the most fundamental chemical bonds in solids. While the physical pictures of these two types of bonding are very different, they both lie in the behavior of electrons as the “quantum glue" and electron density distribution is a simple way to visualize different bonding types. Here we choose to calculate the electron density of the diamond-structured Si, rock-salt NaCl and LiH crystals at their equilibrium position. They are representative of covalent and ionic crystals, and have also been investigated by other high-level wavefunction methods, e.g., AFQMC³⁸. Note that in the calculations of NaCl and Si, correlation-consistent effective core potential (ccECP) is employed to reduce the cost, which removes the inertia of core electrons and keeps the behavior of active valence electrons^15,39.

The electron density of diamond-structured Si in its $(01\bar{1})$ plane is plotted in Fig. 4b. We can see that valence electrons are shared by the nearest Si atoms, forming apparent Si-Si covalent bonds. In contrast, valence electrons are located around atoms in NaCl crystal as Fig. 4c shows. All the valence electrons are attracted around Cl atoms, forming effective Na⁺ and Cl⁻ ions in the crystal. Moreover, the electron density of LiH crystal is also calculated and plotted in Fig. 4d. LiH crystal is a moderate system between a typical ionic and covalent crystal. According to the result, the electrons are nearly equally distributed near Li and H atoms for our network. Detailed Bader charge analysis⁴⁰ manifests the atoms in the crystal become Li^0.67+ and H^0.67− ions, respectively (resolution ~0.015 Å), which slightly deviates from the stable closed-shell configuration (see Supplementary Note 7 for more details).

Network comparison

In refs. 21, 22, neural networks are also used to simulate homogeneous electron gas system, employing a different choice of periodic feature function. In Fig. 5 we plot the correlation error computed on the 14-electrons HEG system, which can be compared with the results of other works. We can see that all three networks can go beyond the BF-DMC level for high-density systems. For all systems tested, our correlation errors are about 2% with the TC-FCIQMC result as the reference³⁵, whereas the results of refs. 21, 22 are within 1%. It is understandable that the networks of refs. 21, 22 are specially designed for HEG systems, so slightly better accuracy can be achieved than our network. In their works, multiple phase factors e^ik⋅r are used in the constructed orbitals, which improve the expressiveness of the network. In comparison, our network contains an additional exponential decay term, which simulates the attraction between atoms and electrons in solids containing nuclei (see Methods section for more details). Furthermore, the choice of periodic distance, as well as the domains of the constructed wavefunction (complex or real-valued), are also different in these three works, which may add differences to their performance. In the future, it would be interesting to combine the insights learned from these three works and design a better network ansatz for periodic systems.

**Fig. 5: Correlation error of 14-electrons HEG system at different r_s.**

Metallic lithium

We have also carried out preliminary calculations on metallic lithium. The real metal system remains a notoriously difficult task for accurate wavefunction approaches^{7,41,42,43,44}. The zero gap of metal leads to a discontinuity in the Brillouin zone integral. As a consequence, a significantly larger simulation cell is required for metals than insulators to reach the thermodynamic limit. Shortcut approaches to simulate metals are proposed via employing a special twist angle^7,43, which helps to reduce the simulation size and finite-size error. Here we employ our network to simulate lithium with a body-centered cubic (bcc) structure, which is a typical metal with zero gap. A 2 × 2 × 2 conventional cell of bcc-Li at Γ point is employed (see Supplementary Table 11). In Supplementary Table 12, we list the total energy and the cohesive energy computed. As expected, the error in cohesive energy of lithium with such a limited supercell is larger than in non-metallic solids such as LiH, and further developments are desired to treat the large finite-size errors in metal.

Discussion

The construction of a wavefunction for solid systems is a crucial but unsolved problem in the neural network community. The core mechanism of our neural network is the use of the periodic distance feature, which promotes molecule neural networks elegantly to the corresponding periodic ones and avoids time-consuming lattice summation. Considering the high-accuracy results obtained in this work, our neural network can be further applied to study more delicate physics and materials problems, such as the phase transitions of solids, surfaces, interfaces, and disordered systems, to name just a few. Our ansatz also offers a flexible extension to other neural networks and an easy integration into traditional computational techniques. The naturally evolved many-body wavefunction from the neural network may provide more physical and chemical insights into emergent phenomena of complex materials.

For further development of neural network-based QMC, the most crucial task is to enlarge its simulation size while retaining a reasonable accuracy, which allows a more accurate simulation of metals and high-temperature superconductors. Employing pseudopotential is helpful to enlarge the simulation size¹⁵, while a better solution is a more efficient neural network, and related works are under progress.

Methods

Supercell approximation

Simulating a solid system requires solving the Schrödinger equation of many electrons within a large bulk. Supercell approximation is usually adopted to simplify the problem, considering a finite number of electrons and nuclei with periodic boundary conditions, whose Hamiltonian reads

$${\hat{H}}_{S}= \mathop{\sum}\limits_{i}-\frac{1}{2}{\Delta }_{i}+\frac{1}{2}\mathop{\sum }\limits_{{{{{{{{{\bf{L}}}}}}}}}_{S},i,j}^{{\prime} }\frac{1}{|{{{{{{{{\bf{r}}}}}}}}}_{i}-{{{{{{{{\bf{r}}}}}}}}}_{j}+{{{{{{{{\bf{L}}}}}}}}}_{S}|}\\ -\mathop{\sum}\limits_{{{{{{{{{\bf{L}}}}}}}}}_{S},i,I}\frac{{Z}_{I}}{|{{{{{{{{\bf{r}}}}}}}}}_{i}-{{{{{{{{\bf{R}}}}}}}}}_{I}+{{{{{{{{\bf{L}}}}}}}}}_{S}|}+\frac{1}{2}\mathop{\sum }\limits_{{{{{{{{{\bf{L}}}}}}}}}_{S},I,J}^{{\prime} }\frac{{Z}_{I}{Z}_{J}}{|{{{{{{{{\bf{R}}}}}}}}}_{I}-{{{{{{{{\bf{R}}}}}}}}}_{J}+{{{{{{{{\bf{L}}}}}}}}}_{S}|},$$

(3)

where r_i denotes the spatial position of ith electron in the supercell. R_I, Z_I are the spatial position and charge of the Ith nucleus and {L_S} is the set of supercell lattice vectors, which is usually a subset of primitive cell lattice vectors {L_p}. In order to simulate the real environments of electrons in solids, the interactions between the particles and their images are also included in ${\hat{H}}_{S}$, and the prime symbol in summation means i = j terms are omitted for L_S = 0.

Supercell Hamiltonian ${\hat{H}}_{S}$ is invariant under the translation of any electron by a vector in {L_S} as well as a simultaneous translation of all-electrons by a vector in {L_p}. As a consequence, two periodic conditions are required for the ground-state wavefunction ψ⁴⁵,

$$\psi ({{{{{{{{\bf{r}}}}}}}}}_{1}+{{{{{{{{\bf{L}}}}}}}}}_{p},\ldots,\, {{{{{{{{\bf{r}}}}}}}}}_{N}+{{{{{{{{\bf{L}}}}}}}}}_{p})=\exp (i{{{{{{{{\bf{k}}}}}}}}}_{p}\cdot {{{{{{{{\bf{L}}}}}}}}}_{p})\psi ({{{{{{{{\bf{r}}}}}}}}}_{1},\ldots,\, {{{{{{{{\bf{r}}}}}}}}}_{N}),\\ \psi ({{{{{{{{\bf{r}}}}}}}}}_{1}+{{{{{{{{\bf{L}}}}}}}}}_{S},\ldots,\, {{{{{{{{\bf{r}}}}}}}}}_{N})=\exp (i{{{{{{{{\bf{k}}}}}}}}}_{S}\cdot {{{{{{{{\bf{L}}}}}}}}}_{S})\psi ({{{{{{{{\bf{r}}}}}}}}}_{1},\ldots,\, {{{{{{{{\bf{r}}}}}}}}}_{N}),$$

(4)

where k_S, k_p denote the momentum vectors reduced in the first Brillouin zone of the supercell and the primitive cell, respectively. Eq. (4) and the anti-symmetry condition together form the fundamental requirements for ψ. As the size of the supercell increases, simulation results gradually converge to the thermodynamic limit of a real solid system.

Wavefunction ansatz

In conventional QMC simulation of solids, Hartree–Fock type wavefunction ansatz composed of Bloch functions is often used, which reads

$${\psi }_{{{{{{{{{\bf{k}}}}}}}}}_{S},{{{{{{{{\bf{k}}}}}}}}}_{p}}^{{{{{{{{\rm{HF}}}}}}}}}({{{{{{{\bf{r}}}}}}}})={{{{{{{\rm{Det}}}}}}}}\left | \begin{array}{ccc}{e}^{i{{{{{{{{\bf{k}}}}}}}}}_{1}\cdot {{{{{{{{\bf{r}}}}}}}}}_{1}}{u}_{{{{{{{{{\bf{k}}}}}}}}}_{1}}({{{{{{{{\bf{r}}}}}}}}}_{1})&\cdots \,&{e}^{i{{{{{{{{\bf{k}}}}}}}}}_{N}\cdot {{{{{{{{\bf{r}}}}}}}}}_{1}}{u}_{{{{{{{{{\bf{k}}}}}}}}}_{N}}({{{{{{{{\bf{r}}}}}}}}}_{1})\\ \cdot &&\cdot \\ \cdot &&\cdot \\ \cdot &&\cdot \\ {e}^{i{{{{{{{{\bf{k}}}}}}}}}_{1}\cdot {{{{{{{{\bf{r}}}}}}}}}_{N}}{u}_{{{{{{{{{\bf{k}}}}}}}}}_{1}}({{{{{{{{\bf{r}}}}}}}}}_{N})&\cdots \,&{e}^{i{{{{{{{{\bf{k}}}}}}}}}_{N}\cdot {{{{{{{{\bf{r}}}}}}}}}_{N}}{u}_{{{{{{{{{\bf{k}}}}}}}}}_{N}}({{{{{{{{\bf{r}}}}}}}}}_{N})\end{array}\right | .$$

(5)

In order to satisfy Eq. (4), k_i in the determinant should lie on the grid of supercell reciprocal lattice vectors {G_S} offset by k_S within the first Brillouin zone of the primitive cell. Moreover, u_k functions in Eq. (5) should satisfy the translation invariant condition by the primitive cell lattice vectors,

$${u}_{{{{{{{{\bf{k}}}}}}}}}({{{{{{{\bf{r}}}}}}}}+{{{{{{{{\bf{L}}}}}}}}}_{p})={u}_{{{{{{{{\bf{k}}}}}}}}}({{{{{{{\bf{r}}}}}}}}).$$

(6)

Following the strategy of FermiNet¹⁰, Bloch functions in Eq. (5) can be promoted with collective distances,

$${e}^{i{{{{{{{\bf{k}}}}}}}}\cdot {{{{{{{{\bf{r}}}}}}}}}_{i}}{u}_{{{{{{{{\bf{k}}}}}}}}}({{{{{{{{\bf{r}}}}}}}}}_{i})\to {e}^{i{{{{{{{\bf{k}}}}}}}}\cdot {{{{{{{{\bf{r}}}}}}}}}_{i}}{u}_{{{{{{{{\bf{k}}}}}}}}}({{{{{{{{\bf{r}}}}}}}}}_{i};{{{{{{{{\bf{r}}}}}}}}}_{\ne i}),$$

(7)

where r_≠i denotes all the electron coordinates except r_i. These collective orbitals are constructed to achieve the equivalence of electron permutations P,

$${P}_{i,j}{u}_{{{{{{{{{\bf{k}}}}}}}}}_{i}}({{{{{{{{\bf{r}}}}}}}}}_{j};{{{{{{{{\bf{r}}}}}}}}}_{\ne j})={u}_{{{{{{{{{\bf{k}}}}}}}}}_{j}}({{{{{{{{\bf{r}}}}}}}}}_{i};{{{{{{{{\bf{r}}}}}}}}}_{\ne i}),$$

(8)

which combined with the Slater determinant ensures the anti-symmetry nature of electrons. Moreover, we use the periodic distance features d(r) in Eq. (2) to substitute ordinary ∣r∣ in the molecular neural network. The periodic functions f, g used in Eq. (2) read

$$f(\omega ) =\,|\omega|\left(1-\frac{|\omega /\pi {|}^{3}}{4}\right),\\ g(\omega ) =\, \omega \left(1-\frac{3}{2}|\omega /\pi |+\frac{1}{2}|\omega /\pi {|}^{2}\right),$$

(9)

and their arguments ω are reduced into [−π, π]. Eq. (6) can then be satisfied without causing discontinuity²⁰. The constructed periodic features {∑_ig(ω_i)a_i, d(r)} are substituted into FermiNet¹⁰ to build a periodic wavefunction. Specifically, electron-atom features h_e and electron–electron features ${{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}$ are constructed as follows,

$${{{{{{{{\bf{h}}}}}}}}}_{e}= \, \left\{{\Sigma }_{i=1}^{3}g({\omega }_{e,I}^{i})\,{{{{{{{{\bf{a}}}}}}}}}_{i}^{p},d({\omega }_{e,I})\right\},\\ {{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}= \, \left\{{\Sigma }_{i=1}^{3}g({\omega }_{e,{e}^{{\prime} }}^{i})\,{{{{{{{{\bf{a}}}}}}}}}_{i}^{S},d({\omega }_{e,{e}^{{\prime} }})\right\},$$

(10)

where ${\omega }_{e,I},\,{\omega }_{e,{e}^{{\prime} }}$ are defined as

$${\omega }_{e,I}=\, ({{{{{{{{\bf{r}}}}}}}}}_{e}-{{{{{{{{\bf{R}}}}}}}}}_{I})\cdot \left\{{{{{{{{{\bf{b}}}}}}}}}_{1}^{p},\,{{{{{{{{\bf{b}}}}}}}}}_{2}^{p},\,{{{{{{{{\bf{b}}}}}}}}}_{3}^{p}\right\},\\ {\omega }_{e,{e}^{{\prime} }}=\, ({{{{{{{{\bf{r}}}}}}}}}_{e}-{{{{{{{{\bf{r}}}}}}}}}_{{e}^{{\prime} }})\cdot \left\{{{{{{{{{\bf{b}}}}}}}}}_{1}^{S},\, {{{{{{{{\bf{b}}}}}}}}}_{2}^{S},\, {{{{{{{{\bf{b}}}}}}}}}_{3}^{S}\right\},$$

(11)

and superscripts p, S denote the primitive cell and supercell respectively. A permutation equivalent feature ${{{{{{{{\bf{f}}}}}}}}}_{e}^{\alpha }$ are further constructed from ${{{{{{{{\bf{h}}}}}}}}}_{e},\, {{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}$,

$${{{{{{{{\bf{f}}}}}}}}}_{e}^{\alpha }={{{{{{{\rm{concat}}}}}}}}({{{{{{{{\bf{h}}}}}}}}}_{e},\, {{{{{{{{\bf{g}}}}}}}}}^{\uparrow },\, {{{{{{{{\bf{g}}}}}}}}}^{\downarrow },\, {{{{{{{{\bf{g}}}}}}}}}_{e}^{\alpha,\uparrow },\, {{{{{{{{\bf{g}}}}}}}}}_{e}^{\alpha,\downarrow }),$$

(12)

where α denotes the spin index (↑, ↓). g^↑, g^↓ and ${{{{{{{{\bf{g}}}}}}}}}_{e}^{\alpha,\uparrow },\, {{{{{{{{\bf{g}}}}}}}}}_{e}^{\alpha,\downarrow }$ read

$$({{{{{{{{\bf{g}}}}}}}}}^{\uparrow },\, {{{{{{{{\bf{g}}}}}}}}}^{\downarrow }) =\left(\frac{1}{{n}^{\uparrow }}\mathop{\sum}\limits_{e}{{{{{{{{\bf{h}}}}}}}}}_{e}^{\uparrow },\frac{1}{{n}^{\downarrow }}\mathop{\sum}\limits_{e}{{{{{{{{\bf{h}}}}}}}}}_{e}^{\downarrow }\right),\\ ({{{{{{{{\bf{g}}}}}}}}}_{e}^{\alpha,\uparrow },\, {{{{{{{{\bf{g}}}}}}}}}_{e}^{\alpha,\downarrow }) =\left(\frac{1}{{n}^{\uparrow }}\mathop{\sum}\limits_{{e}^{{\prime} }}{{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}^{\alpha,\uparrow },\frac{1}{{n}^{\downarrow }}\mathop{\sum}\limits_{{e}^{{\prime} }}{{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}^{\alpha,\downarrow }\right).$$

(13)

${{{{{{{{\bf{f}}}}}}}}}_{e}^{\alpha }$ and ${{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}$ are subsequently substituted into a series of fully connected layers recursively

$${{{{{{{{\bf{h}}}}}}}}}_{e}^{l+1,\alpha } =\tanh ({{{{{{{{\bf{V}}}}}}}}}^{l}\cdot {{{{{{{{\bf{f}}}}}}}}}_{e}^{l,\alpha }+{{{{{{{{\bf{b}}}}}}}}}^{l})+{{{{{{{{\bf{h}}}}}}}}}_{e}^{l,\alpha },\\ {{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}^{l+1,\alpha,\beta } =\tanh ({{{{{{{{\bf{W}}}}}}}}}^{l}\cdot {{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}^{l,\alpha,\beta }+{{{{{{{{\bf{c}}}}}}}}}^{l})+{{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}^{l,\alpha,\beta },$$

(14)

where l denotes the number of layers, and {V_l, b_l}, {W_l, c_l} denote corresponding weight and bias of l-layer.

Functions u in Eq. (7) are built using the ${{{{{{{{\bf{h}}}}}}}}}_{e}^{L}$ from the last L-layer,

$$u={{{{{{{{\rm{Orb}}}}}}}}}^{{{{{{{{\rm{Re}}}}}}}}}\cdot {{{{{{{{\bf{h}}}}}}}}}_{e}^{L}+{{{{{{{\bf{i}}}}}}}}\times {{{{{{{{\rm{Orb}}}}}}}}}^{{{{{{{{\rm{Im}}}}}}}}}\cdot {{{{{{{{\bf{h}}}}}}}}}_{e}^{L},$$

(15)

where ${{{{{{{{\rm{Orb}}}}}}}}}^{{{{{{{{\rm{Re,Im}}}}}}}}}$ denote the weight parameters of the real part and the imaginary part respectively.

Moreover, u function is multiplied by an additional phase factor $\exp (i{{{{{{{\bf{k}}}}}}}}\cdot {{{{{{{\bf{r}}}}}}}})$, which mimics Bloch functions and encodes the occupied k-point information from HF calculation. Inspired by the tight-binding model in solid physics, a periodic-generalized envelope term $\exp [-d({{{{{{{\bf{r}}}}}}}})]$ is also added to the molecule orbitals, which considers an attractive interaction effect between atoms and electrons. The final molecule orbitals ϕ reads

$$\phi ({{{{{{{{\bf{r}}}}}}}}}_{i};{{{{{{{{\bf{r}}}}}}}}}_{\ne i})=\exp (i{{{{{{{\bf{k}}}}}}}}\cdot {{{{{{{{\bf{r}}}}}}}}}_{i})\exp [-d({{{{{{{{\bf{r}}}}}}}}}_{i})]u({{{{{{{{\bf{r}}}}}}}}}_{i};{{{{{{{{\bf{r}}}}}}}}}_{\ne i}).$$

(16)

For an overall sketch of the neural network, see section “Pseudocode of network”. Note that the distance between electrons and nuclei is omitted for the HEG system since it does not contain any nucleus. Specific hyperparameters of each system are listed in Supplementary Note 1.

Pseudocode of network

For clarity, the pseudocode of network reads below:

Require: electron positions $\{{{{{{{{{\bf{r}}}}}}}}}_{1}^{\uparrow },\, \cdots \,,\, {{{{{{{{\bf{r}}}}}}}}}_{{n}^{\uparrow }}^{\uparrow },\, {{{{{{{{\bf{r}}}}}}}}}_{1}^{\downarrow },\cdots,\, {{{{{{{{\bf{r}}}}}}}}}_{{n}^{\downarrow }}^{\downarrow }\}$

Require: nuclear positions {R_I} in the primitive cell

Require: lattice vector $\{{{{{{{{{\bf{a}}}}}}}}}_{1}^{p,S},\, {{{{{{{{\bf{a}}}}}}}}}_{2}^{p,S},\, {{{{{{{{\bf{a}}}}}}}}}_{3}^{p,S}\}$ of primitive cell and supercell

Require: reciprocal lattice vector $\{{{{{{{{{\bf{b}}}}}}}}}_{1}^{p,S},\, {{{{{{{{\bf{b}}}}}}}}}_{2}^{p,S},\, {{{{{{{{\bf{b}}}}}}}}}_{3}^{p,S}\}$ of primitive cell and supercell

Require: occupied {k_i} points offered by Hartree–Fock method

For each electron e, atom I:

${\omega }_{e,I}=({{{{{{{{\bf{r}}}}}}}}}_{e}-{{{{{{{{\bf{R}}}}}}}}}_{I})\cdot \{{{{{{{{{\bf{b}}}}}}}}}_{1}^{p},\, {{{{{{{{\bf{b}}}}}}}}}_{2}^{p},\, {{{{{{{{\bf{b}}}}}}}}}_{3}^{p}\}$

${\omega }_{e,{e}^{{\prime} }}=({{{{{{{{\bf{r}}}}}}}}}_{e}-{{{{{{{{\bf{r}}}}}}}}}_{{e}^{{\prime} }})\cdot \{{{{{{{{{\bf{b}}}}}}}}}_{1}^{S},\, {{{{{{{{\bf{b}}}}}}}}}_{2}^{S},\, {{{{{{{{\bf{b}}}}}}}}}_{3}^{S}\}$

End For

For each electron e:

${{{{{{{{\bf{h}}}}}}}}}_{e}=\{{\Sigma }_{i=1}^{3}g({\omega }_{e,I}^{i})\,{{{{{{{{\bf{a}}}}}}}}}_{i}^{p},\, d({\omega }_{e,I})\}$

${{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}=\{{\Sigma }_{i=1}^{3}g({\omega }_{e,{e}^{{\prime} }}^{i})\,{{{{{{{{\bf{a}}}}}}}}}_{i}^{S},\, d({\omega }_{e,{e}^{{\prime} }})\}$

End For

For each layer l:

${{{{{{{{\bf{g}}}}}}}}}^{l,\uparrow }=\frac{1}{{n}^{\uparrow }}{\sum }_{e}{{{{{{{{\bf{h}}}}}}}}}_{e}^{l,\uparrow }$

${{{{{{{{\bf{g}}}}}}}}}^{l,\downarrow }=\frac{1}{{n}^{\downarrow }}{\sum }_{e}{{{{{{{{\bf{h}}}}}}}}}_{e}^{l,\downarrow }$

For each electron e, spin α:

${{{{{{{{\bf{g}}}}}}}}}_{e}^{l,\alpha,\uparrow }=\frac{1}{{n}^{\uparrow }}{\sum }_{{e}^{{\prime} }}{{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}^{l,\alpha,\uparrow }$

${{{{{{{{\bf{g}}}}}}}}}_{e}^{l,\alpha,\downarrow }=\frac{1}{{n}^{\downarrow }}{\sum }_{{e}^{{\prime} }}{{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}^{l,\alpha,\downarrow }$

${{{{{{{{\bf{f}}}}}}}}}_{e}^{l,\alpha }={{{{{{{\rm{concat}}}}}}}}({{{{{{{{\bf{h}}}}}}}}}_{e}^{l,\alpha },\, {{{{{{{{\bf{g}}}}}}}}}^{l,\uparrow },\, {{{{{{{{\bf{g}}}}}}}}}^{l,\downarrow },\, {{{{{{{{\bf{g}}}}}}}}}_{e}^{l,\alpha,\uparrow },\, {{{{{{{{\bf{g}}}}}}}}}_{e}^{l,\alpha,\downarrow })$

${{{{{{{{\bf{h}}}}}}}}}_{e}^{l+1,\alpha }=\tanh ({{{{{{{{\bf{V}}}}}}}}}^{l}\cdot {{{{{{{{\bf{f}}}}}}}}}_{e}^{l,\alpha }+{{{{{{{{\bf{b}}}}}}}}}^{l})+{{{{{{{{\bf{h}}}}}}}}}_{e}^{l,\alpha }$

${{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}^{l+1,\alpha,\beta }=\tanh ({{{{{{{{\bf{W}}}}}}}}}^{l}\cdot {{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}^{l,\alpha,\beta }+{{{{{{{{\bf{c}}}}}}}}}^{l})+{{{{{{{{\bf{h}}}}}}}}}_{e,{e}^{{\prime} }}^{l,\alpha,\beta }$

End For

For each orbital i:

For each electron e, spin α:

${u}_{i,e}^{\alpha }={{{{{{{{\rm{Orb}}}}}}}}}_{i,\alpha }^{{{{{{{{\rm{Re}}}}}}}}}\cdot {{{{{{{{\bf{h}}}}}}}}}_{e}^{L}+{{{{{{{\bf{i}}}}}}}}\times {{{{{{{{\rm{Orb}}}}}}}}}_{i,\alpha }^{{{{{{{{\rm{Im}}}}}}}}}\cdot {{{{{{{{\bf{h}}}}}}}}}_{e}^{L}$

${p}_{i,e}^{\alpha }=\exp ({{{{{{{\bf{i}}}}}}}}{{{{{{{{\bf{k}}}}}}}}}_{i}\cdot {{{{{{{{\bf{r}}}}}}}}}_{e}^{\alpha })$

${{{{{{{{\rm{enve}}}}}}}}}_{i,e}^{\alpha }={\sum }_{I}{\pi }_{i}^{I,\alpha }\exp [-{\sigma }_{i}^{I,\alpha }d({\omega }_{e,I})]$

${\phi }_{i,e}^{\alpha }={p}_{i,e}^{\alpha }{u}_{i,e}^{\alpha }{{{{{{{{\rm{enve}}}}}}}}}_{i,e}^{\alpha }$

End For

$\psi={{{{{{{\rm{Det}}}}}}}}[{\phi }^{\uparrow }]{{{{{{{\rm{Det}}}}}}}}[{\phi }^{\downarrow }]$

Neural network optimization

Parameters θ within the neural network can be optimized to minimize the energy expectation value 〈E_l〉, and the gradient ∇_θ〈E_l〉 reads

$${\nabla }_{\theta }\langle {E}_{l}\rangle={{{{{{{\rm{Re}}}}}}}}[\langle {E}_{l}{\nabla }_{\theta }\ln {\psi }^{*}\rangle -\langle {E}_{l}\rangle \langle {\nabla }_{\theta }\ln {\psi }^{*}\rangle ],\\ {E}_{l}={\psi }^{-1}{\hat{H}}_{S}\psi,$$

(17)

where E_l denotes the local energy of neural network ansatz ψ. Besides energy minimization, stochastic reconfiguration optimization⁴⁶ has also been widely adopted and proved to be much more efficient, whose gradient reads

$${{{{{{{\rm{Grad}}}}}}}}=\, {F}^{-1}{\nabla }_{\theta }\langle {E}_{l}\rangle,\\ {F}_{ij}=\, {{{{{{{\rm{Re}}}}}}}}\left[\left\langle \frac{\partial \ln {\psi }^{*}}{\partial {\theta }_{i}}\frac{\partial \ln \psi }{\partial {\theta }_{j}}\right\rangle -\left\langle \frac{\partial \ln {\psi }^{*}}{\partial {\theta }_{i}}\right\rangle \left\langle \frac{\partial \ln \psi }{\partial {\theta }_{j}}\right\rangle \right].$$

(18)

In this work, we adopt a modified KFAC optimizer, which approximates F as

$$F= \, {{{{{{{\rm{Re}}}}}}}}\left[\left\langle \frac{\partial \ln {\psi }^{*}}{\partial {{{{{{{\rm{vec}}}}}}}}({W}_{l})}\frac{\partial \ln {\psi }^{T}}{\partial {{{{{{{\rm{vec}}}}}}}}({W}_{l})}\right\rangle -\left\langle \frac{\partial \ln {\psi }^{*}}{\partial {{{{{{{\rm{vec}}}}}}}}({W}_{l})}\right\rangle \left\langle \frac{\partial \ln {\psi }^{T}}{\partial {{{{{{{\rm{vec}}}}}}}}({W}_{l})}\right\rangle \right]\\= \, {{{{{{{\rm{Re}}}}}}}}\left[\langle ({a}_{l}\otimes {e}_{l}^{*}){({a}_{l}\otimes {e}_{l})}^{T}\rangle -\langle ({a}_{l}\otimes {e}_{l}^{*})\rangle {\langle ({a}_{l}\otimes {e}_{l})\rangle }^{T}\right]\\ \approx \, {{{{{{{\rm{Re}}}}}}}}\left[\langle {a}_{l}{a}_{l}^{T}\rangle \otimes \langle {e}_{l}^{*}{e}_{l}^{T}\rangle \right],$$

(19)

where W_l denotes the weight parameters of layer l, and vec means vectorized form. a_l, e_l denote the activation and sensitivity of layer l respectively. Note that activation a_l is always real-valued, which explains the absence of conjugation of a_l in the second line. The first term in the bracket of Eq. (19) is approximated as the Kronecker product of the expectation values, and the second term is omitted for simplification.

Twist average boundary condition

TABC is a conventional technique to reduce the finite-size error due to the constrained size of the supercell²⁷. It averages the contributions from different periodic images of the supercell and improves the convergence of the total energy. The formula reads

$${E}_{{{{{{{{\rm{TABC}}}}}}}}}=\frac{{\Omega }_{S}}{{(2\pi )}^{3}}{\int}_{{{{{{{{\rm{1.B.Z.}}}}}}}}}{d}^{3}{{{{{{{{\bf{k}}}}}}}}}_{S}\frac{{\psi }_{{{{{{{{{\bf{k}}}}}}}}}_{S}}^{*}{\hat{H}}_{S}{\psi }_{{{{{{{{{\bf{k}}}}}}}}}_{S}}}{{\psi }_{{{{{{{{{\bf{k}}}}}}}}}_{S}}^{*}{\psi }_{{{{{{{{{\bf{k}}}}}}}}}_{S}}},$$

(20)

where 1.B.Z. denotes the first Brillouin zone of supercell and the integral is practically approximated by a discrete sum of a Monkhorst-Pack mesh (see Supplementary Note 3.2 for more details).

Structure factor correction

Finite-size error can be further reduced via the structure factor S(k) correction²⁸, which is usually calculated to correct the exchange-correlation potential V_xc and the formula reads

$$\frac{\Delta {V}_{{{{{{{{\rm{xc}}}}}}}}}}{{N}_{e}}= \, \frac{2\pi }{{\Omega }_{S}}\mathop{\lim }\limits_{{{{{{{{\bf{k}}}}}}}}\to 0}\frac{S({{{{{{{\bf{k}}}}}}}})}{{{{{{{{{\bf{k}}}}}}}}}^{2}},\\ S({{{{{{{\bf{k}}}}}}}})= \, \frac{1}{{N}_{e}}\bigg[\langle \rho ({{{{{{{\bf{k}}}}}}}}){\rho }^{*}({{{{{{{\bf{k}}}}}}}})\rangle -\langle \rho ({{{{{{{\bf{k}}}}}}}})\rangle \langle {\rho }^{*}({{{{{{{\bf{k}}}}}}}})\rangle \bigg ],$$

(21)

where $\mathop{\lim }_{{{{{{{{\bf{k}}}}}}}}\to 0}$ is practically estimated via interpolation (see Supplementary Note 3.4 for more details).

Empirical correction formula

Empirical formulas are also commonly employed to reduce the finite-size error¹⁸, one of which reads

$${E}_{\infty }={E}_{{{{{{{{\rm{N}}}}}}}}}^{{{{{{{{\rm{Net}}}}}}}}}+\left({E}_{\infty }^{{{{{{{{\rm{HF}}}}}}}}}-{E}_{{{{{{{{\rm{N}}}}}}}}}^{{{{{{{{\rm{HF}}}}}}}}}\right).$$

(22)

The simulation size of high-accuracy methods is usually limited due to high computational costs. Hence methods with a much more practical time scale, such as HF, is usually used to give a posterior estimation of the finite-size error. All the results of LiH are corrected using this empirical formula with HF results in a cc-pVDZ basis (see Supplementary Note 4.3 for more details).

Electron density analysis

Electron density ρ(r) is defined as

$$\rho ({{{{{{{\bf{r}}}}}}}})=N\int\,{d}^{3}{{{{{{{{\bf{r}}}}}}}}}_{2}\cdots {d}^{3}{{{{{{{{\bf{r}}}}}}}}}_{N}|\psi ({{{{{{{\bf{r}}}}}}}},{{{{{{{{\bf{r}}}}}}}}}_{2},\cdots \,,{{{{{{{{\bf{r}}}}}}}}}_{N}){|}^{2},$$

(23)

and it’s practically evaluated by accumulating Monte Carlo samples of electrons on a uniform grid over the simulation cell. As for the complex polarization Z, it is defined as³⁷

$$Z=\left\langle \exp \left(i\mathop{\sum}\limits_{i}\frac{2\pi }{L}{{{{{{{{\bf{r}}}}}}}}}_{i}^{\parallel }\right)\right\rangle \,,$$

(24)

where r^∥ denotes the projection of electron coordinate along the chain direction. Moreover, Bader charge is employed to estimate the charge partition on each atom⁴⁰. The convergence test of Bader charge is shown in the Supplementary Fig. 8.

Workflow and computational details

This work is developed upon open-source FermiNet⁴⁷ and PyQMC⁴⁸ on Github, deep learning framework JAX⁴⁹ is used which supports flexible and powerful complex number calculation. Ground-state energy calculations are performed with all-electrons. Diamond-structured Si and NaCl crystal are simulated with ccECP[Ne]³⁹. The neural network is pretrained by Hartree–Fock ansatz, obtained with PySCF software⁵⁰. All the used k points are the occupied k points from Hartree–Fock calculation using Monkhorst-Pack mesh offset by k_S in cc-pVDZ basis, and the mesh size is the same as the supercell. All the expectation values for distribution ∣ψ∣² are evaluated via the Monte Carlo approach, and then the energy and wavefunction is optimized using the modified KFAC optimizer²⁴ (see Supplementary Figs. 1, 2, 4, 6, 7). The Ewald summation technique is implemented for the lattice summation in energy calculation. After training is converged, energy is calculated in a separate inference phase.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data generated in this study are provided in the Supplementary Information.

Code availability

The concrete code of this work is developed on Github (https://github.com/bytedance/DeepSolid).

References

Kohn, W. Nobel lecture: electronic structure of matter—wave functions and density functionals. Rev. Mod. Phys. 71, 1253–1266 (1999).
Article ADS CAS Google Scholar
Martin, R. M. Electronic Structure: Basic Theory and Practical Methods (Cambridge Univ. Press, 2004).
Jones, R. O. Density functional theory: Its origins, rise to prominence, and future. Rev. Mod. Phys. 87, 897–923 (2015).
Article ADS MathSciNet Google Scholar
Kirkpatrick, J. et al. Pushing the frontiers of density functionals by solving the fractional electron problem. Science 374, 1385–1389 (2021).
Article ADS CAS Google Scholar
Williams, K. T. et al. Direct comparison of many-body methods for realistic electronic Hamiltonians. Phys. Rev. X 10, 011041 (2020).
CAS Google Scholar
Booth, G. H., Grüneis, A., Kresse, G. & Alavi, A. Towards an exact description of electronic wavefunctions in real solids. Nature 493, 365–370 (2013).
Article ADS CAS Google Scholar
Mihm, T. N. et al. A shortcut to the thermodynamic limit for quantum many-body calculations of metals. Nat. Comput. Sci. 1, 801–808 (2021).
Han, J., Zhang, L. & Weinan, E. Solving many-electron Schrödinger equation using deep neural networks. J. Comput. Phys. 399, 108929 (2019).
Article MathSciNet CAS MATH Google Scholar
Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).
Article ADS MathSciNet CAS MATH Google Scholar
Pfau, D., Spencer, J. S., Alexander, G., Matthews, D. G. & Foulkes, W. M. C. Ab initio solution of the many-electron schrödinger equation with deep neural networks. Phys. Rev. Res. 2, 033429 (2020).
Article CAS Google Scholar
Spencer, J. S., Pfau, D., Botev, A. & Foulkes, W. M. C. Better, faster fermionic neural networks. Preprint at arXiv:2011.07125 (2020).
Hermann, J., Schätzle, Z. & Noé, F. Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem. 12, 891–897 (2020).
Article CAS Google Scholar
Choo, K., Mezzacapo, A. & Carleo, G. Fermionic neural-network states for ab-initio electronic structure. Nat. Commun. 11, 2368 (2020).
Yoshioka, N., Mizukami, W. & Nori, F. Solving quasiparticle band spectra of real solids using neural-network quantum states. Commun. Phys. 4, 1–8 (2021).
Article Google Scholar
Li, X., Fan, C., Ren, W. & Chen, J. Fermionic neural network with effective core potential. Phys. Rev. Res. 4, 013021 (2022).
Article CAS Google Scholar
Ren, W., Fu, W. & Chen, J. Towards the ground state of molecules via diffusion monte carlo on neural networks. Preprint at arXiv:2204.13903 (2022).
Guther, K. et al. NECI: N-electron configuration interaction with an emphasis on state-of-the-art stochastic methods. J. Chem. Phys. 153, 034107 (2020).
Article ADS CAS Google Scholar
Foulkes, W. M. C., Mitas, L., Needs, R. J. & Rajagopal, G. Quantum monte carlo simulations of solids. Rev. Mod. Phys. 73, 33 (2001).
Article ADS CAS Google Scholar
Shi, H. & Zhang, S. Some recent developments in auxiliary-field quantum monte carlo for real materials. J. Chem. Phys. 154, 024107 (2021).
Article ADS CAS Google Scholar
Whitehead, T. M., Michael, M. H. & Conduit, G. J. Jastrow correlation factor for periodic systems. Phys. Rev. B 94, 035157 (2016).
Article ADS Google Scholar
Wilson, M. et al. Wave function ansatz (but periodic) networks and the homogeneous electron gas. Preprint at arXiv:2202.04622 (2022).
Cassella, G. et al. Discovering quantum phase transitions with fermionic neural networks. Preprint at arXiv:2202.05183 (2022).
Martens, J. & Grosse, R. Optimizing neural networks with kronecker-factored approximate curvature. In Proc. 32nd International Conference on International Conference on Machine Learning 2408–2417. (JMLR.org, 2015).
Botev, A. & Martens, J. KFAC-JAX. (2022).
Motta, M. et al. Towards the solution of the many-electron problem in real materials: equation of state of the hydrogen chain with state-of-the-art many-body methods. Phys. Rev. X 7, 031059 (2017).
Google Scholar
Geim, A. K. Nobel lecture: random walk to graphene. Rev. Mod. Phys. 83, 851–862 (2011).
Article ADS CAS Google Scholar
Lin, C., Zong, F. H. & Ceperley, D. M. Twist-averaged boundary conditions in continuum quantum monte carlo algorithms. Phys. Rev. E 64, 016702 (2001).
Article ADS CAS Google Scholar
Chiesa, S., Ceperley, D. M., Martin, R. M. & Holzmann, M. Finite-size error in many-body simulations with long-range interactions. Phys. Rev. Lett. 97, 076404 (2006).
Article ADS Google Scholar
Dappe, Y. et al. Local-orbital occupancy formulation of density functional theory: application to si, c, and graphene. Phys. Rev. B 73, 235124 (2006).
Nolan, S. J., Gillan, M. J., Alfè, D., Allan, N. L. & Manby, F. R. Calculation of properties of crystalline lithium hydride using correlated wave function theory. Phys. Rev. B 80, 165109 (2009).
Article ADS Google Scholar
Binnie, S. J. et al. Bulk and surface energetics of crystalline lithium hydride: benchmarks from quantum monte carlo and quantum chemistry. Phys. Rev. B 82, 165431 (2010).
Article ADS Google Scholar
Ceperley, D. M. & Alder, B. J. Ground state of the electron gas by a stochastic method. Phys. Rev. Lett. 45, 566–569 (1980).
Article ADS CAS Google Scholar
López Ríos, P., Ma, A., Drummond, N. D., Towler, M. D. & Needs, R. J. Inhomogeneous backflow transformations in quantum monte carlo calculations. Phys. Rev. E 74, 066701 (2006).
Article ADS Google Scholar
Liao, K., Schraivogel, T., Luo, H., Kats, D. & Alavi, A. Towards efficient and accurate ab initio solutions to periodic systems via transcorrelation and coupled cluster theory. Phys. Rev. Res. 3, 033072 (2021).
Article CAS Google Scholar
Luo, H. & Alavi, A. Combining the transcorrelated method with full configuration interaction quantum monte carlo: Application to the homogeneous electron gas. J. Chem. Theory Comput. 14, 1403–1411 (2018).
Article CAS Google Scholar
Medvedev, M. G., Bushmarinov, I. S., Sun, J., Perdew, J. P. & Lyssenko, K. A. Density functional theory is straying from the path toward the exact functional. Science 355, 49–52 (2017).
Article ADS CAS Google Scholar
Stella, L., Attaccalite, C., Sorella, S. & Rubio, A. Strong electronic correlation in the hydrogen chain: a variational monte carlo study. Phys. Rev. B 84, 245117 (2011).
Article ADS Google Scholar
Chen, S., Motta, M., Ma, F. & Zhang, S. Ab initio electronic density in solids by many-body plane-wave auxiliary-field quantum monte carlo calculations. Phys. Rev. B 103, 075138 (2021).
Article ADS CAS Google Scholar
Annaberdiyev, A., Melton, C. A., Bennett, M. C., Wang, G. & Mitas, L. Accurate atomic correlation and total energies for correlation consistent effective core potentials. J. Chem. Theory Comput. 16, 1482–1502 (2020).
Article CAS Google Scholar
Tang, W., Sanville, E. & Henkelman, G. A grid-based bader analysis algorithm without lattice bias. J. Phys. Condens. Matter 21, 084204 (2009).
Article ADS CAS Google Scholar
Yao, G., Xu, J. G. & Wang, X. W. Pseudopotential variational quantum monte carlo approach to bcc lithium. Phys. Rev. B 54, 8393–8397 (1996).
Article ADS CAS Google Scholar
Sugiyama, G., Zerah, G. & Alder, B. J. Ground-state properties of metallic lithium. Phys. A Stat. Mech. Appl. 156, 144–168 (1989).
Article CAS Google Scholar
Dagrada, M., Karakuzu, S., Vildosola, VerónicaLaura, Casula, M. & Sorella, S. Exact special twist method for quantum monte carlo simulations. Phys. Rev. B 94, 245108 (2016).
Article ADS Google Scholar
Azadi, S. & Foulkes, W. M. C. Efficient method for grand-canonical twist averaging in quantum monte carlo calculations. Phys. Rev. B 100, 245142 (2019).
Article ADS CAS Google Scholar
Rajagopal, G., Needs, R. J., James, A., Kenny, S. D. & Foulkes, W. M. C. Variational and diffusion quantum monte carlo calculations at nonzero wave vectors: theory and application to diamond-structure germanium. Phys. Rev. B 51, 10591–10600 (1995).
Article ADS CAS Google Scholar
Sorella, S. Green function monte carlo with stochastic reconfiguration. Phys. Rev. Lett. 80, 4558–4561 (1998).
Article ADS CAS Google Scholar
Pfau, D.Spencer, J. S. & Contributors, FermiNet. FermiNet. (2020).
Wheeler, W. A. et al. Pyqmc: an all-python real-space quantum monte carlo module in pyscf. Preprint at arXiv:2212.01482 (2022).
Bradbury, J. et al. JAX: composable transformations of Python+NumPy programs. (2018).
Sun, Q. et al. Pyscf: the python-based simulations of chemistry framework. WIREs Comput. Mol. Sci. 8, e1340 (2018).
Article Google Scholar

Download references

Acknowledgements

The authors thank Matthew Foulkes, David Ceperley, Lucas Wagner, Gareth Conduit, Mario Motta, and Ke Liao for helpful discussions. We thank Gino Cassella for providing Hartree–Fock energies of HEG. We thank the ByteDance AML team specially for their technical and computing support. We also thank ByteDance AI-Lab LIT Group and the rest of the ByteDance AI-Lab research team for inspiration and encouragement. This work is directed and supported by Hang Li and ByteDance AI-Lab. J.C. is supported by the National Natural Science Foundation of China under Grant No. 92165101.

Author information

Authors and Affiliations

ByteDance Inc, Zhonghang Plaza, No. 43, North 3rd Ring West Road, Haidian District, Beijing, China
Xiang Li & Zhe Li
School of Physics, Interdisciplinary Institute of Light-Element Quantum Materials, Frontiers Science Center for Nano-Optoelectronics, Peking University, Beijing, 100871, P. R. China
Ji Chen

Authors

Xiang Li
View author publications
Search author on:PubMed Google Scholar
Zhe Li
View author publications
Search author on:PubMed Google Scholar
Ji Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

X.L. and J.C. conceived the study; X.L. developed the method, performed implementations, simulations, and data analyses; Z.L. contributed to the code development and simulation of HEG; J.C. supervised the project. X.L., Z.L., and J.C. wrote the paper.

Corresponding author

Correspondence to Xiang Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information: Ab initio calculation of real solids via neural network ansatz

Peer Review File

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, X., Li, Z. & Chen, J. Ab initio calculation of real solids via neural network ansatz. Nat Commun 13, 7895 (2022). https://doi.org/10.1038/s41467-022-35627-1

Download citation

Received: 24 May 2022
Accepted: 13 December 2022
Published: 22 December 2022
DOI: https://doi.org/10.1038/s41467-022-35627-1

This article is cited by

Emergent Wigner phases in moiré superlattice from deep learning
- Xiang Li
- Yubing Qian
- Ji Chen
Communications Physics (2025)
Efficient modeling of ionic and electronic interactions by a resistive memory-based reservoir graph neural network
- Meng Xu
- Shaocong Wang
- Ming Liu
Nature Computational Science (2025)
A computational framework for neural network-based variational Monte Carlo with Forward Laplacian
- Ruichen Li
- Haotian Ye
- Liwei Wang
Nature Machine Intelligence (2024)
Spin-symmetry-enforced solution of the many-body Schrödinger equation with a deep neural network
- Zhe Li
- Zixiang Lu
- Weiluo Ren
Nature Computational Science (2024)
Towards a transferable fermionic neural wavefunction for molecules
- Michael Scherbela
- Leon Gerard
- Philipp Grohs
Nature Communications (2024)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Neural network for a solid system

Hydrogen chain

Graphene

Lithium hydride crystal

Homogeneous electron gas

Electron density

Network comparison

Metallic lithium

Discussion

Methods

Supercell approximation

Wavefunction ansatz

Pseudocode of network

Neural network optimization

Twist average boundary condition

Structure factor correction

Empirical correction formula

Electron density analysis

Workflow and computational details

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links