Introduction

Computational methods that can solve the physics of strongly correlated electrons play an important role in the study of molecular and condensed matter systems, where common perturbative or empirical density functional approaches fail. In these systems, the interactions between electrons in some or all of their degrees of freedom contend with the kinetic energy of the electrons, leading to competition between localization and delocalization of the electronic structure, and the emergence of many remarkable properties and low-energy phases. Electronic structure poses a particular challenge amongst the broader umbrella of quantum many-body problems1 due to both charge and spin degrees of freedom, as well as the requirement for antisymmetry, which significantly complicates the form of the solution. However, these features are crucial in the understanding of the emergent physical behavior in many technologically relevant advanced materials, from high-temperature superconductors to catalytic transition metal complexes.

A recent trend has emerged in the use of systematically improvable parameterized ansätze for quantum states, which hold the promise of an exact limit, providing confidence and the ability to internally validate results. Naturally, the universal approximators devised in the field of machine learning (ML) have inspired ansätze for this purpose, leading to the development of Neural Quantum States (NQS)2 with a wide range of network architectures and of differing depths and widths3,4,5,6,7. Concurrently, approaches based on kernel methods have also been considered8,9. All these parameterized ansätze can in principle approximate the complex functional dependencies between the probability amplitudes of the electronic configurations, and have proven capable of obtaining accurate results with minimal user intervention over a variety of systems relying on an optimization based on the techniques of variational Monte Carlo (VMC).

It should be stressed that the use of systematically improvable ansätze in electronic structure is certainly not a new phenomenon with the emergence of NQS. One of the most successful variational methods relies on tensor network states, which provide an improvable tensor factorization of the many-body amplitudes. For a one-dimensional network, the efficient contraction of these amplitudes has led to the prominence of Matrix Product State (MPS) descriptions of correlated systems. The single (hyper)parameter controlling the expressivity of the model is the “bond dimension”, which can be quasi-continuously enlarged to describe higher levels of entanglement towards a complete model. Importantly, the simple structure of these states also allows for additional probes and insights into the emergent many-body physics of the model, and is able to characterize entanglement measures and structures10,11. Insights into the nature of the correlations and the entanglement can be harder to quantify for NQS, where the diversity of different architectures and model parameters can also cloud a clear path towards practical improvability for the states, while it can also be unclear how to precisely design optimal parameterizations.

In this work we return to tensor factorizations to develop an alternative wave function parameterization, inspired by the developments in the class of NQS descriptions of Fermionic quantum matter. There has been much research to indicate that the considerable flexibility of complex NQS architectures is not being fully exploited for many correlated problems, due to the challenges in their optimization and initialization within the VMC framework12,13. Many simple parameterizations have performed as accurately as more complex forms, and a premium is placed on the compactness of the ansatz for ease of practical optimization alongside the overall flexibility. The simpler form for these models can also potentially provide tools for easier interpretability and improvability of the many-body physics, and open avenues to alternative optimization strategies. An example of this is the Gaussian Process State (GPS), which was originally motivated via Bayesian regression as a systematically improvable kernel model with a single model parameter controlling the expressibility. It was shown to achieve similar quality results to NQS, while often being more compact and open to novel insights9,14,15. Further development exposed a duality of the GPS wave function model to an exponential of a CANDECOMP/PARAFAC (CP) tensor-rank decomposition of the wave function amplitudes in second quantization8. This represented an interesting simplification of the model, and suggests further developments in the use of tensor decompositions for systematically improvable descriptions of correlated states. The potential for synergies between the two domains of variational state parameterizations in NQS and tensor decompositions has in fact already motivated the development of NQS architectures that incorporate MPS parameterizations16,17.

This is the topic of this work where we consider a simple and systematically improvable variational quantum state based on tensor factorization, with application to general Fermionic systems, which have proven a particular challenge for NQS methods to date. In this work we consider a fixed basis and second quantization, in contrast to the real-space formulations of other Fermionic ansatz18,19,20. While this introduces a (necessarily incomplete) basis set approximation, it also allows for more flexibility in the choice of model (permutational invariance and antisymmetry are automatically enforced). This also allows for problems to be defined in a finite and discrete space for the stochastic sampling where additional approximations can be devised and chemical insights from atomic orbital correlators are easily accessible. Furthermore, this formulation allows for a straightforward treatment of core electrons and direct comparison to established quantum chemical methods, as well as natural application to multi-resolution and quantum embedding methodologies21,22,23. The basis set approximation, in common with traditional quantum chemical methods, is also much studied with a number of approaches available which can substantially ameliorate it24,25,26,27.

Direct application of NQS-like ansätze in second quantization has often struggled to clearly extend beyond state-of-the-art quantum chemistry, such as coupled-cluster methods (CCSD) or exact diagonalization (FCI), with results often restricted to small molecules and/or minimal basis sets4,5,28,29. The commutation relations of second quantized operators enforced by a necessarily unphysical choice of ordering of the degrees of freedom can induce highly non-local and high-rank parity flips to the probability amplitudes. In principle, these long-range structures can be described by NQS, but in practice are very difficult to model and to appropriately optimize within VMC frameworks, which have mainly been developed for quantum spin systems4. As such, finding better Fermion to spin (qubit) mappings to reduce the rank or range of these non-local parity changes is an active area of research30,31,32,33. Alternatively, NQS-like states can be multiplied by an explicitly antisymmetric state (e.g. Slater determinant, Pfaffian or antisymmetrized geminal power) that will subsume much of the impact of these parity flips, at the cost of potentially limiting the rigorous systematic improvability of the resulting state. Nevertheless, this approach in combination with symmetry-breaking and restoration has achieved impressive results in Fermionic models3,8,15.

Parallel to these developments, backflow transformations have been parameterized via neural networks as an alternative approach to describe Fermionic correlations in strongly interacting systems34,35. This approach modifies the single-electron functions of a Slater determinant (or other antisymmetric function) to depend parametrically on many (potentially all N) electron coordinates in a configuration-dependent way. A closely related approach modifies the Slater determinant by coupling the physical degrees of freedom to a set of configuration-dependent auxiliary or “hidden” fermions, which can similarly be parameterized as a neural network36,37. The parameterization of these backflow-type states has undergone a similar development to other classes of variational wave functions, starting initially from physically-motivated few-body parameterizations38,39,40,41, to a more general ML architecture which allows (in principle) for systematic improvability to exactness where each orbital and electron can arbitrarily change based on all other electronic positions. These configuration-dependent orbitals in backflow states have been defined by a number of different ML architectures, both in real-space and discrete Fock space models, with and without an additional Jastrow factor in the parameterization18,37,42,43. They were shown to be effective in describing the ground states of Fermi-Hubbard models36,44,45, homogeneous electron gases42, ultra-cold Fermi gases46 and (primarily in first quantization) ab initio molecular systems18,19,20,47, achieving energies comparable or surpassing those from Diffusion Monte Carlo, as well as high accuracy coupled-cluster quantum chemical methods.

Here, we consider a particularly simple CP tensor rank decomposition for these configuration-dependent backflow orbitals, which allows for a straightforward yet systematically improvable form for the introduction of explicit many-body correlations into the overall state48,49. We also develop a practical approach for ab initio systems to truncate the length scale of the backflow correlations, providing a further compression of the model with minimal loss of accuracy. We apply this variational ansatz to find the ground state of (doped) Fermi-Hubbard models and the water molecule, outperforming comparable neural network backflow parameterizations, as discussed in the “Fermi-Hubbard model” and “Water molecule” subsections of the “Results and discussion”. In common with other studies, we find increasing the sampling of the VMC optimization important to improve results, indicating that despite the simple form of the state it is still challenging to optimize to the expressibility limit of the ansatz4,47. In “Towards hydrogen materials”, we consider a 6 × 6 2D lattice of hydrogen atoms as a step towards extended systems, with the CPD backflow state comparing favorably to state-of-the-art density matrix renormalization group (DMRG) calculations and significantly beyond the scope of exact approaches. Finally, in “Scaling” we discuss the computational scaling of the method and approaches to reduce this as an outlook towards larger systems and widespread application.

Methods

Backflow determinants via CP tensor-rank decomposition

The wave function for a system of N interacting electrons can be defined in first quantization by assigning a unique label to each electron, and introducing real-space \({{{{{\bf{r}}}}}}_{\alpha }\in {{\mathbb{R}}}^{3}\) and spin σα {} coordinates, so that Ψ(x) = Ψ(x1, …, xN), with xα = (rασα). The simplest wave function that satisfies the required antisymmetry is a Slater determinant of N single-particle spin-orbitals, ϕi(xα):

$${\Phi }_{0}({{{{{\bf{x}}}}}}_{1},\ldots ,{{{{{\bf{x}}}}}}_{N})= \, \frac{1}{\sqrt{N!}}\left| \begin{array}{cccc}{\phi }_{i}({{{{{\bf{x}}}}}}_{1})&{\phi }_{j}({{{{{\bf{x}}}}}}_{1})&\cdots \,&{\phi }_{k}({{{{{\bf{x}}}}}}_{1})\\ {\phi }_{i}({{{{{\bf{x}}}}}}_{2})&{\phi }_{j}({{{{{\bf{x}}}}}}_{2})&\cdots \,&{\phi }_{k}({{{{{\bf{x}}}}}}_{2})\\ \vdots &\vdots &\ddots &\vdots \\ {\phi }_{i}({{{{{\bf{x}}}}}}_{N})&{\phi }_{j}({{{{{\bf{x}}}}}}_{N})&\cdots \,&{\phi }_{k}({{{{{\bf{x}}}}}}_{N})\end{array}\right| ,\\ = \, {{{{\mathcal{A}}}}}[{\phi }_{i}({{{{{\bf{x}}}}}}_{1}){\phi }_{j}({{{{{\bf{x}}}}}}_{2})\ldots {\phi }_{k}({{{{{\bf{x}}}}}}_{N})],$$
(1)

where \({{{{\mathcal{A}}}}}\) antisymmetrizes and normalizes the subsequent product of orbitals with respect to exchange of their arguments. We can consider these single-particle (molecular) orbitals as linear combinations of an underlying basis (e.g. atomic orbitals, AOs) χμ(r), as:

$${\phi }_{i}({{{{\bf{r}}}}})=\sum\limits_{\mu =1}^{L}{\varphi }_{\mu i}{\chi }_{\mu }({{{{\bf{r}}}}}),$$
(2)

where L is the size of this basis and φμi are the coefficients of the linear combination.

The key-idea of backflow ansätze is to extend the Slater determinant by generalizing the single-particle orbitals to functions with non-linear parametric dependencies on all electron coordinates. Historically this meant transforming the electron coordinates rα with a new set of coordinates \({{{{{\bf{r}}}}}}_{\alpha }^{bf}={{{{{\bf{r}}}}}}_{\alpha }+{\sum}_{\beta \ne \alpha }\eta (| {{{{{\bf{r}}}}}}_{\beta }-{{{{{\bf{r}}}}}}_{\alpha }| )({{{{{\bf{r}}}}}}_{\beta }-{{{{{\bf{r}}}}}}_{\alpha })\), where the function η(r) describes the effective displacement of the α electron due to the instantaneous position of the other electrons50. This configurational-dependence on all other electron positions can also be directly encoded into the variational parameters of a linear expansion of single-particle orbitals, as first introduced by ref. 34 for lattice models, yielding a new set of backflow orbitals \({\phi }_{i}^{bf}({{{{{\bf{r}}}}}}_{\alpha };\{{{{{{\bf{r}}}}}}_{/\alpha }\})\). In an effort to improve the systematic description of these configuration-dependent backflow orbitals, recent work has proposed to model \({\phi }_{i}^{bf}({{{{{\bf{r}}}}}}_{\alpha };\{{{{{{\bf{r}}}}}}_{/\alpha }\})\) using neural networks18,20,44,51, ensuring that these functions are invariant under permutation of the electron labels in {r/α} to retain overall antisymmetry of the state.

Within a second quantization representation, the permutational invariance of electrons and antisymmetry of the state is automatically ensured by the action and commutation relations of the second quantized operators, independent of the ansatz chosen. A Slater determinant can thus be obtained from the vacuum state \(\left\vert 0\right\rangle\) by creating N electrons in the corresponding single-particle orbitals as:

$$\left\vert {\Phi }_{0}\right\rangle =\prod\limits_{i=1}^{N}{\hat{c}}_{i}^{{{{\dagger}}} }\left\vert 0\right\rangle =\prod\limits_{i=1}^{N}\left(\sum\limits_{\mu =1}^{L}{\varphi }_{\mu i}{\hat{c}}_{\mu }^{{{{\dagger}}} }\right)\left\vert 0\right\rangle ,$$
(3)

where \({\hat{c}}_{\mu }^{{{{\dagger}}} }\) (\({\hat{c}}_{\mu }\)) is now the operator that creates (annihilates) an electron in the μ-th basis state. To model electron correlation, Eq. (3) can now be straightforwardly extended via analogy to the backflow transformations by including in each orbital a parametric dependence on the full instantaneous orbital occupation vector, n = (n1, …, nL), where nμ indexes instantaneous occupancy of the four Fock states of spin-\(\frac{1}{2}\) fermions in the chosen orthonormal representation of degree of freedom μ. This modifies the creation operator of orbital i to be:

$${\hat{c}}_{i}^{{{{\dagger}}} }({{{{\bf{n}}}}})=\sum\limits_{\mu =1}^{L}{\varphi }_{\mu i;{{{{\bf{n}}}}}}{\hat{c}}_{\mu }^{{{{\dagger}}} },$$
(4)

resulting in an exact model, as each orbital can vary independently according to the instantaneous occupation over the full state. However, it is of limited use as it is an over-parameterization of the full state, with an exponential number of variables. We therefore consider a specific tensor-rank decomposition, the Canonical Decomposition (CANDECOMP) or Parallel Factor (PARAFAC) decomposition (CPD)48,49. This allows for a systematic and improvable decomposition of this tensor for each orbital into a polynomial and low-rank form that is independent of the choice of ordering of the degrees of freedom defining the occupation vector, n. The CP decomposition factorizes the occupation number vector over all states of Eq. (4) into a sum of M tensor products, with each term in the product depending on each degree of freedom in the full occupation number vector, as:

$${\varphi }_{\mu i;{{{{\bf{n}}}}}}^{{{{{\rm{CPD}}}}}}=\sum\limits_{m=1}^{M}\prod\limits_{\nu =1}^{L}{\epsilon }_{\mu i;{n}_{\nu }\nu m}.$$
(5)

We now have a polynomially complex tensor of variational parameters for each orbital, \({\epsilon }_{\mu i;{n}_{\nu }\nu m}\), which encodes the correlation-driven modifications to orbital i for the specific occupied degree of freedom μ, based on the fact that state ν has a local occupation of nν. M represents an improvable parameter describing the systematic coupling of the occupations across all possible occupation strings, providing an increasingly flexible description of higher-rank correlations in the state towards exactness. We denote this single parameter controlling the flexibility of the model as its “support dimension”, by analogy with the CP decomposition within Gaussian process states and kernel model definitions of quantum states8,9,14. This CP decomposition splits the L-dimensional indices indicating the n-dependence of the orbital into a sum of products of rank-3 tensors, depending on each orbital and its occupation. Since this is a simple product rather than matrix product, there is no change in the flexibility of these backflow orbitals with the ordering of the degrees of freedom, ensuring that there should be no explicit dependence on this choice (as found in tensor network states) or dimensionality of the system.

The proposed “CPD” backflow wave function is obtained by replacing the orbitals of the Slater determinant in Eq. (3) by those of Eq. (5), giving an explicitly antisymmetric state where all orbitals depend on the instantaneous occupation of all degrees of freedom:

$$\left\vert {\Psi }^{{{{{\rm{CPD}}}}}}\right\rangle =\sum\limits_{{{{{\bf{n}}}}}}{\Psi }^{{{{{\rm{CPD}}}}}}({{{{\bf{n}}}}})\left\vert {{{{\bf{n}}}}}\right\rangle ,$$
(6)

with

$${\Psi }^{{{{{\rm{CPD}}}}}}({{{{\bf{n}}}}})={{{{\mathcal{A}}}}}[{\varphi }_{{\mu }_{1}1;{{{{\bf{n}}}}}}^{{{{{\rm{CPD}}}}}}{\varphi }_{{\mu }_{2}2;{{{{\bf{n}}}}}}^{{{{{\rm{CPD}}}}}}\ldots {\varphi }_{{\mu }_{N}N;{{{{\bf{n}}}}}}^{{{{{\rm{CPD}}}}}}],$$
(7)

where the antisymmetrizer acts with respect to the N occupied orbitals of the configuration n, given by μ1μ2μN. This model can be evaluated naively via building a matrix and computing a determinant in \({{{{\mathcal{O}}}}}[{N}^{2}ML+{N}^{3}]\) cost, with each orbital evaluated according to Eq. (5). However, for low-rank changes to n where only \({{{{\mathcal{O}}}}}[1]\) orbital occupations change, a fast updating scheme can be devised to reduce the scaling in the matrix build by a factor of L. The update for each orbital in Eq. (5) can be found in \({{{{\mathcal{O}}}}}[M]\) time by dividing out contributions from the previous occupations and multiplying by the new occupations, analogous to the approach in ref. 15. Since all configurational updates in VMC can be formulated in this way, the evaluation of configurational amplitudes of this CPD state can be reduced to \({{{{\mathcal{O}}}}}[{N}^{2}M+{N}^{3}]\).

The total number of variational parameters in this state (which in this work are all real) is therefore \({{{{\mathcal{O}}}}}[4{L}^{2}NM]\), where L is the size of the underlying basis, N is the number of electrons, and M the “support dimension” controlling the flexibility of the model. This scaling in terms of the evaluation of the model and number of parameters allows the standard techniques of VMC to be used for its sampling, optimization and extraction of observables. We note that extending this CPD form to an explicit antisymmetrization of geminal two-particle states within a Pfaffian or antisymmetrized geminal power rather than single-particle orbitals of a determinant would also be possible in this framework and will be explored in the future46,51.

Unless otherwise indicated, in this work, we conserve a definite spin-polarization quantum number for each of the N orbitals labeled 1, 2, 3, …, N in the product in Eq. (7), in which case the overall state must conserve \({\hat{S}}_{z}\) symmetry. This is ensured by only allowing spin-orbital degrees of freedom with the same spin-polarization to be included in the expansion coefficients of the orbital (i.e. the μ labels in Eq. (5)). This allows the state for each n to factorize into a product of spin-up and spin-down determinants (which are allowed to independently optimize, analogous to an “unrestricted” single determinant). An alternative (which is considered in the “Fermi-Hubbard model” subsection of the “Results and discussion”) is to form a “generalized” determinant by allowing spin-polarization to mix in each orbital definition, formally breaking \({\hat{S}}_{z}\) symmetry in the state. This symmetry is nevertheless restored via the sampling of configurations with definite \({\hat{S}}_{z}\) in the Markov chain during the VMC procedure. This \({\hat{S}}_{z}\) symmetry-breaking and projective restoration can improve results by allowing further flexibility in the state, but increases the cost in the evaluation of the determinant defining the amplitude by a factor of eight, and doubles the number of parameters (as μ labels spin-orbitals, not spatial orbitals). Importantly, regardless of whether this spin symmetry is broken or not in the orbital definition, the backflow correlations act both for same-spin and opposite-spin correlations, with the orbital dependence in Eq. (5) running over the spin-full occupations of all other degrees of freedom, nν, ensuring that spin-dependent correlated physics is captured.

Finally, we note that, although the functional form of the configuration-dependent orbitals of Eq. (5) is linear in M, it does not reduce to an uncorrelated determinant in the limit of M = 1. Correlated physics such as that captured via Gutzwiller or Jastrow correlators are included even in this limit, since the dependence between the instantaneous occupation of sites μ and ν can be independently addressed in a product form. Indeed, full non-trivial N-body correlations are included even at M = 1, as the exponentially large sum of products of these orbitals formed from the determinant in Eq. (7) builds in an exponential sum of these N-fold products of variational parameters for each orbital. This results in an expressive state even for very low M, which is systematically improvable to exactness as M is increased.

Two approaches to combine tensor decompositions with backflow parameterizations have been considered in the literature previously; a backflow-inspired extension of the MPS ansatz52 to build non-local entanglement beyond the native MPS tensor ordering constraints for spin systems, as well as a fixed tensor representation of two-body Fermionic backflow form45. The latter study did not factorize these backflow correlations and was constrained to a two-body form for these correlations. In contrast, the CP decomposition of the backflow parameterization introduced in this work overcomes these issues, providing a simple and improvable form for arbitrary rank correlations which is invariant to orbital ordering. We consider the expressibility of these states from an formal perspective further below, separating the practical challenges associated with the faithful optimization of these states in correlated systems.

Universality of the CPD backflow ansatz

In this section, we consider the formal universality of the proposed CPD backflow ansatz, where “universality” in this context refers to the ability to describe any antisymmetric state within the defined Hilbert space of the problem. The universality is simple to prove, and directly stems from the universality of the CP decomposition. This means that in the large-M limit, the CP decomposition employed to model the orbitals \({\varphi }_{\mu i;{{{{\bf{n}}}}}}^{{{{{\rm{CPD}}}}}}\) according to Eq. (5), can be chosen such that they are allowed to vary independently for each many-electron configuration n. This limit also implies a mapping between basis states (Slater determinants) and wavefunction amplitudes after anti-symmetrization which can represent any antisymmetric state within the Hilbert space defined for the problem without approximation error, according to the definition of the CPD backflow ansatz:

$${\Psi }^{{{{{\rm{CPD}}}}}}({{{{\bf{n}}}}})={{{{\mathcal{A}}}}}[{\varphi }_{{\mu }_{1}1;{{{{\bf{n}}}}}}^{{{{{\rm{CPD}}}}}}{\varphi }_{{\mu }_{2}2;{{{{\bf{n}}}}}}^{{{{{\rm{CPD}}}}}}\ldots {\varphi }_{{\mu }_{N}N;{{{{\bf{n}}}}}}^{{{{{\rm{CPD}}}}}}].$$
(8)

This however formally requires M to scale with the number of many-body configurations in the space (as expected for any exact parameterization), as we expand on below.

To show this and make contact with other forms of parameterized states, we consider the subset of CPD states in which the backflow (many-electron) orbitals \({\varphi }_{\mu i;{{{{\bf{n}}}}}}^{{{{{\rm{CPD}}}}}}\) can factorize into a term which is independent of the specific configuration (an n-independent “static” molecular orbital), and a term which is independent of the site index, but yet can depend on a CP decomposition of the specific many-electron configuration. This can be written as:

$${\varphi }_{\mu i;{{{{\bf{n}}}}}}^{{{{{\rm{factored-CPD}}}}}}={\varphi }_{\mu i}\times \left(\sum\limits_{m=1}^{M}\prod\limits_{\nu =1}^{L}{\epsilon }_{i;{n}_{\nu }\nu m}\right)$$
(9)

With this construction, a product of CP decompositions can be factored out of the determinant, bringing the backflow ansatz into the form of a Slater-Jastrow wavefunction:

$$\Psi ({{{{\bf{n}}}}}) = \left(\sum\limits_{m=1}^{M}\prod\limits_{\nu =1}^{L}{\epsilon }_{1;{n}_{\nu }\nu m}\right)\left(\sum\limits_{m=1}^{M}\prod\limits_{\nu =1}^{L}{\epsilon }_{2;{n}_{\nu }\nu m}\right)\cdots \\ \quad \,\left(\sum\limits_{m=1}^{M}\prod\limits_{\nu =1}^{L}{\epsilon }_{N;{n}_{\nu }\nu m}\right)\times {{{{\mathcal{A}}}}}[{\varphi }_{{\mu }_{1}1}{\varphi }_{{\mu }_{2}2}\ldots {\varphi }_{{\mu }_{N}N}].$$
(10)

In this, a Slater determinant common to all configurations (defined by the N orbitals labeled 1, 2, …, N) is multiplied by a product of N CP decompositions, each of which depends on the local occupations of each site (nν). This product of CP decompositions takes the place of the Jastrow factor. Assuming that no two rows of the configuration-independent Slater determinant orbital matrix, φμi, are linearly dependent such that the determinant always evaluates to a non-zero value36, the universal approximator property of the CP decompositions in the prefactor allows this ansatz to define a one-to-one mapping from each many-electron configuration to an arbitrary wavefunction amplitude.

The CPD backflow ansatz in Slater-Jastrow form

This approach to factoring out a CPD decomposition from the Slater determinant shows that we formally require a support dimension M which scales as the size of the Hilbert space for a universal approximator, and is therefore of little practical use. Nonetheless, the representation according to Eq. (10) still provides insights into the ability of the ansatz to represent electronic quantum states of interest at smaller support dimensions, M. From the consideration of the restricted version of the CPD state in this Slater-Jastrow form as shown in Eq. (10), we find that M = 1 is sufficient to represent any single Slater determinant within the given basis, as well as a site-dependent penalty function depending on its local occupation. This encapsulates physically-relevant electronic correlation beyond the mean-field picture. As a specific example, we can consider a parameterization of a Gutzwiller factor of:

$${\epsilon }_{i;{n}_{\nu }\nu m}=\left\{\begin{array}{l}{e}^{{g}_{\nu }}\quad \,{\mbox{if}}\quad i=1{\mbox{ and }}\,{n}_{\nu }\equiv \uparrow \downarrow \quad \\ 1 \quad \quad \!{\mbox{otherwise}}\,\hfill\end{array}\right.,$$
(11)

where nν ≡  indicates a double occupancy of the νth site53,54. This modulates the Slater determinant with a factor depending on the double occupancy of the sites in each configuration, \(\Psi ({{{{\bf{n}}}}}) \sim {e}^{{\sum}_{\nu }{g}_{\nu }{n}_{\nu ,\uparrow }{n}_{\nu ,\downarrow }}\times {{{{\mathcal{A}}}}}[{\varphi }_{{\mu }_{1}1}{\varphi }_{{\mu }_{2}2}\ldots {\varphi }_{{\mu }_{N}N}]\), with parameters gν. General forms for the \({\epsilon }_{i;{n}_{\nu }\nu m}\) parameters even at M = 1 will however also admit factorized non-local dependence on the site occupations beyond Gutzwiller form.

In general, this simple factorization of the CPD state only represents a small subset of the parametrizations possible, which has a significantly larger variational flexibility even for M = 1. This is because the factorization into a site-dependent term and Slater determinant does not need to be imposed, allowing a non-trivial coupling between these “orbital” and “site” effects at the level of the ansatz. This enlarges the span of states accessible within this decomposition at small M, and can therefore outperform “Slater-Jastrow” type factorizations where the Jastrow is taken to have a flexible form, such as those previously considered within the GPS family of states15.

Initialization

A practical bottleneck in working with parameterized quantum states with many variational parameters can often be their reliable stochastic optimization. This can be particularly sensitive to the initialization of the state, since random initialization of the parameters does not always guarantee a good overlap with the ground state, which can slow down or even prevent the optimization from converging to the true ground state. The simple functional form of the CPD backflow orbitals in Eq. (5) allows for a straightforward and effective initialization of the variational parameters, without the requirement for pre-training18,47. Specifically, the tensor \({\epsilon }_{\mu i;{n}_{\nu }\nu m}\) can be initialized to ensure that the CPD wave function exactly spans a given single determinant such as that found from a prior mean-field solution. For most practical cases, this provides a good starting point for the VMC optimization.

In this work, we initialize from a restricted Hartree–Fock state, extracting the molecular orbital coefficients \({\varphi }_{\mu i}^{{{{{\rm{HF}}}}}}\) in the basis in which the state is to be sampled. The CPD variational parameter tensor can then be initialized as follows:

$${\epsilon }_{\mu i;{n}_{\nu }\nu m}={{{{\mathcal{N}}}}}(0,\sigma )+\left\{\begin{array}{ll}{\varphi }_{\mu i}^{{{{{\rm{HF}}}}}}\quad &\,{\mbox{if}}\quad m=1\, {\mbox{ and }}\,\nu =1,\\ 1\quad &\,{\mbox{if}}\quad m=1\, {\mbox{ and }}\,\nu \, > \, 1 \hfill\\ 0\quad &\,{\mbox{if}}\quad m \, > \, 1\, \,\, \, {\mbox{ and }}\,\nu \ge 1 \hfill\end{array}\right.$$
(12)

where \({{{{\mathcal{N}}}}}(0,\sigma )\) is a random number drawn from a normal distribution with standard deviation σ. This small amount of random noise is optional, but is added to the initialization in case the Hartree–Fock solution is too close to a local minimum of the optimization surface.

Backflow truncation via exchange cutoff

While the CPD backflow state only has a polynomial number of parameters, the \({{{{\mathcal{O}}}}}[4{L}^{2}NM]\) scaling is still significantly higher than the native non-backflow (e.g. GPS) state, and there is are significant benefits in attempting to reduce this further with a controllable compromise on the flexibility of the state. Largely redundant parameters in VMC add to statistical noise without improving accuracy and can be particularly deleterious in the optimization of the state55,56. In particular, the scaling with respect to the underlying basis size (L) is quadratic in the CPD state, and in this section we motivate a physical and black-box truncation of this scaling to further improve the overall performance of the state and enable access to larger systems.

We do this by restricting the number of degrees of freedom that the backflow parameterization considers for each μ-indexed site, reducing it from L to a new parameter K. This can be motivated as a range-truncation of the backflow correlations, as has also been considered in other truncated expansions35. If application of this methodology was purely to local lattice models, then strictly truncating by a distance criteria would likely be sufficient to capture the dominant correlations. However, we intend the methodology to be applied equally across lattice models and ab initio systems and therefore seek an alternative proxy to define the choice of entangled orbital subspace in which these backflow correlations are defined for each degree of freedom. This is because an ab initio basis will necessarily be extended in space and perhaps not even able to be uniquely associated with an atomic center. Additionally, the inclusion of the long-range Coulomb interaction in these systems does not necessarily favor purely distance-based criteria. We therefore take inspiration from ab initio formulations of the Density Matrix Renormalization Group (DMRG), where heuristics for the entanglement between two orbitals are necessary in order to find an approximately optimal ordering of the extended orbitals for an effective MPS ansatz. While there are a number of options in the literature, it has been found that the importance of one orbital in describing the dominant correlations with another can be reasonably quantified by the magnitude of the exchange integral between them57, as:

$${{{{{\mathcal{K}}}}}}_{\mu \nu }=\int\int\,d{{{{{\bf{r}}}}}}_{1}d{{{{{\bf{r}}}}}}_{2}{\chi }_{\mu }^{* }({{{{{\bf{r}}}}}}_{1}){\chi }_{\nu }^{* }({{{{{\bf{r}}}}}}_{1})\frac{1}{| {{{{{\bf{r}}}}}}_{1}-{{{{{\bf{r}}}}}}_{2}| }{\chi }_{\mu }({{{{{\bf{r}}}}}}_{2}){\chi }_{\nu }({{{{{\bf{r}}}}}}_{2}).$$
(13)

This exchange-based metric should decay exponentially between localized orbitals, tending towards a flexible locality based truncation in the limit of fully local orbitals, while including the full range of the Coulomb interaction in the kernel. More rigorous definitions of entanglement between orbitals such as their mutual information (pair entanglement entropy)58 could also be used, but require an initial correlated level of theory on which to build these metrics. Since we initialize the CPD backflow molecular orbitals from a Hartree-Fock calculation, the exchange matrix \({{{{{\mathcal{K}}}}}}_{\mu \nu }\) is readily available for no additional cost. The set of K most entangled orbitals for each orbital χμ(r) according to this metric are selected, defining an L × K lookup table which maps to the relevant orbital indices xμν {1, …, L}. The choice of orbitals in the CPD decomposition of Eq. (5) are therefore restricted as:

$${\varphi }_{\mu i;{{{{\bf{n}}}}}}=\sum\limits_{m=1}^{M}\prod\limits_{\nu =1}^{K}{\epsilon }_{\mu i;{n}_{{x}_{\mu \nu }}\nu m},$$
(14)

thus reducing the number of variational parameters to \({{{{\mathcal{O}}}}}(LKMN)\) and formally linear with the size of the system, assuming that K is sufficiently large to capture the range of correlations around each degree of freedom. As K tends to L, the state returns to the original definition (albeit with an inconsequential reordering of sites in the backflow) giving the full flexibility of backflow correlations.

To illustrate the action of this exchange cutoff heuristic, in Fig. 1 we consider the electron density of the K = 5 most entangled orbitals about a specific atom for a 6 × 6 square grid of ab initio hydrogen atoms in a Boys localized basis59 at two different interatomic distances, d. This truncation is used later for numerical results in the “Towards hydrogen materials” subsection of the “Results and discussion” to assess the accuracy of the truncation scheme. A choice of K = 5 respects the local symmetries of each atom, as it enables each atom to be explicitly correlated via the backflow transformations with its four nearest neighbor atoms. As hoped, we find that the exchange cutoff protocol described automatically performs this selection of the nearest-neighbor atomic-localized orbitals around the chosen hydrogen atom in both geometries considered, providing a black-box metric to select the backflow subspace of correlations for each orbital via exploitation of locality of these correlations. We note again that the product structure of the CPD ansatz will build longer-ranged and higher-rank correlations outside the chosen subset implicitly, albeit no longer explicitly for each orbital independently.

Fig. 1: Electron density of orbital subspaces selected via exchange truncation.
figure 1

The K = 5 orbital subspace (red) is shown for a central atom in a 6 × 6 lattice of hydrogen atoms in a STO-6G basis at both compressed (1.0 Å) and extended (2.0 Å) geometries.

Results and discussion

In all results below we initialize the CPD backflow state from the restricted Hartree–Fock solution as outlined in Eq. (12), with a noise scale value σ = 0.01. We optimize parameters using the Stochastic Reconfiguration (SR) method60, and when the number of parameters is larger than that of the samples, we take advantage of the recently introduced kernel formulation from ref. 61 to improve the computational cost of the optimization, as outlined further in the “Scaling” subsection. On the Fermi-Hubbard model and the water molecule, we found that a SR optimizer with RMSProp momentum regularization, as introduced in ref. 62, outperforms standard SR, and we therefore use this optimizer for the results presented in the “Fermi-Hubbard model” and “Water molecule” subsections. The final energies presented in the results are computed as averages over 50 independent energy evaluations with the final optimized parameters and a large sample size (216 for the Fermi-Hubbard model and the water molecule, and 214 for the 6 × 6 hydrogen lattice). Error bars are computed as the standard error of these independent energy evaluations.

The VMC calculations are implemented in the NetKet package63,64, which we interface with our own plugin module, GPSKet for the required custom functionality. For ab initio systems, Hartree–Fock orbital coefficients and Hamiltonians are supplied from PySCF65,66.

Fermi-Hubbard model

While the main ambition of this work is to apply the newly developed CPD backflow ansatz to ab initio systems, we first consider a small Fermi-Hubbard model on a 2D square lattice as a prototypical system for strongly correlated electrons, where comparison to exact results and neural-network parameterized backflow states from the literature are both available. The Hamiltonian for this system is defined as:

$$\hat{H}=-t\sum\limits_{\langle i,j\rangle ,\sigma }{\hat{c}}_{i,\sigma }^{{{{\dagger}}} }{\hat{c}}_{j,\sigma }+U\sum\limits_{i}{\hat{n}}_{i,\uparrow }{\hat{n}}_{i,\downarrow },$$
(15)

where \({\hat{c}}_{i,\sigma }^{{{{\dagger}}} }\) (\({\hat{c}}_{i,\sigma }\)) is the operator that creates (annihilates) a fermion with spin σ on site i, \({\hat{n}}_{i,\sigma }={\hat{c}}_{i,\sigma }^{{{{\dagger}}} }{\hat{c}}_{i,\sigma }\) is the number operator, t is the hopping amplitude, and U is the on-site interaction strength. We apply the CPD backflow state, allowing for spin-polarization breaking and restoration of the orbitals as described in “Methods”, in the strong interaction regime at U/t = 8 on a 4 × 4 lattice with periodic boundary conditions, at half-filling (n = N/L = 1.0) and in the hole doped case (n = 0.875). This hole-doped case is of particular interest as the point at which superconductivity and striped orders strongly compete and is much debated in the literature to date67,68. We compare our results with those obtained by backflow ansätze based on neural networks (NNB) with similar numbers of parameters (~35,000) taken from ref. 44 as well as exact diagonalization (ED)69.

In Fig. 2 we show the percentage relative energy error compared to exact diagonalization for the CPD state of this system, plotted against the number of samples used in the Markov chain for each update of the parameters in the SR steps. Our results significantly improve upon the comparable published neural-network backflow results for this system, even when these are extrapolated with respect to the complexity of the network architecture in the NNB ansatz. We find percentage relative errors as low as 0.5% for the doped case and 0.1% for the half-filled case, which is competitive and within the scatter of other state-of-the-art techniques in the literature for this correlation regime70, albeit with this system too small to be compared in the thermodynamic limit.

Fig. 2: Performance of the CPD and neural network backflow on the Fermi-Hubbard model.
figure 2

Percentage relative energy error for the ground state of the 4 × 4 square Fermi-Hubbard model at U/t = 8 compared to exact diagonalization results69 at a a hole-doped filling of n = 0.875 and b half-filling. CPD backflow results (ΨCPD) are shown as a function of the configurational sample size in the optimization of the parameters, for two different model complexities of M = 1 (blue circles) and M = 2 (green circles). Neural network backflow results (NNB, red dashed lines) are taken from ref. 44. CPD backflow energies are obtained as averages over 50 independent evaluations using the optimized parameters and a sample size of 216, with error bars represented by the standard error across these evaluations.

We also show the variational improvability as the support dimension M of the CPD decomposition is increased from M = 1 to M = 2, with a systematic lowering of all energies found, leading to a maximum of 65,536 parameters. Nevertheless, we unfortunately find that it is still generally more advantageous to increase the number of configurational samples in the Markov chain than to formally increase the flexibility of the state by increasing M. This is due to noise in the estimates of the expectation values required for the optimization of the CPD parameters, which amongst other things affects the inversion of the sampled quantum geometric tensor. This indicates that we cannot be confident of a complete optimization to the global minimum of this state, despite the simple parameterization of the CPD form, with the optimization still limited more by noise in the samples than flexibility in the model, as found in many other studies of comparable states. We will consider this behavior more in the following section, but note that emerging optimization approaches, such as the SPRING algorithm71, will be able to be transferred to this setting and hold promise to boost the resulting performance of the CPD backflow state. However, despite these current limitations we do find a reliable a systematic improvement in the optimized state as the number of samples is increased, and a high level of accuracy overall for this correlated state.

Water molecule

We now consider ab initio molecular systems, which are described in second quantization by an electronic Hamiltonian of the form

$$\hat{H}=\sum\limits_{ij,\sigma }{h}_{ij}^{(1)}{\hat{c}}_{i,\sigma }^{{{{\dagger}}} }{\hat{c}}_{j,\sigma }+\frac{1}{2}\sum\limits_{ijkl,\sigma \tau }{h}_{ijkl}^{(2)}{\hat{c}}_{i,\sigma }^{{{{\dagger}}} }{\hat{c}}_{j,\tau }^{{{{\dagger}}} }{\hat{c}}_{l,\tau }{\hat{c}}_{k,\sigma },$$
(16)

where the sums run over the degrees of freedom in the system and στ are binary spin variables. The \({h}_{ij}^{(1)}\) matrix elements describe the kinetic energy operator and interaction with the external potential in these degrees of freedom, while the \({h}_{ijkl}^{(2)}\) terms model the Coulomb interaction between particles. Compared to Fermi-Hubbard models, the computational complexity of these Hamiltonians is significantly increased by the N2(2LN)2 scaling of the connected configurations required in evaluating the local energy (compared to \({{{{\mathcal{O}}}}}[N]\) terms in Hubbard and other lattice models). Since the evaluation of the CPD wave function model at each configuration is \({{{{\mathcal{O}}}}}[{N}^{2}M+{N}^{3}]\) with the fast update (see “Methods”) this constrains the number of configurational samples that can be afforded.

For our initial benchmark system, we consider the water molecule in the 6-31G basis set at the equilibrium geometry used in ref. 4. While this seems an unassuming system from an electron correlation perspective, it has emerged as somewhat of a benchmark system in the Quantum Monte Carlo (QMC) community, where it has been studied extensively using a variety of ansätze55,72,73. Recently developed NQS architectures have struggled to reach state-of-the-art accuracy for this molecule, despite it still being of a size where exact diagonalization is possible. Part of the issue with this comes from the fact that the weakly correlated physics and compact nature of the molecule mean that it is hard to define an appropriate representation for the basis which can enable efficient, faithful and representative sampling of the state with few configurational samples.

Minimizing the number of samples relies on finding a representation which can be faithfully approximated by a small stochastic selection of configurations, necessitating an orbital representation of the basis in which the wave function amplitudes are as flat as possible throughout the Hilbert space. This maximizes the acceptance rates of the Metropolis-Hastings Markov chain growth, and ensures that as small a sample as possible can represent the wave function distribution. Canonical bases of mean-field (e.g. Hartree–Fock) theories are therefore particularly poorly suited, as they are (away from very strong correlation) dominated by the configuration of a single Slater determinant. These bases have been found for NQS with restricted Boltzmann machine architectures to obtain relatively large correlation energy errors (≈ 5–10%) despite scaling up to 106 configurational samples4. The development of autoregressive NQS models has been able to improve upon this by allowing a direct sampling algorithm of unique configurations that is not constrained by the limitations of the Metropolis-Hastings algorithm5. However, these models still require large sample sizes and have only been benchmarked in a STO-3G minimal basis set for this system5,28,29. Rather than changing the sampling algorithm, in a previous work, we considered the effect of different orbital representations for the configurational basis15. Following this, we consider orthogonal Foster-Boys orbitals for the configurations, localized over all degrees of freedom to minimize the physical spread of the resulting orbitals59.

The results in Fig. 3a show that the CPD backflow ansatz formulated in this local basis exhibits a clear systematic improvability, with the error decreasing inversely with the support dimension of the model, M. We can use this empirical scaling to extrapolate the results to the infinite support dimension limit, which results in a relative correlation energy error of below 2%, for the ansatz optimized with \({{{{\mathcal{O}}}}}[1{0}^{4}]\) configurational samples. At infinite M the model is complete, and the error therefore must arise from the incomplete optimization of the finite-M models. We therefore also consider the improvability in M for two different numbers of configurational samples, NS in the Markov chains used for each optimization step. The 1/M decay of the error is clearly seen in both of these sample sizes, with the extrapolated model result decreasing towards exactness for increasing NS.

Fig. 3: Systematic improvability of the CPD backflow ansatz for the water molecule.
figure 3

Relative ground state correlation energy error (compared to exact diagonalization) for the CPD backflow ansatz (ΨCPD) on the water molecule (6-31G basis, equilibrium geometry as specified in ref. 4) as a function of a support dimension M and b number of configurational samples NS. We also extrapolate these results to infinite M or NS to provide the values in their infinite limit as shown by diamonds. CPD backflow energies are obtained as averages over 50 independent evaluations using the optimized parameters and a sample size of 216, with error bars represented by the standard error across these evaluations.

We analyze this trend more systematically in Fig. 3b, where we show the convergence in NS for two different model complexity parameters M, showing a relatively robust \({N}_{S}^{-\frac{1}{2}}\) scaling in the error. This indicates that doubling the support dimension has a similar effect on reducing the error as quadrupling NS. This robustness and reliability in the error reduction is to be expected with increasing M, but is more surprising with increasing NS. It indicates that the noise introduced into the sampling at finite NS values is not simply changing the variance in the resulting energy, or indeed resulting in different optimized states due to convergence to different local minima in the landscape (where we would expect a wider scatter of optimized energies). Instead, the robustness and systematic trend in the results indicates that NS is controlling the intrinsic error of the optimization of the state in a more systematic fashion. This could potentially arise from non-linear steps in the optimization protocol, and is something which requires further scrutiny going forwards.

Comparing the accuracies obtained to previous state-of-the-art results in Fig. 4, we find that the CPD backflow state with M = 1 (6.7k parameters) already outperforms both the Gaussian process state augmented by a symmetry-broken Pfaffian (585 parameters)15 and the restricted Boltzmann machine NQS state (728 parameters)4 when optimized with \({{{{\mathcal{O}}}}}[1{0}^{3}]\) configurational samples. The accuracy is further improved when larger support dimensions and sample sizes are considered, with the CPD model at M = 4 (27k parameters) and \({N}_{S} \sim {{{{\mathcal{O}}}}}[1{0}^{4}]\) outperforming the best NQS by 2% in the relative correlation energy error. While we still don’t quite reach the level of accuracy of coupled cluster methods with singles and doubles (and the significant “chemical accuracy” hurdle—albeit defined with respect to the finite basis set energy), to the best of our knowledge, these results represent the state-of-the-art for an NQS-like variational ansatz for this system.

Fig. 4: Performance of the CPD backflow and other models on the water molecule.
figure 4

Relative ground state correlation energy error (compared to exact diagonalization) for CPD backflow ansätze (ΨCPD) on the water molecule (6-31G basis, equilibrium geometry as specified in ref. 4) as the number of configurational samples NS in each Markov chain is increased in the parameter optimization steps. Two support dimensions corresponding to M = 1 (blue circles) and M = 4 (green circles) are shown. Comparison energies for the Gaussian process state augmented by a symmetry-broken Pfaffian (\({\Psi }^{GPS}\times \left\vert \,{\mbox{Pf}}\,\right\rangle\), red diamond) and restricted Boltzmann machine NQS (ΨRBM, orange squares) are taken from refs. 4,15, respectively, while the CCSD energy (solid line) is calculated with PySCF65,66. CPD backflow energies are obtained as averages over 50 independent evaluations using the optimized parameters and a sample size of 216, with error bars represented by the standard error across these evaluations.

Towards hydrogen materials

Extending the CPD backflow ansatz beyond benchmark studies and comparison to exact results, we consider a two-dimensional ab initio lattice of hydrogen atoms as a step towards combining strong correlation, long-range interactions and extended systems. These hydrogenic systems have been studied by a variety of methods in the recent years given their simple specification and challenge of realistic interactions, whilst maintaining a close connection to the Fermi-Hubbard model15,74,75,76,77,78,79. In particular, different correlation regimes can be probed by simply changing the interatomic distance of the lattice, similar to tuning the interaction strength in the Fermi-Hubbard model. However, crucially these hydrogen lattices require the accurate treatment of realistic long-range Coulomb interactions and their effects, which are not present in the Fermi-Hubbard model. Accurately capturing the ground state of these systems for different interatomic distance is thus a challenging task for most quantum chemistry methods, as it requires a flexible and expressive model with a treatment of long-range interactions and high-energy scattering physics that gives rise to states of significantly different character.

In Fig. 5, we report the ground state energy per atom obtained from the CPD backflow ansätze (with and without the rank and range truncation introduced in the “Backflow truncation via exchange cutoff” subsection of the “Methods”) for a 6 × 6 hydrogen lattice in a minimal basis (STO-6G) with open boundary conditions. We choose K = 5 for the range cutoff of the backflow to ensure that the local symmetries of quantum fluctuations about each atomic site are preserved. We consider both compact (lower effective U/t) and extended (higher effective U/t) lattice structures by varying the interatomic distances all the way to essentially dissociated non-interacting hydrogen atoms. We compare our results to energies obtained with restricted Hartree-Fock (RHF) and unrestricted coupled-cluster with single and double excitations (UCCSD), as well as an efficient ab initio implementation of density matrix renormalization group (DMRG) going up to bond dimension of 1024 in a fully spin-adapted basis implemented in the block2 package80,81. For an additional comparison between contrasting approaches to Fermionic variational wave functions, we also include the results obtained from a GPS ansatz with support dimension M = 72 acting as a Jastrow in front of a co-optimized Slater determinant15, to compare the CPD backflow to this approach. We optimize the CPD backflow and GPS multiplied by Slater determinant ansätze in a Boys localized basis for the orbitals, whereas for the DMRG results we rely on a split-localized basis, in which occupied and virtual orbitals are localized separately. We found this choice to give the most consistent results for DMRG across the range of geometries studied. Each optimization of the CPD backflow wave functions took  ≈ 250 GPU hours across 4 Nvidia A100 devices, whereas the DMRG runs took a total of  ≈ 500 CPU hours on an Intel(R) Core(TM) i9 device.

Fig. 5: Performance of the CPD backflow and other methods on the hydrogen lattice.
figure 5

a Ground state energy per atom of a 6 × 6 square hydrogen lattice in a STO-6G basis for increasing lattice constants. Shown are RHF (orange line) and UCCSD energies (violet line), while the horizontal gray line indicates the exact energy of the fully dissociated limit in this basis. The CPD backflow ansatz (ΨCPD) with support dimension M = 1 (solid black line) is compared to a GPS ansatz augmented with a Slater determinant (\({\Psi }^{GPS}\times \left\vert \Phi \right\rangle\)) with support dimension M = 7215 (dotted black line), as well as energies obtained with DMRG (red dots and line) using a bond dimension of M = 1024. A further CPD backflow ansatz with support dimension M = 1 and a truncation in the backflow subspace to K = 5 orbitals is shown as a dashed black line. The backflow CPD (GPS augmented by Slater determinant) VMC ansätze were optimized with 4096 (10,000) samples, while DMRG results were obtained using the block2 package80,81. b Energy difference per atom relative to DMRG of the CPD backflow ansätze and the GPS times Slater determinant ansatz. To aid the comparison, we have removed the outlier DMRG energy at 2.5 Å. The energies of the CPD backflow and GPS times Slater determinant ansätze are obtained as averages over 50 independent evaluations using the optimized parameters and a sample size of 214, with error bars represented by the standard error across these evaluations.

The RHF and UCCSD description of this equation of state qualitatively breaks down quite early in this stretching coordinate, with UCCSD failing to converge beyond 1.5 Å. Furthermore, the UCCSD exhibits quantitative error of  ~ 2 mEh per atom even around equilibrium geometries, confirming that substantial correlation effects are present even in this regime. As another point of reference, the fully dissociated limit can be computed via exact diagonalization, where the assumption of simple energy extensivity from a single atom can be applied. In this limit the energy is  ≈ 0.03 Eh above the analytic result for the hydrogen atom due to the basis set incompleteness error, which nevertheless will exhibit a large degree of cancellation for energy differences along this changing geometry. The DMRG provides the best variational comparison for this system, with (apart from 2.5 Å) the CPD and GPS results being within 2 mEh per atom of this value.

The CPD backflow ansatz manages to quantitatively capture the features of the expected potential energy surface, reaching the correct dissociation limit at large interatomic distances, and showing an overall smooth transition from weak to strong correlation regimes. We can directly compare different systematically improvable variational ansätze (DMRG, GPS multiplied by a Slater determinant and the CPD backflow), all of which are competitive and variationally optimal at different points in the changing physics of this system. Around the equilibrium of this system, the CPD backflow and DMRG states are almost identical and variationally optimal amongst the comparison. In the intermediate regime (1.5Å ≤ d < 2.0 Å), the GPS ansatz augmented with a Slater determinant provides the best variational energies, despite (or perhaps because of) the smaller number of parameters (≈12k vs. ≈187k for the CPD backflow ansatz without truncation and ≈26k for the one with). An outlier appears to be the nearly dissociated limit of 2.5 Å interatomic distance, where the DMRG energy appears erroneously high. This could be due to a particularly large impact on the one-dimensional MPS topology used, the choice of basis for the orbitals or the DMRG sweep getting stuck in a local minimum. Nevertheless, the other variational ansatz largely agree at this point.

Comparing the CPD backflow curves with and without the backflow truncation, we find (as expected) that the K = 5 results are all variationally higher than the parent CPD backflow. This truncation has a very small effect on the energies at larger interatomic distances, but becomes more significant around the equilibrium distance and mildly stretched geometries where it reaches a maximum error of 2 mEh per atom. This is expected as the range of the correlations in the compressed lattice will extend further than the stretched limit. Nonetheless, even with this restriction the ansatz is able to reach the coupled-cluster level of accuracy around equilibrium, and to outperform it on stretched geometries, with a reduction in the number of parameters compared to the parent model by more than a factor of seven. This validates the exchange cutoff as a practical parameter reduction scheme for the CPD backflow ansatz, suggesting benefits in the study of larger systems, and potentially allowing for an increase in the support dimension of the model.

To further compare the physical properties of the potential energy surface of this lattice as described by the different levels of theory, we fit a simple Morse potential at different interatomic distances (r), given by:

$$V(r)={D}_{e}{\left(1-{e}^{-a(r-{r}_{e})}\right)}^{2}+u,$$
(17)

where De and a control the depth and width of the well, re is the equilibrium bond length, and u is the energy offset. Although the Morse potential is generally used for diatomic molecules, the symmetric stretching coordinate of this system is nevertheless well modeled by this form. The differences in the quantum chemistry and VMC methods used to obtain the potential energy data are reflected in the variations of dissociation energy (De), equilibrium bond length (re), harmonic vibrational frequency (ωe), and anharmonicity constant (ωeχe) presented in Table 1.

Table 1 Physical properties of the hydrogen lattice as obtained by different methods

UCCSD is expected to accurately describe the correlated physics near equilibrium geometries, however the rapid divergence after this point renders even the harmonic vibrational frequencies unreliable. In contrast, DMRG, which handles both the strong and weak correlations on a consistent level, presents a more accurate dissociation energy (De = 0.031 eV) and harmonic vibrational frequency (ωe = 1900.038 cm−1), while yielding anharmonicity of the vibrational motion of the atomic lattice describing the beyond-parabolic nature of the binding as ωeχe = 36.409 cm−1. The values obtained from CPD backflow ansätze with and without truncation closely track those from DMRG. On the other hand, while agreeing on the dissociation energy, the GPS multiplied by Slater determinant ansatz stands out amongst the variational methods with a marginally softer bond, with a larger equilibrium lattice parameter (re = 1.237 Å) and the lowest harmonic vibrational frequency (ωe = 1748.939 cm−1). Overall, the methods agree on an equilibrium bond length around 1.22 Å and a dissociation energy of 0.03 eV (except UCCSD), with variations in the harmonic and anharmonic wavenumbers.

These variations highlight the strengths and limitations of each method in modeling the potential energy surfaces and vibrational properties of hydrogen materials. Taking all these results into consideration, the CPD backflow ansatz emerges as a competitive method for the study of strong electron correlation, providing a variational description of the ground state of a two-dimensional lattice of hydrogen atoms that is in good agreement with other state-of-the-art methods, while being able to capture strong correlations and anharmonic effects in the system in a low-energy basis.

Spin-spin correlations

The local atomic basis framework of the CPD backflow ansatz allows for the straightforward computation of atom-resolved expectation values for further insights into the electronic structure. In the context of hydrogen materials, local spin-spin correlation functions are of particular interest, as they can provide insights into the nature of the ground state of the system, and the emergence of magnetic order. By analogy with Hubbard models, we would expect some anti-ferromagnetic order to emerge in the electronic structure of this system, with this order decaying algebraically in the thermodynamic limit. However, in the presence of long-range interactions this behavior is far from confirmed in two-dimensions. While admittedly far from this thermodynamic limit, we consider the two-point spin-spin correlation function C(r) between the center of the 6 × 6 hydrogen lattice, and atoms at a distance r from the center. We can define this function via instantaneous (equal-time) spin-spin correlators \(\langle {\hat{S}}_{{\vec{r}}_{a}}^{z}{\hat{S}}_{{\vec{r}}_{b}}^{z}\rangle\) between two atoms as:

$$C(r)=\frac{1}{{N}_{bulk}}\sum\limits_{{\vec{r}}_{a}\in \,{\mbox{bulk}}\,}\sum\limits_{| {\vec{r}}_{a}-{\vec{r}}_{b}| =r}\left\langle {\hat{S}}_{{\vec{r}}_{a}}^{z}{\hat{S}}_{{\vec{r}}_{b}}^{z}\right\rangle ,$$
(18)

where \({\vec{r}}_{a}\) and \({\vec{r}}_{b}\) are the positions of atoms a and b, and Nbulk is the number of equivalent atoms in the center that we average over (four). We use the atom-centered atomic orbitals themselves as natural projectors for the spin operators of each atom. More details about the calculation of the instantaneous spin-spin correlation function can be found in the Supplementary Note 1.

In Fig. 6, we compare the radial spin-spin correlation function for the ground state approximation obtained with the CPD backflow ansatz at near-equilibrium interatomic distance d = 1.2 Å, and at a large stretching of d = 3.0 Å, normalized for the changing inter-atomic distances. We find the emergence of the short-range anti-ferromagnetic order in the material, as anticipated by analogy with Hubbard models. The magnitude of this antiferromagnetic order increases with increasing interatomic separation, again keeping with anticipated Hubbard behavior of increasing U/t values. However, this order is very short ranged, with the spin in the extended lattice not directly affecting a lattice site beyond its nearest neighbors. At more compressed geometries, this order does extend beyond this to the outer atoms in the lattice (next-next-nearest-neighbors), due to the shorter distance in real space, but the overall magnitude of these magnetic correlations is reduced.

Fig. 6: Spin-spin correlations in the hydrogen lattice.
figure 6

Two-point instantaneous spin-spin correlation function for the ground state of the 6 × 6 hydrogen lattice between atoms at two different interatomic distances d = 1.2 Å (blue) and d = 3.0 Å (green). Correlation function is given as a function of normalized radial interatomic distance between atoms in the correlator, showing nearest, next-nearest and next-next-nearest magnetic correlations. The correlators were computed from the optimized CPD backflow ansatz with M = 1.

Scaling

As illustrated in the results above, the CPD backflow ansatz performs well on small Fermionic systems, but further developments for scaling to significantly larger systems are still required for this to become a clearly competitive method for the wider electronic structure community. Simplifying the scaling to assume a general growth of both the basis and electron number such that N ~ L and assuming M and K are independent of system size, the parameters grow with system size as \({N}_{P} \sim {{{{\mathcal{O}}}}}[{N}^{3}]\) for the full ansatz, with the subspace truncation of the backflow reducing this asymptotically to \({N}_{P} \sim {{{{\mathcal{O}}}}}[{N}^{2}]\) (see the “Backflow truncation via exchange cutoff” subsection of the “Methods”). The fast updating of backflow orbitals also enables the evaluation of the wave function log-amplitudes to be performed in \({{{{\mathcal{O}}}}}[M{N}^{2}+{N}^{3}]\) (regardless of whether a backflow truncation is applied). However, we find in practice the determinant evaluation has a significantly smaller prefactor than the construction of the orbitals, so that the dominant scaling is rather \({{{{\mathcal{O}}}}}[{N}^{2}]\) for small to medium-sized systems. Given that the number of terms in general second quantized ab initio Hamiltonians scales as \({{{{\mathcal{O}}}}}[{N}^{4}]\), the resulting scaling of the local energy evaluation is then \({{{{\mathcal{O}}}}}[{N}^{7}]\) for the CPD backflow state in the asymptotic limit, or \({{{{\mathcal{O}}}}}[{N}^{6}]\) for small to medium-sized systems. While this should be competitive with accurate quantum chemical methods such as coupled-cluster, it is clear that the prefactor is significantly larger.

Rather than just the local energy evaluation, we should also consider the computational scaling for the update of the parameters. For larger numbers of parameters (such as the  ~ 187,000 of the hydrogen lattice above), their update used to be the main bottleneck for VMC large-scale ansätze when using the original SR algorithm60. For a model with NP parameters, SR would scale as \({{{{\mathcal{O}}}}}[{N}_{P}^{3}]\), since it involves inverting the NP × NP quantum geometric tensor matrix. This is not the case for recently introduced alternative formulations of SR, such as minimum-step SR82, the kernel formulation of SR61 or SPRING71. In particular, for minimum-step SR and the kernel formulation of SR, simple linear algebra identities were used to reduce the dimension of the matrix that is inverted in the SR algorithm from NP to NS, i.e. the number of samples used during the optimization. When NP NS, as in large-scale models, the scaling of the parameter update becomes \({{{{\mathcal{O}}}}}[{N}_{S}^{2}{N}_{P}+{N}_{S}^{3}]\), i.e. linear in the number of parameters. Thus, the evaluation of the local energy remains the computational bottleneck of the algorithm in the case of ab initio systems.

We show this scaling explicitly in Fig. 7, where we measure the mean runtime for a full VMC parameter update step for the CPD backflow ansatz, including the Markov chain sampling of a fixed number of configurations, evaluation of the local energy, and the subsequent parameter update. By increasing the number of hydrogen atoms in a chain with fixed equilibrium inter-atomic distances (d = 1.68 Å) in a STO-6G basis up to 90 atoms we can extract a realistic asymptotic scaling of the approach. We set the support dimension of the ansatz to M = 1 and choose a sample size of NS = 128, in order to fit the data in memory even for the largest system sizes. For each system size, we let the VMC algorithm run for 50 iterations on a single Nvidia A100 GPU with 40GB of memory. Extracting the scaling from the large system limit gives a scaling of \({{{{\mathcal{O}}}}}[{N}^{6.5}]\), which is only evident for this system when we reach  > 40 atoms.

Fig. 7: Scalability of the CPD backflow ansatz.
figure 7

Mean VMC step runtime for the CPD backflow ansatz as a function of the number of atoms in a hydrogen chain with fixed inter-atomic distances (1.68 Å) in a STO-6G basis. The support dimension is M = 1 and the sample size is NS = 128. The blue points are obtained with the full Hamiltonian, while the green and orange points are obtained with the local energy evaluated with a Hamiltonian pruned with a threshold of 10−9 and 10−5 Eh. The dashed lines show the observed scaling for the largest system sizes. Average runtimes are obtained over 50 iterations, with error bars represented by the standard error across these runtimes.

As discussed in the context of the GPS model in ref. 15 and building on other works in this area74,83,84,85, this scaling for ab initio systems can be reduced further by truncating the number of terms in the sum over connected configurations at each evaluation of the local energy. This truncation is performed on the magnitude of the Hamiltonian matrix element connecting the configurations. By presorting the electron repulsion integrals between the degrees of freedom, this truncation can be implemented without having to consider the entire set of Hamiltonian matrix elements for each evaluation of the local energy. Formally, the exponentially decreasing overlap between the orbitals in the sampled space should reduce the number of connected determinants which contribute to the local energy asymptotically to \({{{{\mathcal{O}}}}}[{N}^{2}]\) rather than \({{{{\mathcal{O}}}}}[{N}^{4}]\)—a scaling which then matches the scaling of the local energy evaluation in a first quantized perspective. This results in a practical scaling of the CPD ansatz of \(\sim {{{{\mathcal{O}}}}}[{N}^{4-6}]\). However, since this method comes with a certain overhead in terms of data structures, the lower-bounds on this scaling only materialize after a certain crossover system size, which depends on the system and on the truncation threshold on the Hamiltonian matrix elements.

Figure 7 also shows the analogous results including this energetic threshold of 10−5 and 10−9 Eh, with this tighter threshold expected to incur negligible change in the sampled energy for a given state. The results show that a practical crossover point, after which the pruning of Hamiltonian elements below a small threshold yields a speed-up, is reached already around system sizes as low as 20–30 electrons for this ansatz, noting that a one-dimensional chain is advantageous in terms of affecting an advantage from this approach. However, for large system sizes the speed-up is more than an order of magnitude and provides the expected asymptotic quadratic improvement, giving an overall scaling of \({{{{\mathcal{O}}}}}[{N}^{3-4}]\) up to  ~ 100 electrons, a scaling competitive with hybrid Density Functional Theory (DFT) techniques. Overall, the combination of this ansatz with the various scaling reductions outlined represent a real potential towards a second quantized, systematically improvable, VMC algorithm with a practical and competitive \({{{{\mathcal{O}}}}}[{N}^{4}]\) scaling for medium to large ab initio systems. Clearly, further developments for prefactor reductions are key to take advantage of this improved scaling and access these system sizes.

Conclusions

In this work, we introduce a general and simple ansatz suitable for ab initio fermions, based on a systematically improvable tensor rank decomposition of a general backflow form. This systematically builds configuration-dependent orbitals of a single antisymmetric Slater determinant in second quantization, directly encoding non-trivial N-body electron-electron correlations with a parameter scaling of \({{{{\mathcal{O}}}}}[{N}^{2-3}]\). We have shown that the ansatz can achieve competitive accuracy on small Fermionic systems, such as the Fermi-Hubbard model and the water molecule, and that it can be used to model larger strongly correlated lattices of ab initio hydrogen atoms with an accuracy comparable to state-of-the-art DMRG techniques. Finally, we have discussed the scalability of the ansatz and shown that we can affect various reductions in a practical fashion to demonstrate \({{{{\mathcal{O}}}}}[{N}^{4}]\) scaling on medium to large ab initio systems.

We are working on further improvements in the accuracy and efficiency of the ansatz, as well as taking advantage of the benefits of working in a second quantized formalism to integrate with multiscale methods and quantum embedding methodologies to provide a practical route in the modeling of truly extended systems within this CPD backflow framework86,86. These techniques could also be integrated within the “hidden fermion” model of correlated states as an alternative parameterization of the correlations36,37. It is also natural to ask whether alternative tensor factorization techniques could be applied within the context of describing second-quantized backflow correlations. These could naturally be fitted into the framework described above, and will also be explored in the future.