Introduction

Recent experiments on twisted bilayer MoTe2 (tMoTe2) reported the observation of the fractional quantum anomalous Hall (FQAH) effect at fractional fillings \(\nu =-\frac{2}{3}\) and \(-\frac{2}{5}\) of the moiré band1,2,3,4. The realization of the FQAH effect in twisted transition metal dichalcogenide bilayers (tTMDs) was theoretically proposed5,6,7 as a consequence of band topology8 and electron interaction. Specially, spontaneous ferromagnetism and electron correlation in spin-valley polarized Chern band lead to the emergence of fractional Chern insulators (FCI) that exhibit the FQAH effect at zero magnetic fields. The observed FQAH effect in tMoTe2 is remarkably robust, existing over an unexpectedly wide range of twist angles and persisting up to 2 K3. The experimental realization of the long-sought fractional quantum Hall effect at zero magnetic field9,10,11,12,13,14,15,16,17,18 not only expands the realm of fractionalized topological phases, but also holds promise for anyon-based topological quantum computations19,20.

While theoretical studies have provided important insights into the FQAH effect in tTMDs, the underlying moiré band structures of tMoTe2 over the experimentally accessible twist angle range has not been systematically studied. A number of first-principles studies report different bandwidths at the commensurate angle θ = 3.89°, ranging from 9 to 18 meV for the lowest moiré band21,22,23. Importantly, lattice relaxation at the moiré length scale can significantly impact the band structure and even the band topology. While the effect of the out-of-plane corrugation has been considered, the in-plane lattice relaxation and the effect of the resulting strain field have not been incorporated into the continuum model. The strain-induced pseudomagnetic field, as well as higher-harmonic moiré potentials, strongly affect higher moiré bands, and therefore are crucial for studying band-mixing FCI states23,24,25 and interaction-induced phases at higher filling factors. Finally, first-principles electronic structure calculations for twist angles below θ = 3.89° are entirely lacking. The accuracy of the continuum model at small twist angles remains to be assessed.

In this work, we perform extensive first-principles simulations to study moiré lattice relaxation and electronic structures. Our calculations encompass a wide range of twist angles, reaching as small as 2.88° using plane-wave basis and 1.1° using transfer learning technique and local basis. In addition to interlayer corrugation, we observe significant in-plane displacement26,27,28,29,30 reaching around 0.5 Å at small angles. To capture the significant effect of lattice relaxation, we extend the continuum model to include second harmonic moiré potentials and pseudo-magnetic field up to 250 T from in-plane strain27. Remarkably, all four topmost moiré valence bands over the entire range of experimentally relevant twist angles (θ = 2.6° – 5°1,2,3,4) are accurately reproduced by our continuum model, with a single set of parameters. With the transfer-learning model, we further calculate the topological edges states and Wilson loop around 2°, revealing a series of C = 1 Chern bands. These findings serve as the foundation point for even-denominator non-Abelian states.

Results

Two types of van der Waals corrections

For accurate lattice relaxations in two-dimensional (2D) multi-layer systems, it is essential to incorporate van der Waals (vdWs) dispersion corrections into the total energy, potential, interatomic forces, and stress tensor calculations. The choice of vdWs corrections, therefore, influences the lattice parameters of unit cells and the interlayer distances. Typically, vdWs corrections fall into two categories: (a) charge-density independent methods such as DFT-D2/D3 and (b) charge-density-dependent methods. The latter category accounts for charge-density variations in vdWs contributions of atoms influenced by their local chemical environments.

The DFT-D2 method31 adds an empirical single-shot dispersion correction to the conventional density functional theory (DFT) calculations. The correction term for the dispersion energy Edisp is given by

$${E}_{disp}=-{s}_{6}{{\sum}_{i=1}^{N-1}}{{\sum}_{j=i+1}^{N}}\frac{{C}_{6,ij}}{{R}_{ij}^{6}}\cdot {f}_{damp}({R}_{ij}).$$
(1)

Here, s6 denotes a global scaling factor that only depends on the density functional used, N is the number of atoms in the system, C6,ij are the dispersion coefficients for the atom pair (ij), and Rij is the distance between atoms i and j. The damping function fdamp is used to avoid the divergence of the dispersion term at short interatomic distances. C6,ij and fdamp are determined by the local geometry, which is unrelated to the self-consistent iteration.

Unlike the D2 method, the density-dependent screened Coulomb (dDsC) method32,33 involves a density-dependent screening function to modulate the Coulomb interaction, which allows for a more realistic representation of vdWs interactions as a function of the local chemical environment. The correction term for dDsC can be expressed by

$${E}_{{{{\mathrm{disp}}}}}=-{{\sum}_{i=1}^{N-1}}{{\sum}_{j=i+1}^{N}}\frac{{C}_{6,ij}}{{R}_{ij}^{6}}\cdot {f}_{{{{\mathrm{damp}}}}}(b{R}_{ij}).$$
(2)

The key difference between dDsC and DFT-D2 lies in the damping function fdamp, which is associated to the key component b (damping factor) for dDsC. This damping factor can be determined by the local electron density, the gradient of the electron density, and other environment-specific parameters. Therefore, it is particularly useful for systems (e.g., strongly correlated moiré systems studied here) where vdWs interactions are sensitive to the local electronic environment.

For untwisted bulk structures, these two vdWs correction methods often give similar results. As shown in Supplementary Note 1, for the bulk-MoTe2, the lattice constants and the vertical layer distances predicted by both DFT-D2 and dDsC methods agree well with the experimental results (a = 3.52 Å and d = 6.99 Å)34,35. However, for the moiré superlattice system, the dDsC method yields more reliable structure relaxation, since the rich local chemical environments such as position-dependent electrical dipoles appearing in the moiré superlattice are better described by dDsC.

Large-scale DFT and lattice relaxation effect

Making use of the initial moiré structure generated by deep potential molecular dynamics (DeePMD)36, large-scale structural relaxations can be achieved at a significantly reduced computational cost. Remarkably, the relaxation of θ = 2.88° twisted structures comprising 2382 atoms was completed in just 5 h with 17 DFT ionic steps using DeePMD-generated structure in four NVIDIA H100 GPUs. The self-consistent calculation and band diagonalization of this 2382-atom system (IBAND = 17,160 and plane-wave number 11,469,590) can be done within 80 min in 20 NVIDIA H100 GPUs, showcasing the massive speedup of the GPU platform for large-scale first principle simulation.

To demonstrate the relaxation effect in the tMoTe2, we compare the relaxed moiré structures with twist angles 3.89° and 2.88°. First, there is a big variation in the interlayer spacing (ILS) (Fig. 1), indicating a large structural transformation. For tMoTe2 with a twist angle of 3.89° (Fig. 1a), the maximum ILS observed is 7.8 Å. This occurs in the MM region, where the Te/Mo atoms of the top layer are aligned directly above those in the bottom layer, resulting in an energy increase in this area due to the strong repulsion. The minimum ILS is 7.0 Å, which is observed in the MX region where the Mo atoms of the top layer stack over the Te atoms of the bottom layer. Figure 1b shows the ILS for 2.88°tMoTe2 exhibiting a clear domain wall connecting MM regions, which becomes more significant at lower twist angles, as shown in Supplementary Figs. 25.

Fig. 1: Lattice relaxation of interlayer and intralayer distances.
figure 1

The relaxed structures of 3.89 and 2.88tMoTe2 from density functional theory under density-dependent screened Coulomb van der Waals correction. a, b are interlayer distances, determined by the distance between Mo atoms at the top and bottom layers. c, d are intralayer displacements, indicating the in-plane displacements after relaxations for Mo atoms at the top layer.

Concerning the intralayer strain, both structures exhibit similar behaviors. As depicted in Fig. 1c, d), the in-plane displacement pattern displays a helical chirality, with the amplitude intensifying as the twist angle diminishes. We observe a large displacement up to 0.5 Å for θ = 2.88°, which generates a pseudomagnetic field up to 200 T (see Supplementary Note 3).

Symmetry analysis of moiré band structures

The space group of the relaxed structures is P321 (No. 150), whose point group is generated by a twofold rotational symmetry along y axis (C2y), and three-fold rotational symmetry along z axis (C3z). In the crystal momentum space, the C2y symmetry only protects twofold degeneracies at the invariant lines or points within the Brillouin Zone, as defined by the relation C2yk → k. Within this invariant space, the Hamiltonian commutes with the symmetry operation, allowing it to be block-diagonalized into two distinct sectors, each characterized by unique eigenvalues  ± π. Due to the constraints imposed by the symmetry, a band represented by eiπ is inherently degenerate with another band represented by eiπ, forming a doubly degenerate band structure. Consequently, the only lines that encapsulate the C2y symmetries within the two-dimensional Brillouin zone are the ΓK lines (satisfying 2k1 + k2 = 0). When considering the C3z rotational symmetry, the lines that meet the conditions k1 + 2k2 = 0 and k1 − k2 = 0 also emerge as symmetry-invariant lines. As a result, bands along the ΓK and MK lines are always doubly degenerate, while a clear splitting is observed along the ΓM line, as shown in Fig. 2.

Fig. 2: Fitting results from the continuum model.
figure 2

The comparative analysis of the band structures for twist angle a 3.15, b 3.48, c 3.89, and d 4.41, respectively. Blue points/lines illustrate the results from density functional theory calculations, while the black line represents the fitting results from the continuum model.

In Fig. 3, we plot the angle-dependent bandwidth and direct gap using two types of vdW corrections. At twist angle θ = 3.89°, D2 type of vdW correction gives rise to a narrow bandwidth as 12 meV, which is close to previous calculation using local-basis SIESTA package37 and D2 correction21. While under the dDsC type of vdW correction, we obtain the bandwidth as 18 meV, and the overall trend of angle-dependent bandwidth follows a parabolic continuum behavior with a single set of parameter, as we will discuss later. At the smallest calculated twist angle θ = 2.88°, the width of the top moiré valence band reduced to 6 meV.

Fig. 3: Variation of bandwidth and direct band gap under different van der Waals corrections.
figure 3

a Variation in the bandwidth of the first and second moiré bands versus diverse twist angles. b Variation in direct band gap between the first two bands along the high symmetry lines in Fig. 2. The blue and red dashed curves denote the D2 (IVDW-10) and density-dependent screened Coulomb (IVDW-4) van der Waals corrections, while the red solid line represents the results from continuum model.

Complete continuum model

We now introduce a more comprehensive continuum model to depict the moiré band structure. The key low-energy states originate from the hole bands in the K and \({K}^{{\prime} }\) valleys of the two MoTe2 layers. Considering that these valleys are connected through time-reversal symmetry (\({{{\mathcal{T}}}}\)), analyzing one valley is sufficient to infer the band structure. For tTMD systems with rotational (C3z) and layer-exchange symmetry (\({C}_{2y}{{{\mathcal{T}}}}\)), we derive the following form:

$$\hat{H}=\left[\begin{array}{cc}-\frac{{(\hat{{{{\boldsymbol{k}}}}}-{{{{\boldsymbol{K}}}}}_{{{{\boldsymbol{t}}}}}+{{{\boldsymbol{eA}}}})}^{{{{\bf{2}}}}}}{2{m}^{* }}+{\Delta }_{t}({{{\boldsymbol{r}}}})&{\Delta }_{T}({{{\boldsymbol{r}}}})\\ {\Delta }_{T}^{{{\dagger}} }({{{\boldsymbol{r}}}})&-\frac{{(\hat{{{{\boldsymbol{k}}}}}-{{{{\boldsymbol{K}}}}}_{{{{\boldsymbol{b}}}}}-{{{\boldsymbol{eA}}}})}^{{{{\bf{2}}}}}}{2{m}^{* }}+{\Delta }_{b}({{{\boldsymbol{r}}}})\end{array}\right]$$
(3)

with:

$${\Delta }_{t}({{{\boldsymbol{r}}}}) = \, 2{V}_{1}{{\sum}_{i=1,3,5}}\cos ({{{{\boldsymbol{g}}}}}_{{{{\boldsymbol{i}}}}}^{{{{\bf{1}}}}}\cdot {{{\boldsymbol{r}}}}+l{\phi }_{1})+2{V}_{2}{{\sum}_{i=1,3,5}}\cos ({{{{\boldsymbol{g}}}}}_{{{{\boldsymbol{i}}}}}^{{{{\bf{2}}}}}\cdot {{{\boldsymbol{r}}}})\\ {\Delta }_{T} = \, {w}_{1}{{\sum}_{i=1,2,3}}{e}^{-i{{{{\boldsymbol{q}}}}}_{{{{\boldsymbol{i}}}}}^{{{{\bf{1}}}}}\cdot {{{\boldsymbol{r}}}}}+{w}_{2}{{\sum}_{i=1,2,3}}{e}^{-i{{{{\boldsymbol{q}}}}}_{{{{\boldsymbol{i}}}}}^{{{{\bf{2}}}}}\cdot {{{\boldsymbol{r}}}}}\\ A({{{\boldsymbol{r}}}}) = \, A({{{{\boldsymbol{a}}}}}_{{{{\bf{2}}}}}\sin ({{{{\boldsymbol{G}}}}}_{{{{\bf{1}}}}}\cdot {{{\boldsymbol{r}}}})-{{{{\boldsymbol{a}}}}}_{{{{\bf{1}}}}}\sin ({{{{\boldsymbol{G}}}}}_{{{{\bf{3}}}}}\cdot {{{\boldsymbol{r}}}})-{{{{\boldsymbol{a}}}}}_{{{{\bf{3}}}}}\sin ({{{{\boldsymbol{G}}}}}_{{{{\bf{5}}}}}\cdot {{{\boldsymbol{r}}}}))$$
(4)

where \(\hat{{{{\boldsymbol{k}}}}}\) is the momentum measured from the Γ point of single layer MoTe2Kt(Kb) is high symmetry momentum K of the top (bottom) layer, Δt(r)(Δb(r)) is the layer dependent moiré potential, ΔT(r) is the interlayer tunneling, Gi’s are moiré reciprocal vectors, A(r) is the strain-induced gauge field which gives a periodic pseudomagnetic field27,38: \({{{\boldsymbol{B}}}}({{{\boldsymbol{r}}}})=B{\sum }_{i = 1,3,5}\cos ({{{{\boldsymbol{G}}}}}_{{{{\boldsymbol{i}}}}}\cdot {{{\boldsymbol{r}}}})\). \({{{{\boldsymbol{g}}}}}_{{{{\boldsymbol{i}}}}}^{{{{\bf{1}}}}}\) and \({{{{\boldsymbol{g}}}}}_{{{{\boldsymbol{i}}}}}^{{{{\bf{2}}}}}\) represent the momentum differences between the nearest and second-nearest plane-wave bases within the same layer. Similarly, \({{{{\boldsymbol{q}}}}}_{{{{\boldsymbol{i}}}}}^{{{{\bf{1}}}}}\) and \({{{{\boldsymbol{q}}}}}_{{{{\boldsymbol{i}}}}}^{{{{\bf{2}}}}}\) denote the momentum differences between the nearest and second-nearest plane-wave bases across different layers. a1a2a3 are the moire lattice vectors. The relations between different wave vector can be given by \({{{{\boldsymbol{G}}}}}_{1}=\frac{4\pi }{\sqrt{3}{a}_{M}}{(\frac{1}{2},-\frac{\sqrt{3}}{2})}^{T},{{{{\boldsymbol{G}}}}}_{3}=\frac{4\pi }{\sqrt{3}{a}_{M}}{(\frac{1}{2},\frac{\sqrt{3}}{2})}^{T},{{{{\boldsymbol{G}}}}}_{5}=\frac{4\pi }{\sqrt{3}{a}_{M}}{(-1,0)}^{T},{{{{\boldsymbol{q}}}}}_{1}^{1}=\frac{4\pi }{3a}2\sin \left(\frac{\theta }{2}\right){(0,1)}^{T},{{{{\boldsymbol{q}}}}}_{2}^{1}={C}_{3}{{{{\boldsymbol{q}}}}}_{1}^{1},{{{{\boldsymbol{q}}}}}_{3}^{1}={C}_{3}^{2}{{{{\boldsymbol{q}}}}}_{1}^{1},{{{{\boldsymbol{q}}}}}_{1}^{2}={{{{\boldsymbol{q}}}}}_{1}^{1}-{{{{\boldsymbol{G}}}}}_{5},{{{{\boldsymbol{q}}}}}_{2}^{2}={C}_{3}{{{{\boldsymbol{q}}}}}_{1}^{2},\) and \({{{{\boldsymbol{q}}}}}_{3}^{2}={C}_{3}^{2}{{{{\boldsymbol{q}}}}}_{1}^{2}\).

To obtain accurate parameters in the continuum model, we perform large-scale calculations with dDsC vdWs corrections (IVDW = 4), then fit the DFT moiré band structure at 3.15° to obtain the following continuum parameter: m* = 0.62meV1 = 10.3 meV, V2 = 2.9 meV, w1 = −7.8 meV, w2 = 6.9 meV, ϕ1 = − 75°, Φ/Φ0 = 0.737. (Φ0 is the quantum flux, Φ represents the value of flux in the moire unit cell). The continuum model parameters with D2 vdW corrections (IVDW = 10) are presented in the Supplementary Note 4. In our subsequent analysis of the continuum model, we will utilize the parameter from the IVDW = 4, as it provides the more reliable structure relaxation previously discussed. Employing these parameters, we are now equipped to solve the moiré band structures at various small twist angles.

Next, we examine the topology of these moiré bands from 1.6° to 5°. At twist angles below 2.5°, the Chern numbers for the top three bands, as calculated using the continuum model, are 1, − 1, 0, as shown in Fig. 4c (see Supplementary Note 2). For greater twist angles, these Chern numbers change to 1, 1, − 2. We emphasize that the arrangement of Chern numbers for θ > 2.83° is in agreement with experimental data. So far, in all experiments where twist angle θ ranges between 3. 5°–3. 9°, both1,2 the Hall conductance and the reflective magnetic circular dichroism increase once the doping exceeds ν = −1. And a double quantum spin Hall effect has been observed at ν = −4. These results suggest that the second band shares the same Chern number as the first band.

Fig. 4: Band topology of the continuum model.
figure 4

a The bandwidth and the global gap from the continuum model. Gap closure happens between the second and third band around θp ≈ 2. 5. The bandwidth of the first band is denoted with the purple curve, which attains zero at the magic angle θm ≈ 2. 2. b The direct gap between the first and second band and we note it always below the upper bound54. The red and purple curves denote the direct band gap (Edg) and the upper bound (Eub), respectively. The blue curve represents the ratio between Edg and Eub. c The Chern number of the top three bands. There is a topological phase transition around 2.5 owing to gap closure between the second and third bands. d T describes the violation of trace condition, and σF reflects the fluctuation of Berry curvature. We show them as a function of twist angles. The red region denotes the range of angles where fractional quantum anomalous Hall effect are observed in transport experiments.

We additionally verify the trace condition for the uppermost moiré band. The band’s geometry is encapsulated in the quantum geometry tensor:

$${\eta }^{uv}:= {A}_{BZ}\langle {\partial }^{u}{u}_{{{{\boldsymbol{k}}}}}| (1-| {u}_{{{{\boldsymbol{k}}}}}\rangle \langle {u}_{{{{\boldsymbol{k}}}}}| )| {\partial }^{v}{u}_{{{{\boldsymbol{k}}}}}\rangle$$
(5)

where ABZ is the area of the Brillouin zone. The symmetric and antisymmetric parts of the quantum geometry tensor give the Berry curvature (\(\Omega ({{{\boldsymbol{k}}}})=-2{{{\rm{Im}}}}({\eta }^{xy})\)) and quantum metric (\({g}^{uv}({{{\boldsymbol{k}}}})={{{\rm{Re}}}}({\eta }^{uv})\)). To quantify the geometric properties, one can calculate two figures of merits39,40,41:

$${\sigma }_{F} := {\left[\frac{1}{{A}_{BZ}}\int{d}^{2}{{{\boldsymbol{k}}}}{(\frac{\Omega ({{{\boldsymbol{k}}}})}{2\pi }-1)}^{2}\right]}^{\frac{1}{2}}\\ T := \frac{1}{{A}_{BZ}}\int{d}^{2}{{{\boldsymbol{k}}}}\left[tr(g({{{\boldsymbol{k}}}}))-\Omega ({{{\boldsymbol{k}}}})| \right],$$
(6)

where σF describes the fluctuations of Berry curvature and the T quantifies the violation of the trace condition. When both σF and T tend towards 0, it becomes possible to exactly map the Chern band to a Landau-level problem, allowing for an intuitive understanding of the fractional state. We calculate the values of these parameters in relation to the twist angle, as depicted in Fig. 4d.

Transfer learning structure relaxation

In order to resolve the problem of structural relaxation, we adopt the ab initio DeePMD method, which combines the first-principles accuracy and empirical-potential efficiency for large-scale systems36. We begin with 3 × 3 × 1 MM, MX, and XM configurations, along with 28 distinct intermediate transition states, all of which have been relaxed with a fixed volume. For each one of the 31 configurations, we introduced random perturbations to generate 200 distinct structures. The random perturbations are applied to both the atomic coordinates, drawing values from a uniform distribution spanning [−0.01 Å, 0.01 Å], and the lattice constants, guided by a deformation matrix that is constructed from a distorted identity matrix spanning [−0.03, 0.03]. Besides, we conduct the 20 fs ab initio molecular dynamics to gather VASP-calculated energy, force, and virial tensor, which constitute the entirety of the initial training set.

Next, we train the initial neural network model through the initial training set, and run molecular dynamics simulations for different pressures (−100 to 10000 bar) and temperatures (10 to 500 K). A bunch of trajectories are generated in this process, and we label them as the failure, candidate, or accurate configurations according to the model deviation: \({\sigma }_{f}^{\max }=\max \sqrt{\langle {\vert {{{{\rm{F}}}}}_{i}-\langle {{{{\rm{F}}}}}_{i}\rangle \vert }^{2}\rangle }.\) During the process, 3 to 200 candidate configurations will be selected to perform the self-consistent DFT calculations, and the data will be collected for the training process of next-iteration.

Although the neural network model shows effective performance in the IVDW-10 correction, it does not yield successful results in the IVDW-4 correction, largely attributed to the complex dependencies on charge density. To address the issue, we augment our training datasets with comprehensive data from twisted structures of 2.88°, encompassing 118 sets of forces, energies, and virial information, as illustrated in Fig. 5. Leveraging the principles of transfer learning, we strategically froze the parameters within the embedding layers while focusing on training the hidden and output layer. This approach significantly improves the performance of the pre-trained model, enabling it to adapt more effectively to the complexities of IVDW-4.

Fig. 5: Scheme of transfer learning.
figure 5

The initial dataset consists only of non-twisted structures, and we use density functional theory calculations to collect the energy and force data. A neural network is then trained with DeePMD. However, this neural network struggles to accurately predict the forces in twisted structures. To address this, we build another dataset with twisted structures and use the energy and force data from this new dataset to perform a transfer learning based on the original neural network. This new neural network demonstrates a significantly improved ability to predict forces accurately.

Conclusion

In this paper, we delve deeply into the lattice relaxation and single-particle of the tMoTe2 system. We present a comprehensive exploration of the moiré band structure under two types of vdW corrections, where we harness the power of large-scale DFT calculations together with transfer learning and GPU acceleration. Built on angle-dependent moiré band structures, we construct a more complete continuum model including higher-harmonic potential and strain-induced gauge field. Our calculations reveal that, at experimentally pertinent twist angles, the intralayer displacement induces a sizeable gauge field, and top two moiré bands consistently display nontrivial Chern numbers.

The continuum model parameters have a strong impact on interaction-induced phases in tMoTe2, as shown by previous numerical studies21,22,23,24,25,42,43,44,45,46. With the continuum model fitted from D2 type of vdW correction21, integer quantum anomalous Hall effect only appears at large dielectric constant ϵ >1524,25, and \(\nu =-\frac{2}{3},-\frac{1}{3}\) are both found to be FCIs21,43. With the continuum model fitted from dDsC type of vdW correction22,23, the integer quantum anomalous Hall effect has been shown to occur at experimentally studied twist angles24,25, while \(\nu =-\frac{2}{3}\) and \(\nu =-\frac{1}{3}\) are FCI and charge-density wave states, respectively22,25.

Note: Upon the completion of this work, a related work appeared47, which overlaps with some of our calculations with IVDW = 10.

Method

Plane-wave basis first principle calculations

The large-scale plane-wave basis first principle calculations are carried out with Perdew–Burke–Ernzerhof (PBE) functionals using the Vienna Ab initio simulation package (VASP)48,49,50. We chose the projector augmented wave potentials, incorporating six electrons for each of the Mo and Te atoms. During the structural relaxation, we set the plane-wave cutoff energy and the energy convergence criterion to 250 eV and 1 × 10−6 eV, respectively. Larger energy cutoff of 350 and 500 eV has been tested for θ = 3.89°, which leads to less than 1 meV change in the bandwidth of topmost valence band. The structure is fully relaxed when the convergence threshold for the maximum force experienced by each atom is less than 10 meV/Å.

Local-basis first-principles calculations

Apart from the calculation utilizing the plane-wave basis, our DFT study on tMoTe2 has also been performed under the pseudo atomic orbital (PAO) basis. Using the relaxed atomic structures from VASP and DeePMD, we use OpenMX package51,52 with PAOs chosen to be Mo7.0-s3p2d1 (7.0 means the cutoff radius is 7.0 Bohr, s3p2d1 means three sets of s-orbitals, two sets of p-orbitals and 1 set of d-orbitals, summed up as 3 × 1 + 2 × 3 + 1 × 5 = 14 atomic orbitals for each Mo atom) and Te7.0-s3p2d2 to conduct the self-consistent calculation and obtain the band structure. The PBE exchange-correlation functional and the norm-conserving pseudopotential53 are employed in the calculation with single Γk-sampling and convergence criterion no lower than 6 × 10−5 Hartree.

Machine learning workflow

We are using the DeePMD-kit code to train the neural network36. Here, we adopt the two-body embedding smooth edition of the DeepPot-SE descriptor, which is constructed by both angular and radial atomic configurations. The cut-off and smooth radius for neighbor searching are set as 8.0 and 2.0 Å, including a maximum number of 100 Mo and 100 Te atoms. Then, we construct a neural network that maps the descriptors to atomic energy, through three embedding layers and three hidden layers of size (25, 50, 100) and (240, 240, 240), respectively. To measure the quality of the neural network, we construct a loss function by a sum of different root means square errors (RMSE):

$$L\left({p}_{\epsilon },{p}_{f},{p}_{\xi }\right)=\frac{{p}_{\epsilon }}{N}\Delta {E}^{2}+\frac{{p}_{f}}{3N}{{\sum}_{i}}{\left\vert \Delta {{{{\boldsymbol{F}}}}}_{i}\right\vert }^{2}+\frac{{p}_{\xi }}{9N}\parallel \Delta \Xi {\parallel }^{2},$$
(7)

where ΔE, ΔFi, and ΔΞ refer to the RMSE of energy, force, and virial, respectively. During the training process, the prefactor pf decreases from 1000 to 1, pϵ and pξ increase from 0.02 to 1. To improve the efficiency of network training, we adopt an exponentially decaying learning rate to minimize the loss function. After 1,700,000 training steps, the learning rate decreases from 1e−3 to a small value of 3.6e−8.