Background & Summary

Ionic conductivity (σ) is one of the most important properties that is used to characterize materials used for electrochemical applications, such as a battery electrode or an electrolyte1,2,3,4. Typically, ionic conduction is a thermally activated process defined by the Nernst-Einstein equation as,

$$\sigma =\frac{{q}^{2}xD(x)}{{k}_{B}T}$$
(1)

where q and x are the charge and concentration of the intercalant (or the electroactive ion), respectively. D(x) is the diffusion coefficient of the intercalant that varies with x, T is the temperature and kB is the Boltzmann constant. D(x) relates the diffusive flux (J) and the concentration gradient ( x of the intercalating species via the Fick’s first law (J = −D(x) x)5. Further, D(x) can be written as,

$$D(x)={D}_{J}(x)\theta (x)$$
(2)

DJ is the jump diffusion coefficient, which captures all the cross correlations among the individual atomic migrations and θ is the thermodynamic factor that captures the non-ideality of the solid solution (i.e., the interactions between the migrating ions and the host framework). θ is defined as \(\theta =\frac{\partial (\mu /{k}_{B}T)}{\partial lnx}\), where μ is the chemical potential of the migrating ion. In solid electrodes and electrolytes, x is typically the site fraction of the migrating ion. For an ideal solid solution where each ionic hop has an identical hop frequency that is independent of the local concentration/configuration, D(x) becomes,

$$D=fg{a}^{2}\nu \exp \left(-\frac{{E}_{m}}{{k}_{B}T}\right)$$
(3)

g is the geometric factor that determines how the diffusion channels are connected, f is the correlation factor, a and ν, are the hop distance and vibrational prefactor, respectively, and Em is the activation energy of migration. Ion transport within a crystalline lattice occurs through ionic migration events, where an ion moves from its original or interstitial site in a lattice to a neighboring vacant site, via a transition state. The migration process is influenced by the energy landscape encountered by the ion during its movement, with the Em playing a crucial role in determining the ease of ionic mobility and, consequently, the material’s σ.

Extensive research has focused on enhancing ionic conductivity by minimizing Em, as this directly improves the rate capabilities of battery systems. Previous studies have shown the underlying host structure to play a vital role in influencing D, such as the presence of interconnected prismatic sites leading to improved Na+ mobility in P2-type layered structures6. In compositions like LiNiO2, Li off-stoichiometry leading to Ni2+ ions in the Li layers obstructing diffusion pathway can effect the Li-ion conductivity significantly7. Indeed, dopants that stabilize the layered structure, such as Ti4+ have been used to improve Na+ mobility8. In the case of phosphates, nuclear magnetic resonance (NMR)9,10 studies reveal that intercalant diffusivity is not governed by a single, uniform barrier but by a distribution of local energy barriers that are dictated by the arrangement of neighboring transition metal cations11. Additionally, subtle electrostatic distortions that screen electrostatic interactions between the intercalant and the anion framework have been shown to improve intercalant mobility in polyanionic structures12.

Galvanostatic intermittent titration technique (GITT)13,14 measurements in Mn and Fe rich disordered rocksalt structures have revealed the importance of Li-exccess compositions, particle size, and the underlying redox process as some of the important factors that affect the intercalant diffusivity15,16,17. Bonnick et al. illustrated the influence of poor electronic conductivity resulting in strong electrostatic interactions within thiospinel lattices (e.g., MgZr2S4) resulting in a reduction of ionic diffusivity18. In summary, various structural and chemical modifications have been explored across different types of intercalation compounds, including layered, spinel, olivine, polyanionic, and other frameworks, to enhance ionic conductivity1,19,20,21, with some approaches using targeted machine learning (ML) techniques as well22,23,24. However, developing universal optimization strategies across a wide range of intercalation systems remains challenging due to the interplay between structure, composition, and other factors besides the lack of a robust Em dataset that spans a diverse range of materials.

In general, estimating diffusivities or Em using experimental techniques like variable temperature NMR, GITT, and electrochemical impedance spectroscopy (EIS)25,26, are either experimentally challenging or resource intensive. This is due to the extremely short time scales (10−12 s) or small length scales ( ~ few Å) of the elementary process of ionic hopping, influence of surface and structural chemistry of electrodes/electrolytes on the measurement, variations in sample preparation and measurement conditions resulting in differences in interfacial formation, bulk stoichiometry and defects, and specific equipment requirements (e.g., the need for inert ion-blocking electrodes in EIS). Thus, experimental NMR/GITT/EIS data documenting Em is unavailable for a wide range of materials.

Computational methodologies to estimate Em have gained prominence, since calculated Em can be used as a screening metric within high-throughput workflows before experimental validation. Computational techniques include empirical approaches such as bond valence sum (BVS27,28) analysis and nudged elastic band (NEB29,30) calculations (usually based on first principles simulations) or molecular dynamics (MD31,32,33). BVS analysis, though computationally swift, has accuracy limitations as it relies on an ionic bond model, making it more suitable for close-packed lattices with highly electronegative anions34,35. NEB calculations strive to estimate the migration barrier within a potential energy surface (PES) constructed by either density functional theory (DFT36,37) or interatomic potentials by modeling the ionic migration path using intermediate images that are connected by fictitious spring forces and subsequently relaxing the images to identify the saddle point that corresponds to the transition state. NEB calculations when performed in conjunction with DFT typically provide accurate Em. However, the DFT-NEB approach is computationally intensive for large systems (>100 atoms), and its accuracy/convergence can depend on the chosen exchange-correlation (XC) functional within DFT38. Classical MD (based on interatomic potentials) and ab-initio MD techniques can directly estimate D(x) at multiple T, thus yielding Em from Equation (3), but are computationally demanding due to the time and length scales that need to be captured39. Note that ab-initio MD calculations are generally more accurate in estimating Em or D(x) compared to classical MD due to the more accurate PES constructed by first principles.

Some strategies have been explored to reduce the computational costs and constraints, while retaining the accuracy, of both the DFT-NEB and ab-initio MD approaches. For example, the ‘pathfinder’ approach in conjunction with the ‘ApproxNEB’ scheme40 aims to reduce the computational constraints of DFT-NEB by mitigating convergence issues via selection of a ‘better’ initial migration path for calculation. However, the scheme still requires performing a full DFT-NEB calculation and is prone to the underlying constraints of the DFT-NEB approach. Another pathway is integrating ab-initio MD simulations with machine learned interatomic potentials (MLIPs), where the MLIPs can theoretically provide higher computational speeds with the accuracy comparable to DFT41. While several MLIP frameworks that are accurate remain chemistry-specific (i.e., there are high computational costs associated with training the MLIPs)42,43,44, the foundational or universal MLIPs45,46,47,48,49,50 have not been tested rigorously on D(x) or Em predictions, yet. More importantly, we need datasets of Em that are available over a wide-range of chemistries and structures to be able to test universal MLIPs in their utility in predicting Em and/or build ML models that are tailored to accurately predict Em that can be used for screening through materials.

In this work, we present a literature-based curated dataset of 621 DFT-NEB-derived Em values across various compounds that have been studied as electrodes or solid electrolytes in lithium, sodium, potassium, and multivalent ion based battery systems. Our dataset includes fully charged and discharged states of electrode materials, with intermediate (non-stoichiometric) compositions considered in 30 cases. Additionally, we provide structural information for each compound, including the initial and final positions of the migrating ion, along with its corresponding energy barrier, which can be used in the construction of ML models that require structural inputs, such as graph-based models leveraging transfer learning51. Our dataset includes a total of 275 distinct entries contributed by 99 systems exhibiting multiple migration pathways. We envision our dataset to be a powerful resource for the scientific research and industrial communities, enabling the development of robust ML models and MLIPs that can eventually accelerate materials discovery for batteries and other applications.

Methods

We conducted a thorough manual review of battery research articles published over the past two decades to compile computationally estimated Em values. To ensure consistency and reliability, we focused exclusively on DFT-NEB calculated Em values, as this method strikes a balance between accuracy and availability in the literature compared to other computational approaches. We note that the generalized gradient approximation (GGA52) is the most widely used XC functional among DFT-NEB calculations. Other functionals that are commonly used include Hubbard U53 corrected GGA (i.e., GGA+U54), local density approximation (LDA55), strongly constrained and appropriately normed (SCAN56) and its Hubbard U-corrected variant (SCAN+U57). Since GGA (and GGA+U) is computationally less intensive than SCAN/SCAN+U and gives reasonably accurate Em estimations38, GGA (or GGA+U) is the preferred choice for DFT-NEB calculations. Our dataset reflects this preference, with 88.05% of the collected Em values calculated using GGA, followed by GGA+U (7.27%), SCAN (3.07%) and LDA (1.61%). In cases where multiple XC functionals were used to calculate Em within the same or different research articles, we prioritised GGA-calculated values to maintain consistency with the majority of the dataset. Note that for structures where the GGA/GGA+U-calculated Em was not available, we included the Em value as calculated with the different XC functional. Thus, the Em barriers of non-GGA functionals have been used as is, and we have not done any re-calculation of such systems to ensure that all Em have been calculated at the same level of theory. Given that Em is primarily dependent on the local atomic environment, and temperature effects usually contribute to the pre-exponential factors in Equation (3), we do not expect entropic effects (such as configurational entropy) to play a major role in modifying Em and hence do not include any temperature effects within our dataset. Nevertheless, if the presence of additional possible configurations in a structure unlocks new migration pathways, the Em for such pathways can be calculated and the dataset be subsequently expanded.

We considered a research article for inclusion in our dataset only if it satisfied specific criteria ensuring the reliability and completeness of the reported DFT-NEB Em values, i.e., we included studies that provided a comprehensive methodology for Em calculations. Details on the methods we looked for in articles included the XC functional used, the number of intermediate images used to describe the migration pathway, and the supercell size employed in the NEB calculations. Articles that lacked the aforementioned details were excluded to maintain data consistency and accuracy. Structural information was another key criterion for selecting articles, where we only considered studies that provided explicit description of the parent structure(s), including the space group(s) and lattice parameters. In case of multiple articles reporting different Em values for a given structure, we included the paper (and the corresponding Em) with the most details on the structure and the methods. If the structural parameters were missing and could not be retrieved from related works or structural repositories, we excluded the corresponding study from the dataset. For materials exhibiting multiple migration pathways, we ensured that the reported Em were appropriately distinguished for each path via clear descriptions of the corresponding pathways. In cases where Em values were not explicitly stated in text but were presented through minimum energy pathway (MEP) plots or other visualizations, only articles with clear, well-labeled plots featuring correct units and axis scales were selected to ensure accurate digital data extraction.

Workflow

Figure 1 presents an overview of the workflow for generating the structural information for our dataset. We analyzed the collected structural information for each datapoint, and ensured that the target structures were download from either the inorganic crystal structure database (ICSD58) or the materials project (MP59). If a target structure was available in both ICSD and MP, we downloaded the computationally relaxed structure from the MP. While DFT relaxation, such as the calculations performed within MP, can change the underlying lattice parameters and ionic positions compared to an experimentally refined crystal structure, we prioritized DFT-relaxed structures so that our dataset is internally consistent with DFT-calculated Em. For all electrode materials, where possible, we downloaded the structure of the discharged composition (i.e., structures with high concentrations of intercalant ions, relevant for electrode materials) preferably over the charged composition.

Fig. 1
figure 1

Flowchart illustrating the structural data generation process for each data point in the database. Rectangle and diamond symbols represent processes and decisions, respectively. GS refers to ground state. Relax refers to the structural relaxation calculation done with DFT.

If the target structure was not available in either ICSD or MP, we searched for a parent structure that shared the same space group, migrating ion concentration, and site occupancies, but containing ions that are different from the target structure in both ICSD and MP. The parent structure was then used as a template, where we substituted the occupant ions with the target ions and used the reference lattice scaling (RLS60) scheme, as implemented by the RLSVolumePredictor class in the pymatgen package61, to obtain a target structure with scaled lattice parameters and the right composition. If target structures were available but contained sites with partial occupancies, we enumerated all possible symmetrically unique configurations that satisfied the target stoichiometry of the reported structure by using the OrderDisorderStructureTransformation class in pymatgen. For all the RLS-generated and/or enumerated structures that we obtained, we performed structure relaxations with DFT to obtain the ground state (i.e., lowest energy) configuration.

For all structure relaxations with DFT, we used the GGA XC functional as available in the Vienna ab initio simulation package (VASP, version 6.1.2)62,63. The effects of core electrons were described via projector augmented wave potentials64. We relaxed the lattice vectors, cell volume, and ionic positions of all input structures, without preserving any symmetry, until the total energy and atomic forces were below 0.01 meV and |0.03| eV/Å, respectively. We used a Γ-centred k-point mesh with a density of at least 48 k-points per Å−1 for sampling the irreducible Brillouin zone, a kinetic energy cutoff of 520 eV for describing the plane-wave basis set, and a Gaussian smearing of width 0.05 eV to intergrate the Fermi surface. For select structures that exhibited convergence difficulties with GGA, we performed the structural relaxations with GGA+U, utilizing the optimized values reported in Ref. 54 All our structures are compatible with the structure relaxation calculation settings of MP (version 2020).

Upon obtaining the ground states for all systems in our dataset, we reviewed the migration path information reported in the corresponding research articles and initialized the initial and final configurations of the migration path within each structure. Note that the initial configuration represents the migrating ion occupying the starting site while leaving the destination site vacant, while the final configuration depicts the inverse arrangement. Thus, we have assumed that all migration events for all structures considered in our dataset occur via a vacancy-based mechanism and not an interstitial-based mechanism. All the solid electrolyte materials in our dataset are stoichiometric and ordered compositions, hence we treated them as equivalent to discharged compositions of electrode materials in our workflow.

For electrode structures where the Em was reported for a charged composition (i.e., the dilute ion limit), we removed the intercalant ions from the ground state discharged structure and subsequently relaxed the structure using DFT. The initial and final configurations were then defined in this relaxed charged structure and were DFT-relaxed again to obtain their true ground state descriptions. In the case of intermediate intercalant compositions, all symmetrically distinct positional configurations corresponding to the composition were enumerated, followed by DFT relaxation to identify the ground state, and the corresponding pathway initializations were carried out in the ground state. For generating all initial and final configurations, we selected appropriate supercell sizes to ensure that the migrating ion does not experience spurious interactions by maintaining a minimum distance of at least 8 Å with its periodic images. In structures with large unit cells, such as NaSICONs (sodium superionic conductors), weberites, and oxyfluorides, we did not generate supercells to reduce computational costs.

Data Records

The dataset is available at our GitHub65 and Zenodo66 repositories, with this section describing the availability and content of the data. We report computationally calculated Em of 621 systems that have been explored as battery materials along with their structural information for each possible ionic migration event. The data can be easily downloaded in the form of a JSON file from our GitHub65 or Zenodo66 repositories.

File format

Each datapoint in the dataset is associated with specific tags that provide essential information, as summarized in Table 1. For each datapoint, we include its composition, crystal class, space group, unique system identifier, and a JSON ID that differentiates each datapoint within the database and allows easier access to different migration paths within a given structure. We assign the system name for each datapoint using a standardized format: reduced_chemical_formula + ‘_’ + path_number. For instance, MgCoSiO4 has two possible migration paths for Mg2+ diffusion, so we represent each path as MgCoSiO4_1 and MgCoSiO4_2. However, if a composition has only one active migration path, it is identified using only the reduced chemical formula. Parentheses and subscripts are omitted in the system name generation.

Table 1 Description of each tag associated with the datapoints in the Em dataset.

When multiple polymorphs of the same composition exist, the system name is modified to include the space group: reduced_chemical_formula + ‘_’ + space_group + ‘_’ + path_number. Additionally, for layered structures where both monovalent and divalent hops are considered67, we treated both hops as distinct migration paths. Charged state structures are labeled using the format: charged_state_reduced_chemical_formula + _ + intercalant, allowing clear differentiation from the discharged state configurations. In order to be compatible with the notations used in the original papers, certain prefixes, such as O3:O3-layered structure, O:olivines, M:maricite, d:δ, e:ϵ, g:γ, b:β, a:α, and a1:α1, were added to the corresponding system names to ease the identification. We did not modify our notation for intermediate compositions as compared to the discharged compositions.

The dataset provides structural information for both the initial and final configurations of each migration path in the ‘POSCAR’ format, which is compatible with pymatgen and the atomic simulation environment (ASE68) packages, enabling easy conversion into other structural representations. Each datapoint also includes a “bibtex” tag, which contains the citation details of the article from which the Em values and the migration path information were sourced. Additionally, the XC functional used to calculate the Em in the respective study is provided under the “XC” tag for reference. We acknowledge that other calculation settings, such as the potential used for describing core electrons, the mesh density and scheme used for sampling k-points in the irreducible Brillouin zone and integrating the Fermi surface, and convergence criteria employed on energies and forces, can influence the calculated Em as well, albeit to a lesser extent compared to the XC functional. To capture such DFT calculation settings in the dataset, we provide the metadata for both structural relaxations and NEB calculations, as detailed in the respective articles, in the form of a dictionary under the tag “calc_metadata”. If any specific information is not available, we populate the corresponding entry with a ‘NaN’.

Data Overview

The distribution of Em across the seven crystal systems, and the corresponding space groups within each crystal system, in our dataset is illustrated as a contour plot in Fig. 2. Each crystal system is visually distinguished using color-coded sectors in Fig. 2, with the solid concentric rings representing different Em values (in eV). Overall, our dataset includes Em values spanning 58 space groups and ranging from 0.03 eV to 8.77 eV. Out of the 621 datapoints, 528 are electrodes and 91 are electrolytes. The lowest Em (0.03 eV) corresponds to charged-LiTiO2 (P42/mnm), while the highest (8.77 eV) is observed in discharged-LiRuO2 (Pnnm), respectively. Notably, both LiTiO2 and LiRuO2 adopt the rutile-type structure and consequently exhibit the widest range of Em values in the dataset. In contrast, space groups \(Ia\overline{3}d\) and I41/acd, which contain two and three datapoints, respectively, show the narrowest range of Em values.

Fig. 2
figure 2

Contour plot illustrating the distribution of the Em dataset over different space groups from each of the seven crystal systems. Individual colored sectors represent individual crystal systems, with space groups indicated by text notations. White circles indicate invidual data points. Concentric circles correspond to different Em values (in eV), as highlighted by the blue text notations.

The space group \(Fd\overline{3}m\), corresponding to cubic spinels, has the highest number of datapoints, contributing 94 entries to the dataset. It is followed by the Pmna space group (72 datapoints), belonging to the orthorhombic system, and the P1 space group (39 datapoints) from the triclinic system. Additionally, 15 space groups contribute only a single datapoint each. Among the different crystal systems, orthorhombic accounts for the highest (205) number of datapoints, whereas hexagonal (6) contributes the least. The dataset exhibits a mean Em value of 0.848 eV with a standard deviation of 0.824 eV, indicating a highly skewed distribution. A majority (73.4%) of the Em values are below 1 eV, while 19.4% fall within the 1-2 eV range, and 7.2% exceed 2 eV. 71.4% and 23.6% of the entries are contributed by discharged (high intercalant content) and charged (low intercalant content) state structures respectively, corresponding to 106 distinct charged and discharged pairs. Intermediate intercalant compositions correspond to 5% of the dateset.

Figure 3 presents a bar chart that illustrates the number of datapoints and the range of Em values across 27 different structural groups (e.g., spinels, olivines, NaSICONs, etc.). Each bar is stacked to represent contributions from nine different intercalating ions, with the stack length indicating the number of datapoints associated with each ion. The inset pie chart provides a breakdown of the percentage contribution of each intercalating ion to the overall dataset. Additionally, the solid square and circle markers, connected by a vertical line, denote the maximum and minimum Em values observed within each structural group, thus representing the range of Em.

Fig. 3
figure 3

Illustration of the Em distribution within each of the different structural groups. The y axis on the left and right represent the number of datapoints and Em values (in eV), respectively. Stacked bar charts correspond to the counts within each structural group, with the colors indicating the split across various intercalants. The black vertical lines represent the range of Em values for a given structural group with the squares and circles representing the maxima and minima, respectively. The inset shows a pie-chart with the contributions from each intercalant (i.e., Al, Ca, K, Li, Mg, Na, Rb, Sr, and Zn, as represented by the different colors) to the total dataset.

Lithium-based intercalant compounds constitute 28.27% of the dataset, making Li the most prevalent intercalating ion, which is expected given the extensive research done on Li-based electrodes and solid electrolytes. Li is followed by calcium (Ca), sodium (Na), and magnesium (Mg) in terms of contribution. The least represented intercalants are strontium (Sr), aluminum (Al), and rubidium (Rb), with only 2, 7, and 8 datapoints, respectively. 17 out of the 27 structural groups include compounds intercalating Li, whereas Ca-based compounds are primarily found in layered structures, NaSICONs, oxides, and weberites. Na-based compounds are more widely distributed than Li, appearing in 19 of the 27 structural groups, while Mg-based compounds are predominantly found in spinel chalcogenides, layered structures, and orthosilicates. Layered structures contribute the highest number of datapoints, with 98 entries, followed by spinel chalcogenides, phosphates, and oxides. Other structural groups, such as alluaudites, Prussian blue analogs, and carbonates, are also represented but in significantly smaller numbers. The highest and lowest Em ranges among the structural groups are observed in rutiles and thiosulphates, respectively.

Technical Validation

A benchmarking of DFT-NEB Em (using different XC functionals) against experimentally reported values has been performed in our previous work38. To estimate Em, we fully relaxed the endpoint geometries representing the initial and final states of migration using DFT. Subsequently, the MEP was initialized by linearly interpolating both atomic positions and lattice vectors to create seven intermediate images between the endpoints, with a spring force constant of 5 eV/Å2 maintained between adjacent images. The images constituting the NEB were optimized along the reaction coordinate using the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)69 method until the force component perpendicular to the elastic band fell below |0.05| eV/Å. The total energy of each image was also converged to within 0.01 meV. All Em values were determined assuming a vacancy-mediated mechanism in the dilute-vacancy limit.

On comparing the calculated Em against experimental values, we observed the SCAN functional to exhibit a higher accuracy on an average relative to other XC functionals. Albeit the higher accuracy of SCAN is counterbalanced by increased computational expense and potential convergence problems. Also, we found GGA to be a suitable alternative for quick and qualitative Em predictions.

Approximately 15.8% of the dataset has been calculated from scratch using the GGA or SCAN XC functional (depending on the computational feasibility for the respective system), using the above described methodology and have also been reported in other works38,70,71,72,73,74. The remaining 84.2% of the dataset has been collected from various literature sources that report DFT-NEB methodologies similar to the above description. In certain cases, articles that used fewer number of images (3 or 5) were also considered if a fully convereged MEP was reported.

Usage Notes

We present a literature-curated DFT-NEB-calculated dataset comprising 621 distinct Em values over 443 chemistries and 27 distinct structural groups, spanning a diverse set of electrode and solid electrolyte materials studied for battery applications. This dataset, which includes structural information and calculated Em values for each system, is provided in both .xlsx and JSON formats for easier and direct data extraction within our GitHub65 repository. The JSON version of the dataset is also available on Zenodo at the DoI66. The following code snippet demonstrates the conversion of structural data from the dataset into pymatgen and ASE objects for subsequent analyses. Our dataset can be effectively utilized to construct ML models for Em estimation, using either structural, compositional, or combined inputs. The snippet illustrates that the dataset can be imported directly into a pandas DataFrame, enabling its use in further machine learning model development, using libraries such as ‘scikit-learn’ and ‘pytorch’.

Conversion of JARVIS structure to pymatgen and ASE objects.

We are hopeful that the dataset will be expanded in the near future with calculated Em contributions from the scientific community. We have provided instructions on contributing to the dataset in our Zenodo repository, which will be used for maintaining version-control as well. With further expansion, the dataset can be employed to develop accurate machine learning models capable of replacing on-the-fly DFT-NEB estimations of Em, which would significantly accelerate kinetic Monte Carlo (kMC) simulations75. In turn, faster kMC simulations can bridge the gap between the high accuracy of DFT and the long timescales accessible by kMC, ultimately providing a powerful tool for quantifying ion transport dynamics. Additionally, the dataset will be suitable for fine-tuning pre-trained foundational models and for benchmarking the performance of various universal MLIPs on Em prediction tasks.