Introduction

The discovery of novel materials drives industrial innovation1,2,3, although the pace of discovery tends to be slow due to the infrequency of “Eureka!” moments4,5. These moments are typically tangential to the original target of the experimental work: “accidental discoveries”. Here we demonstrate the acceleration of intentional materials discovery—targeting material properties of interest while generalizing the search to a large materials space with machine learning (ML) methods combined with experiment in a feedback loop. We demonstrate a closed-loop joint ML-experimental discovery process targeting unreported superconducting materials, which have industrial applications ranging from quantum computing to sensors to power delivery6,7,8,9. By closing the loop, i.e., by experimentally testing the results of the ML-generated superconductivity predictions and feeding data back into the ML model to refine, we demonstrate that success rates for superconductor discovery can be more than doubled10. In four closed-loop cycles, we discovered an unreported superconductor in the Zr-In-Ni system, re-discovered five superconductors unknown in the training datasets, and identified two additional phase diagrams of interest for superconducting materials. Our work demonstrates the critical role experimental feedback provides in ML-driven discovery, and provides definite evidence that such technologies can accelerate discovery even in the absence of knowledge of the underlying physics.

Statistical approaches have long aimed to better understand and predict superconductivity11, most recently through the use of black-box ML methods12,13,14,15,16,17,18. Although resulting in numerous predictions, these studies have not yielded previously unreported families of superconductors, likely not only because of difficulties in extrapolating beyond known families, but also because the predicted materials have chemical attributes that make them unlikely to be superconducting—whether it is highly localized chemical bonding, e.g., those containing polyatomic anions, or an extreme metastability that precludes synthesizability. Further, existing works have treated materials and databases of material properties as fixed snapshots rather than evolving systems, which limits the ability of ML models to learn over sparse data.

Here we report on combining ML techniques with materials science and physics expertise to “close the loop” of materials discovery (Fig. 1). We demonstrate how to make ML models generalize across diverse materials spaces, to identify superconductors that are dissimilar to ones in the training corpus. By alternating between ML property prediction and experimental verification, we are able to systematically improve the fidelity of ML property prediction in regimes sparsely represented by existing materials databases. Crucially, this adds both negative data (materials incorrectly predicted to be superconductors) and positive data (materials correctly predicted) to ML training, enabling the ML model’s overall representation of the space of materials to be iteratively refined. The result is a ML model for predicting superconductivity that doubles the rate of successful predictions10, demonstrating the acceleration of materials discovery by combining human and machine insight.

Fig. 1: The closed loop discovery process.
figure 1

Starting from curated experimental data of known superconductors (1), compositional information is first transformed into a representation suitable for learning using the RooSt27 framework (2). After initial training of the ML model (3), we provide new compositions not known to the ML model from other sources, and obtain predictions of superconducting behavior (4). The synthesizability of these predictions is assessed using a combination of computational thermodynamic data and expert insight (5). Materials downselection (6) occurs with human input based on multiple criteria to maximize the impact experimental work has on model improvement and related factors. Chosen materials are then synthesized and structural and physical properties measured (7). Results are then fed back into the learning process, in addition to generating discoveries. Further details on the closed loop process are in “Closed-loop discovery process” in Methods. RooSt images used with permission, CC-BY-4.0 license27.

Our process uses active learning19 to iteratively select data points to be added to a training set. In particular, we select materials that are both predicted to be high transition temperature (Tc) superconductors and are sufficiently distinct from known superconductors. We also leverage human domain expertise to further refine selections. When the predictive model incorrectly predicts non-superconductors as superconductors, this valuable negative data helps refine the model’s prediction surface.

A key attribute of our work is that the training data used in the ML models is not static, but evolves as the closed-loop process proceeds. A ML model that is employing a closed loop framework, actively sampling regions of previously unexplored spaces of materials, and continually acquiring new data cannot have a concise picture of convergence, and it is changed with every loop. Thus instead of a traditional convergence metric (e.g., looking for a flattening of loss versus number of training epochs for a convolutional neural network), we leverage goal-based metrics—when the model successfully predicts superconductors not in the training set or the human in the loop assesses that model outputs are sufficiently distinct and chemically plausible from prior predictions. This helps avoid model overfitting by terminating the process earlier than a traditional metric, while maximizing the usefulness of the new experimental data to further refine the model.

Utilizing this iterative “closed-loop” approach, we rediscover five known superconductors outside of the ML model’s training set, Table 1. These materials come from a wide variety of families: iron pnictides, doped 2D ternary transition metal nitride halides, and intermetallics, Table 2. We then further report the discovery of a previously unreported superconductor in the Zr-In-Ni phase diagram, and identified two other phase diagrams of interest (Zr-In-Cu and Zr-Fe-Sn).

Table 1 Superconductors rediscovered by machine learning.
Table 2 Distribution of Tc values in SuperCon.

Results and discussion

Model generation

For the initial prediction step of the closed-loop approach, we trained an ML model to predict the superconducting transition temperature, Tc, of candidate materials. Our primary source of training data, SuperCon20, contains compositions of known superconductors. Only the materials’ compositions were used to train the ML model for predicting Tc since SuperCon did not contain additional structural information. Materials Project (MP)21 and Open Quantum Materials Database (OQMD)22, some of the largest public sets of computational materials data, supplied candidate compositions to be screened for superconductivity. These two databases do not contain any Tc data. These three datasets are visualized in Fig. 2 using a joint representation. Crucially, the amount of data for which we have superconducting information is much smaller than our other sources of data and is not uniformly sampled across the joint space.

Fig. 2: Training data sparseness and finding non-derivative superconductors.
figure 2

Histograms of the concentration of materials from a Uniform Manifold and Projection (UMAP)60 embedding of OQMD (without superconductivity information), MP (without superconductivity information), and SuperCon (superconductivity information), based on Magpie30 descriptors for the datasets. The embedding is learned from concatenation of Magpie descriptors obtained from all three datasets; the same axis limits are used across each subplot. These maps show the sparseness of knowledge of data about superconductivity compared to that of all known and predicted compounds in these open databases. Tc is not part of the Magpie descriptors and, therefore, did not influence the representation. The five black symbols indicate rediscovered superconductors (Table 1), and the red symbol our superconductor, near “ZrNiIn4''. The inset on the right highlights the local region in which “ZrNiIn4” is found, which is sparse and far from the known and rediscovered superconductors.

It is well-known23 that when ML methods make predictions on data outside of their training data distribution, accuracy often suffers; this is often called the out-of-distribution generalization problem. In cheminformatics24, it is common to assess whether a dataset is within the distribution of a training dataset by seeing how far, in some representative metric space, its points are from the training dataset: as the difference between the distribution of new data and the training data increases, the likelihood that a model will accurately predict their properties decreases. To improve assessment of generalization, out-of-distribution data may be simulated by creating validation sets that split based on non-random criteria like Murcko scaffold25 or cluster identity, the latter being the leave-one-cluster-out cross-validation (LOCO-CV) strategy26.

In “Model Validation” in Methods, we apply LOCO-CV in a simulated superconductor-identification problem. We show that, although a strong ML model is capable of fitting the training set well and generalizing to out-of-distribution test data, it fails to make accurate predictions of superconducting status on out-of-distribution data. Because existing superconductor datasets are not sufficient to enable accurate identification of unreported superconductors, this motivates the need for multiple iterations of model training, candidate selection, candidate synthesis, and model retraining.

We rely on a recent ML model for chemical property prediction, Representation learning from Stoichiometry (RooSt)27 (see “Computational Methods and Uncertainty” in Methods and the SI), to predict a material’s superconductivity using only its stoichiometry (i.e., ignoring the material’s crystal structure). Although not as immediately powerful as approaches incorporating structural information16,28,29, it enables greater predictive sensitivity because materials compositions can be tested without knowledge of the structure.

Superconductivity-specific considerations

After training an ensemble of RooSt models using the SuperCon database, we apply them to our set of potential superconductors (i.e., MP and OQMD). We filter for materials likely to be high-Tc superconductors, and then selected materials are synthesized and characterized, enabling the ML model to be retrained in further loop iterations.

A risk of searching for superconductors from a static list of candidates is that while a material in MP or OQMD may not have the exact composition as a superconductor, it may have a composition extremely close in terms of stoichiometry, such as MgB2 vs. Mg33B67. Thus, every time we produce a new list of candidates, we identify each candidate’s minimal Euclidean distance, in Magpie-space30, to any point in our training data, and we remove candidates too close to SuperCon.

It is not practical to experimentally verify all ML predictions. The costs associated with fabricating and characterizing a new material are high; hence we are only able to experimentally analyze a small subset of the ML predictions.

The MP and OQMD databases both contain calculated stability information not used by the ML model. Of 190 predicted superconductors in a given prediction round, only 39 compounds were calculated to be stable (Eoverhull = 0.00 eV/atom) but 83 were nearly stable (Eoverhull < 0.05 eV/atom). Stable materials and those with prior experimental reports were prioritized to increase the likelihood that targeted compounds could be successfully synthesized. Prioritizing these materials ensured that failures to observe superconductivity were indicative of the behavior of the targeted compound rather than a failure to synthesize that compound.

Insulating materials like β-ZrNCl and the cuprates superconduct with high Tcs because they can be doped into a metallic state31. One long-running challenge for machine-learning approaches to predicting high-Tc superconductivity is that large bandgap insulators incapable of superconductivity tend to be given overweighted classification scores, likely due to the high Tcs of the cuprates16. Therefore, metals and easily doped materials were favored for testing. Similarly, for some predicted metals, we investigated nearby compounds with similar structures that were known in literature but were not found in MP or OQMD (e.g., Zr3Fe4Sn4 and Hf3Fe4Sn432,33) and isostructural compounds with promising band structures (e.g.,: ZrNi2In).

Since the Tcs of compounds are very sensitive to alloy disorder and lattice parameter, we explored several compositions near each prediction34. We also considered the ease and safety of synthesizing the target materials (e.g., by excluding extremely high-pressure syntheses). Powder X-ray diffraction (XRD) was used to ensure that the target material was successfully synthesized and temperature-dependent AC magnetic susceptibility was used to screen for superconductivity. Superconductors are perfectly diamagnetic below their Tc with minimal applied field.

Material candidate experimental verification

To illustrate the sensitivity of experimentally-measured Tcs to processing conditions, we made and tested samples with A3B stoichiometry (Fig. 3a), including many known superconductors from the A15 family35. Similar compositional sensitivity is common in other systems beyond A15 compounds. For example, as x varies between 0 and 0.35, La2−xSrxCuO4 can vary from not superconducting to having a Tc up to 36 K15. Our experiments show that high-throughput synthesis and characterization techniques can reliably and quickly screen systems for superconductivity. Optimization of many superconducting phases requires much lower-throughput techniques for preparing phase-pure and fully-superconducting samples.

Fig. 3: Experimental data for feedback and discovery.
figure 3

a Evaluation of our high-throughput synthesis of compounds with A3B stoichiometry (including A15 compounds) demonstrates the effects of processing on the measured Tc and our ability to positively identify superconductors quickly. For the superconductor in the Zr-In-Ni phase diagram, samples of various compositions were tested (b and c). The size of the datapoints in (c) is the fraction of the superconducting phase present as estimated from the magnitude of the transition in magnetization between 5 and 10 K (orange region). This transition was distinct from the indium-related transition (green). The compositions of samples with the strongest superconducting signals cluster near the composition “ZrNiIn4''. The metastability of the superconducting phase precluded isolation as a single phase.

Using this closed-loop method and high-throughput synthesis, we re-discovered five known superconductors that were not represented in the ML training dataset. A list of these is found in Table 1. Alongside these successful predictions, the ML model also returned compositions that experts could readily identify as not superconducting candidates. Therefore, it was important to compare the successful prediction rates of the combined human expert-machine approach and the machine-only approach. If one considers all predictions (including those not identified as promising by the human in the loop), the rate of discovery is 5/190(2.6%), comparable to expert-driven success rates of (3%)10. When materials that experts quickly identified as not realistic superconductors were excluded (the human-machine combined approach), the successful prediction rate rose to 5/65(7.5%), more than double that of previous expert-driven approaches10. This is particularly remarkable given the chemical diversity in the predicted candidates.

We were then able to use this ML model to discover unreported superconductors. Specifically, we find a superconducting phase in the Zr-In-Ni system, with a Tc of ~9 K (Fig. 3b, c and Extended Data) and approximate composition ZrNiIn4. No other known elements, binaries or ternaries in the Zr-In-Ni system would explain a superconducting transition temperature this high and the elements and binaries have been extensively investigated12,35,36,37. Unfortunately, the phase responsible for superconductivity is extremely metastable, and we have not yet found a synthesis route to obtain it in single phase form (see SI).

Conclusions

We have presented the first ever “closed-loop" ML-based directed discovery of a superconductor with experimental verification (within the Zr-Ni-In system), identified two additional systems of interest (Zr-Cu-In and Zr-Fe-Sn), and rediscovered five others not represented in our ML training set.

Past revolutionary discoveries tended to happen by serendipity, finding something in material families outside of what was known at the time. Our approach, relying only on stoichiometry and a measure of “distance” from what is currently known, is more likely to find unreported materials of interest and a sense of where unexplored but promising materials lie compared to ML-guided approaches that proceed within only a given family of materials.

This approach improves performance with experience, in that with every closing of the loop, the ML model undergoes feedback and refinement, enabling efficient exploration of materials space. These improvements ultimately will reduce the cost of materials development and discovery. The success of this approach has been demonstrated by discoveries and rediscoveries coming from vastly different families, illustrating the potential of this tool for the discovery of materials with targeted properties. This methodology can be expanded to target more than one desired property, and applied to domains beyond superconductors as long as a mechanism for new data acquisition based on ML-based predictions can be leveraged.

Further, we engaged in only a small number of total prediction/experimental measurement iterations; to maximize the superconducting transition temperatures of superconductors discovered over further iterations, we can use acquisition functions developed for Bayesian Optimization38,39. Our approach retains a human-in-the-loop for synthesizing and characterizing materials, but further automation is possible, involving, e.g., ML systems selecting experiments to be conducted, or robot-powered self-driving laboratories40,41,42. Thus we demonstrate a viable approach of these methods to accelerate materials discovery.

Methods

Data

Our initial data source containing the superconducting transition temperature, Tc of many known compounds is the SuperCon database20, published by the Japanese National Institute for Materials Science. More details and analyses about SuperCon are available in the SI.

In this work, we use the version of SuperCon released by Stanev et al.12, available online. This contains 16,414 material compositions and associated critical temperature measurements. However, some of these compositions are invalid (e.g., Y2C2Br0.5!1.5) and were removed prior to analysis. Our final training dataset has 16,304 valid compositions. In the Extended Data and the SI, we give additional detail about our training dataset. Supplementary Fig. 1 shows the distribution of Tc values in our training data—note that the distribution is weighted toward low-Tc compositions.

We use MP21 and OQMD22 as the set of candidates to screen with ML for superconducting potential. MP and OQMD are some of the largest public sets of computational materials data. Their records contain full crystallographic information for material structures, along with some associated electronic and mechanical properties (but not, importantly, Tc). We scraped MP for material records present in it as of October 2020 using the MPRester class from the pymatgen43 package, obtaining 89,341 unique compositions. We later downloaded the entire OQMD v1.4 database, obtaining 252,978 unique compositions. The Extended Data contains a table of MP and OQMD material IDs used in this study.

Computational methods and uncertainty

RooSt27 is a graph neural network44 that relates material composition to properties by applying a message-passing scheme45 to a weighted graph representation of the composition’s stoichiometry, producing a real-valued embedding vector. To make a prediction, this embedding is then passed through a feedforward network.

In this work, we make use of the publicly-available implementation of RooSt, which is implemented in PyTorch46. Furthermore, we use the default hyperparameters recommended by the RooSt authors, including basing the initial species representation vectors on the matscholar embedding47. Since we seek materials likely to be high-Tc superconductors, and we expect RooSt’s classification model to poorly generalize on out of distribution data, we filter for materials predicted to be in the highest Tc tertile (Tc ≥ 20 K) with a classification score of at least 0.66 (see SI).

RooSt models incorporate two sources of uncertainty in their Tc predictions: We account for aleatoric uncertainty (randomness of input data) by letting a model estimate a mean and standard deviation for each label’s logit48, and we incorporate epistemic uncertainty (error in the model’s result, itself) by averaging over an ensemble of independently trained RooSt models49.

Problem formulation

We formulate our prediction problem as an uncertainty-aware classification task. As shown in Supplementary Information, the distribution of Tc values in SuperCon is skewed, with a large number of materials having Tcs close to 0 K. Although we could have used a regression approach and had models estimate Tc directly, the skewed and heavy-tailed Tc distribution instead prompted us to discretize Tc into three categories, based roughly on tertiles: materials with a measured Tc less than 2 K, materials with a Tc between 2 K and 20 K, and materials with a Tc above 20 K. This is similar to earlier work by Stanev et al.12, who use a two-stage prediction approach where they first classify whether a material has a Tc of greater than 10 K. Depending on the specifics of the target property, our closed-loop discovery process can be used with other ML prediction formulations as well.

In this work, we characterize the similarity between material compositions using both the RooSt latent embedding (for predicting material properties) and via Euclidean distance applied to a material composition’s Magpie30 representation, for determining if superconductor candidates are not sufficiently different than known superconductors to be considered a discovery. The choice of metric is not critical, as it is imposed simply to help broaden the range of materials space explored. Other works have considered alternative mechanisms for material similarity, just as using representations based on element fractions50 or the earth mover’s distance51. Our discovery process does not rely on use of a specific similarity measure and can adopt other measures as desired.

Model validation

SuperCon provides data as a validation experiment for our model—can RooSt successfully predict the Tc tertile of unknown materials? We evaluate this question in two settings; the first under a standard uniform cross-validation (Uniform-CV) split of SuperCon, and the second with the LOCO-CV strategy26. In this approach, we apply K-means clustering to the Magpie30 representation of SuperCon and then train K RooSt models, iteratively holding out each cluster as a test set. Since the clustering will put materials that are similar to each other in the same cluster, LOCO-CV is a better proxy for assessing how well our model will perform when used to identify superconductor candidates in MP.

In this study, we set K = 3 for the clustering and summarize cluster characteristics in Table 3 and Fig. 4. Note that even this simple clustering procedure has produced inter-cluster heterogeneity—e.g., Cluster 0 is significantly smaller than the other clusters, and Cluster 1 has the bulk of the 20 ≤ Tc superconductors.

Table 3 LOCO-CV clustering.
Fig. 4: Statistics of Tc.
figure 4

Statistics of Tc across clusters used in the LOCO-CV study, obtained from Stanev et al.12’s version of SuperCon.

In Figs. 5 and 6, we show the results of our study. In the Uniform-CV setting, our model does well—it shows little evidence of overfitting and performs well for all three Tc categories. However, in LOCO-CV, performance degrades significantly and is also much more variable, based on what cluster is being used as the test set. Our result here echoes12, who show that models trained only on iron-based superconductors fail to accurately predict properties of cuprates, and vice versa.

Fig. 5: Training vs. test accuracy.
figure 5

Training and test set accuracies for uniform cross-validation (Uniform-CV) vs. LOCO-CV, averaged over each fold and cluster. Bars show 95% confidence intervals for the standard error of the mean estimate. The model severely overfits in the LOCO-CV case, and its test set accuracy is much more cluster-dependent and variable.

Fig. 6: Test set precision and recall.
figure 6

Test set precision and recall analysis for each Tc category for the uniform vs. LOCO-CV study, averaged over each fold and cluster. Bars show 95% confidence intervals for the standard error of the mean estimate. The model’s metrics are much more variable and cluster-dependent for the LOCO-CV model.

These results indicate that we should not expect an ML model trained only on SuperCon to consistently identify superconductors in out-of-distribution data, and, as points in SuperCon are more similar to each other than points in MP and OQMD (Fig. 2), the LOCO-CV results here are optimistic compared to our actual problem of interest. This motivates our need for multiple iterations of model training, candidate selection, candidate synthesis, and model retraining.

Closed-loop discovery process

The initial loop iteration used Stanev et al.’s version of SuperCon12 as training data (“Data”). After the Tc-prediction model was trained, candidates were selected from MP21 based on predicted scores (“Computational Methods and Uncertainty”). The second loop iteration used SuperCon, as well as additional measurements from the first loop, as training data, and it again used MP as the set of possible candidates. The third and fourth loops again used prior iterations’ measurements as supplementary training data, but they also combined OQMD22 to MP to obtain the set of possible candidates. The number of materials synthesized and characterized per loop iteration varied across loops, based on domain expert intuition and feasibility of synthesis. This process is summarized in Table 4.

Table 4 A summary of the closed-loop iterations.

Experiment

To synthesize compounds in a medium-throughput manner, arc melting and solid state techniques were used. The standard sample size was 500–700 mg. A list of precursors used in this project is found in Supplementary Table 1 in the SI and details of the synthetic procedures are found in the SI. Additional heat treatments were performed on an as-needed basis when isolating superconducting phases.

Powder XRD patterns were collected at room temperature on the as-melted samples using a Bruker D8 Focus powder diffractometer with Cu-Kα radiation (λk,α,1 = 1.540596 Å, λk,α,2 = 1.544493 Å), Soller slits, and a LynxEye detector to verify the presence of the target phase. We measured from 2θ = 5–60 with a step size of 0.018563 over 4 min as an initial screen. When gathering XRD patterns of samples in preparation for Rietveld refinement, 4 h measurements were performed from 2θ = 5–120 with a step size of 0.01715.

AC-susceptibility measurements were conducted using either a Quantum Design Magnetic Properties Measurement (MPMS) System (HDC = 10 Oe, HAC = 1 − 3 Oe, 900 Hz) or a Quantum Design Physical Properties Measurement System (HDC = 10 Oe, HAC = 3 Oe, 1 kHz), measuring T ≥ 2 K. Since prior density function theory (DFT) calculations52 suggested that CaAg2Ge2 would superconduct near T = 1.5 K, we used the 3He option with the MPMS to measure from 0.4 K to 1.7 K for that sample in addition to our standard measurement above 2 K.