Introduction

Phase-transforming ferroelectric materials are extensively employed in functional devices, such as piezoelectric converters and pyroelectric generators1,2,3. Doping has long been considered an effective strategy for fine-tuning these phase transformations, as the transformation temperature and crystal structure are sensitive to the dopant’s composition. Through systematic doping experiments with comprehensive property characterizations, the transformation temperature and crystal structure can be mapped, outlining the phase boundaries at varying compositions. This traditional route to construct a phase diagram is crucial because it highlights the conditions under which phase transformations occur. Recognition of the phase boundaries associated with chemical, thermal, and physical properties helps determine the effective dopant’s composition to enhance material properties such as dielectric constant, piezoelectric coefficient, and pyroelectric coefficient4,5,6. The construction of precise and comprehensive phase diagrams not only reveals the fundamental effects of doping but also provides a rational strategy for the design of high-performance ferroelectric materials7,8,9,10.

The typical phase diagrams of ferroelectrics, consisting of multiple metallic elements in one or two stoichiometric oxide compounds, are represented as a composition-temperature diagram between the two compounds. Phase boundaries are often determined by linear extrapolation of experimentally measured transition temperature of the compound with a varying composition of the selected element. Finer tuning of compositional parameters enhances the precision of the phase diagram, while a broader chemical tuning range improves its comprehensiveness. Theoretical approaches, such as phase-field simulations, are commonly employed for phase diagram construction, relying on semi-empirical thermodynamic parameters fitted from experimental data11,12,13. However, predicting phase diagrams for complex materials, such as ferroelectric oxides with morphotropic phase boundaries (MPBs), remains challenging due to the lack of well-established thermodynamic parameters14. Conventional methods based on experimental extrapolation and phase-field simulations become less reliable when applied to new material systems with unknown thermodynamic properties. With the increasing demand for novel ferroelectric materials in sensing, actuation, and energy applications, more accurate phase boundary prediction methods are essential for guiding material design. Therefore, alternative strategies that do not heavily depend on pre-existing thermodynamic data are needed to construct reliable phase diagrams for complex ferroelectrics.

Recently, artificial intelligence (AI) has emerged as a promising approach for constructing composition-temperature phase diagrams in ferroelectric materials. Pioneering studies have leveraged machine learning (ML) algorithms to predict crystal structures and phase transformation temperatures based on human-selected atomic features15,16,17. While these models perform well for specific material systems, their predictive accuracy tends to decrease when applied to a broader range of ferroelectrics due to limited generalization. This limitation arises from the insufficient cross-material-family data available in current training datasets.

Although publicly available materials databases, such as the Materials Project18 and the Crystallography Open Database19, provide extensive information on chemical compositions, crystal structures, and electronic properties, phase transformation data is rarely available in these databases, specifically critical information such as temperature-dependent transport properties, latent heat, and symmetry relations among phases. Without a comprehensive and accurately labeled dataset of phase transformations, existing AI models are limited to specific material classes and face challenges in constructing wide-range phase diagrams that are broadly applicable to diverse ferroelectric systems. Thus, developing a general AI framework for phase diagram prediction requires not only robust models but also a well-curated dataset that captures the full complexity of phase transformations across various material families.

In this work, we construct a phase transformation dataset encompassing various ferroelectric materials and develop an AI model, FerroAI to generate wide-range composition-temperature phase diagrams. The model is systematically optimized to enhance its generalization capability, demonstrating strong predictive performance across diverse ferroelectric systems. To validate its effectiveness, we compare the wide-range, high-resolution phase diagrams constructed by FerroAI, compared with discrete phase transformation data reported in the literature. Furthermore, we fabricate a new family of ferroelectrics and use experimental characterization to evaluate whether FerroAI can accurately predict their phase diagrams. The results confirm that FerroAI not only provides accurate predictions but also exhibits strong scalability in the modeling phase transformations for unexplored materials, establishing it as a powerful and generalizable AI tool for ferroelectric materials discovery.

Results

Construction of phase transformation training set by text-mining

A comprehensive dataset of ferroelectrics with consideration of phase transformations is essential for deriving the artificial intelligence models. However, comprehensive datasets covering a wide range of ferroelectric materials are scarce due to the limited connections among various symmetries of crystal structures for complex ferroelectric systems. To address this, we systematically compiled a large-scale dataset by extracting phase transformation information from published literature. By leveraging natural language processing (NLP) techniques, we identified key details such as chemical compositions, crystal structures, and transition temperatures from thousands of research articles. The detailed methodology of this text-mining process is described in Methods.

The dataset construction process is illustrated in Fig. 1. The phase transformation dataset for ferroelectric materials was constructed by text-mining 41,597 research articles using Elsevier’s official Application Programming Interface (API). Through this process, we extracted phase transformation information, including chemical compositions represented by chemical formulas, crystal structures, and transformation temperatures associated with symmetry sequences across phase transitions.

Fig. 1
figure 1

Data-mining process for ferroelectric materials with symmetry-breaking phase transformations by Nature Language Process (NLP).

After automated extraction and verification, we compiled a dataset comprising 2838 phase transformations across approximately 800 ferroelectric materials. The distribution of ferroelectric materials in our dataset is illustrated in Fig. 2a. Clustering analysis highlights that potassium sodium niobate, barium titanate, lead zirconate titanate, and lead magnesium niobate are the most extensively studied ferroelectric and piezoelectric materials in research publications on Elsevier.

Fig. 2: Statistics of classified ferroelectric phase transformation dataset.
figure 2

a data population from different families of doped ferroelectric materials; b chord diagrams of transformation relation between different crystal structures; cg transformation temperature distribution in different types of phase transformation.

The relationships among 7 crystal systems for the materials in the dataset are visualized using a chord diagram, presented in Fig. 2b. It reveals that cubic to tetragonal and tetragonal to rhombohedral phase transformations are the most frequently observed transitions in ferroelectrics. The statistics of temperatures associated with the specific symmetry-breaking transformations are systematically shown in Fig. 2c–g. The histograms of these phase transformation temperatures indicate that most phase transitions occur within the range of 100 K to 700 K. Furthermore, the transformation temperature generally decreases as the symmetry of the crystal structure lowers.

Since different material systems exhibit varying phase transformation sequences and multiple phase transitions commonly occur in a material among various symmetries, the phase transformation dataset (i.e., PT dataset) is further converted into a well-structured crystal dataset through data augmentation to enhance data consistency for training, as illustrated in Fig. 3. To label the dataset, we assign symmetry labels to specific temperature ranges according to the phase transformation sequence in the PT dataset. We then introduce an augmentation factor, N, to uniformly divide each labeled temperature range into N + 1 smaller intervals, extracting the boundary temperature points as input data. This ensures that all labeled temperature ranges are proportionally sampled, maintaining consistent relative weighting compared to the original phase transformation data. Additionally, at every transformation temperature, we add data points corresponding to the crystal structures immediately before and after the transition, specifically at 1 K below the transformation temperature. This step improves the accuracy of phase boundary representation. Through these procedures, the original unstructured PT dataset is systematically converted into a well-structured, augmented crystal dataset.

Fig. 3
figure 3

The workflow illustrating the construction of the augmented crystal dataset for deep neural network training and the development of the FerroAI model.

FerroAI model for predicting phase diagram

To develop the FerroAI model for phase diagram prediction, we used the deep learning neural network trained on the augmented crystal dataset. The overall process flow is illustrated in Fig. 3. In this framework, the input consists of the materials tagged by its chemical formula and atomic compositions. These tagged datasets serve as the foundation for training the neural network, allowing it to learn the compositional range of stable phases at specific temperatures.

We design a six-layer deep neural network with chemical vector and temperature as the input and the crystal symmetry as the output. The chemical vector refers to the material system through chemical formula embedding. Specifically, we construct the chemical vector as a 118-dimensional vector, where elements are sorted according to their atomic numbers in the periodic table. For a given chemical compound, each dimension of the chemical vector is assigned a value based on the atomic ratio of the corresponding element if it appears in the compound, otherwise, it is set to zero, as seen in Fig. 3. This approach transforms unstructured chemical formulas into vectorized representations suitable for training neural networks. Model performance is optimized by systematically tuning key hyperparameters, including the augmentation factor, number of hidden layers, neurons per layer and learning rate, using the controlled variable method. The predictive accuracy is evaluated using a weighted F1 score20, which accounts for variations in dataset distribution across different crystal structures. Figure 4a–c present the mean and standard deviation of the weighted F1 score from 10-fold cross-validation. The mean and standard deviation of the weighted F1 score for the tuning learning rate are included in Supplementary Materials. The optimal hyperparameter set, corresponding to the highest F1 score, is selected for model training. A summary of the model architecture and primary hyperparameters is provided in Table 1. The total number of parameters in the most optimized model is 811,015, which is sufficient to capture the influence of chemical compositions on phase transformation among wide range ferroelectric families.

Fig. 4: Optimization of primary hyperparameters.
figure 4

a augmentation factor; b number of hidden layer; c number of neurons per layer; d optimization of secondary hyperparameters via Hyperband approach.

Table 1 Optimized architecture and list of primary hyperparameters in FerroAI

The secondary hyperparameters, including the weight decay coefficient and dropout rate for each layer, are further optimized using the Successive Halving approach within the Hyperband algorithm21,22. In this process, over 200 hyperparameter combinations are tested, and the one yielding the highest accuracy is selected for final model training. The model demonstrates robustness to variations in secondary hyperparameters, as multiple top-performing configurations yield comparable accuracy. More tuning results are given in the Supplementary Materials. The predictive capability of the model is primarily governed by the choice of primary hyperparameters listed in Table 1. The activation functions used are ReLU23 for hidden layers and Softmax24 for the output layer.

Assessment of the FerroAI model for phase diagram prediction

Using the optimal hyperparameters, we proceed with the final training of the model. Cross-entropy loss is used to evaluate model performance during training, quantifying the difference between the predicted and true probability distributions of the input data. The training objective is to minimize this loss. As shown in Fig. 5a, cross-entropy loss decreases rapidly with increasing training epochs, indicating that the model effectively learns from the input data.

Fig. 5: Performance of FerroAI in training and testing.
figure 5

a cross entropy loss with training epochs corresponding to the best model; b confusion matrix for predicted crystal structure at unseen test dataset.

To assess the performance of our model in predicting phase diagrams for ferroelectric materials, we evaluate the trained model on a test dataset that was not used during training. The resulting confusion matrix, shown in Fig. 5b, represents the successful rate of crystal structure prediction. In this matrix, the horizontal axis denotes the crystal structures predicted by the FerroAI model at specific temperatures and compositions for the given ferroelectric materials, while the vertical axis represents the corresponding labeled structures from the test set. The FerroAI model demonstrates high accuracy in predicting crystal structures, particularly for cubic and rhombohedral phases. Although the model also achieves high accuracy for triclinic structures, the limited representation of triclinic data in the dataset constrains the reliability of this observation. A small fraction of tetragonal and orthorhombic structures are recognized as cubic or other symmetries, likely due to the marginal differences in their lattice parameters. Overall, the model achieves over 80% accuracy across all crystal structures, demonstrating that the well-trained FerroAI model effectively captures phase transformations and structural changes. The FerroAI model is available on Hugging Face, see Methods.

To illustrate the effects of doping elements on phase transformations in ferroelectric materials, we utilize FerroAI to generate high-resolution phase diagrams for common ferroelectrics, as shown in Fig. 6. The phase diagrams are constructed with a compositional resolution of 0.01 at.% and a temperature resolution of 1 K, enabling detailed visualization of phase boundaries. Remarkably, predicting such high-resolution phase diagrams using FerroAI typically takes less than 20 seconds on a normal personal laptop, which is significantly faster than conventional simulations. In the predicted phase diagrams, different colors represent distinct crystal symmetries, delineating phase boundaries at specific temperatures and compositions. The phase boundaries and crystal structures predicted by FerroAI align closely with experimental data extracted from the literature. Here, the discrete experimental data points25,26,27,28,29,30 are in the test dataset. The consistency between the predicted and experimentally observed composition-temperature phase diagrams demonstrates that FerroAI effectively captures the impact of compositional variations on phase transformations across diverse ferroelectric families.

Fig. 6: Phase diagrams predicted by FerroAI for ferroelectrics with respect to compositional variable x.
figure 6

a BaZr0.2Ti0.8O3-xBa0.7Ca0.3TiO325, b BaSn0.5Hf0.5O3-xBa0.93Ca0.07TiO326, c PbZrxTi1−xO327, d 0.36Pb(In1/2Nb1/2)O3-(0.54-x)Pb(Mg1/3Nb2/3)O3-xPbTiO328, e K0.5Na0.5NbO3-xBi0.5Na0.5Zr0.85Sn0.15O329, and f 0.90(K0.5Na0.5)NbO3-0.10(Bi0.5Li0.5)TiO3-xFe2O330.

Figures 6a and b presents phase diagrams for complex barium titanate-based compounds. For both BaZr0.2Ti0.8O3-xBa0.7Ca0.3TiO325 and BaSn0.5Hf0.5O3-xBa0.93Ca0.07TiO326, the phase diagrams reveal morphotropic phase boundaries (MPB) among three polar phases: rhombohedral, orthorhombic, and tetragonal. Notably, the orthorhombic region is relatively narrower than the tetragonal and rhombohedral regions. The predicted phase boundaries align well with experimental data, demonstrating the accuracy of the FerroAI model.

Figure 6c presents the common piezoelectric PZT system with varying Zr composition from 0.3 to 0.5 at.%. The MPB between the tetragonal and rhombohedral phases is accurately captured, showing good agreement with the literature27. The phase diagram of complex Pb-based compounds is shown in Fig. 6d. In this ternary system, the phase boundary remains insensitive to the composition of PbTiO3 and does not exhibit an MPB. A similar phenomenon is observed in ternary potassium sodium niobates with varying Fe2O3, as shown in Fig. 6f. In contrast, the binary potassium sodium niobate compound (Fig. 6e) exhibits an MPB among tetragonal, orthorhombic, and rhombohedral phases with the addition of a small atomic fraction of Bi0.5Na0.5Zr0.85Sn0.15O3.

To understand how different chemical elements influence phase’s symmetry predictions, we used Shapley Additive Explanations (SHAP) analysis to quantify each feature’s contribution. The results, shown in Fig. 7, reveal A-site and B-site elemental contributions to the formation of cubic and tetragonal phases for ferroelectric materials in our dataset. It is seen that among B-site elements, the contribution of Ti is most influential on both cubic and tetragonal symmetries. The A-site elements exhibit more uniform contributions, with Ba showing marginally higher importance than other A-site cations. This analysis suggests that the crystal structure and distortions of perovskite are more sensitive to B-site doping, especially for dopants Ti and Nb.

Fig. 7
figure 7

Shapley Additive Explanations on most influential A-site and B-site elements of perovskite structures for predicting cubic and tetragonal structures.

Discussion

To demonstrate the potential of FerroAI for discovery of new ferroelectric materials, we use it to predict phase diagrams for compositions not included in the training set, nor in test set. The Supplementary video demonstrates a GUI to use the FerroAI model for phase diagram prediction. Specifically, we explore two ferroelectric systems: (1) Ba(Ce0.005Zr0.005Ti0.99)O3-xBa0.7Ca0.3TiO3 (abbreviated as BCeZrT-xBCT) and (2) Ba(Zr0.1Hf0.1Ti0.8)O3-xBa0.7Ca0.3TiO3 (abbreviated as BZrHfT-xBCT). While a few compositions of BCeZrT-xBCT have been investigated for pyroelectric energy conversion31, its phase diagram remains largely unexplored.

The phase diagram for BCeZrT-xBCT is predicted across the full range of BCT compositions, as shown in Fig. 8a, with a resolution of 0.01 at.% and 1 K. The prediction reveals a sequential phase transformation from cubic to tetragonal, then to orthorhombic, and finally to rhombohedral between 550 K and 100 K. No MPB is observed in this binary compound system.

Fig. 8: Study phase transformation of a binary ferroelectric system by FerroAI.
figure 8

a Predicted phase diagram of synthesized ferroelectric material BCeZrT-xBCT, verified by b DSC measurement corresponding to the polycrystal grains observed in c.

To experimentally validate these predictions, we synthesized BCeZrT-xBCT samples with x ranging from 0.08 to 0.9, following the synthesis method detailed in ref. 2. Phase transformation temperatures were measured using differential scanning calorimetry (DSC) with a TA Instruments DSC 250, and the results are presented in Fig. 8b. The extracted transformation temperatures are overlaid onto Fig. 8a for direct comparison. The FerroAI model accurately predicts the composition-dependent phase transformation sequence and the influence of BCT additions on phase transformation temperatures in this binary system. This result highlights FerroAI’s capability in capturing complex thermodynamic behaviors in multicomponent ferroelectrics. Additionally, the microstructure of the synthesized samples, shown in Fig. 8c, provides further insights into the room-temperature phases, revealing that samples with x = 0.3–0.7 BCT exhibit a distinct grain morphology compared to those with x = 0.9 BCT. We observed porosity in all samples, with the x = 0.9 composition exhibiting the highest porosity compared to the others. As the thermal analysis (DSC) confirmed consistent phase transformation behaviors across all compositions, suggesting that the influence of porosity on the transformation temperature is likely marginal.

Figure 9a presents the FerroAI-predicted phase diagram for 0≤x≤0.7 in BZrHfT-xBCT, using the same temperature and composition resolutions. The diagram reveals a sequential phase transformation from cubic to tetragonal, then to orthorhombic, and finally to rhombohedral (C-T-O-R). Different from the BCT doping in BCeZrT, this binary system exhibits MPBs among the three low-symmetry ferroelectric phases, emerging from x = 0.3. This prediction has significant implications for the design of new ferroelectrics, as MPBs play a crucial role in enhancing functional properties for piezoelectric devices32,33,34.

Fig. 9: Study the morphotropic phase boundary and dielectric property of a ferroelectric system by FerroAI.
figure 9

a Predicted phase diagram of BZrHfT-xBCT materials with 0.1≤x≤0.7, evaluated and verified by b the temperature-dependent dielectric constant of selected compositions synthesized in this work.

To verify the prediction, we synthesize BZrHfT-xBCT samples with x varying from 0.2 to 0.55 near the predicted MPB. We measure the temperature-dependent dielectric constants using the aixACCT TF2000E ferroelectric analyzer at 1 kHz and 100Hz, as shown in Fig. 9b. The results indicate that at the predicted MPB, the x = 0.3 sample exhibits the highest dielectric constant 9535 at transformation temperature, which is significantly higher than other compositions. The frequency-dependent dielectric response is observed in x = 0.3 sample, where its dielectric constant at the transformation temperature is enhanced to 11051 at decreased frequency. We extract the transformation temperatures and plot them as markers in Figure 9a, showing strong agreement with FerroAI predictions.

We summarize the MPBs predicted by FerroAI and confirmed through experimental characterization for various lead-free and lead-based materials in Table 2. The comparison shows that FerroAI accurately predicts the MPB types, and the emerging compositions align well with experimental results. These findings demonstrate that FerroAI serves as a fast and reliable tool for identifying MPB compositions, providing valuable guidance for designing high-performance phase-transforming ferroelectric materials.

Table 2 Comparison between FerroAI predicted morphological phase boundary and experimental result in lead-free and lead-based ferroelectric materials

In conclusion, we have developed a deep learning neural network, FerroAI, trained on a comprehensive phase transformation dataset of ferroelectric materials. Leveraging 2838 phase transformations from approximately 800 ferroelectric materials with reliable references, FerroAI effectively captures the influence of dopants on phase transformations and enables the rapid construction of wide-range composition-temperature phase diagrams.

To validate its accuracy, we synthesized and characterized two ferroelectric material systems, demonstrating strong agreement between FerroAI-predicted phase diagrams and experimentally measured transformation temperatures. Based on the predicted MPB, we discovered that the BZrHfT-0.3BCT material exhibits a dielectric constant of 11,051 at its phase transformation, significantly surpassing neighboring compositions. These findings establish FerroAI as not only a robust tool for generating wide-range composition-temperature phase diagrams but also a valuable resource for guiding the design of high-performance phase-transforming lead-free ferroelectric materials.

Admittedly, the current FerroAI model still has several limitations. First, the data source is limited to research articles from Elsevier journals, while a substantial number of relevant publications from other sources remain unincorporated, thereby limiting the generality and versatility of the model. Second, the text-mined training data are limited to composition- and temperature-driven phase transitions, excluding other critical driving factors such as external electric fields. When such field-driven transitions can be systematically extracted from literature and incorporated into the training dataset, the current FerroAI model could be extended to predict more general phase transitions.

Methods

Training dataset

We extract the main texts from each of mined articles, removing references and other irrelevant sections. We use Spacy library to process paragraphs and captions of figures and extract core phrases, which are enumerated for information identification. We define a list of rules to regularize the expressions to identify key information as seen in Step 3 of Fig.1. Finally, the information including chemical formula, symmetry and associated temperature sequence is compiled to phase transformation dataset.

Material synthesis

BZrHfT-xBCT and BCeZrT-xBCT materials were synthesized via the solid-state reaction method. The raw materials are BaCO3(Alfa Aesar, 99.8%), CaCO3(Alfa Aesar, 99.5%), TiO2(Alfa Aesar, 99.8%), CeO2(Alfa Aesar, 99.9%), ZrO2(Sigma Aldrich, 99%),HfO2(ZNXC, 99.99%). The raw powders were weighed according to their stoichiometric ratios at different compositions, respectively. The weighted powders were mixed and dissolved in ethanol and ball-milled by zirconia balls in a planetary ball miller at 600 rpm for 12 hours. The ball-milled solution was dried and calcined at 1000 C for 10 hours to obtain the ferroelectric oxides through solid-state reactions. The calcined oxides were cold-pressed into a rod-shaped green body of 5–6 mm diameter and 10–15 mm length under 30MPa hydrostatic pressure for 30 minutes. The green body rods were sintered at around 1400 C for 3 hours using an optical Infra-red (IR) furnace (Quantum Design IRF11-001-00) under air.

Material characterization

The micrographs of synthesized samples were observed by the polarized light-reflected differential interference microscope35. All samples were well polished by 1 μm diamond suspension and etched by 37% Hydraulic acid to reveal grain boundaries. The etched surface is monitored using an optical microscope, and the etching time is carefully optimized to prevent grain structure damage and surface degradation. The temperature dependent heat flow was measured under Nitrogen atmosphere. The heating and cooling rate is 7 °C/min. The temperature dependent dielectric constant is measured by the aix ACCT TF2000E ferroelectric analyzer via the Capacitance-Voltage module. The temperature step is 1 °C.

Availability of FerroAI model

The best-trained model, FerroAI, along with the temperature and embedded elementary vector scaler, is available on Hugging Face at huggingface.co/FerroAI/FerroAI, facilitating broader accessibility and further exploration of phase diagram predictions for ferroelectric materials. The Supplementary video demonstrates how to generate a phase diagram by providing an appropriate chemical vector input and temperature range.