End-to-end prediction and design of additively manufacturable alloys using a generative AlloyGPT model

Ni, Bo; Glaser, Benjamin; Taheri-Mousavi, S. Mohadeseh

doi:10.1038/s41524-025-01768-2

Download PDF

Article
Open access
Published: 26 September 2025

End-to-end prediction and design of additively manufacturable alloys using a generative AlloyGPT model

Bo Ni¹,
Benjamin Glaser¹ &
S. Mohadeseh Taheri-Mousavi^1,2

npj Computational Materials volume 11, Article number: 294 (2025) Cite this article

1168 Accesses
1 Citations
8 Altmetric
Metrics details

Subjects

Abstract

Being able to tailor the composition at the voxel-size resolution, additive manufacturing of alloys calls for effective models to explore the vast and complex design space. We present AlloyGPT, a generative alloy-specific language model that concurrently performs forward property prediction and inverse alloy design. By converting physics-informed alloy data into structured textual representations, our model learns to capture intricate composition–structure–property relationships. It demonstrates high predictive accuracy across multiple phases and properties (R² = 0.86-0.99) and robust generalization to unseen compositions. In inverse design tasks, it can generate diverse alloy candidates that meet specified property targets, showcasing its versatility. Comprehensive attention patterns and reasoning paths are observed within the model, suggesting promising clues for underlying alloy physics. By synergizing accuracy, diversity and robustness in prediction and design tasks, AlloyGPT is expected to accelerate knowledge integration and material design for uniform or gradient structural alloys manufactured by traditional and additive manufacturing.

A rapid and effective method for alloy materials design via sample data transfer machine learning

Article Open access 22 February 2023

Analysis of thermal conductivity of aluminum alloys by compositions and tempering process using machine learning

Article Open access 29 September 2025

Rapid data acquisition and machine learning-assisted composition design of functionally graded alloys via wire arc additive manufacturing

Article Open access 02 May 2025

Introduction

Structural alloys are indispensable in modern engineering, serving critical roles across industries such as aerospace^1,2,3, automotive^4,5,6, and energy^7,8,9. There is a continuous pursuit for novel alloys with enhanced properties^{10,11,12,13,14,15}, including strength^10,11, toughness¹²^,¹³, and resistance to failure under various working environments^14,15. Recent developments in complex alloys, e.g., high entropy alloys^16,17 or multi-principal element alloys^18,19, are actively enlarging the composition palette for alloy design, pushing the boundaries of traditional compositions. Concurrently, rapid advances in additive manufacturing (AM) have unlocked unprecedented control over alloy compositions, microstructures, and geometries at voxel-scale resolution while shortening supply chain of manufacturing, thus enabling further tailored solutions for demanding applications^20,21. Together, these advancements are opening an expansive design space that consists of numerous composition choices and manufacturing procedures and hold promises for achieving combinations of superior properties. However, experimental fabrication and testing across the whole design space are extremely costly in resources as well as time^22,23. Therefore, efficient yet affordable methods to navigate this broad design space for desired performances are of great interest, not only to advance scientific understanding but also to enable novel engineering applications with next-generation super alloys.

Recent progress in artificial intelligence (AI) has rapidly expanded the horizon of solving scientific problems and engineering applications using a data-driven methodology. Large language models (LLMs), e.g., GPT series from OpenAI²⁴, LLaMA models from Meta²⁵ and DeepSeek models^26,27, trained on general language corpora have demonstrated intelligent capabilities in perceiving, understanding, and applying natural languages^28,29 as well as coding languages^30,31, thus assisting human researchers in accelerating and automating workflows in various tasks^32,33,34. For solving scientific challenges, AI-based models, such as AlphaFold2^35,36,37 and RoseTTAFold^38,39, can predict the atomic structures of proteins based on their sequences, reaching experimental accuracy but at a much-reduced cost³⁶. On material design and engineering, end-to-end generative AI (GAI) models have been developed to design novel protein sequences that meet the desired structures or properties^40,41,42. Similar explorations have been actively developed for various material systems^43,44, including functional molecules^45,46, polymers^47,48, inorganic crystals^49,50 as well as metallic alloys^51,52. For structural alloys, various data analysis techniques and machine learning methods (e.g., correlation analysis, symbolic regression and transfer learning) have been applied to help predict important mechanical properties, including bulk modulus⁵³, tensile strength⁵⁴, and fatigue strength⁵⁵. However, data on experimentally measured mechanical properties remains relatively sparse due to their time-consuming and costly nature. Since language models have unique strength of integrating data from various sources as general text and further being multi-modal, they are highly interesting specifically in the field of alloy design while in-depth studies remain rare.

In this work, we explore the possibility of solving the alloy design challenge from a language modeling perspective by leveraging GAI and alloy data. Specifically, we encode physics-rich alloy data as one-dimensional (1D) sequences of text and develop an attention-based⁵⁶ LLM to capture the underlying patterns in this alloy-specific language. Our model, AlloyGPT, at deployment can accurately predict structures and properties for the given compositions and achieve R² values ranging from 0.86 to 0.99 on the test set. When tested with compositions beyond the learned domain, we observe the accuracy tends to degrade but in a gradual and stable manner with respect to the distance in the composition space. The same model can also tackle alloy design challenges. For the same given property target, it generates various composition choices while maintaining high design accuracy. Through a sampling parameter, we can further boost the creativity of the model and tune the balance between the diversity of the suggested compositions and the accuracy of the delivered properties. Opening the model, we further unveil that AlloyGPT “thinks” with complex attention patterns between input and output information, some of which may hinder the underlying physics in the alloy systems. Those results of our numerical studies demonstrate that language modeling with the LLM specialized for alloys can provide accurate and robust representation of alloy physics. As a probabilistic model by nature, our model is particularly suitable for alloy design tasks with high degeneracy. For example, it can be exploited for designing gradient composition or microstructure in alloys. At each voxel of design, the possible degenerated solutions can provide candidates for further downselection based on local constraints, thus leading to improved overall performance and design metrics. These capabilities can be valuable for accelerating alloy discoveries in AM and reducing time and resource costs. Our methodology is expected to be generalizable for new alloys and other materials with broad design spaces, potentially leading to a foundation language model with comprehensive material knowledge and suitable for integrated multi-material designs or alloys with gradient compositions/microstructures in the future.

Results

An alloy-specific language dataset and AlloyGPT model

Natural languages are powerful tools that enable human communications. In scientific communities, research can be conducted in various forms, including experiments, simulations, and/or theories (Fig. 1a). Communication and documentation of the research are dominantly taken in the form of languages so far, either as written texts or oral speeches. Inspired by this universal flexibility, here we explore the possibility of using language-like 1D sequences to directly represent physics-rich data for alloys. Note that the intended speakers of such languages are LLMs, instead of humans.

**Fig. 1: Convert physics-rich data as sentences of an alloy-specific language.**

Without loss of generality, in this work we adopt a database of Al-based alloys⁵⁷ as an example of scientific data. The figures in our study related to this database are adapted from ref.⁵⁷. The database has been developed and successfully applied to design additively manufacturable, high-strength aluminum alloys^57,58. Using CALPHAD (CALculation of PHAse Diagrams) simulations with the experimentally validated database⁵⁹, Al alloys with five alloying elements (i.e., Ni, Er, Zr, Y, and Yb) were studied using non-equilibrium Scheil solidification and single equilibrium calculations, modeling the as-built condition after rapid cooling in the laser-based AM process and the fully aged condition, respectively. The dataset is structured to group quantitative data of the compositions, phase structures, and properties of those Al alloys and embody the hidden physical relationship between them. The optimized composition design based on the database has been validated experimentally on their phase structures and enhanced properties^57,60,61.

On the statistics of the database, 523,599 compositions were randomly sampled over the targeted composition space in terms of the alloying elements (i.e., Ni (0–4%), Er (0–2%), Zr (0–2%), Y (0–1%), and Yb (0 ~ 1%) in mole percentage (mol%)) using the Latin hypercube sampling method. As shown in Supplementary Fig. 1a, for the alloying elements, the sampled compositions show uniform distributions over the full targeted intervals while the main element, Al, compensates the remaining mol%. The sampled compositions are proven to be independent as the Pearson linear correlation coefficients are zero for any pair of the alloying elements (Supplementary Fig. 1b). On the phase structures, we focus on the four key precipitate phases that affect the mechanical properties of the alloys, including L1₂ structure (Al₃M (M = Er, Zr, Y, Yb)) strengthening phase, metastable ternary phase (Al₂₃Ni₆M₄, M = Er, Zr, Y, Yb), brittle Al₃Zr phase with D0₂₃ structure, and relatively soft Al₃Ni phase. The strengthening phase is called L1₂ phase for simplicity in this study. As shown in Supplementary Fig. 2, their mol% can be affected by the manufacturing process, thus being different under the as-built condition after the rapid solidification in laser-based AM (Supplementary Fig. 2a) and the fully aged condition (Supplementary Fig. 2b). On the properties, the dataset covers diffusion resistivity, misfit strain, coarsening rate metric, freezing range, crack susceptibility coefficient (CSC), and hot cracking susceptibility (HCS). Among these, the first three focus on thermal stability of the L1₂ phase that is important for high temperature performance; the later three are comprehensive indicators of the AM printability (crack free solidification) of the composition. For simplicity in comparison, we normalize the values of these properties with non-trivial units with respect to a benchmark printable Al alloy⁵⁹ (i.e., Al-2.14Ni-1.15Zr-1.35Er-0.25Y-0.73Yb, wt.%). As shown in Supplementary Fig. 3, the sampled compositions cover broad intervals in the property space and show non-trivial distributions. With such a set of phase structures and properties, novel Al alloys with enhanced high temperature strength and free of hot cracking during printing have been discovered and experimentally validated^57,60. Further details of this dataset, including the definition of all the properties, can be found in the “Methods” section.

To apply this dataset for our study here, we reformat the numerical entries from the dataset into “sentences” using an alloy-specific language proposed. Specifically, we group information according to underlying physics and order the information blocks based on the nature of the tasks. As shown in Fig. 1b, locally we group the physical quantities as text blocks on compositions (in the red dash rectangle), structures (green), and properties (blue) separately, given that compositions lead to specific phases and phase distribution affects the properties. Globally, in the sentence structure, we order those blocks by putting the given information ahead of the queried part. For example, for forward prediction, compositions are listed before structures and properties (Fig. 1c, sentence structure). While in inverse design, the sequence starts with properties which are followed by structures and compositions (Fig. 1d, sentence structure). To distinguish different tasks, we add a task block at the beginning of the sentences to label the task name. We also include simple signs to improve the readability of the sentences for human researchers (Fig. 1c, d, examples). Curly brackets enclose the individual blocks; square brackets enclose the list of physical quantities inside the block. The physical quantities are documented as individual pairs of “Key: Value”, where the key is a readable name tag of the quantity and the numerical value are expressed in a scientific format. To connect the blocks, we use the directional sign, “=>”, to indicate the reading or inferring direction and equal sign “=” for equivalence. With these simple rules, we can efficiently convert the numerical dataset of Al-based alloy into the corpus of sentences for different tasks. The statistics of the dataset for this alloy-specific language on Al-based alloys can be found in the SI (Supplementary Figs. 1–3). We expect these rules can be straightforwardly generalized to include more information blocks or applied to other alloys or material systems.

It should be noted that the new alloy-specific language dataset keeps almost all the information of the original numerical dataset. The key difference is that it enables us to view alloy prediction and design challenges as the same sentence completion task. Specifically, given only the beginning blocks, to accurately complete these sentences will be equivalent to predict alloy structures and properties based on the given composition, or to design alloy compositions that fulfill the given properties.

To handle such language tasks, we develop an attention-based⁵⁶ autoregressive LLM, AlloyGPT, and test its performance for both forward prediction and inverse design tasks for alloys. AlloyGPT model adopts an architecture similar to GPT-2⁶² with a tokenizer⁶³ customized for the alloy language. To balance the efficiency and expressiveness of the tokenizer, we developed a character-word mixing tokenizer. We reserved 118 word-based tokens for all elements in the periodic table (i.e., “(A)” for element A) so that their names will not be divided during tokenization. Current LLMs are known to struggle with numbers due to lack of effective tokenization schemes⁶⁴. It has been observed that single-digit tokenization outperforms other popular methods in learning arithmetic operations for numerical data⁶⁵. Therefore, here character-based tokens are adopted for all numbers (0–9), supporting symbols (e.g., “+”, “-”, “.”, “=” and so on) and English alphabet (in lower and upper cases) to express arbitrary numerical values and keys. With additional token positions reserved for future usage, our tokenizer has a compact vocabulary size of 256. For the GPT model configuration, we take a medium size GPT-2 model as the prototype and modify its configurations to have an embedding dimension of 1024, 36 layers of multi-head attention (MHA) layers among which each has 16 attention heads. No special embedding strategy is applied for the numerical text. With about 453.35 million trainable parameters, we will demonstrate in this work that such an AlloyGPT v1.0 model with the customized tokenizer can learn and handle the proposed alloy-specific language effectively. We expect this model as well as the tokenizer can further learn other alloy systems given the flexibility of the tokenizer and the depth of the model. It should be noted that there also exists other advanced tokenization schemes for numerical data, including continuous encoding⁶⁴ and enhanced positional embeddings⁶⁶, which we reserve their integration with our model for future study. We train AlloyGPT from scratch by using the curated alloy-specific language database. Sentences for both forward prediction and inverse design are included and randomly mixed in the train batches. More details of the model architecture and training can be found in the “Methods” section and Supplementary Fig. 4.

Learn the composition-structure-property (C-S-P) relationships with AlloyGPT

To investigate whether our AlloyGPT model can learn the underlying composition-structure-property relationship through the alloy-specific language and autoregressive training, here we apply AlloyGPT model to solve forward prediction problems for Al-based alloys. Distinguishing from conventional methods, here we pose the problem as a language task. As shown in Fig. 2a, we provide a text prompt with the information blocks of the task type and the composition as the input. Our model is able to continue the sentence in the correct format of the alloy-specific language and finish it at the right sequence length. This behavior indicates that AlloyGPT can learn the grammar rules via training.

**Fig. 2: Solve forward prediction tasks for the composition-structure-property relationship in alloys using AlloyGPT.**

Quantitative results then can be extracted from the text. For the case depicted in Fig. 2a, the predicted values of the alloy phases and properties are close to the ground truth, thus solving this composition-to-structure and property (C-to-SP) task. Following this procedure, we performed this C-to-SP task for the whole test set and observed consistent high accuracy. Representative results of examples from the structure and property blocks are shown in Fig. 2b (as-built L1₂ phase) and c (hot cracking susceptibility), respectively. Good agreements are observed for both the overall distribution as well as the individual values between the prediction and the ground truth. R² values larger than 0.95 are observed for as-built L1₂ phase mol% and normalized HCS. The positive results observed here indicate that language modeling can be competitive to conventional numerical methods (See the “Discussion” section for more details) in capturing the hidden pattern between composition, structures and properties in this Al-based alloy family, thus potentially opening new avenues to alloy research.

As an autoregressive language model⁶⁷, AlloyGPT may process length dependence of its predicting accuracy and present convenient strategies to enhance its performance. Specifically, AlloyGPT completes the given sentences by regressively predicting the next token. Thus, the predicting error may accumulate and influence the accuracy of the content predicted at a later stage. In Fig. 2d, we plot the R² values (blue bars for C-to-SP tasks) of all the physical quantities in their predicting order from left to right. Our model achieves high accuracy across various physical quantities with R² values ranging from 0.86 to 0.99 (See Supplementary Table 1 for details). At the same time, the property block (shaded in blue) predicted later in the output shows overall smaller R² values compared with the earlier predicted structure block (shaded in green). Therefore, to improve the accuracy of property predictions, one straightforward way could be to increase the input prompt length. For example, providing not only composition but also structure blocks for property prediction (i.e., a CS-to-P task) can reduce the prediction length for the language model. As shown in Fig. 2d, indeed, such CS-to-P tasks performed by the same model show improved accuracy with larger R² values (red bars). This strategy also aligns well with a physics-based consideration. Because composition and structures together may better define the alloy and determine its properties. It should also be noted that there could exist other factors that affect the prediction accuracy. For example, by definition, CSC values are determined by the whole solidification process⁶⁸, which makes it relatively challenging to predict. And we observe CSC has the lowest R² value while it is not predicted at last.

Design Al-based alloys for target properties using AlloyGPT

As a language model of a probabilistic nature, AlloyGPT provides unique advantages in addressing the inverse design challenges for alloys. On one hand, AlloyGPT model unifies the forward predicting and inverse designing tasks under the same language task. From the format point of view, the language modeling task of sentence completion developed for the forward predicting tasks in the previous section can be straightforwardly applied to the inverse design tasks. As shown in Fig. 3a (left panel), the updated input prompt can include only the property information for a property-to-structure and composition (P-to-SC) design task, or both the property and structure blocks for a property and structure-to-composition (PS-to-C) design. Our model is tasked to complete the remaining of the sentence with the composition information included. On the other hand, the probability-based prediction of AlloyGPT model can be leveraged to capture degenerated solutions in alloy design challenges, which will be discussed later in this subsection.

**Fig. 3: Generate alloy compositions based on property and structure information using AlloyGPT.**

To evaluate the performance of this unified pathway for solving design challenges, we perform P-to-SC and PS-to-C design tasks on the test set using AlloyGPT. By setting the properties of the data entries from the test set as the goal, we make sure there exists at least one composition as the feasible solution. The R² values calculated using the known compositions from the test set and those suggested by our model are shown in Fig. 3a (right panel). When both properties and structures are provided as the input prompt or design goal, in the PS-to-C tasks the suggested compositions achieve high R² values (red bars in Fig. 3a) (similar to those observed in the previous section for forward predicting tasks) and agrees well with the known compositions from the test set (taking element Y in Fig. 3c as the weakest example). This result demonstrates that AlloyGPT can “rediscover” compositions known to the test set based on conditioning of properties and structures.

However, when only properties are prescribed as the design goal in the P-to-SC tasks, the suggested physical quantities, for both structures and compositions, achieve low R² values (blue bars in Fig. 3a). As shown in Fig. 3b, the suggested mol% of element Y tends to diverge from the known records and spread broadly. Thus, AlloyGPT can also suggest compositions different from the known ones.

It should be clarified that the R² values of the compositions in Fig. 3a (right panel, shaded in red) should not be taken as the indicator of design accuracy. By definition, it represents the composition recoverability with respect to the test set, i.e., how similar the proposed compositions are to the known ones from the test set (Fig. 4a). Design accuracy, instead, should be evaluated by comparing the targeted goal and the delivered performance, i.e., the prescribed properties and the achieved values for the P-to-SC design task (Fig. 4a).

**Fig. 4: Evaluate the design accuracy of AlloyGPT model in the P-to-SC design tasks.**

To truly estimate the design accuracy, we first obtain the property and structure results of the designed compositions by applying the same protocol used for the initial dataset creation, and then calculate the R² values between these and the input design targets. As shown in Fig. 4b, high R² values (blue bars) are obtained between the input property goals and the achieved properties, indicating actually AlloyGPT also achieves high design accuracy for the P-to-SC design tasks. Comparisons of some example properties, including the normalized coarsening metric and HCS, are shown in Fig. 4c, d. Good agreement in both distributions and individual values are observed. At the same time, the low composition recoverability (red bars in Fig. 4b) suggests for the given properties, there could be different compositions as the design solutions. By achieving high design accuracy and low composition recoverability, AlloyGPT has demonstrated promising potential in addressing the challenges of degenerated solutions in inverse design problems and maintaining design accuracy and diversity at the same time.

It should also be noted that AlloyGPT addresses the inverse design tasks in an efficient end-to-end manner. Thus, it bypasses the iterative searching steps that are usually required and sometimes could be time-consuming in the conventional design methods using optimization^57,69,70. For example, consider a task of designing for a low diffusion resistivity value of 0.1. As a traditional method, we adopted a Bayesian optimization method with a XGBoost model as the surrogate model to find possible solutions. It is observed that dozens of iterations (e.g., about 90 iterations as shown in Supplementary Fig. 5) may happen before the optimization reaches the solution. In contrast, by providing the targeted diffusion resistivity as the input prompt, AlloyGPT model can directly complete the sentence and provide the composition with high confidence, thus providing a concise and straightforward design strategy. At the same time, AlloyGPT remains computationally heavy and often relies on decent GPU hardware to produce smooth generating speed. Future studies may investigate how to further reduce the computational costs of such LLMs. Here, by solving forward prediction and inverse design tasks with the same model, our AlloyGPT presents unique strength compared to traditional machine learning techniques⁷¹.

Discussion

The results above all together have demonstrated that language modeling can provide a unified format to address the forward prediction and inverse design tasks in alloy research. Trained on physics-rich alloy data, our AlloyGPT model can capture the underlying composition-structure-property relationships and accurately predict the microstructural phases and properties at as-built and aged conditions. For inverse design, AlloyGPT model can not only produce accurate designs but also generate diverse compositions. As an initial step towards a language modeling powered material research direction, we expect our model and methodology to be integrated with many engineering applications and continuously improve with the growing dataset, enhanced language model architecture and deepened scientific understanding. To do that, some discussions on its robustness, tunability, interpretability, comparison, and expandability are in order.

We start the discussion by investigating the robustness of AlloyGPT when going beyond the learned domain. As a data-driven method, the capability of our AlloyGPT learned is strongly affected by the available data. For example, the data set adopted in the current AlloyGPT model covers a finite hypercube domain in the composition space⁵⁷. For many practical applications, it is important to evaluate how the model performs when being pushed beyond its learned domain. To do so, we intentionally sample compositions beyond the learned domain and perform C-to-SP prediction tasks to evaluate the accuracy. As shown in Fig. 5a, we enlarge the mol% of elements Yb, Zr and Er by one-fold and randomly sample compositions from this enlarged region. Since our model has not been trained on any data from this domain, we borrow from biology and refer to these compositions as de novo compositions. We define the L2 distance of the sampled composition to the nearest boundary of the learned domain as the mutation distance, d_M. As the mutation distance increases, we observe the variance of the relative L1 errors of the predicted properties gradually grow (Fig. 5b), indicating that the model loses the predicting capability when moving away from the learned domain. However, it should be noted that the variance increases in a gradual and stable manner. Therefore, in the thin neighborhood of the learned domain (e.g., d_M < 0.5), the model is expected to still behave relatively well without dramatic increases in error. It is recommended to use caution when making predictions beyond this region. In the long term, retraining the model with a growing dataset for an enlarged composition space is expected to better address this issue.

**Fig. 5: Test AlloyGPT beyond the learned domain with the prediction tasks.**

We continue our discussion by studying the design accuracy and composition diversity of AlloyGPT. Contracting to the conventional deterministic models, AlloyGPT predicts the probability distribution for the next token based on the previous text and then samples it under this distribution⁶². Therefore, depending on the shape of the probability distribution, different outcomes in the token level can be observed when repeating the generation process with the same input or prompt. At the same time, this probability distribution can be rescaled by diving it with a sampling temperature parameter, which is termed as prediction temperature, T_p, in our AlloyGPT model. A T_p larger than 1 tends to flatten the probability distribution and boost diverse predictions. As shown in Supplementary Fig. 6a, repeating the same generation tasks can lead slightly different responses (rows 2 and 3); By increasing T_p to a higher value (e.g., 3.0), the generated sentence fails to follow the grammar rules of the alloy specific language and becomes unusable for quantity extraction.

Given those observations, it is important to understand the stability of AlloyGPT’s predictions and how to potentially leverage it for design purposes. We randomly sample 200 data entries from the test set to perform P-to-SC design tasks. For each input prompt, we repeatedly generate 20 designs at the same prediction temperature. This procedure is then performed for a series of prediction temperatures ranging from 0.001 to 3. It is observed that a prediction temperature, ${T}_{p}\le 2$, can produce readable outputs with a high probability ($\ge 87 \%$) and is recommended for real applications. For designs under such conditions, we analyze the diversity of the suggested compositions as well as the accuracy of the achieved properties using their coefficient of variation, CV, and relative L1 error, ${L}_{1}^{{{rela}}}$, respectively (the definition can be found in the “Method” section). To investigate the effect of prediction temperature, we grouped the results for different design targets at the same prediction temperature and show the mean values in Fig. 6. It is demonstrated that, as the prediction temperature increases, the mean CV values for all alloying elements generally increase (Fig. 6a), indicating boosted diversity of the composition designed. At the same time, the mean values of ${L}_{1}^{{{rela}}}$ for the achieved properties also show increase (Fig. 6b) clearly for T_p > 1, suggesting the design accuracy may decrease with the prediction temperature. It should be noted that, for many properties (excepting crack susceptibility coefficient and hot cracking susceptibility), the normalized error increases with the prediction temperature with a small and stable slope. These trends together indicate that AlloyGPT can generate diverse designs by its probabilistic nature. Considering practical applications, for design problems with intrinsic degeneracy, a 0 < T_p ≤ 1 often is enough to capture multiple solutions while maintaining design accuracy. As shown in Fig. 6, in this region, the design error (Fig. 6b) for all the properties remains flat with weak oscillations while there exist no-trivial variances in the generated compositions (Fig. 6a). On the other hand, for physical problems that may lack real degeneracy, to intentionally boost diversity and explore possible solutions, a T_p between 1 and 2 can be tried. However, now the suggested compositions may need further screening for design accuracy. Both AlloyGPT model as the forward predictor and CALPHAD-based simulations can serve as the initial screen tools while the final validation is reserved for experiments.

**Fig. 6: The effects of prediction temperature on the diversity of compositions and the accuracy of the achieved properties in solving P-to-SC design tasks using AlloyGPT.**

For the inverse design task, we clarify that the seemly trade-off between diversity and accuracy doesn’t necessarily always exist. For degenerated design problems (e.g., for the given set of properties, there exist multiple compositions that can achieve it), solving it successfully can naturally achieve diversity in compositions and accuracy in properties at the same time. Our AlloyGPT model is trained to learn the joint probability distribution of all possible compositions under the given property prompt. Thus, for example, when there exist two composition choices that can lead to the same property, the well-trained AlloyGPT model tends to produce a probability distribution in which both candidates have comparable high probability and can be chosen during multiple attempts. That is the underlying reason for what we observed in Fig. 4 (with T_p = 1), where AlloyGPT model achieves both accuracy and diversity due to the degenerate nature of the alloy system itself, not the choice of T_p. As shown in Fig. 6, for 0 < T_p ≤ 1, the design error (Fig. 6b) remains flat with some oscillations while there exists nontrivial diversity in the proposed compositions (Fig. 6a). Therefore, mainly for T_p > 1, there exists a trade-off between accuracy and diversity.

For forward prediction tasks, it should be noted that there are no degenerated solutions by nature. Therefore, the high accuracy in forward prediction tasks over the test set (Fig. 2 with a trivial T_p = 1) indicates that as a probabilistic model by nature, AlloyGPT can behave like a deterministic model when there is little degeneracy in the physical problem itself. To test stability and accuracy in the forward predictions explicitly, we randomly selected 50 compositions, applied AlloyGPT model to perform C-to-SP predictions 10 times for each composition, and calculated the relative L1 error of the predictions. The means and variances of relative L1 error for phase structures and properties are shown in Supplementary Fig. 7. For reliable predictions (e.g., mean relative L1 error being less than 0.06), the observed variances remain small (i.e., less than 0.01). While for those less accurate predictions (e.g., predictions on AsBuilt_TernaryMol% and AsBulit_Al3NiMol% in the randomly selected cases), relative larger variances (0.02–0.04) are observed. Combining the results, we observed that for reliable predictions, AlloyGPT model tend to generate stable and consistent results for forward predictions, even though it is a probabilistic model by nature.

Next, we focus our discussion on understanding how the AlloyGPT model reasons. Built upon the high prediction accuracy, our AlloyGPT model brings in unique opportunities to investigate inside the trained model and try to understand how it reasons, i.e., using provided information, trained parameters to form predictions.

To understand how the AlloyGPT model reasons when making predictions, we opened the model and analyzed the attention weights between information blocks. As the backbone of the AlloyGPT model, the MHA layers are expected to play key roles in learning the underlying relationships between tokens. The attention weights calculated within them often provide insights to understand the “thinking” process of LLM for natural language applications⁷². Here, we focus on a simple representative test, in which given the composition AlloyGPT model predicts the mol% of L1₂ phase under the as-built condition. The input prompt and the output are shaded in pink and blue in Fig. 7a, respectively. After the prediction is done, we extract and analyze the attentions weights from the individual heads and MHA layers. It should be noted that the original attention weights are formed between individual tokens. Here, for clarity, we group the raw individual tokens into information blocks (see rectangles in Fig. 7a) and coarse-grain the attention weight matrix accordingly. For individual heads, some clues found in the attention weights can suggest reasoning paths that agree with the underlying physics. For example, in the last MHA layer (i.e., Layer 35), some attention weights between the L1₂ phase information block (shaded in gray in the left columns) and all information blocks (shaded in color in the right columns) are shown in Fig. 7b, c as the color depth of the connection lines. In Head 1 (Fig. 7b), relatively high attention is paid to information blocks of element Zr and Yb while in Head 10 (Fig. 7c), among elements, relatively high attention is received by element Er and Y. Those attention priorities agree with the hidden physics that Zr, Yb, Er and Y are all potential elements that can form L1₂ phase and indicate that AlloyGPT model has learned the reasoning path to relate them strongly with the L1₂ phase information block via training.

**Fig. 7: Visualize attention weights as AlloyGPT makes predictions.**

It should also be pointed out that not all attention weights can be directly correlated to known physics and there exists a collection of different attention patterns across the parallel attention heads and serial MHA layers inside AlloyGPT. For instance, in Fig. 7c, the L1₂ phase information block pays the highest attention to the beginning information block of the text. In Fig. 7d, the attention weights between all information blocks are shown for the 16 heads through the 36 MHA layers. It can be observed that the attention pattern in each head (See each column) evolves across the layers; in the final MHA layer, the patterns from the 16 heads present strong diversity. It is the collective contribution of these patterns that AlloyGPT adopts to make the prediction of the next token, indicating an intertwined reasoning pathway. This complexity makes it challenging to provide a clear or simple explanation about how AlloyGPT reasons for the prediction but echoes the known tradeoff between the expressiveness and interpretability of language-based models⁷³.

To quantitatively investigate the “thinking” pattern of AlloyGPT on relating the inputs and outputs, we further investigate the influence of the input features on the predictions of the AlloyGPT model by calculating the input saliency⁷⁴ and input attributions⁷⁵. Input saliency has been used to distinguish the importance of the input features in the prompt on the output texts of LLMs by evaluating the derivatives of model output with respect to the inputs⁷⁶. Here, we leverage the back propagation mechanism and an integrated gradient method to calculate the input saliency for AlloyGPT when predicting the mol% of the as-built L1₂ phase (Fig. 8a–c) for the given input composition. As shown in Fig. 8a, when starting to predict the first token “+” in the numerical result, the information block “AsBuilt_L12Mol%” demonstrates the highest input saliency, indicating that AlloyGPT first identifies what phase is considered at this moment; when predicting the leading tokens that strongly affect the numerical value (e.g., “6” in b and “2” in c) given their positions, relatively high input saliencies are observed at information blocks of elements Er and Yb (Fig. 8b, c), suggesting that these two elements strongly affect the value of L1₂ phase mol%. This result agrees well with the high Pearson coefficients observed between the printed L1₂ phase mol% and the two elements among all alloying elements reported recently⁵⁸. These and similar observations strongly indicate that learning solely on data, AlloyGPT can form “thinking” patterns that pay uneven considerations to information blocks in the input prompts to form specific predictions; some of these patterns resonate with some known physics of alloys while some may prompt new understanding of the underlying physics. To understand how the final choices of the output tokens evolve through the depth of the AlloyGPT model, we trace the rankings of those tokens after each MHA layer. As shown in Fig. 8d, the important numerical tokens (e.g., “6”, “2” and the last two ‘0’s highlighted by the blue rectangles) often only become the clear 1st choice within the last three layers; while other supporting tokens which usually remain unchanged (e.g., “e” and “+” highlighted by the orange rectangles) emerge as the top choice earlier (at Layer 21 for “e” and at Layer 2 for “+”) and keep the position thereafter. These differences in the evolution of the decision making inside AlloyGPT indicate there exists a diverse yet comprehensive “thinking” process within the multiple MHA layers to reach the conclusion for the choice of tokens at different positions.

**Fig. 8: Quantitative evaluations of how input features affect the output of AlloyGPT.**

While gradient-based input saliency provides rich details about the influence of input features, it is often expensive to calculate and demands heavy computational resources (e.g., to calculate and store the gradients across the layers in LLMs) as the predicted sequence becomes longer. To collectively evaluate the influence of many compositions on the full predictions of the phases and properties, here we switch to an affordable perturbation method to calculate the LLM attributions⁷⁵ for AlloyGPT in forward prediction tasks. The input attribution is identified as the change of the direct output of AlloyGPT model (i.e., the probability of tokens) due to the ablation of the information blocks in the input prompts and only the relative values matter. As shown in Fig. 8e, the attributions of the alloying element information blocks on the predictions of phases and properties are evaluated and averaged over 1000 randomly selected compositions from the learned composition domain. The observed attributions demonstrate intriguing patterns between inputs and outputs, which may suggest new clues about the composition-structure-property relationship in the alloys. For instance, for Al₃Ni phases in both as-build and aged condition, element Ni shows the highest attribution, which agrees well with the physical intuition. On the printability indicators, freezing range and CSC, element Yb shows the strongest attribution, which matches well with the results of Pearson coefficient evaluation purely based on data analysis⁵⁷. Thus, future alloy research may take this as a working hypothesis and make plans to validate and understand the potential hidden mechanisms via physics-based modeling and experiments. It should be noted that it remains unclear whether or how the AlloyGPT captures the meaning of the numerical texts correctly or similarly as human does. Future studies may combine the observations on the interpretability of AlloyGPT here to formulate affordable methods to systematically define and investigate the sensitivity of AlloyGPT or similar LLMs trained in alloy-specific language or other data-intensive languages.

We continue our discussion by briefly comparing AlloyGPT with conventional methods. With the AlloyGPT model, our work here aims to introduce a novel language-based modeling approach for alloy prediction and design and highlight its unique advantages, especially over conventional methods. It is well recognized that conventional methods, including classical machine learning models, can excel in many alloy prediction tasks. For example, we have constructed and optimized a decision tree model using the XGBoost (eXtreme Gradient Boosting) algorithm with multiple outputs to solve the C-to-SP tasks and compared its performance with our AlloyGPT model. As shown in Supplementary Table 2, after training with the equivalent dataset, the XGBoost method can achieve satisfying accuracy for predictions on both phases and properties (with R² values range from 0.79 to 0.99), similar to the performance of AlloyGPT. Also, as a lighter model, the XGBoost method demands less computational resources for training (e.g., minutes in CPU times with XGBoost vs tens of hours with GPU computing for AlloyGPT) and inferring. However, with AlloyGPT, we observe some improvements in the prediction accuracy as shown in terms of the mean absolute errors (MAEs) in the last two columns in Supplementary Table 2.

More importantly, AlloyGPT model presents a few novel features that conventional methods rarely show. (1) Overloading functionalities within one unified model. Treating as sequence completing tasks, AlloyGPT can solve both forward predictions and inverse designs with one set of model weights, which is rarely achieved with conventional physical-based model and classical machine learning methods. Given the depth of attention layers in the model, we expect more alloy systems and tasks can be overloaded into the same AlloyGPT instance via continuous learning, which makes it a compelling candidate for a foundation model for alloy research. (2) Handling degenerated problems via the probabilistic nature. Facing degenerated problems with multiple solutions, conventional methods often require extra efforts to recover different solutions, like multiple searching attempts with different staring points. However, AlloyGPT learns the underlying probability distributions of the next token to be predicted and can naturally generate different solutions that process comparable probabilities. These two unique features distinguish AlloyGPT and similar language-based models from the conventional methodologies and open new possibilities in alloy and materials science research.

At the same time, it is acknowledged that there exist different pathways to train LLMs. In this work, we trained the AlloyGPT model from scratch so that it can serve as a clean benchmark to demonstrate its novel potential in alloy research. For other applications, one may prefer to teach the alloy-specific language proposed to LLMs that are pre-trained in natural languages (e.g., open sourced LLaMA²⁵ and DeepSeek²⁶ models) using various fine-turning strategies^77,78. We positively expect comparable performance as well as new functionalities may emerge from those models and reserve the in-depth investigation as future study.

Finally, we discuss the potential expansion and further development of our AlloyGPT model. We have developed AlloyGPT model with its applications to additive manufacturable Al-alloys in mind. Therefore, we have intentionally selected printability (i.e., crack-free additive manufacturing) indexes as part of the properties of interests. It is well known that high strength Al alloys (e.g., Al7075) are often prone to hot cracking during the rapid solidification in AM^79,80. Thus, we include freezing range, CSC and HSC in the database to evaluate the printability⁸¹ of various compositions. After training, AlloyGPT model can be applied to screen or suggest compositions that are optimized for printability in AM. Built upon the validated reliability of AlloyGPT, one can start with compositions generated by AlloyGPT under the prompt with low values of HCS (e.g., points highlighted in the blue rectangle in Fig. 4d), down select based on additional criteria, and then perform experimental validation. Going beyond, other parameters that are key to AM can be further integrated into future AlloyGPT versions. Given the learning capabilities of AlloyGPT models and expressiveness of the alloy-specific language, we expect information on the AM processing^82,83 (e.g., laser power, scanning speed, hatch spacing, etc.) can be further integrated into the “sentences” as a new block of processing information and let AlloyGPT model to learn the hidden relationship between composition, process, structures and properties. Facing the inevitable noise and deviations in real experiments, we expect uncertainty quantification techniques^58,84 can also be applied to gain deeper understanding of the trained AlloyGPT model as well as the alloy system and provide guidance for more robust designs that remain insensitive to various uncertainties in realistic engineering scenarios. We leave in-depth investigation along this direction for future study.

Our current protype AlloyGPT model has been trained on Al-based alloys. Given the generality of the alloy-specific language, more numerical data of relevant physics and other alloy systems can be easily reformatted as new text corps with no or little updates. GPT architecture and models similar to our AlloyGPT have demonstrated promising scaling law^85,86 in learning from more training data and achieving better performances. Combining these two strengths, we expect our AlloyGPT model to be further improved by training with enlarged text corpora for different alloys and materials. Not only numerical data but also text statements or descriptions⁶⁹ can be integrated at the same time in this language modeling process. This mixture may help to take advantage as much as possible of the available data in different formats and with various fidelity. Going beyond Al-based alloy, it will be interesting to let the AlloyGPT model learn multiple alloy systems and investigate its extrapolation capabilities in prediction or design for novel hybrids of alloys as well as alloys with spatial gradients⁸⁷, thus building towards a foundation model for alloys. We consider these studies as promising research directions in language modeling-based materials research and design.

In summary, we introduce AlloyGPT, an autoregressive language model specifically tailored to alloy design and prediction tasks. By encoding comprehensive alloy data into an alloy-specific language, our approach bridges the gap between traditional numerical modeling and modern GAI techniques. AlloyGPT successfully learns the intricate C-S-P relationships within alloys, demonstrates high predictive accuracy and robust inverse design capabilities for Al-based alloys, and suggests promising clues of alloy physics via intriguing attention patterns. Beyond its ability to discover known compositions, the model excels in generating diverse alloy designs that achieve targeted properties with high accuracy, effectively addressing the degeneracy challenge in inverse design.

Our results highlight the versatility of AlloyGPT in navigating the vast compositional design space of additively manufacturable alloys, with gradual degradation in performance observed only when operating outside the learned domain. The probabilistic nature of the model allows multiple composition designs to achieve the given design goal. Through parameters such as prediction temperature, the “creativity” of the model can be further boosted by tuning the balance between design diversity and accuracy, thus providing a flexible tool for alloy discovery. These findings underscore the potential of language modeling as a unified framework for both forward prediction and inverse design in materials science.

While this work establishes a foundation for integrating GAI into alloy and material design, we expect the current model can incorporate more knowledge with more training tokens. With a size similar to that of the medium version of GPT-2 model, AlloyGPT v1.0 is expected to be trained with tens of billions of tokens^88,89. However, the data set we created here provides about 0.5 billion tokens, which is relatively small. It remains challenging to curate large, consistent and high-quality alloy-specific datasets. We plan to continuously grow our alloy dataset by including more information blocks (e.g., blocks that describe processing conditions) and covering more alloy systems. With such a growing dataset, we expect to systematically test the model performance with various combinations of model configurations and training data sizes, in hope to understand the scaling law of AlloyGPT models, trigger emergent capabilities⁹⁰ and achieve optimal performances⁹¹. Therefore, future efforts may focus on expanding the training datasets to include diverse alloy systems and processing parameters, enhancing model architectures, and integrating textual and numerical data for a more comprehensive understanding. By leveraging the scalability of language models and the generalizability of the alloy-specific language, AlloyGPT is expected to serve as a cornerstone in the development of next-generation alloys, accelerating innovation and reducing resource-intensive experimental workflows.

Methods

The Al-based alloy database

As a prototype case study, in this work, we adopt a database on Al-based alloys with five alloying elements, including Ni (0–4%), Er (0–2%), Zr (0–2%), Y (0–1%) and Yb (0–1%), based on CALPHAD simulations using Thermo-Calc⁹². In a previous study⁵⁷, this database has been used to design printable Al alloys with high strength, which have been validated via experiments. Focusing on AM applications, we document microstructures and properties for both as-built and fully aged conditions. The key microstructure features include mol% of L1₂ phase, ternary phase, Al₃Ni phase, and Al₃Zr phase, while the properties cover diffusion resistivity⁹³, misfit strain, coarsening rate metric, freezing range, CSC, and HCS.

Diffusion resistivity is defined as

$$D=\mathop{\sum }\limits_{i=2}^{N}\frac{{\left({\bar{C}}_{i}^{\beta }-{\bar{C}}_{i}^{\alpha }\right)}^{2}}{{M}_{i}}$$

(1)

where $i=2,\ldots ,N$ refers to the alloying elements, ${\bar{C}}_{i}^{\alpha }$ or ${\bar{C}}_{i}^{\beta }$ refers to the average concentration of alloying element i in the precipitate phase $\beta$ or the matrix phase $\alpha$, ${M}_{i}$ refers to the diagonal component of the mobility matrix for element i. Misfit, $\bar{\epsilon }$, is one minus the ratio between the average lattice parameters of the precipitate phases and the matrix phase. The coarsening metric, CM, is defined as the ratio of misfit strain and diffusion resistivity as the following.

$${CM}=\frac{\bar{\epsilon }}{{\sum }_{i=2}^{N}{\left({\bar{C}}_{i}^{\beta }-{\bar{C}}_{i}^{\alpha }\right)}^{2}/{M}_{i}}$$

(2)

Freezing range, FR, is measured as the temperature difference between the states that the alloy is completely liquid and 99% solidified as

$${FR}={T}_{0}^{S}-{T}_{0.99}^{S}$$

(3)

where ${T}_{0}^{S}$ and ${T}_{0.99}^{S}$ are the lowest temperatures that the system is fully liquid and 99% solid, respectively. CSC introduced in ref. ⁹⁴ is defined as

$${CSC}=\frac{{t}_{v}}{{t}_{R}}$$

(4)

where ${t}_{v}$ is the normalized cracking-vulnerable time corresponding to a liquid volume fraction between 0.1 to 0.01 and ${t}_{R}$ is the normalized relaxation time with a liquid volume fraction between 0.6 to 0.1. HCS is defined as

$${HCS}={FR}\times {CSC}$$

(5)

To ensure printability, compositions that minimizing FR, CSC or HCS are preferred. For these physical quantities with nontrivial units, we normalize their values with respect to those of the benchmark alloy, i.e., “Alloy 1” in ref. ⁵⁹. The distributions of these quantities are shown in Supplementary Fig. 1–3. More details can be found in ref. ⁵⁷.

The alloy-specific language

The alloy-specific language proposed in this work plays the key role in representing scientific data in alloy research. Different from many natural languages, materials science data are numerical intensive and bear physics-based causality in between. We design our alloy-specific language and grammar to make it expressive, efficient and flexible. Exemplified in Fig. 1c, d, the key-value pairs can express many quantitative features in alloy research and manufacturing; the scientific notation adopted for the value part covers a broad range (i.e., from −9.999e+99 to +9.999e+99) with a flexible resolution (i.e., ranging from 0.001e-99 to 1.000e+99). The grouping of texts on the composition, structure, and property parts make it straightforward to encode the logic flow of forward prediction and inverse design using the sequence order. It is modular and thus straightforward to include more features and details (e.g., including text blocks on the processing condition, additional microstructural features, and more properties) while remaining readable for human researchers.

AlloyGPT model and training

AlloyGPT includes 36 layers of MHA layers and ~400 M trainable parameters. The full structure of the model is shown in Supplementary Fig. 4a. We augment a character-level tokenizer to include words for all elements and signs used in the alloy-specific language. The alloy-specific language dataset includes sentences for both forward prediction and inverse designs. To prepare it, we start with the 523,599 compositions with their structures and properties and randomly separate them into a training set (90% with 471,239 data entries) and a test set (10% with 52,360 entries). For training/testing, each data entry is converted into two sentences with a composition-structure-property and property-structure-composition order using the alloy-specific language, respectively (as shown in Fig. 1c, d). For example, 942,478 sentences corresponding to the 471,239 data entries are used to train the AlloyGPT model. We train the AlloyGPT model from scratch on the training set for 4 epochs with an AdamW optimizer⁹⁵ using a NVIDIA A40 GPU. No overfitting has been observed (Supplementary Fig. 4b). We developed our code based on nanoGPT⁹⁶

Design accuracy and diversity evaluation

We use relative L1 errors to measure the design accuracy for the properties.

$${L}_{1}^{{rela}}\left[x,\,y\right]=\frac{\left|y-x\right|}{\left|x\right|}$$

(6)

where x is the ground truth or input value, and y is the achieved value based on the designed composition.

For the repeated designs with the same input target and a fixed prediction temperature, we use the coefficient of variance, CV, to measure the diversity of the suggested compositions.

$${CV}=\frac{\sigma }{\mu }$$

(7)

where $\sigma$ is the standard deviation and $\mu$ is the mean of the designed mol% of the individual elements.

Data availability

The training dataset is generated using ThermoCalc. Due to the restrictions of ThermoCalc End User License Agreement⁹⁷, the dataset and the trained model will only be made available based on reasonable requests. Other data is contained within the manuscript and supplementary files.

Code availability

Source code and script examples, for training and inference, are available on GitHub https://github.com/Taheri-Mousavi-Laboratory/AlloyGPT. We use Python 3.10.13, PyTorch 2.2.0+cu121 with CUDA (CUDA version 12.1)⁹⁸, and an NVIDIA A40 with 48 GB VRAM for training and inference.

References

Gialanella, S. & Malandruccolo, A. Aerospace Alloys. https://doi.org/10.1007/978-3-030-24440-8 (Springer International Publishing, 2020).
Li, S. et al. Development and applications of aluminum alloys for aerospace industry. J. Mater. Res. Technol. 27, 944–983 (2023).
Article CAS Google Scholar
Bai, J. et al. Applications of magnesium alloys for aerospace: a review. J. Magnes. Alloy. 11, 3609–3619 (2023).
Article CAS Google Scholar
Czerwinski, F. Current trends in automotive lightweighting strategies and materials. Materials 14, 6631 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liu, B. et al. Development and application of magnesium alloy parts for automotive OEMs: a review. J. Magnes. Alloy. 11, 15–47 (2023).
Article CAS Google Scholar
Li, Y. et al. Al alloys and casting processes for induction motor applications in battery-powered electric vehicles: a review. Metals 12, 216 (2022).
Article CAS Google Scholar
Cann, J. L. et al. Sustainability through alloy design: challenges and opportunities. Prog. Mater. Sci. 117, 100722 (2021).
Article CAS Google Scholar
Raabe, D. The materials science behind sustainable metals and alloys. Chem. Rev. 123, 2436–2608 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wilms, M. B., Rittinghaus, S.-K., Goßling, M. & Gökce, B. Additive manufacturing of oxide-dispersion strengthened alloys: materials, synthesis and manufacturing. Prog. Mater. Sci. 133, 101049 (2023).
Article CAS Google Scholar
Han, S. Z., Choi, E.-A., Lim, S. H., Kim, S. & Lee, J. Alloy design strategies to increase strength and its trade-offs together. Prog. Mater. Sci. 117, 100720 (2021).
Article CAS Google Scholar
Kang, L. & Yang, C. A review on high-strength titanium alloys: microstructure, strengthening, and properties. Adv. Eng. Mater. 21, 1801359 (2019).
Article Google Scholar
Li, J. et al. Progress on improving strength-toughness of ultra-high strength martensitic steels for aerospace applications: a review. J. Mater. Res. Technol. 23, 172–190 (2023).
Article CAS Google Scholar
Jin, Z.-Z. et al. Alloying design and microstructural control strategies towards developing Mg alloys with enhanced ductility. J. Magnes. Alloy. 10, 1191–1206 (2022).
Article CAS Google Scholar
Darolia, R. Development of strong, oxidation and corrosion resistant nickel-based superalloys: critical review of challenges, progress and prospects. Int. Mater. Rev. 64, 355–380 (2019).
Article CAS Google Scholar
Li, W., Liaw, P. K. & Gao, Y. Fracture resistance of high entropy alloys: a review. Intermetallics 99, 69–83 (2018).
Article CAS Google Scholar
George, E. P., Raabe, D. & Ritchie, R. O. High-entropy alloys. Nat. Rev. Mater. 4, 515–534 (2019).
Article CAS Google Scholar
Yeh, J.-W. et al. Nanostructured high-entropy alloys with multiple principal elements: novel alloy design concepts and outcomes. Adv. Eng. Mater. 6, 299–303 (2004).
Article CAS Google Scholar
Birbilis, N., Choudhary, S., Scully, J. R. & Taheri, M. L. A perspective on corrosion of multi-principal element alloys. Npj Mater. Degrad. 5, 14 (2021).
Article CAS Google Scholar
Miracle, D. B. & Senkov, O. N. A critical review of high entropy alloys and related concepts. Acta Mater. 122, 448–511 (2017).
Article CAS Google Scholar
Armstrong, M., Mehrabi, H. & Naveed, N. An overview of modern metal additive manufacturing technology. J. Manuf. Process. 84, 1001–1029 (2022).
Article Google Scholar
Gardner, L. Metal additive manufacturing in structural engineering—review, advances, opportunities and outlook. Structures 47, 2178–2193 (2023).
Article Google Scholar
Thomas-Seale, L. E. J., Kirkman-Brown, J. C., Attallah, M. M., Espino, D. M. & Shepherd, D. E. T. The barriers to the progression of additive manufacture: perspectives from UK industry. Int. J. Prod. Econ. 198, 104–118 (2018).
Article Google Scholar
Baumers, M., Dickens, P., Tuck, C. & Hague, R. The cost of additive manufacturing: machine productivity, economies of scale and technology-push. Technol. Forecast. Soc. Chang. 102, 193–201 (2016).
Article Google Scholar
OpenAI et al. GPT-4 Technical Report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2024).
Touvron, H. et al. LLaMA: open and efficient foundation language models. Preprint at https://doi.org/10.48550/arXiv.2302.13971 (2023).
DeepSeek-AI et al. DeepSeek-V3 Technical Report. Preprint at https://doi.org/10.48550/arXiv.2412.19437 (2024).
DeepSeek-AI et al. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.2501.12948 (2025).
Baktash, J. A. & Dawodi, M. Gpt-4: a review on advancements and opportunities in natural language processing. Preprint at https://doi.org/10.48550/arXiv.2305.03195 (2023).
Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at https://doi.org/10.48550/arXiv.2303.12712 (2023).
Kashefi, A. & Mukerji, T. ChatGPT for programming numerical methods. J. Mach. Learn. Model. Comput 4, 1–74 (2023).
Google Scholar
Rozière, B. et al. Code Llama: open foundation models for code. Preprint at list https://arxiv.org/abs/2308.12950 or https://ai.meta.com/research/publications/code-llama-open-foundation-models-forcode/ (2024).
Ni, B. & Buehler, M. J. MechAgents: large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge. Extrem. Mech. Lett. 67, 102131 (2024).
Article Google Scholar
Jignasu, A. et al. Towards foundational AI models for additive manufacturing: language models for G-code debugging, manipulation, and comprehension. Preprint at https://scholar.google.com/citations?hl=en&user=Gb_YM5oAAAAJ&view_op=list_works&sortby=pubdate#:~:text=Towards%20foundational%20ai%20models%20for%20additive%20manufacturing%3A%20Language%20models%20for%20g%2Dcode%20debugging%2C%20manipulation%2C%20and%20comprehension (2023).
Buehler, M. J. MechGPT, a language-based strategy for mechanics and materials modeling that connects knowledge across scales, disciplines, and modalities. Appl. Mech. Rev 76, 021001 (2024).
Article Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).
Article CAS PubMed Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Article CAS PubMed PubMed Central Google Scholar
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Article CAS PubMed PubMed Central Google Scholar
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).
Article CAS PubMed Google Scholar
Ni, B., Kaplan, D. L. & Buehler, M. J. Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model. Chem 9, 1828–1849 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ni, B., Kaplan, D. L. & Buehler, M. J. ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a language diffusion model. Sci. Adv. 10, eadl4000 (2024).
Article CAS PubMed PubMed Central Google Scholar
Ni, B. & Buehler, M. J. Agentic End-to-End De Novo Protein Design for Tailored Dynamics Using a Language Diffusion Model. Preprint at https://doi.org/10.48550/arXiv.2502.10173 (2025).
Pyzer-Knapp, E. O. et al. Accelerating materials discovery using artificial intelligence, high performance computing and robotics. Npj Comput. Mater. 8, 1–9 (2022).
Article Google Scholar
Luu, R. K. et al. Learning from nature to achieve material sustainability: generative AI for rigorous bio-inspired materials design. MIT Explor. Gener. AI https://doi.org/10.21428/e4baedd9.33bd7449 (2024) .
Tong, X. et al. Generative models for de novo drug design. J. Med. Chem. 64, 14011–14027 (2021).
Article CAS PubMed Google Scholar
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Article CAS PubMed Google Scholar
Tran, H. et al. Design of functional and sustainable polymers assisted by artificial intelligence. Nat. Rev. Mater. 9, 866–886 (2024).
Article CAS Google Scholar
Gurnani, R. et al. polyG2G: a novel machine learning algorithm applied to the generative design of polymer dielectrics. Chem. Mater. 33, 7008–7016 (2021).
Article CAS Google Scholar
Park, H., Li, Z. & Walsh, A. Has generative artificial intelligence solved inverse materials design?. Matter 7, 2355–2367 (2024).
Article CAS Google Scholar
Chen, Y. et al. MatterGPT: a generative transformer for multi-property inverse design of solid-state materials. Preprint at https://scholar.google.com/citations?hl=en&user=yQklyicAAAAJ&view_op=list_works&sortby=pubdate#:~:text=MatterGPT%3A%20A%20Generative%20Transformer%20for%20Multi%2DProperty%20Inverse%20Design%20of%20Solid%2DSta (2024).
Roy, A. et al. Rapid discovery of high hardness multi-principal-element alloys using a generative adversarial network model. Acta Mater. 257, 119177 (2023).
Article CAS Google Scholar
Li, Z., Li, S. & Birbilis, N. A machine learning-driven framework for the property prediction and generative design of multiple principal element alloys. Mater. Today Commun. 38, 107940 (2024).
Article CAS Google Scholar
Zhu, D.-X. et al. Improved material descriptors for bulk modulus in intermetallic compounds via machine learning. Rare Met. 42, 2396–2405 (2023).
Article CAS Google Scholar
Zhu, D. et al. A transfer learning strategy for tensile strength prediction in austenitic stainless steel across temperatures. Scr. Mater. 251, 116210 (2024).
Article CAS Google Scholar
Bai, P. et al. The interpretable descriptors for fatigue performance of wrought aluminum alloys. J. Mater. Res. Technol. 32, 3423–3431 (2024).
Article CAS Google Scholar
Vaswani, A. et al. Attention is All you Need. In Advances in Neural Information Processing Systems Vol. 30 (Curran Associates, Inc., 2017).
Taheri-Mousavi, S. M. et al. Additively manufacturable high-strength aluminum alloys with coarsening resistant microstructures by exploiting rapid solidification. J. Adv. Mater. (2025). In Press.
Glaser, B., Hart, A. J. & Taheri-Mousavi, S. M. Computational design of additively manufacturable, cost-effective, high-strength aluminum alloys exploiting rapid solidification. J. Mech. Phys. Solids 200, 106120 (2025).
Article CAS Google Scholar
Gong, J., Olson, G. B., Snyder, D. R. & Kozmel, I. T. S. Multicomponent aluminum alloys for applications such as additive manufacturing U.S. Patent 11,401,585, https://patents.google.com/patent/US11401585B2/en?q=(Multicomponent+aluminum+alloys+applications+such+as+additive+manufacturing)&oq=Multicomponent+aluminum+alloys+for+applications+such+as+additive+manufacturing) (2022).
Taheri-Mousavi, S. M., Hart, A. & Olson, G. Methodologies for formulating compositions, including aluminum alloys with high-temperature strength. WO 92273: A2,PCT/US2023/078228. https://patentscope.wipo.int/search/en/WO2024092273 (2024).
Ge, Z. et al. High-strength additively manufacturable Al-Zr-Er-Ni alloys with high as-built ductility and thermal stability. Accepted by npj Advanced Manufacturing, https://doi.org/10.1038/s44334-025-00048-7 (2025).
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).
Google Scholar
Byte-Pair Encoding tokenization—Hugging Face NLP Course. https://huggingface.co/learn/nlp-course/en/chapter6/5.
Golkar, S. et al. xVal: a continuous numerical tokenization for scientific language models. Preprint at https://doi.org/10.48550/arXiv.2310.02989 (2024).
Number Tokenization Blog—a Hugging Face Space by huggingface. https://huggingface.co/spaces/huggingface/number-tokenization-blog.
McLeish, S. et al. Transformers can do arithmetic with the right embeddings. Adv. Neural Inf. Process. Syst. 37, 108012–108041 (2024).
Google Scholar
Schuurmans, D., Dai, H. & Zanini, F. Autoregressive large language models are computationally universal. Preprint at https://doi.org/10.48550/arXiv.2410.03170 (2024).
Yan, X. & Lin, J. C. Prediction of hot tearing tendency for multicomponent aluminum alloys. Metall. Mater. Trans. B 37, 913–918 (2006).
Article Google Scholar
Chaudhari, A., Guntuboina, C., Huang, H. & Farimani, A. B. AlloyBERT: alloy property prediction with large language models. Comput. Mater. Sci. 244, 113256. Preprint at http://arxiv.org/abs/2403.19783 (2024).
Khatamsaz, D. et al. Bayesian optimization with active learning of design constraints using an entropy-based approach. Npj Comput. Mater. 9, 1–14 (2023).
Article Google Scholar
Debnath, A. et al. Comparing forward and inverse design paradigms: a case study on refractory high-entropy alloys. J. Mater. Res. 38, 4107–4117 (2023).
Article CAS Google Scholar
Vig, J. A multiscale visualization of attention in the transformer model. Preprint at https://doi.org/10.48550/arXiv.1906.05714 (2019).
Wang, R. et al. Large language models are interpretable learners. Preprint at https://doi.org/10.48550/arXiv.2406.17224 (2024).
Feng, Z., Zhou, H., Zhu, Z., Qian, J. & Mao, K. Unveiling and manipulating prompt influence in large language models. Preprint at https://doi.org/10.48550/arXiv.2405.11891 (2024).
Kokhlikyan, N. et al. Captum: a unified and generic model interpretability library for PyTorch. Preprint at https://doi.org/10.48550/arXiv.2009.07896 (2020).
Alammar, J. Ecco: an open source library for the explainability of transformer language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations (eds Ji, H., Park, J. C. & Xia, R.) 249–257. https://doi.org/10.18653/v1/2021.acl-demo.30 (Association for Computational Linguistics, Online, 2021).
Parthasarathy, V. B., Zafar, A., Khan, A. & Shahid, A. The ultimate guide to fine-tuning LLMs from basics to breakthroughs: an exhaustive review of technologies, research, best practices, applied research challenges and opportunities. Preprint at https://doi.org/10.48550/arXiv.2408.13296 (2024).
Hu, E. J. et al. LoRA: low-rank adaptation of large language models. ICLR 1, no. 23. Preprint at https://doi.org/10.48550/arXiv.2106.09685 (2022).
Aucott, L. et al. Initiation and growth kinetics of solidification cracking during welding of steel. Sci. Rep. 7, 40255 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, N., Mokadem, S., Rappaz, M. & Kurz, W. Solidification cracking of superalloy single- and bi-crystals. Acta Mater. 52, 3173–3182 (2004).
Article CAS Google Scholar
Katgerman, L. & Eskin, D. G. In search of the prediction of hot cracking in aluminium alloys. in Hot Cracking Phenomena in Welds II (eds Böllinghaus, T., Herold, H., Cross, C. E. & Lippold, J. C.) 11–26. https://doi.org/10.1007/978-3-540-78628-3_1 (Springer, 2008).
Sames, W. J., List, F. A., Pannala, S., Dehoff, R. R. & Babu, S. S. The metallurgy and processing science of metal additive manufacturing. Int. Mater. Rev. 61, 315–360 (2016).
Article CAS Google Scholar
Brennan, M. C., Keist, J. S. & Palmer, T. A. Defects in metal additive manufacturing processes. J. Mater. Eng. Perform. 30, 4808–4818 (2021).
Article CAS Google Scholar
Wang, X. & Xiong, W. Uncertainty quantification and composition optimization for alloy additive manufacturing through a CALPHAD-based ICME framework. Npj Comput. Mater. 6, 1–11 (2020).
Article Google Scholar
Scaling Laws for Neural Language Models. https://arxiv.org/abs/2001.08361.
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws. https://arxiv.org/abs/2404.05405.
Dan, X. et al. Exceptional strength and ductility in heterogeneous multi-gradient TiAl alloys through additive manufacturing. Acta Mater. 281, 120395 (2024).
Article CAS Google Scholar
Reproducing GPT-2 (124 M) in llm.c in 90 minutes for $20 · karpathy/llm.c · Discussion #481. GitHub https://github.com/karpathy/llm.c/discussions/481.
openai/gpt-2: Code for the paper ‘Language Models are Unsupervised Multitask Learners’. https://github.com/openai/gpt-2?tab=readme-ov-file.
Wei, J. et al. Emergent Abilities of Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2206.07682 (2022).
Hoffmann, J. et al. Training Compute-Optimal Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2203.15556 (2022).
Andersson, J.-O., Helander, T., Höglund, L., Shi, P. & Sundman, B. Thermo-Calc & DICTRA, computational Tools for materials science. Calphad 26, 273–312 (2002).
Article CAS Google Scholar
Philippe, T., & Peter, W V. Ostwald ripening in multicomponent alloys. Acta Materialia 61, 4237–4244 (2013).
Article CAS Google Scholar
Clyne, T. W. & Davis, G. J. The Influence of Composition on Solidification Cracking Susceptibility in Binary Alloy Systems. Br. Foundrym 74, 65–73 (1981).
Google Scholar
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. Preprint at https://doi.org/10.48550/arXiv.1711.05101 (2019).
Andrej karpathy, nanoGPT, GitHub repository at https://github.com/karpathy/nanoGPT (2024).
Terms and Conditions for Purchase. Thermo-Calc Software https://thermocalc.com/terms-and-conditions-for-the-purchase/.
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Advances in neuralinformation processing systems 32 Preprint at https://doi.org/10.48550/arXiv.1912.01703 (2019).

Download references

Acknowledgements

We acknowledge support from Naval Nuclear Laboratory (NNL) award No. 1047622. This research was conducted using the Tartan Research Advanced Computing Environment (TRACE). The authors would like to gratefully acknowledge the College of Engineering at Carnegie Mellon University for making this shared high-performance computing resource available to its community.

Author information

Authors and Affiliations

Department of Materials Science and Engineering, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA
Bo Ni, Benjamin Glaser & S. Mohadeseh Taheri-Mousavi
Department of Mechanical Engineering, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA
S. Mohadeseh Taheri-Mousavi

Authors

Bo Ni
View author publications
Search author on:PubMed Google Scholar
Benjamin Glaser
View author publications
Search author on:PubMed Google Scholar
S. Mohadeseh Taheri-Mousavi
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: S.M.T.-M. and B.N. Investigation: B.N. and S.M.T.-M. Methodology: B.N. and S.M.T.-M. Resources: S.M.T.-M. Funding acquisition: S.M.T.-M. Data curation: S.M.T.-M., B.N. and B.G. Validation: S.M.T.-M. and B.N. Supervision: S.M.T.-M. Formal analysis: B.N. and S.M.T.-M. Software: B.N. and S.M.T.-M. Project administration: S.M.T.-M. Visualization: B.N. Writing—original draft: B.N. Writing—review and editing: B.N., S.M.T.-M. and B.G.

Corresponding author

Correspondence to S. Mohadeseh Taheri-Mousavi.

Ethics declarations

Competing interests

B.N. and S.M.T.-M are co-inventors on a provisional patent application regarding the methodology (AlloyGPT) presented in this paper.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ni, B., Glaser, B. & Taheri-Mousavi, S.M. End-to-end prediction and design of additively manufacturable alloys using a generative AlloyGPT model. npj Comput Mater 11, 294 (2025). https://doi.org/10.1038/s41524-025-01768-2

Download citation

Received: 19 February 2025
Accepted: 07 August 2025
Published: 26 September 2025
DOI: https://doi.org/10.1038/s41524-025-01768-2