Abstract
Zinc (Zn) alloys offer advantages such as abundant resources and low cost. Nevertheless, their current mechanical properties limit application in more advanced fields. Due to the lack of clear compositional design methods, the development of high-performance Zn alloys is urgently needed. To this end, this work proposes a fast and effective design strategy for Zn alloys based on machine learning (ML). The prediction models for the ultimate tensile strength, elongation, and hardness were successfully developed, with accuracies exceeding 90%. Interpretability analysis of the models was performed using the SHAP method with particle swarm optimization (PSO). Furthermore, a ML-based Zn alloy composition design system (ZACDS) was proposed by integrating the Bayesian optimization algorithm. A novel high-strength Zn alloy was successfully designed using ZACDS, demonstrating good agreement between predicted and experimental mechanical properties. This approach offers a new strategy for Zn alloy design under different compositional constraints and performance requirements.
Similar content being viewed by others

Introduction
Zinc (Zn), as the fourth most widely used metal worldwide, offers abundant resources, affordability, low melting point, excellent fluidity, superior corrosion resistance, and high recyclability1,2,3. Nevertheless, its low mechanical strength, poor ductility, and brittleness at low temperatures have hindered its further development and application4,5. To overcome these limitations and expand the application field of Zn, alloying has been employed as a strategy to improve the strength, corrosion resistance, and wear resistance of pure Zn through the addition of elements such as aluminum (Al)6,7, copper (Cu)8, magnesium (Mg)9, silicon (Si)10, lithium (Li)11, manganese (Mn)12, and rare earths (REs)13. Nonetheless, the wide variety of elements involved in alloying and the broad range of compositional combinations render traditional trial-and-error methods time-consuming, labor-intensive, and inefficient. Therefore, there is an urgent need for a more intelligent and efficient approach to assist in the compositional design of Zn alloys.
In recent years, machine learning (ML) has demonstrated exceptional potential and application value as an important data processing technology in materials science14,15,16. With powerful datasets and algorithms, ML can not only construct predictive models for desired properties quickly and accurately, but also reveal the underlying relationships between material features17,18, thereby helping researchers explore mechanistic issues in depth. Furthermore, compared to the trial-and-error approach traditionally used in alloy design, machine learning offers significant advantages in terms of cost-effectiveness and efficiency. And as a complement to traditional simulation methods such as density functional theory and molecular dynamics, machine learning not only provides new approaches for research in materials science but also serves as a driving force for the innovation and development of these traditional simulation methods. The successful application of ML has facilitated the development of novel alloys across various systems19. For instance, Jiang et al. 20 used interpretable ML models to identify the intrinsic factors and dominant patterns of alloying elements, as well as their effects on the strength, toughness, and stress corrosion resistance of Al alloys, leading to the successful design of new high-performance alloys. Similarly, Wei et al. 21 proposed an ML-based design model for predicting new Lead (Pb)-free solder alloys with high strength and ductility, and its reliability was validated experimentally. In addition, ML has been applied in the development of Mg alloys22,23, Cu alloys24,25, high-entropy alloys26,27, and Nickel (Ni)-based alloys28,29. However, research on the application of machine learning (ML) in zinc alloys is still limited. Heiss et al. 30 applied unsupervised machine learning methods to guide the zinc laser powder bed fusion (LPBF) manufacturing process. By optimizing process parameters, they achieved lower porosity in the Zn-4Al-1Cu alloy formed via LPBF, thereby improving the quality of the final product. Moreover, based on their self-constructed biodegradable zinc alloy database, Chen et al. 31 successfully used machine learning algorithms to screen for novel biodegradable zinc alloys with high ultimate tensile strength and low immersion corrosion rate. Although they have already attempted to use machine learning methods to assist in zinc alloy design, these studies are primarily based on alloy performance after post-processing. In contrast, for the development of new alloys, there is a greater need to understand the as-cast properties of zinc alloys, as this can significantly reduce the time and economic costs associated with frequent post-processing during the early stages of experimentation. To advance the design and development of Zn alloys, it is crucial to fully utilize the potential of ML in this area.
In this study, a rapid and effective ML-based design strategy for Zn alloys is proposed to achieve performance-driven compositional design. Figure 1 illustrates a flowchart of this strategy, which comprises four parts: predictive model development, model interpretability analysis, alloy design system construction, and experimental validation. Initially, leveraging the constructed database comprising the composition, ultimate tensile strength (UTS), elongation at break (EL), and hardness of different Zn alloys, three predictive models for different properties are developed through feature selection, model selection, and optimization. Subsequently, to clearly reveal the relationships between alloy compositional features and their effects on target properties, Shapley additive explanations (SHAP) is employed, which is based on game theory. Moreover, an expanded dataset generated through particle swarm optimization (PSO) is utilized to conduct interpretability analysis on the trained models. This process identifies the alloy compositional features that significantly impact performance and the effects of changes in the content of each component. Furthermore, by leveraging the performance prediction capabilities of high-precision models and the robust search capabilities of the Bayesian optimization algorithm (BOA), an ML-based Zn alloy composition design system (ZACDS) is developed, capable of handling multiple performance optimization tasks. This system can successfully predict a novel high-performance Zn alloy that meets the specified requirements. Lastly, the reliability of the model is validated via tensile testing and hardness measurements.
After the establishment of the three performance prediction models, the subsequent application tasks are carried out: data augmentation of the original dataset based on the PSO algorithm, followed by exploration of potential relationships between model features using SHAP values; development of a novel high-performance Zn alloy using an ML-based Zn alloy composition design system (ZACDS) and experimental evaluation of its actual performance. ZACM is the name of the new zinc alloy.
Results
Alloy performance dataset and data distribution
In this work, three datasets were constructed based on laboratory test data, relevant literature32,33,34,35,36,37,38,39,40,41,42, and the Matweb database, each containing the same types of compositional features and different performance metrics: UTS, EL, and hardness. Missing values in the datasets were filled with zeros. Detailed data can be found in Tables S1–S3 of the Supplementary Materials. Three predictive models were trained to predict the mechanical properties based on alloy composition under constant forming conditions. Consequently, the compositional features in the datasets were used as inputs, while the UTS, EL, and hardness properties were used as outputs.
Figure 2a–c present the distribution of data based on the amount of elements in the alloys for the three mechanical properties. The samples in the UTS, EL, and hardness datasets primarily consisted of ternary to six-component alloys. The alloy composition diversity contributes to a broader information space for the ML models, thereby enhancing their generalization ability and robustness and providing an effective basis for subsequent application work.
a–c Distribution of alloy composition quantity. The x-axis represents the quantity of the main elements in the alloy, while the y-axis represents the proportion in the dataset; d–f Distribution of element types and contents. The x-axis represents the categories of all elements present in the dataset, while the y-axis represents their proportion in the dataset; g–i Distribution of mechanical property ranges. Different colors represent statistical ranges for various properties, and the size of the sectors indicates their proportion in the dataset.
The distribution of element types and contents in the datasets is exhibited in Fig. 2d–f. The UTS, EL, and hardness datasets involved 10 alloying elements. In all three datasets, the element with the widest content range was Al, ranging from 0 to 43 wt.%, followed by Cu, with a range of 0–13 wt.%, and Mg, with a range of 0–3 wt.%. The content of the other elements ranged generally between 0 and 1.2 wt.%. This aligns with the extensive research on doping Zn alloys with Al, Cu, and Mg43,44,45. Furthermore, to enhance the reliability and accuracy of ML model applications, the composition search space was defined based on the element content ranges in the dataset, thereby reducing the risk of model performance degradation due to extrapolation.
The distribution of the three mechanical properties in the dataset is plotted in Fig. 2g–i. The UTS, EL, and hardness values ranged within 25–457 MPa, 0–20.1%, and 53–200 HV, respectively. The proportion of samples with a UTS between 0 and 300 MPa was 72.8%, with an EL between 0-5% was 76.5%, and with a hardness between 40 and 115 HV was 62.7%. It is evident that the majority of Zn alloy samples exhibited relatively low performance and failed to demonstrate well-balanced mechanical properties, indicating significant room for improvement and optimization. Introducing ML models to guide Zn alloy design based on data-driven insights is essential for accelerating the development of high-performance Zn alloys.
Feature selection
Prior to feature selection, this work first divided the dataset into a training set and a test set in an 8:2 ratio using random sampling. All feature analysis and selection steps were performed only on the training set, with the test set not involved in this process. This was intended to preserve the independence of the test set during the evaluation and analysis phase, thereby effectively reducing the risk of data leakage.
Here, the correlation coefficients between features and target properties were calculated and visualized as proportions (Figure S1). The Pearson correlation coefficient analysis indicated that Cu and Mg have a strong linear correlation with the UTS and hardness of Zn alloys, while the RE elements La and Ce exhibited a highly linear correlation with EL (Figure S1a). Similarly, the Spearman correlation coefficient indicated that Cu is strongly correlated with both UTS and hardness (Figure S1b). Nevertheless, most compositional features exhibited varying correlation strengths with the target properties across the two correlation coefficients, with some, e.g., Mg, even exhibiting an inverse correlation trend. This implies that relying solely on a single correlation coefficient may not comprehensively capture the true relationships between variables. Thus, using both correlation coefficients for feature selection would be both necessary and effective. To reduce the dimensionality of the input features, enhance model training efficiency, and mitigate the risk of overfitting, the Ca and B features, which had a correlation with the target of less than 0.25, were eliminated.
To conduct a more comprehensive feature selection, the input features were ranked using two common evaluation metrics for feature importance based on random forest (RF) (Fig. 3). The feature ranking based on the mean decrease in impurity (MDI) indicated that Al, Cu, and Mg have the most significant impact on UTS and Hardness, while La and Mn have a considerable effect on EL (Fig. 3a–c). Similarly, the ranking based on permutation importance (PI) revealed that Cu has the strongest effect on UTS, while La and Mg are the features that most significantly impact EL and Hardness, respectively (Fig. 3d–f). To reduce model complexity, the Ca, Ti, and B features, which consistently ranked lowest across six evaluations via the two RF-based metrics, were selected for further consideration. In conjunction with the correlation analysis results, it was decided to exclude the Ca and B features from the final model.
MDI: (a) UTS; (b) EL; (c) Hardness. PI: (d) UTS; (e) EL; (f) Hardness. The MDI and PI results are obtained from the average values of 100 random runs. The boxplot exhibits the first quartile (Q1, bottom of the boxes), second quartile (Q2, also known as the median), and third quartile (Q3, top of the boxes). The interquartile range (IQR) is defined as (Q3 - Q1), and the whiskers represent the range (Q1 - 1.5 IQR, Q3 + 1.5 IQR). Values outside the whiskers are considered outliers and have been removed from the figure.
Model selection and hyperparameter optimization
In this work, a performance evaluation was conducted on four standard statistical models: three linear regression models (unregularized linear regression, regularized Lasso, and Ridge) and a response surface model. Multiple random splits of the dataset were used for training, and the prediction performance of the models was evaluated based on RMSE and the coefficient of determination (R²) for both the training and testing sets. The evaluation results are shown in Tables S4, S5, and S6. For predicting UTS and EL, the R² values of the linear regression models are all below 0.5, with some negative R² values in the test set. This indicates that the linear regression model is not suitable for predicting the mechanical properties of zinc alloys. Although the linear regression model achieved an R² of around 0.75 for predicting hardness, it still does not meet the requirements for practical application. As for the response surface model, it showed large negative values and high standard deviations across all test set R² values, indicating poor performance in predicting the alloy properties.
To improve the efficiency of this process, three ensemble algorithms were employed: Extreme Gradient Boosting (XGBoost)46, RF47, and Adaptive Boosting (AdaBoost)48. Each algorithm was used to build prediction models for the three mechanical properties. The models were evaluated by comparing the average R² over 100 random dataset splits. Moreover, the performance variations of the models under different training/testing set split ratios were also investigated.
Figure 4a–c present the prediction accuracy of the three algorithms for each mechanical property under different split ratios. The models constructed by the same algorithm exhibited similar R² trends across different target performance metrics. The R² values on the test sets of the three models improved with increasing training set proportion, reaching relatively high values when the training set ratio was 0.8. As regards the training set R², the models built using the RF and XGBoost algorithms exhibited a consistent upward trend, similar to the test set, while the model built with the AdaBoost algorithm presented a notable decline. This can be attributed to the strong dependence of the model on the dataset, where increasing the training data results in greater model complexity and a higher likelihood of overfitting. Based on the performance evaluation of models for different target metrics, the model with the highest average R² at a training set ratio of 0.8 was selected for each target. In particular, XGBoost was chosen for predicting UTS and Hardness, while RF was selected for predicting EL.
Mean and standard deviation of R² for the three ensemble algorithms under different training set ratios: (a) UTS; (b) EL; (c) Hardness. Predictive performance of the optimized model: (d) UTS; (e) EL; (f) Hardness. The red points represent the sample points from the training set, and the green points represent the sample points from the test set.
To tune the hyperparameters of the prediction models for different target performances, Bayesian optimization49 was applied. To avoid overfitting or underfitting while improving model accuracy, the hyperparameter search space was constrained. The specific hyperparameters and their respective ranges for XGBoost and RF are listed in Table 1.
Figure 4 presents the optimal hyperparameter combinations and model accuracy after optimization. In these plots, red points represent the training set samples, while green points represent the test set samples. In addition, the dashed lines represent the ideal fit line with a slope of 1; the closer the predicted points to this line, the higher the accuracy of the model. The UTS prediction model achieved an R² of 0.93 on the training set and 0.92 on the test set, with a root mean square error (RMSE) of 22.29 MPa (Fig. 4d). The EL prediction model reached an R² of 0.97 on the training set and 0.95 on the test set, with an RMSE of 0.95% (Fig. 4e). The Hardness prediction model achieved an R² of 0.96 on the training set and 0.93 on the test set, with an RMSE of 6.91 HV (Fig. 4f). All three models demonstrated high predictive accuracy on both the training and test sets, with R² differences of less than 0.03, indicating strong generalization ability. No significant overfitting or underfitting was observed, providing a stable and reliable basis for the further application of these prediction models.
Model interpretability analysis
The interpretability analysis of ML-based alloy performance prediction models not only helps understand the impact of different alloy compositions on performance but also reveals the potential interactions and relationships between compositional features. This analysis is fundamentally based on the training and prediction sample points, where both the quality and quantity of these points significantly affect its effectiveness.
Inspired by the strategies employed by Chen et al. 50 and Dong et al. 51 to improve the accuracy of predictive models via virtual sample generation, this study utilized PSO to generate virtual samples before conducting interpretability analysis of the models. This approach expanded the original dataset, enriching the original feature space and providing a more robust data basis for the interpretability analysis. Figure S2a–c illustrate the distribution of the original dataset across features for UTS, EL, and Hardness, respectively. It is evident that the distribution of the original dataset in the feature space was uneven, limiting the extraction of relational information among model features. In contrast, Figure S2d–f depict the distribution of the augmented datasets across features, exhibiting a more uniform distribution with richer sample information. This enhanced distribution helps better elucidate the relationships among model features.
Furthermore, the SHAP values for each model were calculated based on the augmented dataset, facilitating a more intuitive analysis of how input features affect target performance. Figure 5a visualizes the relative importance of all compositional features for the predictive models of UTS, EL, and Hardness, along with the overall importance ranking for the combined mechanical properties. Features located further outward in the ring indicate greater importance for the overall mechanical properties of the Zn alloy, while the relative importance of a compositional feature reflects its effect on the corresponding target performance in the predictive model. By observing the relative importance of the features in each model, it can be deduced that Cu and Al were the dominant features affecting UTS performance. As regards EL, Ce and La had a significant impact, with Mg and Cu exhibiting nearly equal importance. In the Hardness prediction model, Mg and Cu were the most dominant features. From the overall importance ranking, it becomes evident that Cu, Mg, and Al were the three most critical elements affecting the mechanical properties of Zn alloys. Figure 5b–d visualize the SHAP values for all samples in the augmented datasets of the three mechanical properties. The horizontal axis represents the SHAP value, with its magnitude indicating the extent to which a feature affects target performance. Positive or negative SHAP values indicate the direction of the feature effect on target performance, with positive values suggesting a positive contribution to the performance prediction. The color of each sample point reflects the magnitude of a feature for that specific sample. In the UTS model, higher values of Al, Cu, and Si are associated with higher predicted target values, suggesting that these three elements contribute positively to enhancing the strength of Zn alloys. In the EL model, higher values of Ce, La, and Mn are associated with higher predicted elongation, suggesting that the addition of RE elements and Mn contributes positively to the toughness of Zn alloys. In the Hardness model, Mg, Cu, and Si are positively correlated with the predicted hardness, highlighting their significant role in improving the hardness of Zn alloys. Although Al, Ti, Ce, and La also contribute to hardness improvement, their effects are less pronounced. Furthermore, combining the SHAP value analyses from both the UTS and EL models reveals that a high Mg content affects negatively both target performances. However, a positive correlation with both target properties is observed with decreasing Mg content, suggesting that trace amounts of Mg are beneficial for developing high-strength, tough Zn alloys.
a Ringed bar chart visualizing the relative and comprehensive importance of all features. The red, yellow, and blue colors correspond to the relative importance of each feature in the UTS, EL, and Hardness prediction models, respectively. SHAP values of all feature sample points in the prediction models: b UTS model; c EL model; d Hardness model.
Composition design system development and application
In alloy design, it is essential not only to enhance a single target property but also to comprehensively optimize multiple properties. In certain specific application scenarios, performance must meet requirements within a specified range. For instance, when Zn alloys are used as biodegradable materials in the medical field, the implant material must possess human bone-like mechanical properties along with a moderate degradation rate in the body. To enhance the application of ML in alloy design, this work introduces ZACDS, which is based on the previously developed predictive models for three Zn alloy properties and the BOA, addressing compositional design tasks under various target property constraints. The flowchart of ZACDS is illustrated in Figure S3. The core concept involves using the BOA to iteratively search within a defined space, minimizing the error between predicted and target performances to achieve optimal composition design. The specific method is described below.
First, it is crucial to establish the constraints. After defining the target performance metrics for UTS, EL, and Hardness, the ranges of all compositional features should be determined based on additional criteria, e.g., cost, thereby constructing the search space for the composition design scheme. Notably, to reduce the potential uncertainty and prediction errors in the model during extrapolation, the feature ranges should be aligned with the fluctuation intervals of the content of each element in the dataset, enhancing the reliability of the output results. The objective function for the BOA is defined as follows:
where \({Score}\) is the objective value in BOA, \({W}_{{UTS}}\), \({W}_{{EL}}\), and \({W}_{{HB}}\) are the importance coefficients for UTS, EL, and Hardness, respectively, with values of either 1 or 0, and \({R}_{{UTS}}\), \({R}_{{EL}}\), and \({R}_{{HB}}\) are the ratios of predicted to target values for UTS, EL, and Hardness, respectively.
Due to the significant differences in the magnitudes of the three performance metrics, simply summing their prediction errors makes it difficult to obtain valid results via error minimization. To address this issue, the ratio of predicted to target values was utilized. Furthermore, an importance coefficient was assigned to each performance metric, reflecting its significance. For instance, when a single target performance is specified for UTS, its coefficient is set to 1, while the coefficients for the other two performances are set to 0. This allows the composition design to focus on a single target. In scenarios requiring multi-objective design tasks with two or three specified performances, the corresponding coefficients are set to 1, while those for the other performances are set to 0. This approach effectively addresses all potential specific performance design tasks. Finally, iterative Bayesian optimization is used to minimize the objective value score, achieving minimal error in target performance. Once the maximum number of iterations is reached, the minimum error value is compared with the allowable deviation; if the condition is met, the design solution is provided; otherwise, “no solution” is output. In cases where a design solution is not obtained during runtime, the number of iterations is increased or the allowable deviation is relaxed. In addition, adjusting the search space can also improve solution efficiency.
Considering the alloy design cost and the interpretability analysis results, Al, Cu, and Mg were selected as the search elements. Referring to the distributions in the dataset, the search ranges were set as 3-8 wt.% for the Al content, 0-6 wt.% for the Cu content, and 0-0.05 wt.% for the Mg content.
The mechanical properties were derived from samples in the dataset that contain doped Al, Cu, and Mg elements, as well as the performance of commercial Zn alloys outlined in GB/T 1175-2018 (Fig. 6). The visualization of these sample points provides a design space with optimized overall mechanical properties. UTS was given as 300 MPa, with a permissible deviation of 20 MPa; EL was specified at 3.5%, with an allowable deviation of 0.5%; hardness was given as 130 HV, with a permissible deviation of 10 HV. After 50 iterations, the desired composition was successfully identified and designated as ZACM alloy, with predicted Al content of 3.96 wt%, Cu content of 5.83 wt%, and Mg content of 0.05 wt%. The predicted UTS, EL, and hardness performances were 298.52 MPa, 3.44%, and 128.64 HV, respectively, represented by the yellow star in Fig. 6. To verify the accuracy of the prediction, the ZACM alloy was prepared, and component testing was conducted using inductively coupled plasma optical emission spectroscopy (ICP-OES). The actual Al, Cu, and Mg contents were found to be 4.01 wt%, 5.92 wt%, and 0.02 wt%, respectively. The UTS, EL, and hardness values of the prepared ZACM alloy were 298 ± 14 MPa, 3.2 ± 0.2%, and 133 ± 2 HV, with the test results shown in Figure S4 and represented by the red star in Fig. 6. The errors between the experimental and predicted UTS and hardness values were both less than 3%, while that concerning the EL performance was slightly higher (7.5%). All errors were within the acceptable range, and the performance was within the initially specified range.
a Three-dimensional distribution of samples based on their UTS, EL, and Hardness properties. b Two-dimensional distribution of samples based on their UTS and Hardness properties. c Two-dimensional distribution of samples based on their Hardness and EL properties. The blue data points represent the training points in the dataset, the green data points correspond to commercial alloys, while the yellow and red stars represent the predicted and actual ZACM alloy samples, respectively. The overlapping areas of the color blocks are the specified performance ranges.
Microstructure analysis
To further evaluate the performance of the newly designed alloys, this work includes a comparison with the commercial zinc alloy ACuZinc5, known for its high strength and wide application. According to the ASTM 240-13 standard, the Al content of ACuZinc5 is 2.8–3.3 wt%, the Cu content is 5.2–6 wt%, and the Mg content is 0.035–0.05 wt%. The alloy was cast using the same melting process as the newly designed zinc alloy ZACM. The composition was analyzed using ICP-OES, and the results showed that the Al content was 3.3 wt%, the Cu content was 5.5 wt%, and the Mg content was 0.04 wt%, all of which conform to the standard. The tensile strength, elongation, and hardness values of ACuZinc5 were 253 MPa, 0.8%, and 128 HV, respectively. Compared to ACuZinc5, the newly designed ZACM zinc alloy exhibited a 17.8% improvement in tensile strength, along with increased elongation and hardness.
To further investigate the reasons for the performance enhancement of the new alloy, microstructural characterization was conducted. As shown in Figure S5, the optical microscopy results at different magnifications are presented. Both ZACM and ACuZinc5 exhibit the precipitation of a second phase. The precipitates in ZACM appear shorter and smaller, while those in ACuZinc5 are more elongated. In addition to the precipitate phase, some regions with finer microstructures (such as the small area indicated by the red box) can be observed in Figure S5d. A similar region also appears to be visible in Figure S5h, although it is very small in area.
To clarify the composition of the precipitate phase and better observe the regions with fine microstructures, further observations were conducted using a scanning electron microscope. As shown in Figure S6, the SEM and EDS characterization results of the two alloys are presented. Points 2 and 4 in the figure correspond to the precipitate phases observed under the optical microscope, and their composition analysis results are shown in Table S7. The Zn and Cu contents at these two points are highly consistent with the CuZn4 phase reported in the literature52,53. Furthermore, these studies have shown that CuZn4 exhibits high hardness, which contributes to the improvement of the alloy hardness properties. Additionally, from the overall elemental analysis and the characterization results at points 3 and 6, we observed that the regions with fine microstructures mainly consist of Zn and Al elements. This microstructure is highly consistent with the Zn-Al eutectic region reported by Priya et al. 54. Their work suggests that, with a higher zinc content in the alloy, each eutectic colony can be considered a grain containing a zinc-rich layer. Compared to ACuZinc5, the ZACM alloy exhibits a more distinct and abundant Zn-Al eutectic region. The presence of the Zn-Al eutectic region refines the Zn phase in the ZACM matrix and impedes the growth of CuZn4, making it shorter and smaller. This change leads to finer grains in ZACM, which, according to the Hall-Petch55 grain refinement strengthening effect, significantly enhances the strength of the material.
Discussion
This study introduces a alloy design strategy developed using classical ML algorithms. The strategy ensures model reliability while avoiding the limitations associated with model specialization56, making it more user-friendly and easier to understand. This is of great significance in advancing the widespread application of ML in traditional alloy design processes and achieving practical benefits. In addition, inverse design is a key approach for accelerating material development and discovering superior new materials57,58,59. To this end, this study also proposes a framework for reverse design of alloys under multiple constraints. It not only enables efficient alloy design, but also allows for precise adjustments based on specific requirements by optimizing the objective function. This further enhances the adaptability and efficiency of the design process. The data-augmented model interpretability method makes the impact of each feature on target performance more visually intuitive and clear. This approach also effectively narrows the complex compositional design space, significantly reducing computation time and costs. Furthermore, This is a successful attempt at the initial application of ML for predicting and optimizing the mechanical properties of Zn alloys, which paves the way for the development of high-strength Zn alloys. Meanwhile, established a comprehensive dataset of Zn alloys, incorporating composition and performance information from the literature, the MatWeb database, and laboratory data, which will provide robust support for future data-driven zinc alloy design. Meanwhile, a comprehensive dataset of Zn alloys was established, incorporating composition and performance information from the literature, the MatWeb database, and laboratory data, which will provide robust support for future data-driven zinc alloy design. Finally, the significant potential of machine learning in guiding the development of Zn alloys was further validated by testing the mechanical properties of the designed alloys.
Methods
Performance prediction models
Establishing a reliable predictive model is fundamental to any ML application, serving as the foundation upon which all different application pathways are built and developed. The construction of the predictive models for this study can be broadly divided into the following steps:
In the first step, three datasets were constructed for three Zn alloy properties: UTS, EL, and hardness. The data were obtained from laboratory experiments, relevant literature, and publicly available databases. To eliminate the influence of post-processing techniques, such as die-casting and heat treatment, on alloy performance, only data from cast Zn alloys were retained. Each dataset contains both alloy composition and the corresponding mechanical properties. The UTS dataset consists of 185 alloy samples, the EL dataset contains 173 alloy samples, and the hardness dataset includes 155 alloy samples. These datasets will be split into training and testing sets. The training set will primarily be used for constructing prediction models, while the testing set will be used to evaluate the accuracy of the models.
The second step was feature selection, which is critical for identifying input features that can accurately predict the desired properties. In this work, Pearson correlation, Spearman correlation, and RF-based feature importance ranking were used to comprehensively evaluate and select input features. Pearson and Spearman correlations provide insights into both linear and monotonic relationships between feature variables60. The RF-based feature importance ranking was assessed via two metrics: MDI and PI61. This balanced approach mitigates bias and limitations in feature evaluation, facilitating highly accurate identification of key features. Furthermore, to avoid potential data leakage risks, feature selection must be performed on the training set.
The third step involved model training and selection. To select an appropriate dataset partition ratio and algorithm for constructing the predictive model, this study performed 100 random splits of the dataset under seven different partition ratios (training set to test set sample ratios of 3:7, 4:6, 5:5, 6:4, 7:3, 8:2, and 9:1). Three commonly used ensemble algorithms for predictive modeling were chosen as candidates: XGBoost, RF, and AdaBoost. The three algorithms were trained under different partition ratios, and the prediction performance of the models was evaluated based on the RMSE and coefficient of determination R² for both the training and testing sets. The equations for the two evaluation metrics are provided in Eqs. (2) and (3):
where \(n\) is the total number of datapoints, \({y}_{i}\) is the actual value, \({\hat{y}}_{i}\) is the value predicted by the model, and \({\bar{y}}_{i}\) is the mean of the actual sample data.
In the fourth and final step, the hyperparameters of the selected model were optimized. Appropriate hyperparameters not only enhance the predictive performance of a model but also prevent overfitting and underfitting, thereby increasing model reliability. Given that ensemble algorithms involve numerous hyperparameters, BOA was chosen to conduct the search within this extensive high-dimensional space, improving the efficiency of the optimization process.
Virtual sample generation
This study employed a virtual sample generation method based on the PSO62 algorithm to augment the original dataset with sample points in conjunction with the Zn alloy performance prediction model. Initially, the asymmetric acceptance domain was calculated based on the sample points from the original dataset, restricting the range of generated feature values. The lower bound (LB) and upper bound (UB) of the range were calculated as follows:
where \({AVG}\), \(\min\), and \(\max\) represent the mean, minimum, and maximum values of the corresponding features for all samples, respectively, \({N}_{L}\) is the number of sample points with values smaller than \({AVG}\), and \({N}_{U}\) is the number of sample points with values larger than \({AVG}\).
Subsequently, the feature values of virtual samples were iteratively generated using the PSO algorithm. Then, these feature values were input into the prediction model to obtain the corresponding target performance, ultimately forming a complete virtual sample. In addition, to ensure the quality of the virtual samples, the dataset underwent normalization and denormalization, and the final virtual sample set was subjected to outlier detection utilizing the Z-Score algorithm.
Tools for model interpretability
SHAP63 is a game theory-based interpretability method, widely applied in ML model interpretability research. In this study, SHAP analysis was introduced into the Zn alloy performance prediction models to quantify the contribution of each component to the prediction outcome, thereby revealing the effects of different elements in the alloy on mechanical properties. By calculating the marginal contribution of each feature under different input combinations and visualizing these values, SHAP can intuitively demonstrate the impact of various components on performance predictions. SHAP analysis not only enhances model interpretability but also helps identify key components and their underlying mechanisms.
Experiment
The novel alloy ZACM and the commercial alloy ACuZinc were cast under argon protective atmosphere using a vacuum induction furnace with an alumina crucible. The raw materials included high-purity Zn ingots (99.99%), pure Al (99.5%), pure Mg (99.95%), and CuZn25 master alloy. After placing the raw materials into the crucible, the metals were heated to 700 °C and, following a 2 min refinement and temperature adjustment, were cast into ingots. Their composition was analyzed via ICP-OES, and test specimens for mechanical property evaluation were prepared via wire cutting.
Alloy performance was assessed via strength and hardness testing. The tensile testing specimens were dog bone-shaped, with a gauge length of 20 mm and cross-sectional area of 5 mm × 2 mm. The UTS and EL were measured at room temperature using a KV-100KNW universal testing machine at a constant strain rate of 0.02 mm·s−1. For hardness testing, block specimens measuring 10 mm × 10 mm × 3 mm were prepared. Their surface was ground using SiC sandpapers with grit sizes ranging from 400 to 2000, followed by polishing with diamond suspensions of 3.5, 1.5, and 0.5 microns. Hardness was measured using an HVS-1000 Vickers hardness tester. The sample was tested at five different locations, and the mean value was taken as the final hardness value.
Evaluation of the microstructure of the studied alloys was carried out using an optical microscopy and a scanning electron microscopy (SEM, SU-5000) with an energy dispersive spectrometer (EDS). All the samples for microscopic observation were polished, etched with an etchant composed of 4 ml nitric acid and 95 ml ethanol for 2–3 seconds.
Data availability
The datasets used and analysed during the current study are included in the supplementary information files of this article.
Code availability
The source code used in this article can be accessed at https://github.com/pcf2000/MAD.git.
References
Bagheri, B., Mehrabani, V. J. & Farrokhpay, S. Recovery of sphalerite from a high zinc grade tailing. J. Hazard. Mater. 381, 120946 (2020).
Asgari, H., Toroghinejad, M. R. & Golozar, M. A. On texture, corrosion resistance and morphology of hot-dip galvanized zinc coatings. Appl. Surf. Sci. 253, 6769–6777 (2007).
Apelian, D., Paliwal, M. & Herrschaft, D. C. Casting with Zinc Alloys. JOM 33, 12–20 (1981).
Shi, Z. Z. et al. Design biodegradable Zn alloys: Second phases and their significant influences on alloy properties. Bioact. Mater. 5, 210–218 (2020).
Ren, K. et al. Effect of ECAP temperature on formation of triple heterogeneous microstructure and mechanical properties of Zn–1Cu alloy. Mater. Sci. Eng.: A 826, 141990 (2021).
Yan, S. et al. Influence of Different Al Contents on Microstructure, Tensile and Wear Properties of Zn-based Alloy. J. Mater. Sci. Technol. 26, 648–652 (2010).
Ares, A. E. & Schvezov, C. E. The effect of structure on tensile properties of directionally solidified Zn-based alloys. J. Cryst. Growth 318, 59–65 (2011).
Li, L. et al. Microstructures, mechanical properties and in vitro corrosion behavior of biodegradable Zn alloys microalloyed with Al, Mn, Cu, Ag and Li elements. J. Mater. Sci. Technol. 103, 244–260 (2022).
Chen, T. J., Hao, Y., Sun, J. & Li, Y. D. Effects of Mg and RE additions on the semi-solid microstructure of a zinc alloy ZA27. Sci. Technol. Adv. Mater. 4, 495–502 (2003).
Savaskan, T. & Bican, O. Effects of silicon content on the microstructural features and mechanical and sliding wear properties of Zn–40Al–2Cu–(0–5)Si alloys. Mater. Sci. Eng: A 404, 259–269 (2005).
Gancarz, T. & Cempura, G. Characterization of ZnAl cast alloys with Li addition. Mater. Des. 104, 51–59 (2016).
Sun, S. et al. Abnormal effect of Mn addition on the mechanical properties of as-extruded Zn alloys. Mater. Sci. Eng.: A 701, 129–133 (2017).
Li, H., Wang, P. & Liu, X. Effect of rare earth elements Ce and Yb on the in vitro properties of biodegradable Zn alloys. J. Mater. Res. Technol-JMRT 30, 8228–8239 (2024).
Schmidt, J., Marques, M. R. G., Botti, S. & Marques, M. A. L. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater. 5, 83 (2019).
Choudhury, A. The Role of Machine Learning Algorithms in Materials Science: A State of Art Review on Industry 4.0. Arch. Comput. Method Eng. 28, 3361–3381 (2021).
Himanen, L., Geurts, A., Foster, A. S. & Rinke, P. Data-Driven Materials Science: Status, Challenges, and Perspectives. Adv. Sci. 6, 1900808 (2019).
Allen, A. E. A. & Tkatchenko, A. Machine learning of material properties: Predictive and interpretable multilinear models. Sci. Adv. 8, eabm7185 (2022).
Tang, Y. C. et al. Accurate prediction of magnetocaloric effect in NiMn-based Heusler alloys by prioritizing phase transitions through explainable machine learning. Rare Metals 44, 639–651 (2024).
Hart, G. L. W., Mueller, T., Toher, C. & Curtarolo, S. Machine learning for alloys. Nat. Rev. Mater. 6, 730–755 (2021).
Jiang, L. et al. Synchronously enhancing the strength, toughness, and stress corrosion resistance of high-end aluminum alloys via interpretable machine learning. Acta Mater 270, 119873 (2024).
Wei, Q. et al. Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility. npj Comput. Mater. 9, 201 (2023).
Yang, B., Valentin, V. G. & Llorca, J. Application of machine learning to assess the influence of microstructure on twin nucleation in Mg alloys. npj Comput. Mater. 10, 26 (2024).
Mi, X. et al. Accelerated design of high-performance Mg-Mn-based magnesium alloys based on novel Bayesian optimization. J. Magnes. Alloy. 12, 750–766 (2024).
Wang, C., Fu, H., Jiang, L., Xue, D. & Xie, J. A property-oriented design strategy for high performance copper alloys via machine learning. npj Comput. Mater. 5, 87 (2019).
Zhang, H., Fu, H., Zhu, S., Yong, W. & Xie, J. Machine learning assisted composition effective design for precipitation strengthened copper alloys. Acta Mater 215, 117118 (2021).
Zhou, Z. et al. Machine learning guided appraisal and exploration of phase design for high entropy alloys. npj Comput. Mater. 5, 128 (2019).
Rao, Z. et al. Machine learning–enabled high-entropy alloy discovery. Science 378, 78–85 (2022).
Zhou, C. L. et al. Creep rupture life predictions for Ni-based single crystal superalloys with automated machine learning. Rare Metals 43, 2884–2890 (2024).
Liu, F. et al. High-Throughput Method–Accelerated Design of Ni-Based Superalloys. Adv. Funct. Mater. 32, 2109367 (2022).
Heiss, A., Thatikonda, V. S. & Klotz, U. E. Multi-objective optimization of LPBF manufacturing with Zn-4Al-1Cu alloy for technical applications. J. Manuf. Process. 134, 193–206 (2025).
Chen, W. et al. Machine learning design of 400 MPa grade biodegradable Zn-Mn based alloys with appropriate corrosion rates. Int. J. Miner. Metall. Mater. 31, 2727–2736 (2024).
Li, H. F. et al. Development of biodegradable Zn-1X binary alloys with nutrient alloying elements Mg. Ca and Sr. Sci. Rep. 5, 10719 (2015).
Li, H. et al. Design and characterizations of novel biodegradable ternary Zn-based alloys with IIA nutrient alloying elements Mg. Ca and Sr. Mater. Des. 83, 95–102 (2015).
Dambatta, M. S. et al. Influence of thermal treatment on microstructure, mechanical and degradation properties of Zn–3Mg alloy as potential biodegradable implant material. Mater. Des. 85, 431–437 (2015).
Liu, X. et al. Micro-alloying with Mn in Zn–Mg alloy for future biodegradable metals application. Mater. Des. 94, 95–104 (2016).
Zou, Y., Chen, X. & Chen, B. Effects of Ca concentration on degradation behavior of Zn-x Ca alloys in Hank’s solution. Mater. Lett. 218, 193–196 (2018).
Wang, K. et al. Binary Zn–Ti alloys for orthopedic applications: Corrosion and degradation behaviors, friction and wear performance, and cytotoxicity. J. Mater. Sci. Technol. 74, 216–229 (2021).
Lin, J. et al. A biodegradable Zn-1Cu-0.1Ti alloy with antibacterial properties for orthopedic applications. Acta Biomater 106, 410–427 (2020).
Vojtěch, D., Kubásek, J., Šerák, J. & Novák, P. Mechanical and corrosion properties of newly developed biodegradable Zn-based alloys for bone fixation. Acta Biomater 7, 3515–3522 (2011).
Zhang, L., Liu, X. Y., Huang, H. & Zhan, W. Effects of Ti on microstructure, mechanical properties and biodegradation behavior of Zn-Cu alloy. Mater. Lett. 244, 119–122 (2019).
Liu, X. et al. Microstructure, mechanical properties, in vitro degradation behavior and hemocompatibility of novel Zn–Mg–Sr alloys as biodegradable metals. Mater. Lett. 162, 242–245 (2016).
Liu, X. et al. Effects of alloying elements (Ca and Sr) on microstructure, mechanical property and in vitro corrosion behavior of biodegradable Zn–1.5Mg alloy. J. Alloy. Compd. 664, 444–452 (2016).
Zhang, Y., Yang, L., Zeng, X., Zheng, B. & Song, Z. The mechanism of anneal-hardening phenomenon in extruded Zn–Al alloys. Mater. Des. 50, 223–229 (2013).
Vida, T. A. et al. Near-eutectic Zn-Mg alloys: Interrelations of solidification thermal parameters, microstructure length scale and tensile/corrosion properties. Curr. Appl. Phys. 19, 582–598 (2019).
Xu, Z. et al. Revealing the abnormal softening mechanisms of Zn-xCu (x=2, 3) wrought alloys by gradually increasing ECAP numbers. Mater. Sci. Eng.: A 856, 143962 (2022).
Gong, J., Chu, S., Mehta, R. K. & McGaughey, A. J. H. XGBoost model for electrocaloric temperature change prediction in ceramics. npj Comput. Mater. 8, 140 (2022).
Schwenker, F. Ensemble Methods: Foundations and Algorithms [Book Review]. IEEE Comput. Intell. Mag. 8, 77–79 (2013).
Mehta, P. et al. A high-bias, low-variance introduction to Machine Learning for physicists. Phys. Rep. 810, 1–124 (2019).
Greenhill, S., Rana, S., Gupta, S., Vellanki, P. & Venkatesh, S. Bayesian optimization for adaptive experimental design: a review. IEEE Access 8, 13937–13948 (2020).
Chen, Z. S., Zhu, B., He, Y. L. & Yu, L. A. A PSO based virtual sample generation method for small sample sets: Applications to regression datasets. Eng. Appl. Artif. Intell. 59, 236–243 (2017).
Dong, Q. et al. Virtual sample generation empowers machine learning-based effluent prediction in constructed wetlands. J. Environ. Manage. 346, 118961 (2023).
Jiang, J., Qian, Y., Huang, H., Niu, J. & Yuan, G. Biodegradable Zn-Cu-Mn alloy with suitable mechanical performance and in vitro degradation behavior as a promising candidate for vascular stents. Biomater. Adv. 133, 112652 (2022).
Li, R., Ding, Y., Zhang, H., Wang, X. & Gao, Y. 300 MPa grade high-strength ductile biodegradable Zn-2Cu-xMg (x = 0.08, 0.15, 0.5, 1) alloys: The role of Mg in bimodal grain formation. J. Mater. Sci. Technol. 221, 168–186 (2025).
Tiwari, P., Dehury, K. R., Gupta, V., Kumar, R. & Joshi, Y. G. Influence of aluminium in wetting characteristics of Zn-Al alloy. Mater. Lett. 372, 137079 (2024).
Figueiredo, R. B., Kawasaki, M. & Langdon, T. G. Seventy years of Hall-Petch, ninety years of superplasticity and a generalized approach to the effect of grain size on flow stress. Prog. Mater. Sci. 137, 101131 (2023).
Unni, R., Zhou, M., Wiecha, P. R. & Zheng, Y. Advancing materials science through next-generation machine learning. Curr. Opin. Solid State Mat. Sci. 30, 101157 (2024).
Chai, Z. et al. Tailoring Stress–Strain Curves of Flexible Snapping Mechanical Metamaterial for On-Demand Mechanical Responses via Data-Driven Inverse Design. Adv. Mater. 36, 2404369 (2024).
Dijkstra, M. & Luijten, E. From predictive modelling to machine learning and reverse engineering of colloidal self-assembly. Nat. Mater. 20, 762–773 (2021).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 361, 360–365 (2018).
Schober, P., Boer, C. & Schwarte, L. A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth. Analog. 126, 1763–1768 (2018).
Nembrini, S., König, I. R. & Wright, M. N. The revival of the Gini importance? Bioinformatics 34, 3711–3718 (2018).
Gad, A. G. Particle Swarm optimization algorithm and its applications: a systematic review. Arch. Comput. Method Eng. 29, 2531–2561 (2022).
Mangalathu, S., Hwang, S.-H. & Jeon, J.-S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng. Struct. 219, 110927 (2020).
Acknowledgements
Authors would like to thank the support of Project funded by Genesis Alloys (Ningbo) Ltd (HK2023000500), National Natural Science Foundation of China (12302179,22108316), Natural Science Foundation of Zhejiang Province (LY24E010001), Major Science and Technology Projects in Ningbo (2024Z070, 2024Z158), 2025 Ningbo Yongjiang Talent Programme (2024A-120-G), Mechanics Interdisciplinary Fund for Outstanding Young Scholars of Ningbo University (ZX2025000397).
Author information
Authors and Affiliations
Contributions
L.R. and W.L. supervised the project. C.P. and J.Z. conceived the idea. K.C. and Y.C. collated the previous experimental data and established the dataset. C.P. and W.L. performed the ML studies and developed the framework. J.Z. carried out the experimental fabrication and measurements. L.R., W.L. and W.J. discussed the results. C.P. produced the manuscript with suggestions from L.R. and W.L. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Pan, C., Lin, W., Zhou, J. et al. Novel machine learning driven design strategy for high strength Zn Alloys optimization with multiple constraints. npj Comput Mater 11, 169 (2025). https://doi.org/10.1038/s41524-025-01666-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41524-025-01666-7







