Introduction

Zinc (Zn), as the fourth most widely used metal worldwide, offers abundant resources, affordability, low melting point, excellent fluidity, superior corrosion resistance, and high recyclability1,2,3. Nevertheless, its low mechanical strength, poor ductility, and brittleness at low temperatures have hindered its further development and application4,5. To overcome these limitations and expand the application field of Zn, alloying has been employed as a strategy to improve the strength, corrosion resistance, and wear resistance of pure Zn through the addition of elements such as aluminum (Al)6,7, copper (Cu)8, magnesium (Mg)9, silicon (Si)10, lithium (Li)11, manganese (Mn)12, and rare earths (REs)13. Nonetheless, the wide variety of elements involved in alloying and the broad range of compositional combinations render traditional trial-and-error methods time-consuming, labor-intensive, and inefficient. Therefore, there is an urgent need for a more intelligent and efficient approach to assist in the compositional design of Zn alloys.

In recent years, machine learning (ML) has demonstrated exceptional potential and application value as an important data processing technology in materials science14,15,16. With powerful datasets and algorithms, ML can not only construct predictive models for desired properties quickly and accurately, but also reveal the underlying relationships between material features17,18, thereby helping researchers explore mechanistic issues in depth. Furthermore, compared to the trial-and-error approach traditionally used in alloy design, machine learning offers significant advantages in terms of cost-effectiveness and efficiency. And as a complement to traditional simulation methods such as density functional theory and molecular dynamics, machine learning not only provides new approaches for research in materials science but also serves as a driving force for the innovation and development of these traditional simulation methods. The successful application of ML has facilitated the development of novel alloys across various systems19. For instance, Jiang et al. 20 used interpretable ML models to identify the intrinsic factors and dominant patterns of alloying elements, as well as their effects on the strength, toughness, and stress corrosion resistance of Al alloys, leading to the successful design of new high-performance alloys. Similarly, Wei et al. 21 proposed an ML-based design model for predicting new Lead (Pb)-free solder alloys with high strength and ductility, and its reliability was validated experimentally. In addition, ML has been applied in the development of Mg alloys22,23, Cu alloys24,25, high-entropy alloys26,27, and Nickel (Ni)-based alloys28,29. However, research on the application of machine learning (ML) in zinc alloys is still limited. Heiss et al. 30 applied unsupervised machine learning methods to guide the zinc laser powder bed fusion (LPBF) manufacturing process. By optimizing process parameters, they achieved lower porosity in the Zn-4Al-1Cu alloy formed via LPBF, thereby improving the quality of the final product. Moreover, based on their self-constructed biodegradable zinc alloy database, Chen et al. 31 successfully used machine learning algorithms to screen for novel biodegradable zinc alloys with high ultimate tensile strength and low immersion corrosion rate. Although they have already attempted to use machine learning methods to assist in zinc alloy design, these studies are primarily based on alloy performance after post-processing. In contrast, for the development of new alloys, there is a greater need to understand the as-cast properties of zinc alloys, as this can significantly reduce the time and economic costs associated with frequent post-processing during the early stages of experimentation. To advance the design and development of Zn alloys, it is crucial to fully utilize the potential of ML in this area.

In this study, a rapid and effective ML-based design strategy for Zn alloys is proposed to achieve performance-driven compositional design. Figure 1 illustrates a flowchart of this strategy, which comprises four parts: predictive model development, model interpretability analysis, alloy design system construction, and experimental validation. Initially, leveraging the constructed database comprising the composition, ultimate tensile strength (UTS), elongation at break (EL), and hardness of different Zn alloys, three predictive models for different properties are developed through feature selection, model selection, and optimization. Subsequently, to clearly reveal the relationships between alloy compositional features and their effects on target properties, Shapley additive explanations (SHAP) is employed, which is based on game theory. Moreover, an expanded dataset generated through particle swarm optimization (PSO) is utilized to conduct interpretability analysis on the trained models. This process identifies the alloy compositional features that significantly impact performance and the effects of changes in the content of each component. Furthermore, by leveraging the performance prediction capabilities of high-precision models and the robust search capabilities of the Bayesian optimization algorithm (BOA), an ML-based Zn alloy composition design system (ZACDS) is developed, capable of handling multiple performance optimization tasks. This system can successfully predict a novel high-performance Zn alloy that meets the specified requirements. Lastly, the reliability of the model is validated via tensile testing and hardness measurements.

Fig. 1: Schematic diagram of the proposed ML-based Zn alloy design strategy.
figure 1

After the establishment of the three performance prediction models, the subsequent application tasks are carried out: data augmentation of the original dataset based on the PSO algorithm, followed by exploration of potential relationships between model features using SHAP values; development of a novel high-performance Zn alloy using an ML-based Zn alloy composition design system (ZACDS) and experimental evaluation of its actual performance. ZACM is the name of the new zinc alloy.

Results

Alloy performance dataset and data distribution

In this work, three datasets were constructed based on laboratory test data, relevant literature32,33,34,35,36,37,38,39,40,41,42, and the Matweb database, each containing the same types of compositional features and different performance metrics: UTS, EL, and hardness. Missing values in the datasets were filled with zeros. Detailed data can be found in Tables S1S3 of the Supplementary Materials. Three predictive models were trained to predict the mechanical properties based on alloy composition under constant forming conditions. Consequently, the compositional features in the datasets were used as inputs, while the UTS, EL, and hardness properties were used as outputs.

Figure 2a–c present the distribution of data based on the amount of elements in the alloys for the three mechanical properties. The samples in the UTS, EL, and hardness datasets primarily consisted of ternary to six-component alloys. The alloy composition diversity contributes to a broader information space for the ML models, thereby enhancing their generalization ability and robustness and providing an effective basis for subsequent application work.

Fig. 2: Data distribution in the UTS, EL, and Hardness datasets.
figure 2

ac Distribution of alloy composition quantity. The x-axis represents the quantity of the main elements in the alloy, while the y-axis represents the proportion in the dataset; df Distribution of element types and contents. The x-axis represents the categories of all elements present in the dataset, while the y-axis represents their proportion in the dataset; gi Distribution of mechanical property ranges. Different colors represent statistical ranges for various properties, and the size of the sectors indicates their proportion in the dataset.

The distribution of element types and contents in the datasets is exhibited in Fig. 2d–f. The UTS, EL, and hardness datasets involved 10 alloying elements. In all three datasets, the element with the widest content range was Al, ranging from 0 to 43 wt.%, followed by Cu, with a range of 0–13 wt.%, and Mg, with a range of 0–3 wt.%. The content of the other elements ranged generally between 0 and 1.2 wt.%. This aligns with the extensive research on doping Zn alloys with Al, Cu, and Mg43,44,45. Furthermore, to enhance the reliability and accuracy of ML model applications, the composition search space was defined based on the element content ranges in the dataset, thereby reducing the risk of model performance degradation due to extrapolation.

The distribution of the three mechanical properties in the dataset is plotted in Fig. 2g–i. The UTS, EL, and hardness values ranged within 25–457 MPa, 0–20.1%, and 53–200 HV, respectively. The proportion of samples with a UTS between 0 and 300 MPa was 72.8%, with an EL between 0-5% was 76.5%, and with a hardness between 40 and 115 HV was 62.7%. It is evident that the majority of Zn alloy samples exhibited relatively low performance and failed to demonstrate well-balanced mechanical properties, indicating significant room for improvement and optimization. Introducing ML models to guide Zn alloy design based on data-driven insights is essential for accelerating the development of high-performance Zn alloys.

Feature selection

Prior to feature selection, this work first divided the dataset into a training set and a test set in an 8:2 ratio using random sampling. All feature analysis and selection steps were performed only on the training set, with the test set not involved in this process. This was intended to preserve the independence of the test set during the evaluation and analysis phase, thereby effectively reducing the risk of data leakage.

Here, the correlation coefficients between features and target properties were calculated and visualized as proportions (Figure S1). The Pearson correlation coefficient analysis indicated that Cu and Mg have a strong linear correlation with the UTS and hardness of Zn alloys, while the RE elements La and Ce exhibited a highly linear correlation with EL (Figure S1a). Similarly, the Spearman correlation coefficient indicated that Cu is strongly correlated with both UTS and hardness (Figure S1b). Nevertheless, most compositional features exhibited varying correlation strengths with the target properties across the two correlation coefficients, with some, e.g., Mg, even exhibiting an inverse correlation trend. This implies that relying solely on a single correlation coefficient may not comprehensively capture the true relationships between variables. Thus, using both correlation coefficients for feature selection would be both necessary and effective. To reduce the dimensionality of the input features, enhance model training efficiency, and mitigate the risk of overfitting, the Ca and B features, which had a correlation with the target of less than 0.25, were eliminated.

To conduct a more comprehensive feature selection, the input features were ranked using two common evaluation metrics for feature importance based on random forest (RF) (Fig. 3). The feature ranking based on the mean decrease in impurity (MDI) indicated that Al, Cu, and Mg have the most significant impact on UTS and Hardness, while La and Mn have a considerable effect on EL (Fig. 3a–c). Similarly, the ranking based on permutation importance (PI) revealed that Cu has the strongest effect on UTS, while La and Mg are the features that most significantly impact EL and Hardness, respectively (Fig. 3d–f). To reduce model complexity, the Ca, Ti, and B features, which consistently ranked lowest across six evaluations via the two RF-based metrics, were selected for further consideration. In conjunction with the correlation analysis results, it was decided to exclude the Ca and B features from the final model.

Fig. 3: RF-based feature importance ranking.
figure 3

MDI: (a) UTS; (b) EL; (c) Hardness. PI: (d) UTS; (e) EL; (f) Hardness. The MDI and PI results are obtained from the average values of 100 random runs. The boxplot exhibits the first quartile (Q1, bottom of the boxes), second quartile (Q2, also known as the median), and third quartile (Q3, top of the boxes). The interquartile range (IQR) is defined as (Q3 - Q1), and the whiskers represent the range (Q1 - 1.5 IQR, Q3 + 1.5 IQR). Values outside the whiskers are considered outliers and have been removed from the figure.

Model selection and hyperparameter optimization

In this work, a performance evaluation was conducted on four standard statistical models: three linear regression models (unregularized linear regression, regularized Lasso, and Ridge) and a response surface model. Multiple random splits of the dataset were used for training, and the prediction performance of the models was evaluated based on RMSE and the coefficient of determination (R²) for both the training and testing sets. The evaluation results are shown in Tables S4, S5, and S6. For predicting UTS and EL, the R² values of the linear regression models are all below 0.5, with some negative R² values in the test set. This indicates that the linear regression model is not suitable for predicting the mechanical properties of zinc alloys. Although the linear regression model achieved an R² of around 0.75 for predicting hardness, it still does not meet the requirements for practical application. As for the response surface model, it showed large negative values and high standard deviations across all test set R² values, indicating poor performance in predicting the alloy properties.

To improve the efficiency of this process, three ensemble algorithms were employed: Extreme Gradient Boosting (XGBoost)46, RF47, and Adaptive Boosting (AdaBoost)48. Each algorithm was used to build prediction models for the three mechanical properties. The models were evaluated by comparing the average R² over 100 random dataset splits. Moreover, the performance variations of the models under different training/testing set split ratios were also investigated.

Figure 4a–c present the prediction accuracy of the three algorithms for each mechanical property under different split ratios. The models constructed by the same algorithm exhibited similar R² trends across different target performance metrics. The R² values on the test sets of the three models improved with increasing training set proportion, reaching relatively high values when the training set ratio was 0.8. As regards the training set R², the models built using the RF and XGBoost algorithms exhibited a consistent upward trend, similar to the test set, while the model built with the AdaBoost algorithm presented a notable decline. This can be attributed to the strong dependence of the model on the dataset, where increasing the training data results in greater model complexity and a higher likelihood of overfitting. Based on the performance evaluation of models for different target metrics, the model with the highest average R² at a training set ratio of 0.8 was selected for each target. In particular, XGBoost was chosen for predicting UTS and Hardness, while RF was selected for predicting EL.

Fig. 4: Performance evaluation and optimization of predictive models.
figure 4

Mean and standard deviation of R² for the three ensemble algorithms under different training set ratios: (a) UTS; (b) EL; (c) Hardness. Predictive performance of the optimized model: (d) UTS; (e) EL; (f) Hardness. The red points represent the sample points from the training set, and the green points represent the sample points from the test set.

To tune the hyperparameters of the prediction models for different target performances, Bayesian optimization49 was applied. To avoid overfitting or underfitting while improving model accuracy, the hyperparameter search space was constrained. The specific hyperparameters and their respective ranges for XGBoost and RF are listed in Table 1.

Table 1 Hyperparameters and their corresponding ranges for the optimization of the predictive models

Figure 4 presents the optimal hyperparameter combinations and model accuracy after optimization. In these plots, red points represent the training set samples, while green points represent the test set samples. In addition, the dashed lines represent the ideal fit line with a slope of 1; the closer the predicted points to this line, the higher the accuracy of the model. The UTS prediction model achieved an R² of 0.93 on the training set and 0.92 on the test set, with a root mean square error (RMSE) of 22.29 MPa (Fig. 4d). The EL prediction model reached an R² of 0.97 on the training set and 0.95 on the test set, with an RMSE of 0.95% (Fig. 4e). The Hardness prediction model achieved an R² of 0.96 on the training set and 0.93 on the test set, with an RMSE of 6.91 HV (Fig. 4f). All three models demonstrated high predictive accuracy on both the training and test sets, with R² differences of less than 0.03, indicating strong generalization ability. No significant overfitting or underfitting was observed, providing a stable and reliable basis for the further application of these prediction models.

Model interpretability analysis

The interpretability analysis of ML-based alloy performance prediction models not only helps understand the impact of different alloy compositions on performance but also reveals the potential interactions and relationships between compositional features. This analysis is fundamentally based on the training and prediction sample points, where both the quality and quantity of these points significantly affect its effectiveness.

Inspired by the strategies employed by Chen et al. 50 and Dong et al. 51 to improve the accuracy of predictive models via virtual sample generation, this study utilized PSO to generate virtual samples before conducting interpretability analysis of the models. This approach expanded the original dataset, enriching the original feature space and providing a more robust data basis for the interpretability analysis. Figure S2a–c illustrate the distribution of the original dataset across features for UTS, EL, and Hardness, respectively. It is evident that the distribution of the original dataset in the feature space was uneven, limiting the extraction of relational information among model features. In contrast, Figure S2d–f depict the distribution of the augmented datasets across features, exhibiting a more uniform distribution with richer sample information. This enhanced distribution helps better elucidate the relationships among model features.

Furthermore, the SHAP values for each model were calculated based on the augmented dataset, facilitating a more intuitive analysis of how input features affect target performance. Figure 5a visualizes the relative importance of all compositional features for the predictive models of UTS, EL, and Hardness, along with the overall importance ranking for the combined mechanical properties. Features located further outward in the ring indicate greater importance for the overall mechanical properties of the Zn alloy, while the relative importance of a compositional feature reflects its effect on the corresponding target performance in the predictive model. By observing the relative importance of the features in each model, it can be deduced that Cu and Al were the dominant features affecting UTS performance. As regards EL, Ce and La had a significant impact, with Mg and Cu exhibiting nearly equal importance. In the Hardness prediction model, Mg and Cu were the most dominant features. From the overall importance ranking, it becomes evident that Cu, Mg, and Al were the three most critical elements affecting the mechanical properties of Zn alloys. Figure 5b–d visualize the SHAP values for all samples in the augmented datasets of the three mechanical properties. The horizontal axis represents the SHAP value, with its magnitude indicating the extent to which a feature affects target performance. Positive or negative SHAP values indicate the direction of the feature effect on target performance, with positive values suggesting a positive contribution to the performance prediction. The color of each sample point reflects the magnitude of a feature for that specific sample. In the UTS model, higher values of Al, Cu, and Si are associated with higher predicted target values, suggesting that these three elements contribute positively to enhancing the strength of Zn alloys. In the EL model, higher values of Ce, La, and Mn are associated with higher predicted elongation, suggesting that the addition of RE elements and Mn contributes positively to the toughness of Zn alloys. In the Hardness model, Mg, Cu, and Si are positively correlated with the predicted hardness, highlighting their significant role in improving the hardness of Zn alloys. Although Al, Ti, Ce, and La also contribute to hardness improvement, their effects are less pronounced. Furthermore, combining the SHAP value analyses from both the UTS and EL models reveals that a high Mg content affects negatively both target performances. However, a positive correlation with both target properties is observed with decreasing Mg content, suggesting that trace amounts of Mg are beneficial for developing high-strength, tough Zn alloys.

Fig. 5: SHAP analysis of the prediction model.
figure 5

a Ringed bar chart visualizing the relative and comprehensive importance of all features. The red, yellow, and blue colors correspond to the relative importance of each feature in the UTS, EL, and Hardness prediction models, respectively. SHAP values of all feature sample points in the prediction models: b UTS model; c EL model; d Hardness model.

Composition design system development and application

In alloy design, it is essential not only to enhance a single target property but also to comprehensively optimize multiple properties. In certain specific application scenarios, performance must meet requirements within a specified range. For instance, when Zn alloys are used as biodegradable materials in the medical field, the implant material must possess human bone-like mechanical properties along with a moderate degradation rate in the body. To enhance the application of ML in alloy design, this work introduces ZACDS, which is based on the previously developed predictive models for three Zn alloy properties and the BOA, addressing compositional design tasks under various target property constraints. The flowchart of ZACDS is illustrated in Figure S3. The core concept involves using the BOA to iteratively search within a defined space, minimizing the error between predicted and target performances to achieve optimal composition design. The specific method is described below.

First, it is crucial to establish the constraints. After defining the target performance metrics for UTS, EL, and Hardness, the ranges of all compositional features should be determined based on additional criteria, e.g., cost, thereby constructing the search space for the composition design scheme. Notably, to reduce the potential uncertainty and prediction errors in the model during extrapolation, the feature ranges should be aligned with the fluctuation intervals of the content of each element in the dataset, enhancing the reliability of the output results. The objective function for the BOA is defined as follows:

$${Score}={W}_{{UTS}}\left|{1-R}_{{UTS}}\right|+{W}_{{EL}}\left|{1-R}_{{EL}}\right|+{W}_{{HB}}\left|{1-R}_{{HB}}\right|$$
(1)

where \({Score}\) is the objective value in BOA, \({W}_{{UTS}}\), \({W}_{{EL}}\), and \({W}_{{HB}}\) are the importance coefficients for UTS, EL, and Hardness, respectively, with values of either 1 or 0, and \({R}_{{UTS}}\), \({R}_{{EL}}\), and \({R}_{{HB}}\) are the ratios of predicted to target values for UTS, EL, and Hardness, respectively.

Due to the significant differences in the magnitudes of the three performance metrics, simply summing their prediction errors makes it difficult to obtain valid results via error minimization. To address this issue, the ratio of predicted to target values was utilized. Furthermore, an importance coefficient was assigned to each performance metric, reflecting its significance. For instance, when a single target performance is specified for UTS, its coefficient is set to 1, while the coefficients for the other two performances are set to 0. This allows the composition design to focus on a single target. In scenarios requiring multi-objective design tasks with two or three specified performances, the corresponding coefficients are set to 1, while those for the other performances are set to 0. This approach effectively addresses all potential specific performance design tasks. Finally, iterative Bayesian optimization is used to minimize the objective value score, achieving minimal error in target performance. Once the maximum number of iterations is reached, the minimum error value is compared with the allowable deviation; if the condition is met, the design solution is provided; otherwise, “no solution” is output. In cases where a design solution is not obtained during runtime, the number of iterations is increased or the allowable deviation is relaxed. In addition, adjusting the search space can also improve solution efficiency.

Considering the alloy design cost and the interpretability analysis results, Al, Cu, and Mg were selected as the search elements. Referring to the distributions in the dataset, the search ranges were set as 3-8 wt.% for the Al content, 0-6 wt.% for the Cu content, and 0-0.05 wt.% for the Mg content.

The mechanical properties were derived from samples in the dataset that contain doped Al, Cu, and Mg elements, as well as the performance of commercial Zn alloys outlined in GB/T 1175-2018 (Fig. 6). The visualization of these sample points provides a design space with optimized overall mechanical properties. UTS was given as 300 MPa, with a permissible deviation of 20 MPa; EL was specified at 3.5%, with an allowable deviation of 0.5%; hardness was given as 130 HV, with a permissible deviation of 10 HV. After 50 iterations, the desired composition was successfully identified and designated as ZACM alloy, with predicted Al content of 3.96 wt%, Cu content of 5.83 wt%, and Mg content of 0.05 wt%. The predicted UTS, EL, and hardness performances were 298.52 MPa, 3.44%, and 128.64 HV, respectively, represented by the yellow star in Fig. 6. To verify the accuracy of the prediction, the ZACM alloy was prepared, and component testing was conducted using inductively coupled plasma optical emission spectroscopy (ICP-OES). The actual Al, Cu, and Mg contents were found to be 4.01 wt%, 5.92 wt%, and 0.02 wt%, respectively. The UTS, EL, and hardness values of the prepared ZACM alloy were 298 ± 14 MPa, 3.2 ± 0.2%, and 133 ± 2 HV, with the test results shown in Figure S4 and represented by the red star in Fig. 6. The errors between the experimental and predicted UTS and hardness values were both less than 3%, while that concerning the EL performance was slightly higher (7.5%). All errors were within the acceptable range, and the performance was within the initially specified range.

Fig. 6: ZACDS application and composition optimization.
figure 6

a Three-dimensional distribution of samples based on their UTS, EL, and Hardness properties. b Two-dimensional distribution of samples based on their UTS and Hardness properties. c Two-dimensional distribution of samples based on their Hardness and EL properties. The blue data points represent the training points in the dataset, the green data points correspond to commercial alloys, while the yellow and red stars represent the predicted and actual ZACM alloy samples, respectively. The overlapping areas of the color blocks are the specified performance ranges.

Microstructure analysis

To further evaluate the performance of the newly designed alloys, this work includes a comparison with the commercial zinc alloy ACuZinc5, known for its high strength and wide application. According to the ASTM 240-13 standard, the Al content of ACuZinc5 is 2.8–3.3 wt%, the Cu content is 5.2–6 wt%, and the Mg content is 0.035–0.05 wt%. The alloy was cast using the same melting process as the newly designed zinc alloy ZACM. The composition was analyzed using ICP-OES, and the results showed that the Al content was 3.3 wt%, the Cu content was 5.5 wt%, and the Mg content was 0.04 wt%, all of which conform to the standard. The tensile strength, elongation, and hardness values of ACuZinc5 were 253 MPa, 0.8%, and 128 HV, respectively. Compared to ACuZinc5, the newly designed ZACM zinc alloy exhibited a 17.8% improvement in tensile strength, along with increased elongation and hardness.

To further investigate the reasons for the performance enhancement of the new alloy, microstructural characterization was conducted. As shown in Figure S5, the optical microscopy results at different magnifications are presented. Both ZACM and ACuZinc5 exhibit the precipitation of a second phase. The precipitates in ZACM appear shorter and smaller, while those in ACuZinc5 are more elongated. In addition to the precipitate phase, some regions with finer microstructures (such as the small area indicated by the red box) can be observed in Figure S5d. A similar region also appears to be visible in Figure S5h, although it is very small in area.

To clarify the composition of the precipitate phase and better observe the regions with fine microstructures, further observations were conducted using a scanning electron microscope. As shown in Figure S6, the SEM and EDS characterization results of the two alloys are presented. Points 2 and 4 in the figure correspond to the precipitate phases observed under the optical microscope, and their composition analysis results are shown in Table S7. The Zn and Cu contents at these two points are highly consistent with the CuZn4 phase reported in the literature52,53. Furthermore, these studies have shown that CuZn4 exhibits high hardness, which contributes to the improvement of the alloy hardness properties. Additionally, from the overall elemental analysis and the characterization results at points 3 and 6, we observed that the regions with fine microstructures mainly consist of Zn and Al elements. This microstructure is highly consistent with the Zn-Al eutectic region reported by Priya et al. 54. Their work suggests that, with a higher zinc content in the alloy, each eutectic colony can be considered a grain containing a zinc-rich layer. Compared to ACuZinc5, the ZACM alloy exhibits a more distinct and abundant Zn-Al eutectic region. The presence of the Zn-Al eutectic region refines the Zn phase in the ZACM matrix and impedes the growth of CuZn4, making it shorter and smaller. This change leads to finer grains in ZACM, which, according to the Hall-Petch55 grain refinement strengthening effect, significantly enhances the strength of the material.

Discussion

This study introduces a alloy design strategy developed using classical ML algorithms. The strategy ensures model reliability while avoiding the limitations associated with model specialization56, making it more user-friendly and easier to understand. This is of great significance in advancing the widespread application of ML in traditional alloy design processes and achieving practical benefits. In addition, inverse design is a key approach for accelerating material development and discovering superior new materials57,58,59. To this end, this study also proposes a framework for reverse design of alloys under multiple constraints. It not only enables efficient alloy design, but also allows for precise adjustments based on specific requirements by optimizing the objective function. This further enhances the adaptability and efficiency of the design process. The data-augmented model interpretability method makes the impact of each feature on target performance more visually intuitive and clear. This approach also effectively narrows the complex compositional design space, significantly reducing computation time and costs. Furthermore, This is a successful attempt at the initial application of ML for predicting and optimizing the mechanical properties of Zn alloys, which paves the way for the development of high-strength Zn alloys. Meanwhile, established a comprehensive dataset of Zn alloys, incorporating composition and performance information from the literature, the MatWeb database, and laboratory data, which will provide robust support for future data-driven zinc alloy design. Meanwhile, a comprehensive dataset of Zn alloys was established, incorporating composition and performance information from the literature, the MatWeb database, and laboratory data, which will provide robust support for future data-driven zinc alloy design. Finally, the significant potential of machine learning in guiding the development of Zn alloys was further validated by testing the mechanical properties of the designed alloys.

Methods

Performance prediction models

Establishing a reliable predictive model is fundamental to any ML application, serving as the foundation upon which all different application pathways are built and developed. The construction of the predictive models for this study can be broadly divided into the following steps:

In the first step, three datasets were constructed for three Zn alloy properties: UTS, EL, and hardness. The data were obtained from laboratory experiments, relevant literature, and publicly available databases. To eliminate the influence of post-processing techniques, such as die-casting and heat treatment, on alloy performance, only data from cast Zn alloys were retained. Each dataset contains both alloy composition and the corresponding mechanical properties. The UTS dataset consists of 185 alloy samples, the EL dataset contains 173 alloy samples, and the hardness dataset includes 155 alloy samples. These datasets will be split into training and testing sets. The training set will primarily be used for constructing prediction models, while the testing set will be used to evaluate the accuracy of the models.

The second step was feature selection, which is critical for identifying input features that can accurately predict the desired properties. In this work, Pearson correlation, Spearman correlation, and RF-based feature importance ranking were used to comprehensively evaluate and select input features. Pearson and Spearman correlations provide insights into both linear and monotonic relationships between feature variables60. The RF-based feature importance ranking was assessed via two metrics: MDI and PI61. This balanced approach mitigates bias and limitations in feature evaluation, facilitating highly accurate identification of key features. Furthermore, to avoid potential data leakage risks, feature selection must be performed on the training set.

The third step involved model training and selection. To select an appropriate dataset partition ratio and algorithm for constructing the predictive model, this study performed 100 random splits of the dataset under seven different partition ratios (training set to test set sample ratios of 3:7, 4:6, 5:5, 6:4, 7:3, 8:2, and 9:1). Three commonly used ensemble algorithms for predictive modeling were chosen as candidates: XGBoost, RF, and AdaBoost. The three algorithms were trained under different partition ratios, and the prediction performance of the models was evaluated based on the RMSE and coefficient of determination R² for both the training and testing sets. The equations for the two evaluation metrics are provided in Eqs. (2) and (3):

$${RMSE}=\sqrt{\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{(\,{y}_{i}-{\hat{y}}_{i})}^{2}}$$
(2)
$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{(\,{y}_{i}-{\hat{y}}_{i})}^{2}}{{\sum }_{i=1}^{n}{(\,{y}_{i}-{\bar{y}}_{i})}^{2}}$$
(3)

where \(n\) is the total number of datapoints, \({y}_{i}\) is the actual value, \({\hat{y}}_{i}\) is the value predicted by the model, and \({\bar{y}}_{i}\) is the mean of the actual sample data.

In the fourth and final step, the hyperparameters of the selected model were optimized. Appropriate hyperparameters not only enhance the predictive performance of a model but also prevent overfitting and underfitting, thereby increasing model reliability. Given that ensemble algorithms involve numerous hyperparameters, BOA was chosen to conduct the search within this extensive high-dimensional space, improving the efficiency of the optimization process.

Virtual sample generation

This study employed a virtual sample generation method based on the PSO62 algorithm to augment the original dataset with sample points in conjunction with the Zn alloy performance prediction model. Initially, the asymmetric acceptance domain was calculated based on the sample points from the original dataset, restricting the range of generated feature values. The lower bound (LB) and upper bound (UB) of the range were calculated as follows:

$${LB}={AVG}-\frac{{N}_{L}+{N}_{U}+1}{{N}_{U}}\times ({AVG}-\min )$$
(4)
$${UB}={AVG}+\frac{{N}_{L}+{N}_{U}+1}{{N}_{L}}\times (\max -{AVG})$$
(5)

where \({AVG}\), \(\min\), and \(\max\) represent the mean, minimum, and maximum values of the corresponding features for all samples, respectively, \({N}_{L}\) is the number of sample points with values smaller than \({AVG}\), and \({N}_{U}\) is the number of sample points with values larger than \({AVG}\).

Subsequently, the feature values of virtual samples were iteratively generated using the PSO algorithm. Then, these feature values were input into the prediction model to obtain the corresponding target performance, ultimately forming a complete virtual sample. In addition, to ensure the quality of the virtual samples, the dataset underwent normalization and denormalization, and the final virtual sample set was subjected to outlier detection utilizing the Z-Score algorithm.

Tools for model interpretability

SHAP63 is a game theory-based interpretability method, widely applied in ML model interpretability research. In this study, SHAP analysis was introduced into the Zn alloy performance prediction models to quantify the contribution of each component to the prediction outcome, thereby revealing the effects of different elements in the alloy on mechanical properties. By calculating the marginal contribution of each feature under different input combinations and visualizing these values, SHAP can intuitively demonstrate the impact of various components on performance predictions. SHAP analysis not only enhances model interpretability but also helps identify key components and their underlying mechanisms.

Experiment

The novel alloy ZACM and the commercial alloy ACuZinc were cast under argon protective atmosphere using a vacuum induction furnace with an alumina crucible. The raw materials included high-purity Zn ingots (99.99%), pure Al (99.5%), pure Mg (99.95%), and CuZn25 master alloy. After placing the raw materials into the crucible, the metals were heated to 700 °C and, following a 2 min refinement and temperature adjustment, were cast into ingots. Their composition was analyzed via ICP-OES, and test specimens for mechanical property evaluation were prepared via wire cutting.

Alloy performance was assessed via strength and hardness testing. The tensile testing specimens were dog bone-shaped, with a gauge length of 20 mm and cross-sectional area of 5 mm × 2 mm. The UTS and EL were measured at room temperature using a KV-100KNW universal testing machine at a constant strain rate of 0.02 mm·s1. For hardness testing, block specimens measuring 10 mm × 10 mm × 3 mm were prepared. Their surface was ground using SiC sandpapers with grit sizes ranging from 400 to 2000, followed by polishing with diamond suspensions of 3.5, 1.5, and 0.5 microns. Hardness was measured using an HVS-1000 Vickers hardness tester. The sample was tested at five different locations, and the mean value was taken as the final hardness value.

Evaluation of the microstructure of the studied alloys was carried out using an optical microscopy and a scanning electron microscopy (SEM, SU-5000) with an energy dispersive spectrometer (EDS). All the samples for microscopic observation were polished, etched with an etchant composed of 4 ml nitric acid and 95 ml ethanol for 2–3 seconds.