Introduction

In modern engineering and manufacturing, Al-Si alloys are among the most widely used aluminum alloys due to their exceptional properties, including low density, high specific strength, excellent plasticity, superior electrical and thermal conductivity, and corrosion resistance1,2,3,4. These characteristics make Al-Si alloys highly suitable for applications in the automotive, aerospace, and defense industries5,6. As the demand for lightweight and high-performance materials continues to increase, developing and optimizing high-strength Al-Si alloys have become increasingly important7.

Designing high-performance Al-Si alloys remains challenging due to compositional and processing complexity. Alloying with various elements forms strengthening phases8,9,10,11, creating a large compositional space with over 1000 possible combinations, even when considering only common metallic and rare earth elements12,13. Additionally, Al-Si alloys can be processed through diverse manufacturing routes, such as heat treatment, hot extrusion, and rolling14. The combination of these compositional and processing variables results in a large design space, making it difficult to identify optimal solutions using traditional approaches, such as trial-and-error experiments combined with phase diagram calculations and density functional theory analysis15,16,17,18. These methods often lead to prolonged development cycles, substantial costs, and intensive labor requirements19,20,21.

Recent advances in machine learning (ML) have shown promise in accelerating materials design22,23,24,25,26,27 by directly linking alloy properties with composition and processing parameters, avoiding the cumulative errors that often arise from multiple-mechanism complexity28. ML approaches have been successfully applied to various alloy designs with high performance29,30,31,32,33,34,35, such as high entropy alloys24,33,34,35,36, Al-based alloys29,30,31,37, and Cu-based alloys32. These approaches primarily focused on the composition design of a specific process or the optimization of process parameters to achieve desired performance31,32,33,35. However, they are largely limited to alloy design within a single process, requiring the construction of distinct datasets for different processes, which resulted in significant labor and time costs. Apparently, the above ML approaches cannot satisfy the demand for designing Al-Si alloys, which need diverse processes to achieve specific strength characteristics for different service environments, especially when requiring expensive and complex processes to design high-strength alloys for special applications, such as aerospace38, where limited data hinders effective ML training. Although some conventional PRs, such as simple casting, can generate abundant data due to their accessibility, advanced PRs, such as specialized hot extrusion, yield limited experimental data due to their complexity and cost39. This data imbalance across different PRs prevents traditional ML models from establishing robust composition-processing-property relationships40.

Active learning has emerged as a promising solution to address small-sample challenges in alloy design41,42, employing a surrogate model to iteratively select the most informative, though unseen, data points to accelerate high-performance alloy design43. However, current active learning applications in alloy design primarily use simple surrogate models and Bayesian optimization44,45, which are better suited for low-dimensional data and often require multiple iterations to achieve substantial performance improvements46,47. Furthermore, these approaches typically treat different PRs independently and fail to leverage potential synergies between readily available and scarce experimental data.

To address these issues, in this work, we propose a process-synergistic active learning (PSAL) framework to enable the efficient design of high-strength Al-Si alloys through multi-process collaborative prediction. This novel framework uniquely consolidates data across different PRs, leveraging abundant data from accessible PRs to enhance prediction accuracy for complex processes with limited data. This approach integrates generative models to explore the most informative compositional space across various PRs, along with Bayesian-optimized surrogate models to select high-strength compositions. This framework significantly reduces the required number of iterations through iterative experimental validation and feedback loops while simultaneously accelerating the design of high-strength Al-Si alloys across multi-PRs.

Results

PSAL framework

To address the diverse processing requirements for optimizing Al-Si alloy performance and mitigating data imbalances associated with complex PRs, we introduce the PSAL framework. This framework enables the efficient design of high-strength alloys across multi-PRs, while minimizing iteration cycles and enhancing prediction accuracy, especially for processes with limited experimental data. The PSAL framework consists of five integrated components (Fig. 1): dataset construction (Fig. 1a), composition generation (Fig. 1b), surrogate model development (Fig. 1c), candidate selection (Fig. 1d), and experimental validation. The framework operates through an iterative workflow: dataset construction first provides a foundational dataset for downstream tasks. Next, composition generation produces initial candidate alloy formulations, followed by surrogate model development to build predictive models for alloy performance. Candidate selection then identifies the most promising formulations based on model predictions, and experimental validation tests these selected candidates, feeding results back into the model for subsequent optimization cycles.

Fig. 1: Overview of the process-synergistic active learning framework.
figure 1

The framework comprises following main components. a Database construction through integration of experimental with literature-derived composition-process-property data, encompassing seven principal alloy elements beyond Al and Si, with PRs encoded via one-hot encoding. b Generation of process-specific alloy compositions latent space utilizing a c-WAE architecture. c Implementation of an ensemble surrogate model for performance prediction based on PRs and alloy compositions. d Systematic selection of top-ranked alloy compositions for experimental validation, followed by iterative database enrichment for model refinement until high-strength alloys are found.

By integrating experimental and literature-derived data, a comprehensive database comprising 140 composition-process-property entries can be established. The dataset consists of seven principal alloying elements, namely, Mg, Cu, Ni, Zn, Fe, Mn, and Cr, in addition to Al and Si. These are studied across four distinct PRs: gravity casting (GC), GC with T6 heat treatment (GC + T6), GC with hot extrusion (GC + HE), and GC with combined hot extrusion and T6 treatment (GC + HE + T6). The detailed database construction methodology is presented in the Methods section.

The PSAL framework employs c-WAE architecture to generate a large number of potential Al-Si alloy compositions tailored to different processing requirements48. The encoder transforms high-dimensional compositional data into a low-dimensional latent space while preserving critical physical attributes, thus enabling efficient representation learning. PRs, encoded as conditional variables through one-hot encoding, allow the model to establish process-specific alloy compositional clusters within distinct regions of the latent space. By sampling from these conditional regions, the model generates compositions potentially suitable for various PRs. The decoder then reconstructs these low-dimensional latent representations into specific alloy compositions. Visualization of the latent space reveals the intrinsic relationships among alloy compositions, PRs, and strength metrics, thus accelerating the identification of high-strength combinations across different PRs.

The PSAL framework develops an ensemble surrogate model by integrating the strength predictions from neural networks (NN) and extreme gradient boosting decision trees (XGBDT), leveraging their complementary advantages49,50,51. Trained with alloy compositions and PRs as input features and strength as outputs, the model capitalizes on the NN to capture complex nonlinear relationships in high-dimensional spaces, while XGBDT enhances predictive accuracy through feature importance evaluation. For better prediction, the hyperparameters of both models are fine-tuned via Bayesian optimization and random search over iterations. The predicted strength of sampled compositions is then determined by averaging the outputs of both models with all hyperparameter configurations (Supplementary Fig. S1).

For optimizing candidate selection, a ranking criterion based on the exploration-exploitation strategy is implemented due to the probable bias and uncertainty between the predicted and experimental results. This strategy leverages the mean predicted strength value to facilitate exploration by identifying compositions with high strength. Meanwhile, the standard deviation is used to guide the selection towards candidates with low uncertainty. By integrating these two factors, the ranking criterion ensures a systematic balance between exploring new compositions and exploiting high-strength candidates (detailed ranking criterion is presented in the Methods)12. Thus, the candidate compositions with high-ranking scores are prioritized. To ensure compositional diversity and mitigate experimental bias, alloy compositions within each validation set maintain a minimum differential of 0.5% in mass percent (MP) for at least one constituent element, as the following expression showed.

$$\left|{{\rm{MP}}}_{j,i}-{{\rm{MP}}}_{k,i}\right|\ge 0.5 \% ,$$
(1)

where j and k mean the j-th and k-th compositions in the set, respectively. The i represents the i-th element in the compositions.

The top-ranked three candidates undergo experimental validation, with results iteratively incorporated into the database for further model refinement. Through these iterative optimization cycles, the PSAL framework demonstrates exceptional effectiveness in identifying Al-Si alloy compositions with enhanced tensile strength.

Data distribution imbalance

From the analysis of the collected dataset, a significant imbalance in data volumes across different PRs is revealed (Fig. 2a). The data volumes for GC, GC + T6, GC + HE, and GC + HE + T6 PRs are 38, 70, 13, and 19, respectively. This imbalance primarily comes from the varying inherent complexity and economic considerations associated with each PR.

Fig. 2: Dataset distribution analysis and multi-process synergistic effect on model performance.
figure 2

Key data analyses for dataset are shown in (ac). a Sample distribution across PRs (striped bars) and corresponding UTS values (scatter points). b Pearson correlation heatmap illustrating compositional influence on UTS for the GC PR. c The t-SNE visualization of dataset distribution for multi-PRs synergistic data. The synergistic effect is presented in (d) and (e). d Comparative analysis of the NN and XGBDT model performance across single-process and multi-PRs synergistic datasets. The 3D bar chart presents the MAE in UTS predictions for each model and PR. The zoomed-in regions highlight the MAE and fluctuation range for GC PR predictions under both single-process and multi-PRs synergistic conditions. e Parity plots of predicted vs. experimental UTS values, with R² values quantifying the prediction accuracy for individual single-process and multi-PRs scenarios.

The GC PR is primarily used for casting ingots, which serve as precursors for heat treatment and hot extrusion. The GC + T6 PR incorporates T6 heat treatment of the cast ingots, optimizing the alloy’s grain structure and reinforcing precipitates (e.g., Si particles), thereby enhancing ultimate tensile strength (UTS), hardness, and corrosion resistance9. The T6 heat treatment significantly improves the UTS while maintaining cost efficiency, making it suitable for a wide range of applications, including automotive and construction industries.

In contrast, the GC + HE PR is more complex, requiring high-temperature mechanical extrusion of the cast ingots to improve the alloy’s mechanical properties. Hot extrusion demands higher equipment investment, greater energy consumption, and longer processing times, resulting in increased costs39. The GC + HE + T6 PR, involving further heat treatment on the GC + HE-treated alloys, could continuously improve mechanical properties; however, it is more complex and costlier. Therefore, these processes have been typically used only when higher strength and specialized performance are required, such as in aerospace applications52. As a result, the data volumes for GC and GC + T6 PRs are generally larger than those for GC + HE and GC + HE + T6 PRs.

ML-based predictive models are traditionally constrained by the scarcity of experimental data. To overcome this limitation, we propose a novel approach that systematically leveraged system-level similarities and synergistic data effects to enhance model prediction accuracy, thereby enabling rapid iterative optimization of high-strength Al-Si alloys, particularly for GC + HE and GC + HE + T6 PRs.

Multi-process synergistic effect

The limited availability of process-specific experimental data can hinder the effective training and refinement of ML-based models. To address this, we propose an innovative approach that enhances model performance by systematic integration of data across multi-PRs. Crucially, the effectiveness of this approach fundamentally depends on two key factors: the existence of inherent correlations within the dataset and the potential for meaningful data synergies across different PRs.

Comprehensive correlation analysis reveals that the influence of alloy elements on mechanical properties varies significantly across different PRs (Fig. 2b and Supplementary Fig. S2a–c), indicating that single-process data alone inadequately captures the complete composition-performance relationships across various PRs and prevents direct generalization. Our multi-PRs synergistic approach addresses this limitation by enabling comprehensive mapping of composition-performance correlations across multi-PRs, thereby enhancing the predictive capability of the model.

We subsequently conduct t-distributed stochastic neighbor embedding analysis, which demonstrates that data points from different PRs exhibit overlapping distributions due to their shared alloy system and elemental composition (Fig. 2c). This intertwining indicates that the model successfully captures cross-process shared and similar characteristics53, facilitating the improved alloy generalization of the model. Notably, the reconstructed data points in the 2D latent space maintain similar distribution patterns with the original dataset (Supplementary Fig. S3), validating the preservation of key compositional features and enhancing both the stability, accuracy and reliability of model predictions

Quantitative evaluation of model performance through comparative analysis of single-PR versus multi-PRs synergistic approaches reveals significant improvements in both accuracy and stability. Performance metrics, including mean absolute error (MAE) and coefficient of determination (R²) (detailed definitions provided in the Methods section), demonstrate substantially reduced prediction errors and improved prediction stability for multi-PRs synergistic data (Fig. 2d)54,55. Specifically, for the synergistic GC PR, the XGBDT model achieves UTS prediction error of 9.3 MPa, with an error fluctuation of 6.9 MPa, both notably lower than those for single-process predictions (i.e., 17.7 and 13.2 MPa). The NN model exhibits a similar trend. The model selection and implementation details can be found in the Methods section.

Furthermore, the NN and XGBDT models both demonstrate superior performance on the multi-PRs synergistic test set, achieving R² values exceeding 0.9 (i.e., NN: 0.914, XGBDT: 0.912), significantly outperforming the single-process test sets (Fig. 2e). These findings strongly suggest that multi-PRs synergy effectively enhances both prediction accuracy and stability, which could be attributed to the increased data diversity and representativeness across multi-PRs, enabling a more robust understanding of the composition-performance relationships.

Process-aware compositional latent space

The c-WAE encoder effectively reduces dimensionality while enabling PRs to guide the organization of alloy data within distinct regions of the latent space (Fig. 3a). Four well-defined clusters can be distinguished, each corresponding to a specific PR with clear boundaries. Notably, alloys processed using GC + T6 PR demonstrate consistently superior UTS compared to other PRs, consistent with observations in Fig. 2a. Within each cluster, a systematic arrangement emerges from left to right based on Si content, with compositions featuring similar Si content naturally clustering together (Fig. 3a). This inherent organization facilitates targeted design of Al-Si alloys with varying Si content for specific PRs, such as hypoeutectic and hypereutectic Al-Si alloys56,57.

Fig. 3: Process-aware latent space analysis and sampling.
figure 3

The panels highlight how the latent space guides process-specific sampling and optimization. a Visualization of alloy compositions in the process-aware latent space, where distinct clusters represent different PRs, with data points color-coded according to UTS values. Symbol shapes indicate varying ranges of Si content. b Sampling workflow in the compositional latent space for a specified PR (exemplified by GC + T6). c Progressive optimization through three iterative cycles for GC + T6 PR, showing the latent space evolution and GMM probability density map refinement toward high-strength alloy compositions. d Accelerated optimization pathway for GC + HE PR, achieving high-strength alloy compositions within only a single iteration.

A particularly significant feature of the latent space organization is the central positioning of GC PR data points, serving as a foundational reference nucleus for overall distribution. Other PRs generate clusters aligned diagonally around the central GC cluster, reflecting a hierarchical relationship in processing complexity. Notably, alloys produced via the GC + T6 PR are categorized as cast aluminum alloys, while those prepared via GC + HE and GC + HE + T6 PRs are classified as deformed aluminum alloys. The observed correlation between cluster distance from the GC reference and processing complexity aligns with established metallurgical principles (Fig. 2a)58, where the greater the separation of the cluster from the central GC cluster, the more sophisticated the PRs, and the greater the enhancement in UTS.

This process-aware latent space organization demonstrates the proposed PSAL framework’s capability to capture and represent complex interactions among alloy compositions, PRs, and resultant UTS. The physically meaningful pattern of the latent space provided a robust foundation for sampling and designing high-strength alloy compositions through targeted exploration of specific regions associated with desired PRs and property combinations.

Latent space sampling and active learning iteration

By leveraging four distinct processing-specific clusters identified within the latent space (Fig. 3a), we aim to design high-strength Al-Si alloys using the GC + T6 and GC + HE PRs. The GC + T6 PR is selected for its superior mechanical properties, while the GC + HE PR is chosen to demonstrate the current model’s ability to effectively address the challenge of data scarcity.

The sampling workflow for the GC + T6 PR is illustrated in Fig. 3b. Initially, the Gaussian Mixture Model (GMM) is used to define the boundaries of the GC + T6 clusters (Supplementary Fig. S4). Then, the Markov Chain Monte Carlo (MCMC) method is employed to sample compositions within these clusters59,60. During the sampling process, a trained binary classifier is used to validate the features of each sampled point (Supplementary Fig. S5). The validated compositions, which are marked as yellow symbol “+” in the GMM probability density map (Fig. 3c), are then reconstructed into high-dimensional space by the decoder, revealing their elemental and processing details.

The sampled alloy compositions and their respective PRs are employed to predict the UTS using the well-trained surrogate model and ranked by the ranking criterion ((details provided in the Methods section). Using this approach, top-ranked Al-Si alloy composition candidates can be well-identified (Supplementary Table S1), with the top three (A1, A2, and A3) selected for experimental validation (Table 1). The positions of these alloys are highlighted in the GMM probability density map (lower row of Fig. 3c).

Table 1 Composition (wt%) and UTS (MPa) of sampled Al-Si alloys in iterations

Subsequently, alloy samples of these three compositions are cast into dog-bone-shaped specimens (Fig. 4b and Supplementary Fig. S6), and experimental tensile tests are performed to determine their actual UTS, as shown for the A1–A3 groups in Table 1. Notably, the experimental values do not fully align with the predicted range during the first iteration, underscoring the need to incorporate the new composition-performance data back into the original dataset. This feedback allows the PSAL framework to adjust model parameters and enhance prediction accuracy.

Fig. 4: Experimental validation and mechanical performance of high-strength Al-Si alloys.
figure 4

a combination of structural analysis and mechanical testing is presented to evaluate alloy performance. a XRD patterns, and b and c SEM-EDS analyses for alloys C2 (GC + T6 PR) and D2 (GC + HE PR). d Stress-strain curves of selected high-strength alloys from GC + T6 and GC + HE PRs. e Iterative prediction and validation accuracy assessment across three cycles for GC + T6 PR and a single iteration for GC + HE PR. f Comparison of UTS versus Si content between literature data, previous experimental results, and the high-strength Al-Si alloys developed through this work for GC + T6 and GC + HE PRs.

During the second iteration, the latent space distribution and probability density of the GC + T6 PR are updated, indicating that the model could self-adjust and optimize data feature extraction for improved identification of high-strength Al-Si alloys. By the third iteration, the PSAL framework successfully predicts Al-Si alloys with UTS surpassing the original dataset. A comparison of experimental and predicted values for the GC + T6 PR over three iterations (Table 1, A1 to C3) shows that while the experimental values for one group do not fall within the predicted range in the first and second iterations, all experimental values fall within the predicted range in the third iteration. This reflects the continuous improvement in the model’s prediction accuracy and search ability for high-strength Al-Si alloys with further iterations (Fig. 4e and Supplementary Fig. S7).

For the GC + HE PR, only a single iteration is required to identify high-strength Al-Si alloys, as indicated in Table 1 (D1 to D3) and Fig. 3d. Note that the number of validated compositions (yellow “+”) for the GC + HE PR is much lower than for the GC + T6 PR, due to the smaller size of the initial GC + HE dataset. Experimental validation shows that alloys D1 and D2 outperform the original dataset, and their positions are marked within the probability density map. The need for only a single iteration can be attributed to the generally lower and more consistent performance distribution of the GC + HE PR, which enables the model to effectively capture the relationships among composition, PR, and strength, ultimately predicting Al-Si alloys with superior strength compared with the original dataset.

High-strength Al-Si alloys

Iterative optimization demonstrates the effectiveness of the PSAL framework in designing high-strength Al-Si alloys across multi-PRs within just a few iterations. As shown in Fig. 4d, f, for the GC + T6 PR, the UTS of the optimized alloys C1, C2, and C3 reach 436.6, 459.8, and 454.6 MPa, respectively, exceeding both the highest UTS reported in the literature data (425 MPa) and our own previous results (405 MPa). Similarly, for the GC + HE PR, alloys D1 and D2 achieve UTS values of 208.5 and 220.5 MPa, respectively, outperforming the literature maximum of 198 MPa as well as our prior experiments (191.8 MPa). Notably, our PSAL framework provides valuable insights into the optimal Si content ranges for high-strength Al-Si alloys, suggesting 6–8 wt% for the GC + T6 PR and around 12 wt% for the GC + HE PR. These findings aligned with previously proven values widely employed in practical applications, such as engine cylinder block and piston61,62.

To gain deeper insights into the mechanisms responsible for the enhanced performance of alloys C2 and D2, comprehensive microstructural analyses are conducted, as shown in Fig. 4a–c. The X-ray diffractometer (XRD) results (Fig. 4a) indicate that, similar to conventional Al-Si alloys, the primary phases of both C2 and D2 predominantly consist of α-Al and eutectic Si phases. In alloy D2, the increased magnesium (Mg) content promotes the formation of amounts of Mg₂Si, which plays a critical role in improving its mechanical properties. In contrast, the relatively low concentrations of trace elements, such as iron (Fe), nickel (Ni), and manganese (Mn) in alloy C2 prevent the detection of corresponding phases in the XRD pattern. To further investigate the strengthening phases and their contribution to the alloy’s performance, scanning electron microscopy (SEM) combined with energy-dispersive X-ray spectroscopy (EDS) is employed. The SEM images and EDS mapping analyses of the tensile test specimens of the optimized Al-Si alloys are shown in Figs. 4b, c. In addition to eutectic Si, alloy C2 contains strengthening phases such as Q-Al5Cu2Mg8Si6, γ-Al7Cu4Ni, and α-Al(FeMn)Si. These phases are primarily located at the grain boundaries, strengthening them. The synergistic effects of these phases significantly contribute to the observed enhancement in mechanical properties, aligning with the predictions made by the PSAL framework. The SEM and EDS analysis of D2 (Fig. 4c) further corroborates the XRD findings, revealing that the microstructure primarily consists of eutectic Si and Mg₂Si phases, which are uniformly distributed within the aluminum matrix, thereby enhancing the mechanical properties of D2. Additional characterizations for other alloys (C1, C3, D1) can be found in Supplementary Fig. S8–S10. Spectral analysis of all the Al-Si alloy samples (C1–C3, D1–D2) confirms that the compositional deviations between the experimental compositions and predictions are within 0.5%, highlighting the accuracy of the framework’s predictions and the strong correlation between the designed and realized alloy compositions (Supplementary Table S2).

Discussions

The proposed PSAL framework demonstrates several key advantages in high-strength Al-Si alloy design. Primarily, the integration of multi-PRs into a unified model provides an effective solution for alloy design across diverse PRs. Instead of handling each process independently, and requiring separate models and datasets, our approach consolidates information from various PRs into a shared latent space. This integration not only provides a comprehensive understanding of how different PRs affect alloy performance, but also enhances predictive accuracy by enabling cross-pollination of insights across these PRs. For example, our PSAL framework successfully reconciles disparate mechanical properties resulting from casting versus hot extrusion and identifies high-strength compositions suitable for both processes. Notably, the universality of this approach can extend to other system-level attributes that share correlations or transferable information, such as testing temperature and stress levels of materials.

Subsequently, the PSAL framework effectively addresses the challenge of data scarcity, which often constrains alloy design studies, especially for specialized PRs. By leveraging GMM and MCMC sampling methods, the framework efficiently uses the available scarce data to generate new potential composition candidates. This capability may be particularly valuable for alloy materials with complex processing requirements, where data acquisition is often limited by experimental difficulty and cost, such as high-temperature Al-Si alloys used in aerospace applications52. Our framework mitigates these challenges by effectively modeling and generating high-performance candidates from limited initial data.

Finally, the PSAL framework is found to be particularly advantageous for designing alloys that require complex and expensive experimental validation. Many alloy design scenarios involve resource-intensive or time-consuming processes, such as specialized heat treatments or fatigue endurance testing. To address these challenges, our PSAL framework intentionally incorporates data from simpler and less costly experiments within the same alloy system, thereby augmenting the overall dataset and optimizing the target system. For example, when the hot extrusion process of heat-resistant Al-Cu alloy is too expensive for comprehensive exploration63, data from room temperature heat treatment of the same alloy system can expand the database. This strategy not only conserves resources but also accelerates the discovery and design of high-performance materials.

Despite these advantages, the PSAL framework has several areas that warrant further improvement. One notable limitation stems from the limited availability of yield strength and elongation data in the reference dataset. During the data collection process for specific PRs, UTS data are present in nearly 100% of the collected alloy compositions, whereas yield strength and elongation data are frequently incomplete or missing. Especially in data-scarce and complex PRs, the absence of yield strength or elongation data can reach as high as 52.6% (Supplementary Table S13). Given that UTS is the most extensively studied mechanical property in aluminum alloy research, the current framework focuses on optimizing this single performance metric. Future developments should incorporate multiple performance metrics concurrently to facilitate the development of alloys with an optimal balance of properties. This would ultimately produce alloys with more comprehensive and industry-relevant performance profiles. However, these performances may also depend on other conditions, such as temperature, time, or stress levels, which are not always strictly related to PRs. Future improvements should consider a broader set of conditions in the optimization process, which would provide a more holistic view of alloy performance across different scenarios.

Another potential limitation is the reliance on surrogate models, such as NN and XGBDT, which may not always fully capture the complexity of the relationships between composition, PRs, and performance. Additionally, the optimization of process parameters has not been handled with sufficient granularity, and a more detailed treatment of processes may be required to improve the overall design accuracy. This lack of precision in process optimization could eventually necessitate the use of more sophisticated models. To address these challenges, incorporating more advanced ML techniques, such as graph neural networks or Transformer-based architectures64,65, could enhance the prediction accuracy by better capturing the intricate interactions within the material system. Furthermore, the integration of additional data types, such as image-based information from field emission scanning electron microscopy (FESEM) with an EDS, could provide valuable insights into the microstructural behavior of the alloys. These data types would require more complex models to effectively interpret the information and improve the model’s predictive capabilities.

Unveiling the intrinsic relationships among composition, process, microstructure, and performance has long been a central challenge in materials design66. This complexity arises from the inheritable characteristics of microstructure and performance, their dynamic evolution over time, and the nonlinear interactions between different processing steps. However, the proposed PSAL framework, with further optimization and extension, holds the potential to effectively address this issue, offering novel insights and methodologies for materials design and performance regulation.

In summary, the proposed PSAL framework offers an efficient and data-driven approach to the design of high-strength Al-Si alloys, particularly for PRs with scarce experimental data. By integrating diverse PRs into a unified model, this framework significantly enhances predictive accuracy and stability, especially for processes with limited datasets, enabling the identification of potential high-strength compositions. Experimental validations of the designed alloys, which achieve UTS surpassing existing benchmarks, demonstrate the effectiveness of this ML-driven, multi-PRs synergy approach in accurately predicting and optimizing alloy performance, even when data is limited.

Methods

Data creation and collection

The Al-Si alloys exhibited a wide range of compositions due to their various grades. To create a unified and high-quality dataset, alloy compositions and properties were compiled from our experimental data, supplemented by relevant data from existing literature with matching compositions, yielding an initial set of 170 records. To ensure dataset quality, the remaining data were stratified by processing route and randomly split into an 80% training set and a 20% test set. Within the training set, we performed five-fold cross-validation and calculated the regression residual for each sample; those that consistently exhibited large deviations across folds were cross-checked against their source publications, and if their reported processing conditions or performance values were physically implausible, they were excluded. This effort resulted in a dataset containing 140 composition-property pairs (Supplementary Table S13), with the specific process parameter ranges detailed in Supplementary Table S3. We considered four common PRs for Al-Si alloys, GC, GC with T6 heat treatment (GC + T6), GC with hot extrusion (GC + HE), and GC with hot extrusion followed by T6 heat treatment (GC + HE + T6). Due to the different parameters of each process, the high-dimensional parameter space necessitated a larger dataset, which proved to be impractical for small-sample and complex processes.

To address this, we introduced process codes (1, 2, 3, 4) to represent the processing flow and its internal parameters. In situations where literature data lacked specific processing parameters, we analyzed our accumulated experimental data to evaluate the impact of alloy composition, key processing parameters, and process codes on performance using Shapley Addictive exPlanation (SHAP) plots67. Our findings revealed that the SHAP values for the process codes closely aligned with the combined SHAP values of multiple parameters (Supplementary Fig. S11), suggesting that the process codes could effectively replace these parameters. To remove implicit relationships between different processes, one-hot encoding was employed to convert process codes into numerical data suitable for computational analysis68, represented as GC {0 0 0 1}, GC + T6 {0 0 1 0}, GC + HE {0 1 0 0}, and GC + HE + T6 {1 0 0 0} (Fig. 1a).

Implementation of surrogate model

In alloy composition-performance prediction, a wide variety of ML algorithms were employed. In this study, we compared four commonly used models, NN, XGBDT, random forest, and support vector machines. The dataset was randomly split into training FV² were used to evaluate the predictive accuracy of the models. The specific expressions for MAE and R² were provided below.

$${\rm{MAE}}=\frac{1}{n}\mathop{\sum}\limits_{i=1}^{n}\frac{\left|{y}_{i}-{\hat{y}}_{i}\right|}{{y}_{i}}$$
(2)
$${{\rm{R}}}^{2}=1-\frac{{\sum }_{i=1}^{n}{({y}_{i}-{\hat{y}}_{i})}^{2}}{{\sum }_{i=1}^{n}{({y}_{i}-{\bar{y}}_{i})}^{2}}$$
(3)

where yi denoted the actual UTS value, ŷi was the model-predicted value, ȳ represented the mean of the actual values, and n was the sample size. MAE reflected the average absolute deviation between the predicted and actual values without considering the FV lower MAE value indicated smaller deviations between predicted and actual values, implying higher predictive accuracy of the model. R² measured the degree of fit between predicted and actual values, with a typical range between 0 and 1. An R² value of 1 represented a perfect fit, while R² = 0 indicated that the model does not explain the variance in the data.

The comparison results of the four models are shown in Supplementary Fig. S12. The NN and XGBDT models demonstrated low errors and high R² values on this dataset, indicating excellent training capability and generalization ability. Therefore, we used the same dataset to train these two models, and used the trained models to predict the performance respectively. Then, the predicted performance was combined, and the average value was the final predicted value of our surrogate model.

We applied a Bayesian optimization algorithm to fine-tune the hyperparameters of the two models, thereby improving their predictive performance. The detailed process was shown in Supplementary Fig. S1. First, we trained the NN and XGBDT models using the original dataset. It was important to note that the one-hot encoding step ensured the model allocated equal weight to each processing route. During training, Bayesian optimization was applied to explore the hyperparameters of the NN model (e.g., batch_size, learning_rate, num_hidden_layers) and the XGBDT model (e.g., colsample_bytree, learning_rate, max_depth) through 30 rounds of random parameter selection and 150 iterations of model training. A total of 180 sets of hyperparameter combinations and their corresponding minimized loss values were optimized. The models were then trained separately on the dataset and test set using these 180 hyperparameter configurations.

According to the R2 value of the test set and the negative test loss in Bayesian optimization of the corresponding hyperparameters, the optimal 6 groups of hyperparameters were selected from high to low. The model hyperparameters in iterations of the PSAL framework were listed in attached Supplementary Tables S4–S9, and the corresponding model training results were also shown in Supplement Figs. S13–S18. Obviously, in each iteration, the six groups of hyperparameters optimized by Bayesian methods made the models achieve better prediction effects. Then, the trained six groups of NN models and six groups of XGBDT models were used to predict the latent space sample alloy composition, and their mean (pre_mean) was used as the predicted performance value, while their standard deviation (pre_std) was used as the floating range of the performance value.

Sampling in process-aware latent space

The PSAL framework first utilized c-WAE to reduce high-dimensional Al-Si alloy data from the dataset into a two-dimensional latent space. The architecture comprised an encoder and a decoder, both constructed from multiple fully connected layers. The encoder took a combined representation of alloy compositions and PRs, managed them through successive linear layers and ReLU activation functions69, and output a two-dimensional representation of the latent space. The decoder then reconstructed the sampled alloy compositions from the joint representation of the latent space and PRs’ information.

The network was initialized using a specific weight initialization strategy to optimize the training process. The c-WAE model was trained with the Adam optimizer, a learning rate of 0.0005 and a weight decay of 0.000170, aiming to minimize a composite loss function that included reconstruction error (Recon_loss) and maximum mean discrepancy (MMD_loss)71. The model underwent training over 350 epochs, with a batch_size set to 50. Additionally, the MMD loss ensured that, while preserving the data structure, the distribution of the latent representations approximated the prior distribution. The progression of MMD_loss and Recon_loss throughout the training, depicted in Supplementary Fig. S19, demonstrated a steady decrease in error rate as the epochs increased, indicating effective training of the c-WAE model. Finally, a scatter plot visualized the data points in the latent space, with different colors representing different PRs, vividly illustrating how the model discriminated between data under various conditions in the latent space.

To further explore the latent space and generate new data points, we implemented an MCMC sampling strategy. Initially, sampling points were initialized from a trained GMM, and these points were iteratively updated using an acceptance-rejection algorithm. In each iteration, the acceptance probability of new sampling points was evaluated based on the probability score obtained from a binary classifier (similarity of the point’s features to the original data’s features) and the probability density function of the GMM.

Specifically, in MCMC sampling, the acceptance probability of a new sample point znext was determined by comparing its relative probability to the current sample point z. Let P(z) denote the probability of the current sample point, and P(znext) denote the probability of the newly generated sample point. These probabilities were derived by combining the probability density value of the GMM with the binary classifier’s score. If log P(znext) exceeded log P(z), the new point was accepted, and it would be marked as yellow symbol “+” in GMM probability density map (Fig. 3c, d); otherwise, the new point’s acceptance probability was exp (log P(znext) − log P(z)), i.e., P(znext)/ P(z).

This process enabled efficient extraction of samples from the complex latent distribution, which were subsequently used for data reconstruction. Using the decoder part of the c-WAE model, the MCMC-sampled latent points were transformed back into the original data space. This step was performed by the conditional decoder, which took the latent variables and corresponding conditions and output the reconstructed data.

Ranking criteria

In the optimization of high-strength Al-Si alloy compositions, uncertainty played a pivotal role as it served as an indicator of how much a new composition differed from the existing dataset. By capturing this ‘gap,’’ uncertainty helped identify areas that remained unexplored, making it a valuable tool in guiding the discovery process. The process was supported by a strategic framework that balanced two essential components72: exploration and exploitation. Exploration focused on identifying novel compositions by prioritizing those with higher uncertainty, thus encouraging the discovery of unconventional solutions. In contrast, exploitation honed in on compositions that demonstrated high predicted strength, enabling incremental improvements in the design process.

Even with well-trained prediction models, experimental results exhibited significant deviations from the predicted values due to the nonlinear relationship between composition and strength or the limitations of small datasets. To overcome these challenges, this study adopted the ranking criterion, which prioritized candidates by their relative position rather than their absolute values, balancing the contributions of predictions and uncertainty through a weighted approach. The ranking score (Rs) for each candidate was computed as a weighted combination of prediction and uncertainty rankings. This method enhanced the robustness and reliability of the model, ensuring more accurate alloy composition optimization. The details were as follows:

$${R}_{s}=p* {\rm{pre\_mean}}+q* {\rm{pre\_std}},$$
(4)

where p and q were represented importance assigned to prediction accuracy and uncertainty, which satisfied the relationship: p + q = 1. The pre_mean was the mean prediction of the surrogate model under several different hyperparameters, which represented the prediction accuracy, and the pre_std was the standard deviation of the surrogate model’s predictions, which represented the uncertainty. To further optimize the process, this study assigned higher weight to predicted performance (p = 0.85) compared to uncertainty (q = 0.15), reflecting a preference for refining known high-strength compositions while still allowing room for exploratory innovation. This ranking-based approach enabled a balanced exploration and exploitation, ensuring that the optimized Al-Si alloy compositions achieved both innovative breakthroughs and performance stability. The ranking criterion helped reduce the impact of model inaccuracies on candidate selection, making the high-strength Al-Si alloy optimization became not only more efficient but also more reliable. Notably, in each iteration of the active learning scheme in this study, candidate compositions were selected based on a descending rank of predicted performance (Supplementary Tables S1, S10–S12).

Material preparation and processing

Industrial-grade aluminum was melted in a graphite-clay crucible at 750 °C in an electric resistance furnace. Once fully melted, pure copper wrapped in aluminum foil and sponge zirconium were added, and the melt was held for at least 30 min. The molten alloy was poured into a metal mold preheated to 300 °C and into a cylindrical graphite mold (Φ90 mm). The metal mold was used for testing the mechanical properties of cast rods, while the graphite mold castings were used for hot extrusion. The heat treatment for cast rods involved a solution treatment at 480 °C for 6 h, increasing to 510 °C for an additional 4.5 h, followed by water quenching, and then aging at 175 °C for 6 h. The hot extrusion was performed at 510 °C with an extrusion ratio of 45:1.

Tensile testing

Room-temperature tensile tests were conducted on a universal testing machine (WDW-100D) with a crosshead speed of 2 mm/min. The room-temperature tensile specimen dimensions are shown in Supplementary Fig. S6. For each sample, tensile strength was averaged over 2–3 specimens.

Spectral composition analysis

Chemical composition significantly impacted the physical, chemical, and mechanical properties of the alloy. Each alloy composition was tested for its spectral composition before processing, performance testing, and microstructural analysis. Given that the spectral signal was sensitive to surface impurities, the sample testing surfaces were machined to ensure cleanliness and accuracy. An inductively coupled plasma optical emission spectrometer (ICP-OES, SpectroMax PE8000) was used to analyze the alloy composition after calibration with standard samples. Three to five surface points were randomly selected and averaged to obtain the actual composition. The carbon content in the alloys was determined using a carbon-sulfur analyzer (CS-2800) after grinding the alloy into fine chips and igniting it with tungsten flux in a ceramic crucible.

Microstructure observation

Samples were taken from the cast rods, mounted, ground, and polished. High-magnification morphology and point/area compositional analysis were conducted using a FESEM (FESEM, JSM-7800F) equipped with an energy-dispersive spectroscopy (EDS, OXFORD XMAX-80) detector at an accelerating voltage of 15 kV.

Software implementation

The proposed PSAL framework for high-strength Al-Si alloy design leveraged several well-established ML techniques and libraries. The GMM and cross-validation algorithms were implemented using the sklearn library. Bayesian optimization was performed using the bayes_opt library. The XGBDT model was implemented using the xgboost library. All algorithms and implementations in this work were conducted within the PyTorch v1.13.0 framework, running on an NVIDIA RTX 4090 GPU.