Abstract
Generating reduced-order, synthetic grain structure datasets that accurately represent the measured grain structure of a material is important for reducing the cost and increasing the accuracy of computational crystal plasticity efforts. This study introduces a machine-learning-based approach, termed texture adaptive clustering and sampling (TACS), for generating representative Euler angle datasets that accurately mimic the crystallographic texture. The TACS approach employs K-means clustering and density-based sampling in a closed-loop iteration to create representative Euler angle datasets. Proof-of-principle experiments were performed on rolled and recrystallized low-carbon steel. Validation of the TACS approach was extended to twenty-two datasets, varying lattice structures, and complex crystallographic textures, thereby encompassing a broad range of materials and crystal structures. Kolmogorov-Smirnov (K-S) test comparisons underscore the performance of the TACS approach over traditional electron backscatter diffraction EBSD dataset reduction techniques, with average K-S test scores nearing 0.9, indicating a high-fidelity representation of the original datasets. In contrast, conventional methods display scores below 0.3, indicating less reliability of the structure representation. The independence of the TACS approach from material texture and its capability to autonomously generate datasets with predetermined data points demonstrates its unbiased potential in streamlining dataset preparation for crystallographic analysis.
Similar content being viewed by others
Introduction
The microstructure of polycrystalline metallic materials is a very important factor in determining their physical properties1,2,3,4. Predominantly, the grain structure stands as the most observable microstructural feature in most polycrystalline metals, wielding a significant influence on the structural behavior of the metal1,5,6. Specifically, aspects such as grain shape (morphological texture) and lattice orientation (crystallographic texture) can lead to mechanical property directional anisotropy5,6,7,8. In addition, morphological and crystallographic texture can also influence the contribution of different deformation mechanisms, such as dislocation slip and deformation twinning, to the overall deformation behavior of a metallic material7,8. As such, the grain structure has the potential to be engineered for enhancing specific properties like strength and ductility7,9,10,11. Accurately defining the correlation between the grain structure of a metal and its properties is critical for developing useful computational methods for predicting and designing the mechanical response of materials.
Crystal plasticity simulations offer a powerful tool for studying the influence of grain morphology and lattice orientation on the mechanical response of crystalline materials12,13,14,15,16,17,18. Together with a theory representing the elastic-plastic response of the single crystals, crystal mechanics simulations are done by synthesizing a grain structure that mimics the microstructural characteristics of the material. Typically, the information for these grain structures is initially obtained through electron backscatter diffraction (EBSD) datasets, which provide detailed mappings of grain orientations. A great deal of effort is required to ensure that the characteristics of the synthesized grain structure accurately reflect the actual microstructure. Grain morphology is usually modeled using 3D software such as Dream3D, where the number of grains in the synthetic structure can vary from a few tens to thousands, influenced by the available computational resources and the choice of the crystal plasticity simulation approach13,14,15,16.
However, while typical EBSD datasets for grain structures may encompass hundreds to thousands of grains, the grains included in a representative volume element (RVE) might be reduced to manage computational expenses effectively. The extent of this reduction can vary, depending on the specific requirements and objectives of the simulation. Such reductions necessitate meticulous attention to ensure that the comprehensive grain information is accurately represented in the RVE13,15,16,17,19. Given the complexities involved in capturing such detailed microstructural information within a reduced-order dataset, maintaining the integrity of the crystallographic texture without overwhelming computational resources is a challenge in crystal plasticity simulations13,15,16,17,19.
Generating accurate reduced-order Euler angle datasets is relevant not only for crystal plasticity simulations but also for enhancing the effectiveness of other modeling techniques such as reduced-order modeling and surrogate modeling20,21,22,23,24,25,26. These methods aim to streamline complex physical models, facilitating faster simulations while maintaining high accuracy. For instance, in reduced-order modeling, key characteristics of a material’s behavior can be efficiently encapsulated with fewer degrees of freedom, proving essential for real-time simulations and iterative design processes21,22,23,24,26. Similarly, surrogate models, which act as proxies for more intricate simulations, can be optimized through the use of reduced-order datasets that capture microstructural characteristics critical to material properties20,25. As these modeling approaches are still evolving, particularly in their capacity to predict material deformation behavior, they stand to gain significantly from advanced methods that efficiently compress dataset sizes while accurately representing the microstructure of the material.
Machine learning (ML) based approaches have already been utilized to solve complex problems that are associated with EBSD datasets27,28,29,30. For instance, K. Kaufman et al. demonstrated the use of deep learning, specifically a few-shot transfer learning approach, for classifying electron backscatter diffraction patterns (EBSPs)27. By leveraging transfer learning, the authors significantly accelerated the model training process with limited data, demonstrating the effectiveness of the method and efficiency in classifying complex EBSPs. Britton et al. introduced an unsupervised ML approach segment EBSPs of γ matrix from γ precipitate in a Cu/Ni-based super alloy28. Backscatter patterns produced by an alloy solid solution matrix and its ordered superlattice exhibit only extremely subtle differences, due to the inelastic scattering that precedes coherent diffraction. K. Krishna et al. have successfully utilized conditional generative adversarial networks (c-GANs) for the de-noising of EBSPs, leading to improvements in indexing success rates and pattern accuracy 29. Z. Ding et al. used generative models to simulate EBSPs, showcasing the potential for more flexible analysis frameworks30. Hence, distinguishing the precipitate from the matrix presents challenges. The developed method successfully did that. These contributions reflect the evolving application of ML in addressing the challenges associated with traditional EBSD analysis, enabling more precise and efficient materials characterization.
In this research, we introduce a non-supervised ML approach, termed texture adaptive clustering and sampling (TACS), which incorporates K-means clustering and density-based sampling to generate reduced-order datasets for large EBSD datasets. In various research domains, K-means clustering, and density-based sampling have been used to address complex problems. For instance, these techniques have been used in cancer research to classify patient subgroups, aiding in personalized treatment strategies31,32,33,34. Similarly, density-based sampling has been utilized in agriculture to enhance crop yield predictions by analyzing field spatial variability35. Furthermore, these methods have been utilized in composite engineering for analyzing material properties and predicting composite behavior under stress conditions36,37. A. Leitherer et al. used K-means clustering for segmenting atomic signals from background noise in scanning transmission electron microscopy images, further showcasing the versatility of the method in material science research38. These examples illustrate the adaptability of K-means clustering and density-based sampling in extracting useful information from complex datasets across different fields.
The developed algorithm in this study consists of two main steps: firstly, identifying the optimal number of clusters using K-means clustering; secondly, conducting density-based sampling from these clusters with iterative refinement to generate a representative dataset. We initially demonstrate the technique using EBSD datasets from a quenched and tempered steel. Then the TACS approach is tested on a broad array of materials with distinct crystallographic textures and lattice structures demonstrating the robustness of the developed method.
Results and discussion
Proof-of-principle evaluation
The pole figure (PF) maps of the EBSD dataset of a rolled and recrystallized low carbon (0.23 wt.% C) steel that was used to develop the algorithm are depicted in Fig. 1a. The rolling direction (RD) and normal direction (ND) of the 12.7 mm thick plate are indicated for reference. The EBSD analysis identified only the iron-BCC phase, resulting from the quenching step in the manufacturing process. There is a possibility that EBSD did not detect retained austenite or carbides in the material or mis-indexed the patterns39. However, this is not related to the scope of this study. A step size of 0.250 µm was used to collect the data and the dataset consists of 165088 indexed points. A roughly 120 × 90 µm2 region was scanned. The dataset consists of 2068 grains, hence the dataset is large enough to represent the grain structure of the material. The pole figure (PF) maps reveal a weak (100) texture—twice the random distribution—in the rolling direction of the plates. The (110) and (111) directions also show weak textures at an angle of 45 degrees to the RD, possibly due to the (100) texture in the RD. The weak texture of the EBSD dataset, characterized by more randomly oriented grains with a broad variation in orientations, poses a significant challenge in generating a representative reduced-order synthetic Euler angle dataset. Accurately capturing the diverse orientation characteristics of such weakly textured materials typically requires a large number of grains, in contrast to strongly textured materials where fewer grains suffice.
Considering that crystal plasticity simulations typically use 100 to 500 grains to model polycrystalline materials, a dataset size of 350 grains was chosen for this case study. To evaluate the effectiveness of the TACS approach, datasets were also generated using both probability distribution function mapping (PDF) and kernel density estimation (KDE) approach by utilizing the MTEX toolbox in MATLAB. The PF maps of the generated representative datasets are presented in Fig. 1. A visual examination indicates that the PF maps derived from the PDF mapping method (Fig. 1b) fail to adequately capture the texture of the grain structure even with 512 data points. While the PDF method is capable of providing a general statistical overview of orientation data, it inherently lacks the granularity and spatial consideration required to accurately capture the complex information present in EBSD datasets. Moreover, the PDF approach involves discretizing continuous data into bins or intervals. When applied to the orientations in an EBSD dataset, this process can inadvertently introduce artificial periodicity into the generated PF map if the bin sizes or the distribution assumptions do not align perfectly with the actual data distribution which is the reason for the observed periodic pattern in the (100) PF map. The PF maps from the KDE approach (Fig. 1c) exhibit a significant improvement over those from the PDF mapping. However, these maps still do not align well with the PF maps of the raw dataset. In fact, the weak texture present in the grain structure is not captured by the generated dataset. This can be attributed to the bandwidth sensitivity of the KDE method, which is crucial in defining the smoothness of the estimated density function. If the bandwidth is too large, the KDE will oversmooth the data, leading to a loss of local texture details and failing to capture sharp features of the pole figure. Conversely, a narrow bandwidth can lead to overfitting, where noise in the data is mistaken for actual textural features. Therefore, both PDF and KDE methods are not effective to generate representative reduced-order synthetic Euler angle datasets for the EBSD dataset of a rolled and recrystallized low carbon (0.23 wt.% C) steel. While it may be possible to adjust the bandwidth parameters in KDE to obtain a more accurate representative dataset, these results indicate that neither method can reliably generate representative reduced-order Euler angle datasets for an arbitrary EBSD dataset without significant manual, and potentially biased, adjustment.
The PF maps produced using the TACS approach, as shown in Fig. 1d, demonstrate a substantial enhancement in replicating the PF maps of the raw dataset compared to the earlier two methods. The PF maps are almost identical to the PF maps of the raw datasets where the weak (100) texture in RD is accurately captured by the generated dataset. Moreover, the weak texture in (110) and (110) directions also are captured by the generated dataset. The texture in (111) is slightly higher than the texture in the raw dataset. However, overall, a visual comparison of the PF maps generated by the different techniques suggests that the TACS approach significantly outperforms the PDF and KDE mapping methods.
The statistical distribution of the generated datasets was quantitatively assessed by performing the Kolmogorov-Smirnov test (K-S test). The Kolmogorov-Smirnov (K-S) test is ideal for comparing the generated datasets because it is a non-parametric test that assesses whether the datasets are accurately drawn from the original distribution without assuming any particular distribution shape40,41,42. The K-S test was performed for each Euler angle, and the average was taken. The p-values for the PDF mapping, KDE approach, and the TACS approach are 0.001, 0.23, and 0.94, respectively. The lower p-values for the PDF mapping and KDE approach suggest that these methods may not effectively generate representative datasets that accurately reflect the raw EBSD datasets. Specifically, the PDF method, with a significantly lower p-value, indicates that it is less effective in generating representative Euler angle datasets. Consequently, the PDF method was excluded from further analysis in favor of assessing the performance of the TACS approach, with only the KDE approach considered for comparison. In contrast, the higher p-value for the TACS approach, being close to 1, suggests its suitability in generating representative Euler angle datasets that can accurately mimic the weak texture of this dataset.
Therefore, the TACS approach demonstrates its potential capability in accurately capturing weak textures within materials, a critical aspect often overlooked in traditional methods. In the case of the rolled and recrystallized low-carbon steel with its weak (100) texture, the method effectively replicates these subtle orientation characteristics. Unlike conventional approaches which tend to overlook or inadequately represent such weak textures, this method ensures that even the slightest variations in grain orientations are accurately reflected in the synthetic datasets. This precision is particularly crucial for materials where weak textures play a significant role in their overall properties and behavior.
Robustness assessment across diverse EBSD datasets
The robustness of the TACS approach was assessed using twenty varied EBSD datasets. Nine datasets were authored, while eleven datasets were sourced from literature, covering a range of crystallographic textures and lattice structures (cubic, hexagonal, tetragonal). These datasets included intricate PF map patterns and varied from 102 to 104 grains, offering a comprehensive statistical foundation for validation. The smallest datasets, numbered 18 and 20, contained 400 and 277 grains respectively. Due to the limited availability of larger EBSD datasets for these specific minerals, these were the largest datasets that could be sourced. Further details about the datasets are provided in the method section. Overall, this diversity in the datasets allowed testing of the adaptability of the method to different intrinsic material characteristics.
Similar to proof-of-principle evaluation on rolled and recrystallized low-carbon steel process, 350 data points were initially used to generate representative Euler angles for each dataset. For comparison, the pole figure (PF) maps of raw datasets 4, 9, 13, and 19, along with the PF maps of the corresponding representative datasets generated from the TACS approach and KDE approach, are presented in Fig. 2. These datasets are a subset sample to represent the twenty datasets used in this study. For example, the experimental datasets of 4 and 9 exhibit strong crystallographic textures, whereas the other two datasets do not. Moreover, the lattice structure of dataset 4 is face-centered cubic (FCC), while datasets 9 and 13 are hexagonal close-packed (HCP), and dataset 19 is body-centered cubic (BCC). The PF maps of the representative Euler angle datasets generated via the TACS approach are almost identical to the PF maps of the raw datasets 4 (Fig. 2a), 9 (Fig. 2b), and 13 (Fig. 2c). Specifically, the strong textures in datasets 4 and 9 are accurately mimicked by the generated datasets. However, there is a deviation in the texture of the (11\(\bar{2}\)0) PF map of dataset 19 (Fig. 2d), where the maximum texture orientation is overestimated by approximately 10% in the TACS approach.
a Dataset 4, LPBF SS316L, b Dataset 9, DED Ti-6Al-4V, c Dataset 13, low carbon steel (0.18%C)23, d Dataset 19, quartz24. The PF maps of the datasets generated by the TACS approach are almost identical to the PF maps of the raw datasets whereas the maps generated by the KDE approach failed to capture the texture of the raw dataset accurately.
Contrastingly, the PF maps of the representative datasets generated via the KDE approach show significant deviations in crystallographic texture. The strong textures in datasets 4 (Fig. 2a) and 9 (Fig. 2b) are significantly lower in the generated datasets. The textures in datasets 13 (Fig. 2c) and 19 (Fig. 2d) are also underestimated in the generated datasets. For instance, the (10\(\bar{1}\)0) texture in dataset 19 (Fig. 2d) is not reflected in the generated dataset. The K-S test scores for the four datasets generated from the TACS approach are 0.89, 0.95, 0.90, and 0.95, respectively. In contrast, the K-S test scores for the four datasets generated from the KDE approach are 0.01, 0.16, 0.01, and 0.02, respectively. This further indicates that the datasets generated by the TACS approach are a better statistical representation of the raw datasets. The PF maps of the other datasets are presented in Supplementary Information 1.
It is important to note that the datasets were generated by providing only the raw datasets and the required number of data points for the representative dataset. No parameters were modified in the algorithm to obtain these representative datasets. Therefore, the TACS approach is capable of autonomously generating accurate representative datasets without the need for human intervention or bias. This contrasts markedly with techniques such as the KDE approach, which often require manual adjustments of parameters to accurately capture the crystallographic textures of materials. The autonomous nature of TACS not only streamlines the dataset generation process but also eliminates potential biases and inconsistencies inherent in manual parameter tuning. This capability ensures that TACS can consistently produce high-quality datasets, reflecting the actual grain structures with minimal user input, thus significantly improving the practicality and applicability of the method in diverse material science research contexts. Moreover, this study demonstrated that TACS could generate representative datasets for EBSD datasets with different lattice structures, such as body-centered cubic, face-centered cubic, hexagonal, orthorhombic, and monoclinic. This demonstrates the versatility of the approach and its capability to handle complex lattice structures, suggesting promising potential for application to low-symmetry lattices such as triclinic systems.
The versatility of the TACS approach is further demonstrated by its ability to effectively handle datasets of varying dimensions and complexities, similar to how it processes \(M\times 3\) matrices for Euler angles, where M is the number of Euler angles in the EBSD dataset. The TACS algorithm is designed to recognize and maintain the relationships between coupled data columns, ensuring that each set of parameters, whether they represent Euler angles or other microstructural features such as grain size, aspect ratio, and the number of neighbors, is treated as a linked entity. This capability is relevant when feeding the algorithm with a \(P\times Q\) matrix, where P represents the number of grains and Q represents different parameters. By employing K-means clustering alongside density-based sampling, the TACS approach can effectively preserve the intrinsic coupling of these parameters, maintaining the integrity and coherence of the microstructural characteristics in the reduced-order datasets. This adaptability underscores the potential of TACS to serve as a foundational tool in the development of advanced material models that require the integration of complex and varied data inputs.
Furthermore, the developed TACS approach is versatile enough to meet diverse research needs. For instance, it can be employed to assign Euler angles to grains within the three-dimensional space of an RVE. This advanced application is particularly important for ensuring that the microstructure synthesized within the RVE accurately represents the textural characteristics across its entire volume, a necessary feature for crystal plasticity simulations. As a proof of concept, we have successfully applied this extended TACS method to Dataset 4, LPBF SS316L. A detailed demonstration of this methodology is presented in Supplementary Information 2. This example not only highlights the capability of the method to transition from 2D to 3D textural representations but also underscores the potential of TACS to facilitate complex modeling tasks. However, it is important to note that the current implementation best handles scenarios with uniform textural distributions. True 3D textural data, which accounts for variations across the thickness of a sample, would require additional experimental methods such as serial sectioning with FIB-EBSD or 3D X-ray imaging. The challenge of inferring 3D information from 2D maps cannot be completely resolved through computational approaches alone due to the inherent lack of depth information in 2D analyses. This limitation underscores the need for further experimental studies to fully capture the three-dimensional architecture of microstructures when such detail is necessary.
Performance analysis
The performance of the TACS approach was evaluated by varying the number of data points from 10 to 500 in the Euler angle datasets. This enables a quantitative evaluation of the effectiveness of the TACS approach in reducing dataset size while capturing the nuanced characteristics of the crystallographic texture. Eleven test cases were created by incrementing data points by 50, plus two additional conditions at 10 and 25 points. Given that datasets 18 and 20 originally consisted of only 400 and 277 grains, respectively, the TACS approach was used to generate additional data points for cases requiring more than the original dataset. The increase in grain count is achieved by synthesizing grains that are consistent with the observed crystallographic textures and densities within each cluster. This methodological step involves inferring additional grains based on the statistical properties and spatial distributions of the data points within each cluster, effectively increasing the granularity of the dataset. This could also be helpful when the EBSD dataset is not large enough to develop computational models. The K-S test was used as the metric to assess the statistical representation of the raw data, with results illustrated in Fig. 3. Datasets from KDE approach had K-S values under 0.35, indicating a poor statistical representation, even with datasets with 500 data points. Conversely, the TACS approach yielded K-S values above 0.5, with 80% exceeding 0.7 even for the datasets with only 50 data points, demonstrating a significant improvement.
The K-S test scores achieved using the TACS approach consistently exceed 0.5, demonstrating its ability to statistically represent raw datasets. In contrast, the KDE approach method fails to reach even 0.3, underlining its limitations in accurately characterizing the datasets even with increased data points.
For datasets with strong textures, like 4, 5, and 9, datasets with 10 and 25 data points resulted in K-S values below 0.5. However, for the raw datasets with weak textures, even datasets with 10 and 25 data points maintained K-S values above 0.5 due to inherent randomness. However, the datasets generated with 50 data points for all twenty datasets reported K-S values higher than 0.7. This indicates that the TACS approach can reduce the dataset size by one to two orders of magnitude without losing the statistical integrity of the raw dataset. The K-S scores did not consistently correlate with the number of data points, varying between 0.6 and 1, reflecting the random nature of sampling, the sensitivity of the clustering algorithm, and the complexity of EBSD data.
The variation in the PF maps with the number of data points was also studied. For comparison, PF maps of the generated datasets with 25 and 50 data points for the same datasets presented in Fig. 2 are depicted in Fig. 4. The PF maps of the generated datasets with 25 data points for datasets 4 and 9 resemble the crystallographic texture of the raw dataset. However, the strong texture of dataset 4 in the (110) direction is underestimated in the generated dataset, where the peak is six times the random distribution compared to 10 times in the raw dataset. Similarly, the strength of the crystallographic texture in dataset 9 is also underestimated by the generated dataset. The PF maps of generated datasets with 50 data points for datasets 4 and 9 not only mimic the crystallographic texture of the raw dataset but also match the strength of the texture in the raw dataset. Based on this comparison, it can be deduced that generated datasets with 50 data points represent the smallest dataset size that can accurately mimic the crystallographic texture of the raw dataset. The raw datasets 5 and 9 consist of 2588 and 8572 grains. Hence, the TACS approach reduced the dimension of the raw dataset size by 1 and 2 orders of magnitude without losing the textural information of the two raw datasets. Similar results were achieved for the other datasets that also have strong crystallographic textures that is higher than 2.5. In contrast, the PF maps of the generated datasets for datasets 13 and 19, which have weak crystallographic textures, do not resemble the weak textures present in the raw dataset. In fact, the generated datasets have been subjected to overfitting by the TACS approach. The higher randomness in these datasets could be attributed to this behavior. Therefore, datasets with a higher number of data points, such as 350, would be required to mimic the crystallographic texture of a material with a weak texture. This observation was consistent across other datasets with weak crystallographic textures below 2.5. Overall, TACS approach successfully generates representative datasets that are smaller by 1-2 orders of magnitude compared to the raw dataset, while accurately mimicking both the crystallographic texture and statistical characteristics of the raw datasets.
a Dataset 4, Laser powder bed fusion (LPBF) SS316L, b Dataset 9, Directed energy deposition (DED) Ti-6Al-4V, c Dataset 13, low carbon steel (0.18%C)23, d Dataset 19, quartz24. The PF maps of the datasets 4 and 9 resembles the strong texture of the raw dataset. Datasets 13 and 19 does not represent the weak texture of the raw dataset.
Another implementation of the performance analysis is that it can help to determine the optimal number of datapoints to represent a grain structure. While TACS can significantly reduce the size of datasets without losing textural information, the K-S score allows for a systematic approach to identify the minimum dataset size that maintains the integrity of the crystallographic texture and statistical characteristics of the raw datasets. In addition to the K-S test score, the orientation distribution function (ODF) also can be employed to determine the optimal number of data points.
To summarize, the TACS method developed in this study offers an improved approach for replicating weak texture in low-carbon steel. This method demonstrates enhanced performance over traditional techniques, particularly in accurately representing subtle crystallographic textures. Comprehensive validation of additional materials and dataset size variations confirmed the statistical accuracy of TACS. Significantly, the performance of the method, especially in terms of K-S scores, indicates its efficacy over existing methods. Furthermore, the TACS method distinguishes itself by its ability to autonomously generate accurate datasets without human intervention and bias, streamlining the data preparation process, and reducing the potential for manual error. By accurately mimicking the actual grain structures, the proposed approach enables the development of more RVEs. These enhanced RVEs lead to simulations that are not only more precise but should also require fewer computational resources. As a result, this approach promises substantial improvements in the predictive modeling of metal behaviors under various stress conditions, crucial for advancing material science and engineering.
Methods
The details of the dataset preparation and algorithm of the TACS approach, PDF mapping, and KDE approach using MTEX built-in functions are presented below. The EBSD raw datasets were pre-processed using the MTEX toolbox available for MATLAB and the clustering was performed in the python environment.
Preparation of the raw datasets
Datasets gathered by authors were collected by using two microscopes. Datasets 1-6 were collected using a Zeiss Gemini 300 Field Emission Scanning Electron Microscope (FESEM) equipped with an Oxford Instruments CNano EBSD detector. Datasets 8-9 were collected using a JEOL 7400 FESEM equipped with an Oxford Instruments Symmetry EBSD detector. Data collection was performed at 30 kV with various step sizes. For each dataset, the step size was determined by performing a low-resolution, fast EBSD scan on the selected location. Samples for the EBSD were prepared by mechanical polishing. The polishing consisted of multiple steps, beginning with a coarse polish using standard silicon carbide grit papers of grit sizes 240, 320, 400, 600, 800, and 1200 grades, followed by fine polishing using 3 µm and 1 µm polycrystalline diamond suspensions. Finally, vibratory polishing was performed using 50 nm colloidal silica to remove all the work hardening accumulated on the surface from the previous polishing steps. The details of the datasets that were gathered from the literature are provided in the relevant sources.
The process of preparing EBSD datasets for analysis involves a series of steps, each integral to ensuring the accuracy and usability of the data in subsequent analyses. Initially, the datasets, stored in the .ctf format, are processed using the MTEX toolbox in MATLAB. MTEX is adept at handling and interpreting the complex data contained in EBSD scans, making it an ideal choice for the initial processing phase. During this phase in MATLAB, the primary focus is on extracting Euler angles from the EBSD data. These angles are extracted from the .ctf files and are arranged into a matrix with dimensions Mx3, where M represents the number of measurements or data points in the dataset. Each row of this matrix contains three Euler angles, corresponding to the three-dimensional orientation of a crystal at a specific point in the material. For the materials with more than one phase, the primary phase with the highest percentage of indexing was selected for the analysis. Grains less than five pixels were ignored for the analysis. Once the Euler angles are successfully extracted and organized into this matrix, the data is then saved into a MATLAB file with a .mat extension.
The subsequent step involves transitioning the data into a format that is more amenable to Python-based analysis. To achieve this, the .mat files are converted into .npy files, a format native to NumPy, a fundamental package for scientific computing in Python. This conversion is performed using Python scripts, which read the .mat files, extract the Euler angles array, and save it as .npy files. The .npy format is particularly suited for storing large arrays efficiently and allows for easy loading of the data into Python for further processing. This conversion to .npy files is a critical step as it opens up the data to the extensive ecosystem of Python libraries and tools, particularly those tailored for data analysis and machine learning.
The datasets are summarized in Table 1, which provides metadata for each dataset, such as material name, fabrication method, number of grains, strength of the texture, primary phase(s), and lattice structure. The datasets feature a variety of fabrication methods, including quenching, tempering, and two additive manufacturing techniques: DED and LPBF. They cover a range of metals, including SS316L, Ti-6Al-4V, and low-carbon steel. Datasets 10-15 consist of a duplex microstructure with two primary phases: ferrite and bainite. In this study, both were considered as a single phase because EBSD cannot distinguish between the two. Both phases have a BCC lattice structure with similar lattice parameters, hence the EBSD detector cannot distinguish between them. In the original study43, the authors have adopted deep learning and kernel average misorientation (KAM) maps to deconvolute these two phases. Additionally, datasets 17-20 consist of minerals, allowing us to validate our method on non-metallic materials as well. These datasets also help to verify that our approach is independent of the underlying characteristics of the EBSD dataset. The strength of the crystallographic texture varied substantially among the datasets, with the highest reported for dataset 9 at 5.89. Datasets 1,2, and 10-16 showed the weakest texture, with the strength of the texture close to 1, suggesting a random grain structure. Moreover, each dataset showed unique patterns in their PF maps. The largest dataset, dataset 8, consisted of 14,973 grains. All datasets, except for 18 and 20, consisted of more than 1000 grains.
TACS approach
The algorithm, illustrated in Fig. 5, begins with an EBSD dataset and aims to produce a representative subset of data points. It first determines the optimal number of clusters using K-means clustering to capture the inherent variability of the data. The optimal cluster count is identified by minimizing the within-cluster sum of squares (WCSS) and by finding a point where additional clusters result in less than a 1% relative reduction in WCSS, indicating a point of diminishing returns in model accuracy improvement. This threshold was chosen based on the stability it provides to the clustering process, balancing detail against computational efficiency. Multiple iterations of K-means clustering are executed to reduce the effect of initial conditions, which can produce variable outcomes.
Following the optimal cluster determination, the algorithm performs density-based sampling within each cluster to calculate the density of data points, enabling the selection of a representative subset. This step ensures the preservation of the orientation distribution, as characterized by the ODF, in the reduced dataset.
The subsequent phase involves iterative refinement. During this phase, the ODF of the newly created smaller dataset is computed and compared with the ODF from the previous iteration. This ODF is evaluated by dividing the range of each Euler angle into bins (typically 10) and generating a histogram for each, reflecting the density distribution of the angles. The selection of representative data points from the clusters is adjusted iteratively, based on the ODF comparison. This process is repeated until the change in ODF between consecutive iterations falls below a relative threshold of 10%, which has been set to ensure the stability of the ODF while avoiding unnecessary computations.
Upon reaching this point of convergence, the algorithm concludes with a dataset that, despite being reduced in size, retains an accurate representation of the original dataset’s crystallographic texture. This balance ensures that the final dataset maintains the structural integrity required for subsequent analyses.
In addressing the complexity posed by multi-phase materials, the TACS approach has been tailored to provide users with flexible options that best suit their analytical needs. Recognizing the significance of phase proportions in the representative volume elements, the method allows for two distinct strategies: the selection of a single phase for focused analysis or the generation of a dataset that preserves phase fractions of the original dataset. This latter approach aligns with the hypothesis that the representative volume should reflect the same phase fractions as the actual material. As an illustrative example, in a material such as Ti-6Al-4V, which comprises both α and β phases, the TACS method can be employed to generate a dataset of 350 data points that mirrors the phase distribution of the original sample. This would result in the dataset containing 350 multiplied by the phase fraction of the α phase and 350 multiplied by the phase fraction of the β phase.
Probability distribution function mapping
The flowchart illustrated in Fig. 6 outlines the process for generating representative data points based on Euler angle probability distributions from an EBSD dataset, which is typically used for analyzing the crystallographic orientation of materials. The process begins by defining the dimension of the representative Euler angle space (N) and determining the resolution for the representative dataset. The Euler angles are then extracted from the EBSD dataset, normalized to valid ranges, and categorized into bins. A three-dimensional histogram of these binned Euler angles is constructed, which assigns weights to each bin, representing the frequency of the orientations. Next, Euler angle probability distributions (PDs) are calculated from the EBSD dataset, which are histograms showing the frequency of specific orientations (ϕ1, Φ, ϕ2) within the material. Once these distributions are calculated, weights are assigned within the N³ Euler space to mimic these PDs, effectively creating a weighted model that reflects the original orientation data.
Kernel density estimation using MTEX built-in functions
The flowchart presented in Fig. 7 illustrates approach for synthesizing a reduced-order representative dataset using MTEX built-in functions. Similar to the previous method, the process initiates with the specification of the number of representative data points (N), which defines the number of data points of the resultant Euler angle dataset. Leveraging the computational capabilities of MTEX, a MATLAB toolbox for texture analysis, the ODF is computed using the calcODF function. It can utilize different algorithms, including direct kernel density estimation (KDE), kernel density estimation via Fourier series, and Bingham estimation. It also allows the use of grain area as weights for orientations or a specific kernel function like SO3AbelPoissonKernel. The ODF can be computed as a Fourier series up to a specified order, with options to set the weights, halfwidth, resolution, and kernel function. This function is versatile for creating detailed and customized ODF representations from EBSD data. For this study, kernel density estimation, with a kernel size of five degrees was used to model the probability density of crystal orientations within a dataset.
Following the establishment of the ODF, the calcOrientations function is used to draw a random N number of orientations using the generated ODF. The function essentially performs a probabilistic sampling from the ODF, attempting to ensure that the reduced dataset mirrors the comprehensive orientation characteristics of the material. The final output is a collection of representative data points, each comprising three Euler angles (ϕ1, Φ, ϕ2).
Data availability
The authors declare that the data supporting the findings of this study are available within the article and its supplementary information files. Raw EBSD datasets except for Ti-6Al-4V were made available at https://github.com/janithwanni12/Texture-adaptive-clustering-and-sampling-TACS-algorithm.
Code availability
The texture adaptive clustering and sampling (TACS) algorithm, the algorithm to assign Euler angles to an RVE, the other supplementary algorithms to generate datasets were made available at https://github.com/janithwanni12/Texture-adaptive-clustering-and-sampling-TACS-algorithm.
References
Cho, K. K., Chung, Y. H., Lee, C. W., Kwun, S. I. & Shin, M. C. Effects of grain shape and texture on the yield strength anisotropy of Al-Li alloy sheet. Scr. Mater. 40, 651–657 (1999).
Liu, L. et al. Dislocation network in additive manufactured steel breaks strength–ductility trade-off. Mater. Today 21, 354–361 (2018).
Tsai, M.-H. & Yeh, J.-W. High-entropy alloys: a critical review. Mater. Res. Lett. 2, 107–123 (2014).
Bache, M. R. A review of dwell sensitive fatigue in titanium alloys: the role of microstructure, texture and operating conditions. Int. J. Fatigue 25, 1079–1087 (2003).
Jiang, M., Devincre, B. & Monnet, G. Effects of the grain size and shape on the flow stress: A dislocation dynamics study. Int. J. Plast. 113, 111–124 (2019).
Delannay, L. & Barnett, M. R. Modelling the combined effect of grain size and grain shape on plastic anisotropy of metals. Int. J. Plast. 32–33, 70–84 (2012).
Sun, Z., Tan, X., Tor, S. B. & Chua, C. K. Simultaneously enhanced strength and ductility for 3D-printed stainless steel 316L by selective laser melting. NPG Asia Mater. 10, 127–136 (2018).
Wang, X., Muñiz-Lerma, J. A., Sánchez-Mata, O., Attarian Shandiz, M. & Brochu, M. Microstructure and mechanical properties of stainless steel 316L vertical struts manufactured by laser powder bed fusion process. Mater. Sci. Eng. A 736, 27–40 (2018).
Todaro, C. J. et al. Grain structure control during metal 3D printing by high-intensity ultrasound. Nat. Commun. 11, 1–9 (2020).
Tan, L., Allen, T. R. & Busby, J. T. Grain boundary engineering for structure materials of nuclear reactors. J. Nucl. Mater. 441, 661–666 (2013).
Zhang, Q., Zhu, Y., Gao, X., Wu, Y. & Hutchinson, C. Training high-strength aluminum alloys to withstand fatigue. Nat. Commun. 11, 5198 (2020).
Roters, F. et al. Overview of constitutive laws, kinematics, homogenization and multiscale methods in crystal plasticity finite-element modeling: Theory, experiments, applications. Acta Mater. 58, 1152–1211 (2010).
Kalidindi, S. R., Bronkhorst, C. A. & Anand, L. Crystallographic texture evolution in bulk deformation processing of FCC metals. J. Mech. Phys. Solids 40, 537–569 (1992).
Herath, C., Wanni, J., Arnold, S. M. & Achuthan, A. A microstructure-informed constitutive model for hierarchical materials with subgrain features. Int. J. Mech. Sci. 261, 108691 (2023).
Charmi, A. et al. Mechanical anisotropy of additively manufactured stainless steel 316L: An experimental and numerical study. Mater. Sci. Eng. A 799, 140154 (2021).
Bulgarevich, D. S., Nomoto, S., Watanabe, M. & Demura, M. Crystal plasticity simulations with representative volume element of as-build laser powder bed fusion materials. Sci. Rep. 13, 1–15 (2023).
O’Brien, B. J. & McLellan, R. B. The electronic properties of metals with quenched-in disorder. Philos. Trans. R. Soc. A 341, 401–411 (1992).
McDowell, D. L. & Dunne, F. P. E. Microstructure-sensitive computational modeling of fatigue crack formation. Int. J. Fatigue 32, 1521–1542 (2010).
Yaghoobi, M. et al. PRISMS-Fatigue computational framework for fatigue analysis in polycrystalline metals and alloys. npj Comput. Mater. 7, 1–12 (2021).
Pandey, A. & Pokharel, R. Machine learning-based surrogate modeling approach for mapping crystal deformation in three dimensions. Scr. Mater. 193, 1–5 (2021).
Ibragimova, O. et al. A convolutional neural network based crystal plasticity finite element framework to predict localized deformation in metals. Int. J. Plast. 157, 103374 (2022).
Bishara, D., Xie, Y., Liu, W. K. & Li, S. A state-of-the-art review on machine learning-based multiscale modeling, simulation, homogenization and design of materials. Arch. Comput. Methods Eng. 30, 191–222 (2022).
Yuan, M., Paradiso, S., Meredig, B. & Niezgoda, S. R. Machine learning–based reduce order crystal plasticity modeling for ICME applications. Integr. Mater. Manuf. Innov. 7, 214–230 (2018).
Zhao, J. et al. Establishing reduced-order process-structure linkages from phase field simulations of dendritic grain growth during solidification. Comput. Mater. Sci. 214, 111694 (2022).
Khandelwal, S., Basu, S. & Patra, A. A Machine Learning-based surrogate modeling framework for predicting the history-dependent deformation of dual phase microstructures. Mater. Today Commun. 29, 102914 (2021).
Bugas, D. & Runnels, B. Grain boundary network plasticity: Reduced-order modeling of deformation-driven shear-coupled microstructure evolution. J. Mech. Phys. Solids 184, 105541 (2024).
Kaufmann, K., Lane, H., Liu, X. & Vecchio, K. S. Efficient few-shot machine learning for classification of EBSD patterns. Sci. Rep. 11, 1–10 (2021).
McAuliffe, T. P., Dye, D. & Britton, T. B. Spherical-angular dark field imaging and sensitive microstructural phase clustering with unsupervised machine learning. Ultramicroscopy 219, 113132 (2020).
Krishna, K. V. M., Madhavan, R., Pantawane, M. V., Banerjee, R. & Dahotre, N. B. Machine learning based de-noising of electron back scatter patterns of various crystallographic metallic materials fabricated using laser directed energy deposition. Ultramicroscopy 247, 113703 (2023).
Ding, Z. & De Graef, M. Parametric simulation of electron backscatter diffraction patterns through generative models. npj Comput. Mater. 9, 1–10 (2023).
Gönen, M. & Margolin, A. A. Localized data fusion for Kernel k-means clustering with application to cancer biology. In Z. Ghahramani, M. Welling, C.Cortes, N. Lawrence, & K. Q. Weinberger (Eds.), Adv. Neural Inf. Process. Syst. 27 (NIPS, 2014).
de Souto, M. C. P., Costa, I. G., de Araujo, D. S. A., Ludermir, T. B. & Schliep, A. Clustering cancer gene expression data: A comparative study. BMC Bioinforma. 9, 1–14 (2008).
Zheng, B., Yoon, S. W. & Lam, S. S. Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst. Appl. 41, 1476–1482 (2014).
Nidheesh, N., Abdul Nazeer, K. A. & Ameer, P. M. An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data. Comput. Biol. Med. 91, 213–221 (2017).
Huang, L. et al. Data mining tools | rapidminer: K-means method on clustering of rice crops by province as efforts to stabilize food crops in Indonesia. IOP Conf. Ser. Mater. Sci. Eng. 420, 012089 (2018).
Zhou, W., Zhao, W., Zhang, Y. & Ding, Z. Cluster analysis of acoustic emission signals and deformation measurement for delaminated glass fiber epoxy composites. Compos. Struct. 195, 349–358 (2018).
Shrifan, N. H. M. M., Jawad, G. N., Isa, N. A. M. & Akbar, M. F. Microwave nondestructive testing for defect detection in composites based on K-means clustering algorithm. IEEE Access 9, 4820–4828 (2021).
Leitherer, A., Yeo, B. C., Liebscher, C. H. & Ghiringhelli, L. M. Automatic identification of crystal structures and interfaces via artificial-intelligence-based electron microscopy. npj Comput. Mater. 9, 1–11 (2023).
Karthikeyan, T., Dash, M. K., Saroja, S. & Vijayalakshmi, M. Evaluation of misindexing of EBSD patterns in a ferritic steel. J. Microsc. 249, 26–35 (2013).
Fasano, G. & Franceschini, A. A multidimensional version of the Kolmogorov–Smirnov test. Mon. Not. R. Astron. Soc. 225, 155–170 (1987).
Massey, F. J. The Kolmogorov-Smirnov Test for Goodness of Fit. J. Am. Stat. Assoc. 46, 68–78 (1951).
Berger, V. W. & Zhou, Y. Kolmogorov–Smirnov Test: Overview. Wiley StatsRef: Stat. Ref. Online, (2014).
Breumier, S. et al. Leveraging EBSD data by deep learning for bainite, ferrite and martensite segmentation. Mater. Charact. 186, 111805 (2022).
Agrawal, A. K., Meric de Bellefon, G. & Thoma, D. High-throughput experimentation for microstructural design in additively manufactured 316L stainless steel. Mater. Sci. Eng. A 793, 139841 (2020).
Lopez-Sanchez, M. A. & Llana-Fúnez, S. A cavitation-seal mechanism for ultramylonite formation in quartzofeldspathic rocks within the semi-brittle field (Vivero fault, NW Spain). Tectonophysics 745, 132–153 (2018).
Bachmann, F., Hielscher, R. & Schaeben, H. Texture Aanalysis with MTEX – Free and open source software toolbox. Solid State Phenom. 160, 63–68 (2010).
Ott, J. N., Condit, C. B., Schulte-Pelkum, V., Bernard, R. & Pec, M. Seismic Anisotropy of Mafic Blueschists: EBSD-based constraints from the exhumed rock record. J. Geophys. Res. Solid Earth 129, e2023JB027679 (2024).
Demouchy, S. et al. Dislocation and disclination densities in experimentally deformed polycrystalline olivine. Eur. J. Mineral. 35, 219–242 (2023).
Acknowledgements
The authors acknowledge funding for this work was provided by the Center for Extreme Events in Structurally Evolving Material under U.S. Army CC-APG-RTP Division contract number W011NF-23-2-0073. The funder played no role in the study design, data collection, analysis, and interpretation of data, or the writing of this manuscript. The authors also wish to acknowledge Prof. A. Achuthan of Clarkson University for providing the Ti-6Al-4V EBSD datasets for analysis.
Author information
Authors and Affiliations
Contributions
J.W.: Conceptualization, Methodology, Data curation, Writing – original draft. C.A. Bronkhorst: Conceptualization, Supervision, Resources, Writing – review & editing, Funding acquisition. D.J.T.: Conceptualization, Supervision, Resources, Funding acquisition, Writing – original draft.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wanni, J., Bronkhorst, C.A. & Thoma, D.J. Machine learning enhanced analysis of EBSD data for texture representation. npj Comput Mater 10, 133 (2024). https://doi.org/10.1038/s41524-024-01324-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41524-024-01324-4
This article is cited by
-
Reply to: Comment on “Machine learning enhanced analysis of EBSD data for texture representation”
npj Computational Materials (2025)
-
Comment on “Machine learning enhanced analysis of EBSD data for texture representation”
npj Computational Materials (2025)









