Introduction

In actual working conditions, gears are typically subjected to complex cyclic loads, such as bending, contact, and torsion1,2,3. To enhance the overall performance of gear, carburizing and quenching methods are commonly employed to increase surface hardness and improve the internal structure of the material4,5. Atmosphere carburizing is widely used due to its ease of carbon potential adjustment, simple process formulation, and low production cost6. This process involves the infiltration of activated carbon atoms-decomposed from the carburizing medium-into the surface layer of the material, resulting in a high carbon content at the surface while the core retains its original composition. Parts subjected to carburizing benefit from strong wear resistance, surface contact resistance, and strong bending fatigue ability7. Therefore, the carburizing process has significant practical value and economic benefits and has been widely applied to key mechanical parts such as crankshafts, turbine blades, gears and forgings, which are used in ships, machine tools and aircraft4,8.

Currently, the optimization of the carburizing process primarily relies on the trial and error method, which involves identifying the relationship between the carburizing process parameters and the gradient change of carbon content through experimentation9. However, this method requires substantial manpower, materials and financial resources, resulting in high experimental costs and an extended research cycle. To overcome these problems, researchers have employed numerical simulation methods10 to calculate the carbon concentration distribution under given carburizing process parameters. This approach can effectively determine the optimal conditions for the carburizing process, thereby reducing experimental costs and shortening the research cycle. Although the numerical simulation method has the potential to optimize the carburizing process, operators with a poor theoretical foundation may face operational difficulties in the actual production process. Additionally, numerical simulations pose challenges in production process, including slow calculation speed and large consumption of time and human resources11. To address these issues, some scholars have predicted the accuracy of the cutting force and surface finish during the milling of titanium alloy machining12 based on machine learning13,14. They also adopt data-driven prediction methods based on simulation or experimental data15,16. Furthermore, some researchers use the BPNN model to establish the nonlinear relationship between the surface carbon concentration, the effective carburizing layer depth, and the process parameters after vacuum carburizing to find the optimal vacuum carburizing process conditions17. However, the carbon concentration gradient distribution may vary slightly even if the carburized layer depth is the same. This makes it inaccurate to predict the carbon concentration gradient based on solely on the depth of the carburized layer. The 2D carburization cloud map can provide more comprehensive information compared to the 1D carburization distribution, covering the distribution of carbon concentration in different locations and regions, thus highlighting spatial changes. Therefore, this study aims to predict the carbon concentration results of three 2D shapes of square, circle, and trapezoid under various process parameters.

In materials prediction tasks, SVM have demonstrated high accuracy under small sample conditions, particularly in friction material properties prediction18but tend to suffer from reduced precision and larger errors with large datasets19. In contrast, BPNN have been applied to material process quality prediction due to their strong nonlinear modeling capability. To improve their convergence and accuracy, genetic algorithms (GA) have been used to optimize initial weights and thresholds. GA-BPNN models have shown higher accuracy and faster convergence compared to standard BPNNs, making them effective tools for predicting process deviations and enhancing quality control20. However, neural networks often face slow training speeds when dealing with large datasets, which can hinder their practical application.

To predict the relationship between the carbon concentration distribution and process parameters after atmosphere carburizing heat treatment, this paper explores a method to address the slow training speed of neural networks caused by the large amount of carbon concentration data in the 2D carburizing model. To address the issue of slow training speed of neural networks in the case of large data, some scholars have proposed combining clustering analysis with neural networks21. In the field of ultrasonic testing, a successful prediction method combining mean shift clustering (MSC) and BPNN has been applied22. MSC is used to partition the original dataset into smaller cluster centers, based on the features and trends of the data. Then, these cluster centers are utilized as training data for the BPNN, thereby accelerating the training speed.

However, it did not optimize the parameters of the BPNN, which can easily lead to local optima and slow convergence speed. The hybrid approach of heuristic algorithms23 and BPNN can avoid the insufficient exploration ability and premature stagnation of the algorithm during later iterations24. This study aims to address the aforementioned issues and makes the following two contributions. Firstly, it proposes a novel prediction method for the BPNN to improve the accuracy, stability, and computational efficiency of the predictions. Secondly, the algorithm is tested on a large-scale dataset of different carburized 2D models, demonstrating excellent generalization performance.

The slime mould algorithm (SMA) is a metaheuristic algorithm based on the behavior of slime mould found in nature25. This method utilizes the collective behavior exhibited by slime mould when searching for food to explore optimal solutions. Its uniqueness lies in simulating the movement, communication, and fitness updating processes of slime mould. Compared to traditional algorithms such as the Dragonfly algorithm (DA)26Gray wolf algorithm (GWO)27 and Particle swarm optimization (PSO)28SMA has demonstrated superior optimization performance. In this study, SMA is employed to optimize the parameters of the BPNN and enhance its convergence speed. A parameter-adaptive BPNN algorithm based on Mean shift clustering (MSC) and SMA (MSMABP) is proposed. MSC is used to cluster the data and form cluster centers as the training set for the neural network. The SMA algorithm is then used to optimize the initial weights and thresholds of BPNN to improve the stability and prediction accuracy of the network. By comparing it with highly similar algorithms in terms of prediction accuracy, stability, and training time, MSMABP achieves top performance in terms of stability and second place in prediction accuracy, which proves the feasibility and effectiveness of this method.

Carbon concentration in carburized steel prediction on MSMABP

The main idea of MSMABP: (1) Divide the large data samples into a training set and a test set; (2) Use MSC to process the training set to get the clustering centers; (3) Set BPNN parameters: weight and threshold, multiple different parameter groups are randomly generated from the BPNN parameter population, and each parameter group is a search individual; 4) BPNN is used to predict the training set samples to obtain the prediction accuracy based on these parameter groups and cluster centers; 5) The SMA algorithm is used to update the BPNN parameter set according to the prediction accuracy; Loop steps 4 and 5 until the cut-off condition is met to obtain the optimal BPNN parameter set. The cut-off condition can be generally set to reach the maximum number of iterations. Finally, weights and thresholds are assigned to the BPNN based on the optimal parameter group. MSMABP is actually a combinatorial innovation method, which uses MSC to reduce the capacity of the large data sample training group to form cluster centers; Then SMA is used to optimize the parameters of the BPNN based on the cluster centers; Finally the initial weight and thresholds after SMA optimization are assigned to the BPNN for training. 6) Input the test set into the trained BPNN to obtain the carbon concentration value.

The BPNN parameter population is marked as X, expressed as X = {X1, X2,., Xn}, which is composed of search individuals Xi, where n represents the number of search individuals. The attributes of Xi are the weights and thresholds of each layer in the neural network. Before using the heuristic algorithm, the initial parameters of the neural network were set randomly. The heuristic algorithm was then used to iteratively optimize the initial values of the neural network, using prediction accuracy as the fitness value. The flowchart of MSMABP can be seen in Fig. 1.

Fig. 1
figure 1

MSMABP prediction process.

Mean shift clustering

After obtaining the normalized training set, it is necessary to perform mean shift clustering analysis to obtain the cluster centers of the training set. Mean shift clustering is a non-parametric clustering method used to discover dense regions from unlabeled data. This algorithm estimates the density of data points and determines the cluster centers in the data space by following the direction of maximum density gradient. The specific steps are as follows:

Step 1 Set the iterative error (error1), merging error (error2) and the search bandwidth (BW);

Step 2 Randomly select a point from the unmarked data points as the cluster center (center);

Step 3 Search for data points in the search dataset that fall within the bandwidth (BW) range of the current center point (center) and label them as set M.

Step 4 For each data point in the set M, calculate the vector from the current center point (center) to the data point, and add these vectors to obtain a total offset vector (shift);

Step 5 Add the current center point (center) and the offset vector (shift) to get the new center point position;

Step 6 Check whether the offset vector (shift) is smaller than the iteration error error1. If yes, the center point is stable enough, go to step 7 for the next step. If not, the center point is still moving, then return to step 3 and continue to iterate this process;

Step 7 Check whether the distance between the current center point (center) and the existing cluster center is smaller than the set merge error error2. If yes, the distance between the two clusters is very close, and they are merged into one cluster. This can reduce the number of clusters and improve the conciseness of the clustering results;

Step 8 If all points are visited by marks, stop searching and output the centers of all clusters as CCs; otherwise, go to step 2.

BP neural network

The neural network model for optimizing the heat treatment of carburized steel utilizes the BPNN algorithm29,30. The model takes carburizing and diffusion temperature, time, and carbon potential as input parameters, along with shape and x-y coordinates, with carbon concentration as the output layer parameter. Among these, the input shape is represented numerically: 1 for square, 2 for circle and 3 for trapezoid. These three selected shapes have broad applications in actual production processes. Specifically, the square shape resembles the cross-section of a turning tool, the circular shape resembles the cross-section of a shaft, and the trapezoid shape resembles the cross-section of a gear.

Simulation data of 243 sets with different process parameters are generated through ABAQUS, each set of simulation data contains the carbon concentration value corresponding to each coordinate of each shape. A total of 530,226 sets of data were used as samples, of which 371,158 sets were used to train the model for learning, while the remaining 159,068 sets were used to test the generalization and stability of the model.

Firstly, set the maximum number of iterations to 200, the learning rate to 0.01, and the momentum factor to 0.001 before establishing the BPNN model. Select the trainlm function as the training function, and use the backpropagation rule to train the neural network. At the same time, set the activation function of the output layer is purelin, and that of all hidden layers are set to the S-shaped tangent function tansig. Secondly, determine the number of hidden layers and neurons. Compared with a single hidden layer, the overall performance of a multi-hidden layer neural network model is better, though it comes with an increase in training time. This article selects 4, 5, and 6 as the number of hidden layers for discussion. The number of neurons is chosen in the range of 5 to 14 according to the empirical formula31and in the range of 15 to 24 outside the formula range. The specific formula is as follows:

$$\:l=\sqrt{n+m}+a$$
(1)

Where l is the number of single hidden layer nodes, a is constant (1 < a < 10), n and m is the number of input nodes and output nodes respectively.

As can be seen from Fig. 2, as the number of neuron nodes and hidden layers increases, the root mean square error of BPNN decreases while the running time gradually increases. As illustrated in the red box area in Fig. 2b, when a model with this regional structure is selected, the root mean square error on the test set is less than 4 × 10−5, while the calculation time of this regional structure model in Fig. 2a exceeds 3000 s, and often well beyond 3000 s (except for the model with 5 hidden layers and 20 neurons per layer). Considering both prediction accuracy and running time, this paper chooses the BPNN model as 10 × 20 × 20 × 20 × 20 × 20 × 1, as shown in Fig. 3.

Fig. 2
figure 2

Comparison of model performance with different number of neuron nodes under different hidden layers.

Fig. 3
figure 3

BP schematic diagram of neural network structure.

Slime mould optimization algorithm

The slime mould algorithm (Slime Mould Algorithm, SMA) is an optimization algorithm simulating the spreading and foraging behavior of slime mould25. It is inspired by the vegetative stages of slime mould, including feeding behavior and morphological changes. SMA uses an adaptive weight factor to simulate the positive and negative feedback behavior of slime mould, forming three different contraction modes of foraging behavior. The foraging behavior of slime mould includes three steps: looking for food, approaching food, and secreting biological enzymes to digest food. The process of SMA is shown in Fig. 4. Table 1 provides the pseudo code of the SMA algorithm.

Fig. 4
figure 4

The steps of SMA.

Approaching food

Slime mould judges the concentration of food based on the smell in the air and then approaches the food. To simulate this behavior, its approach behavior is formulated using the following equation to simulate the contraction mode25:

$$\:\overrightarrow{S\left(t+1\right)}=\left\{\begin{array}{c}\overrightarrow{{S}_{b}\left(t\right)}+\overrightarrow{vb}\bullet\:\left(\overrightarrow{W}\bullet\:\overrightarrow{{S}_{A}\left(t\right)}-\overrightarrow{{S}_{B}\left(t\right)}\right),\:x<q\\\:\overrightarrow{vc}\bullet\:\overrightarrow{S\left(t\right)},\:x\ge\:q\end{array}\right.$$
(2)

where, \(\:\overrightarrow{\:vb}\)and \(\:\overrightarrow{vc}\) are the control parameters. t is the current number of iterations, \(\:\overrightarrow{{S}_{b}}\:\)denotes the individual location with the highest odor concentration currently found, \(\:\overrightarrow{{S}_{A}}\) and \(\:\overrightarrow{{S}_{B}}\) are two random individuals, \(\:\overrightarrow{W}\) represents the weight of slime mould.

$$\:q=\text{tanh}\left|P\left(i\right)-TF\right|$$
(3)

where \(\:i\in\:\text{1,2},\dots\:,n\). \(\:P\left(i\right)\) is the adaptation value of \(\:\overrightarrow{S}\), \(\:TF\) is the best adaptation value in all iterations.

$$\:\overrightarrow{W\left(SmellIndex\right(i\left)\right)}=\left\{\begin{array}{c}1+x \bullet\:{log}\left(\frac{bF-P\left(i\right)}{bF-wF}+1\right),half\\\:1-x \bullet\:{log}\left(\frac{bF-P\left(i\right)}{bF-wF}+1\right),\:others \end{array}\right.$$
(4)
$$\:SmellIndex=sort\left(P\right)$$
(5)

,

where half is P(i) ranks in the top half of the population, others is the remaining individuals in the population, x represents a random number between \(\:\left[0,\:1\right]\), \(\:bF\) and \(\:wF\) are the best and worst fitness value of the current iterations, \(\:SmellIndex\) is the sequence of fitness values.

Wrapping food

The formula for updating the slime mould position is as follows:

$$\overrightarrow {{S^{*} }} = \left\{ {\begin{array}{*{20}c} {rand \cdot \left( {UB - LB} \right) + LB,\;\;\;rand < z~~~~~~~~~~~~~~} \\ {\overrightarrow {{S_{b} \left( t \right)}} + \overrightarrow {{vb}} \cdot \left( {W \cdot \overrightarrow {{S_{A} \left( t \right)}} - \overrightarrow {{S_{B} \left( t \right)}} } \right),\;\;\;x < q~~~~~~~~~~} \\ {\overrightarrow {{vc}} \cdot \overrightarrow {{S\left( t \right)}} ,\;\;\;x > q~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~} \\ \end{array} } \right.$$
(6)

Where LB and UB represent the lower and upper boundaries of the search range respectively, and rand represent random values in \(\:\left[0,\:1\right]\).

Access to food

The control parameter \(\:\overrightarrow{vb}\) is chosen randomly between [-d\(\:,\)d]. The control parameters \(\:\overrightarrow{vc}\) oscillates randomly within the range of \(\:\left[-1,\:1\right]\), gradually converging to zero.

$$\:\overrightarrow{vb}=[-d,\:d]$$
(7)
$$\:d=\text{a}\text{r}\text{c}\text{t}\text{a}\text{n}\text{h}\left(-\left(\frac{t}{\text{m}\text{a}\text{x}\_t}\right)+1\right)$$
(8)
Table 1 SMA Pseudo code.

Case study

Experiment description

The atmosphere carburizing experiment in this paper is carried out on the wx-1000 atmosphere carburizing multifunctional furnace fabricated by Jiangsu Yixin Gear Manufacturing Co., Ltd. This furnace can meet the production requirements of 1.5 ~ 3.5 mm carburized layer depth for small and medium-sized parts, with a relatively high production rate and high efficiency. It has the characteristics of relatively high temperature, a stable atmosphere in the furnace, and a high degree of automation. It is especially suitable for processing large quantities of carburized heat-treated parts such as gears, bearings, and knives. The experimental material in this study is the gear steel 20Cr2Ni4A, which is prepared as a cylindrical sample of \(\:\varnothing\:\)20mm\(\:\:\times\:\:\)48mm. Table 2 shows the chemical composition of 20Cr2Ni4A gear steel.

Table 2 Chemical compositions of 20Cr2Ni4A steel(wt%).

For the carburizing production process of different carburized layers, the cylindrical sample is placed into the furnace for carburizing. The carburizing process mainly involves two stages: carburizing and diffusion, followed by subsequent cooling to the appropriate temperature for quenching. The process curve is shown in Fig. 5.

The first stage is preheating, with the temperature raised to 800 °C and isothermally maintained for 30 min to ensure uniform heating; The second stage is carburizing, with the carbon potential cp.1%, the temperature raised to T1 °C, and the duration set to t1 minutes; The third stage is diffusion, with the carbon potential set at cp.2%, the temperature maintained at T2 °C, and the duration set to t2 minutes; In the fourth stage, the temperature is lowered to the quenching temperature T3 °C and maintained for t3 minutes; The fifth stage is quenching for 40 min, with the quenching oil temperature set at 70 °C and the actual temperature rising to about 120 °C; The sixth stage is air cooling until the sample cools down to room temperature. The sample placement with the furnace is shown in Fig. 6.

Fig. 5
figure 5

Carburizing and quenching process curve.

According to the determined carburizing process of spiral bevel gears, cylindrical samples were carburized in the furnace to verify the accuracy of the simulation model. As shown in Fig. 6, three cylindrical specimens were installed in racks with three different process parameters, each gear frame weighs about 1 ton. The carburizing and diffusion temperature of process 1 and process 2 is 910 °C, the carburizing and diffusion temperature of process 3 is 920 °C, the quenching temperature of all three processes is 820 °C. The carburizing carbon potential of the three processes is 1.2%, the diffusion carbon potential of process 1 and process 3 is 0.8%, and the diffusion carbon potential of process 2 is 0.9%. For carburization time the unit is hours, Process 3 has a longer set time, while Process 2 has the shortest carburizing time. The specific process parameters are shown in Table 3.

Table 3 Carburizing and quenching process parameters.
Fig. 6
figure 6

Pictures of furnace samples of different processes.

Carburized layer carbon concentration gradient test

The carbon concentration is measured at three different positions of the cylindrical sample after carburizing, as shown in Fig. 7b. The test equipment uses a German BRUEKR (Bruker) direct reading spectrometer (Q2 ION), as shown in Fig. 7a. Firstly, the carbon concentration of the surface is measured. Secondly, the marks of the spectrometer test is grinded away with a grinding machine. After that, the loss on thickness is measured with a micrometer, followed by the measurement of the carbon concentration on the identical location. Then the same procedures are repeated until reaching the substrate of the sample.

Fig. 7
figure 7

Test equipment and test locations (a) Direct Reading Spectrometer (b) Carbon concentration test location.

Numerical simulation

Atmospheric carburizing can be divided into three processes. First, the carburizing medium undergoes a decomposition reaction at high temperature. Subsequently, carbon atoms are continuously transferred to the surface of the steel part, causing the carbon content on the surface of the steel part to increase. Finally, a concentration gradient is formed between the surface and the core of the steel part, providing a driving force for carbon atoms to move into the steel part17.

The governing equation of the carburizing diffusion model can be described by Fick’s second law and partial differential Eq.32:

$$\:\frac{\partial\:C}{\partial\:t}=D\frac{{\partial\:}^{2}C}{\partial\:{x}_{i}^{2}}$$
(9)

Here, C is the carbon concentration, D is the carbon diffusion coefficient, which is a function related to temperature and carbon concentration. For alloy steel, the carbon diffusion coefficient can be expressed as:

$$\:D=(0.0047{e}^{-1.6C}\times\:{e}^{\left[\frac{6600C-37000}{RT}\right]})\times\:S$$
(10)

Where R is the gas constant, T is the Kelvin temperature, and S is a constant related to the alloy element content.

$$\:D\frac{\partial\:C}{\partial\:{x}_{i}}=\beta\:(C-{C}_{g})$$
(11)

Generally, Cg are the environmental carbon potential of the carbon content on the workpiece surface, and β is the carbon transfer coefficient between the gas phase and the solid phase steel part.

As shown in Fig. 8a, the simulated object is a small cylindrical sample of 20Cr2Ni4A with a thickness of 48 mm and a diameter of 40 mm. To simplify the calculation of the 1D diffusion process, half of the cylindrical sample is taken as the simulation object, and the finite element model is established, as shown in Fig. 8b. AB is the axis of symmetry, and boundary conditions of temperature and carburizing carbon potential are imposed on the remaining three sides. For carburizing heat treatment model, the grid at a distance of 4 mm from the surface is refined to accurately capture the carbon gradient, forming a denser surface grid and a sparser core grid. The grid type is quadrilateral, the number of grid cells is 4600, and the number of nodes is 4747, as shown in Fig. 8c. The initial condition is that the concentration field inside the sample is uniform, with the carbon concentration at each node is C0 = 0.2%, and the density set as 7.98 g/cm3. Boundary conditioning are set to face flow conditions.

The testing points on the top of the finite element of the cylindrical sample are shown in Fig. 8d. Each testing point is selected at an interval of 0.13 mm from the surface, and a total of 31 testing points are taken to measure the carbon content.

Fig. 8
figure 8

Cylindrical Sample and Model (a) Schematic diagram of a cylindrical sample (b) Finite element model (c) Finite element model meshing (d) Carbon concentration distribution cloud chart.

Comparison between simulation results and experiments

The carbon concentration was measured on cylindrical samples with three different process parameters. The effective carbon concentration depths corresponding to a carbon content of 0.35% are pinpointed. The effective carbon concentration depths of process 1, process 2, and process 3 are 1.372 mm, 0.858 mm, and 1.819 mm respectively. The comparison between the simulation results and experimental results of the carbon content gradient change is shown in Fig. 9. It can be seen from the figure that both the simulated values and the experimental values decrease with the increase in depth, and the decreasing trend is the same.

The highest carbon content among the three processes ranges between 0.91% and 0.81%, with the carbon concentration evenly distributed on the surface and gradually decreasing from the surface toward the center. The surface carbon content in Process 2 is higher than that of the other two processes because the relatively short diffusion time causes the absorption rate of carbon atoms on the workpiece surface to exceed their diffusion rate into the interior. This results in the accumulation of carbon atoms on the surface and a higher surface carbon concentration. In contrast, the carburized layer in Process 3 is the deepest, as both the strong carburizing stage and the diffusion stage in this process are the longest, which facilitates the continuous inward diffusion of carbon atoms.

Fig. 9
figure 9

Carbon concentration gradient in three different processes.

To verify the accuracy of the carburizing numerical model, the liner interpolation method was used to calculate the carbon concentration values at three different experimental test locations, corresponding to the positions of the simulation coordinates. Subsequently, the standard deviation and average values of the carbon concentration values corresponding to the three different positions were calculated. As shown in Fig. 10, the percentage error curves between the experimental values and simulation values for the three different processes reveal that the maximum percentage error for process 1 is 7.79%, occurring at a distance of 1.86 mm from the surface. The maximum percentage error for all three processes is located within 0 ~ 0.2 mm from the surface. This discrepancy is attributed to the error of the test equipment and the uses of 0.02% substrate material as the standard sample. In areas with higher carbon concentrations, there might be some measurement errors.

Fig. 10
figure 10

Percent error between experimental value and simulation value.

This article chooses a cylindrical end face for carbon concentration gradient testing. The results for the end face are the same as the carbon concentration gradient results for the square and trapezoid in the vertical direction of the 1D surface under the same process parameters. For 2D shapes, the finite element software considers the impact of differently shaped surfaces on carbon concentration diffusion. By establishing an accurate carbon concentration diffusion model during carburization, three different shapes circle, square, and trapezoid were modeled, as shown in Fig. 11. The number of grid points corresponding to the three shapes are 729, 728, and 725 respectively. An orthogonal process parameter combination is designed for large-scale calculations according to the carburizing process parameter range. For carburizing and diffusion, the temperature is fixed in three groups, the time settings are the same, and the carbon potential is set to three groups respectively, for a total of 3\(\:\times\:\)3\(\:\times\:\)3\(\:\times\:\)3\(\:\times\:\)3=243 groups of different carburizing process parameters, as shown in Table 4. The carbon concentration distribution results under the corresponding process parameters are calculated as training samples for the neural network.

The carbon concentration distribution of three shapes under 243 sets of different process parameters was generated through simulation software, with a total of 243\(\:\times\:\)(729 + 728 + 725) = 530,226 sets of data as training samples. Finally, a prediction model based on BPNN was established with the values corresponding to the carburizing process parameters, coordinates and shape as input and the carbon concentration value as output.

Table 4 Simulation generates a training set scheme list.
Fig. 11
figure 11

Meshing with three different shapes.

Feasibility verification experiment

To better display the distribution cloud map of carburizing carbon concentration prediction results, we randomly selected input data of three different shapes under different process parameters (as shown in Table 5) from the test set and input them into the trained network. The carbon concentration distribution cloud diagrams of the three different shapes were obtained, as shown in Fig. 12.

Table 5 Process parameters for three different shape selections.
Fig. 12
figure 12

Three shape simulation results and prediction result cloud charts.

The comparison of the percentage errors between simulation values and predicted values is shown in Fig. 13. The area with larger errors for the circular shape is within a ring 1.5–4.5 mm from the surface. The areas with larger errors for the square and trapezoidal shape are all in the four corner areas. The maximum percentage errors of the square, circle, and trapezoid shapes are 11%, 5.9%, and 10.45% respectively. The reason the circular shape has smaller errors than the other two shapes is that the surface of the circular shape is relatively uniform. Square and trapezoidal surfaces have corners, and the carbon concentration distribution in these areas is more complex.

Fig. 13
figure 13

Comparison of simulation and prediction percentage errors.

To better observe the gradient change of carburizing carbon content, for each shape, the carbon concentration value of each point is taken every 0.35 mm along the vertical direction starting from the coordinate position (7.5, 15) to the substrate. The percentage errors between the prediction and simulation results for the square, circle, and trapezoid shapes are 0.071%-2.399%, 0.33%-3.39%, and 0.013%-3.33% respectively, as shown in Fig. 14. This indicates that the algorithm proposed in this article shows strong generalization performance in carbon concentration prediction. The effective carburized layer depths of square, circular and trapezoidal shapes are 1.35 mm, 2.32 mm and 1.72 mm respectively. The effective carburized layer depths depends on the carburizing process parameters. Generally speaking, the higher the temperature, the longer the time, and the higher the carbon potential, the deeper the carburized layer will be15. The depth of the circular effective carburizing layer is higher than that of the other two shapes because its carburizing temperature is higher, which accelerates the diffusion of carbon atoms. Additionally, the arc shape of its surface also accelerates the diffusion of carbon atoms. From the percentage error curves of the three shapes, it can be seen that there is a larger error in the range of 2\(\:\sim\)3.5mm from the surface. This is because there are fewer grid points in this range, resulting in insufficient corresponding training samples for the simulation model, thus causing an increase in errors.

Fig. 14
figure 14

Carbon concentration gradient error comparison.

To enhance the interpretability of the carbon concentration prediction model, we applied SHAP (SHapley Additive Explanations) analysis to quantify the contribution of each process parameter. As shown in Fig. 15, the feature importance derived from SHAP values provides a clear and consistent explanation of how each input feature affects the model’s output.

Among all process parameters, the X and Y coordinates exhibit the highest importance scores. This finding is consistent with the physical mechanism of the carburizing process carbon concentration tends to be higher closer to the surface, particularly within the 0\(\:\sim\)2 mm depth range from the part’s surface. Additionally, the diffusion carbon potential and quenching carbon potential also demonstrate significant influence. This is because the quenching stage is the final phase of the carburizing process and directly determines the surface carbon concentration after treatment.

Fig. 15
figure 15

Analysis of the contribution of process parameter characteristics to carbon concentration prediction.

Comparative experiment

Some initial parameters involved in this method are shown in Table 6. Figure 16 shows the BPNN model training performance curves for each hybrid method. After 200 rounds of learning and iteration, each method can reduce the error to 2.14\(\:\times\:\)10−5, demonstrating that each network model has a very fast training speed and can achieve good training results. It can be seen from the training performance curve that in the first 15 rounds or so, the mean square error drops sharply, and the network convergence speed is significant and rapid. After 50 rounds, the training performance curve becomes smooth and shows no obvious fluctuations, which better reflects the one-to-one correspondence between the 10 input layer parameters and output layer parameters. Therefore, the above combined methods can be used for carburizing process optimization.

Table 6 Parameter settings.

It can be seen from the partial enlarged picture in Fig. 16 that the training performance curve of the MSC-BP neural network algorithm without optimizing the initial weight and threshold tends to become smooth after 80 iterations, which is obviously slower than other methods. Compared with other heuristic algorithms, the convergence speed of the neural network can be significantly accelerated after optimizing the initial value of BPNN. This shows that optimizing the initial value of the neural network through a heuristic algorithm can significantly speed up the convergence speed of the network. Comparing several training performance curves using heuristic algorithms to optimize the initial value of the BPNN, it can be seen that during the first 40 iterations, the loss function of the MSMABP algorithm decreased the fastest.

Fig. 16
figure 16

BPNN model training performance curves of each hybrid method.

To verify the prediction performance of the method in this paper (MSMABP), it is compared with MGWOBP, MPSOBP, and MDABP. Using a training group and test group allocation ratio of 7:3, each method was run 5 times. Table 7 lists the average (Avg) and standard deviation (Std) of the prediction error and running time. Std is mainly used to evaluate the stability of the method. As can be seen from Fig. 17, the violin contour of the MSMABP method at 1.29 × 10− 4 is wider, indicating that the data is more concentrated near this area, while the MPSOBP method has the largest data fluctuation range and the least stable prediction accuracy. In terms of training time, the MSMABP method fluctuates around 650 s, which is more stable than other methods as shown in Fig. 18. This method performs best in terms of prediction accuracy and running time stability, ranking first in overall stability. In terms of prediction accuracy, this method is second only to the MGWOBP method, but it is slightly inferior to other methods in terms of neural network training time.

Compared with other machine learning methods, the performance of the MSMASVR method is significantly worse. As shown in Table 7, its RMSE reaches 2.46 × 10⁻³, which is one order of magnitude higher than the other methods, indicating poor prediction accuracy on large-scale datasets. Additionally, its average running time exceeds 5800 s, far longer than the neural network-based approaches. This is primarily because SVR involves solving a quadratic programming problem with a computational complexity of O(n3, making it highly sensitive to the training sample size. Furthermore, SVR typically requires computing kernel functions against all support vectors, which becomes costly as data grows.

Mean-shift clustering drastically reduces the number of training samples, thus reducing the running time. However, this brings a new problem: the loss of original data results in a decrease of accuracy. Therefore, the mixed method of MSC and BP (MBP) is formed, and the comparison results are shown in Table 7. A total of 522,955 sets of training samples can be clustered into 77,321 cluster centers with mean shift clustering. Using the cluster centers as the training set can reduce the running time of BPNN by nearly 5.7 times. From the perspective of prediction accuracy, the RMSE (Root Mean Square Error) value of MBP is still relatively small, 4.5 times that of the BPNN RMSE. The average percentage error of BPNN is 0.48%, and the average percentage error of MBP is 2.18%. To some extent, this loss is acceptable.

Table 7 Comparison of MSMABP with other algorithms.
Fig. 17
figure 17

Violin and scatter plots of the RMSE results of each method.

Fig. 18
figure 18

Violin and scatterplot of running time of each method.

Conclusion

To solve the problem of high-efficiency prediction of carburizing carbon concentration under the condition of large samples, a parameter adaptive BPNN prediction method combining MSC and SMA is proposed in this paper. MSC is used to accelerate the entire method, and SMA is used to improve the prediction accuracy and stability of the method. By training a large amount of historical data, the model can learn the complex nonlinear relationship between atmospheric carburizing carbon concentration and corresponding process parameters, showing high prediction accuracy. The average percentage error of the trained model on the test set is 2.18%, the correlation coefficient between the training value and the predicted value reach 0.9998, and the root mean square error is 1.29 × 10− 4, which verifies the superiority of the model. In addition, the prediction accuracy and stability were compared among MSMABP, MBP, MGWOBP, MPSOBP and MDABP. The method proposed in this paper shows a strong competitive advantage and excellent performance in the stability of results, which can be directly applied to practical processing. Future research is expected to apply this method to predict the three-dimensional carburized carbon concentration. Additionally, it should consider the influence of material factors and more complex shapes on the carbon concentration results after carburizing. Targeting production costs, this approach is of great significance in reducing carbon emissions and improving the environment.