Gaussian process regression with physics-guided pseudo-sample augmentation for wear prediction under sparse measurements in milling

Nguyen, Hai-Phong; Nguyen, Duc-Thuan; Kim, Jong-Myon

doi:10.1038/s41598-026-38067-9

Download PDF

Article
Open access
Published: 04 February 2026

Gaussian process regression with physics-guided pseudo-sample augmentation for wear prediction under sparse measurements in milling

Hai-Phong Nguyen¹,
Duc-Thuan Nguyen¹ &
Jong-Myon Kim^1,2

Scientific Reports volume 16, Article number: 7231 (2026) Cite this article

434 Accesses
Metrics details

Subjects

Abstract

Tool wear prediction is essential to ensure machining quality and sustainability. Hybrid physics-data Gaussian process regression (GPR) methods integrate domain knowledge with data-driven learning, but a fundamental challenge remains due to an inherent GPR characteristic: when trained on sparse measurements, GPR struggles to extrapolate accurately as tool wear progresses beyond the training distribution, leading to increased uncertainty and prediction errors. This work proposes Gaussian process regression with physics-guided pseudo-samples (GPR-PPS), which addresses this extrapolation issue by enriching the training set with synthetic wear labels at intermediate cuts between sparse measurements. Pseudo-samples are generated by fitting a physics-based flank-wear function to recent GPR predictions and realigning the fitted curve to measured values. These samples are then incorporated into the GPR training set alongside real measurements to predict tool flank wear values across the tool’s operational life. The proposed framework is evaluated on high-speed milling experiments using multi-sensor signals, and the results demonstrate that the proposed method accurately forecasts the entire tool life cycle while using as little as 9.5% of the tool’s life span as initial labeled training data. Compared to conventional machine learning baselines, the proposed approach exhibits superior predictive performance and robustness under limited data conditions.

Introduction

Computer numerical control (CNC) machining has emerged as a fundamental technology in modern production, allowing for the creation of complex, high-precision components for sectors such as aerospace, healthcare, electronics, and energy. Despite advancements in cutting tool materials and machining technology, tool wear remains an unavoidable challenge caused by mechanical stress, heat, chemical reactions, and abrasive interactions at the tool-workpiece interface¹. In high-speed and precision milling, wear not only degrades surface quality and dimensional accuracy but can also lead to tool failure, unplanned downtime, and increased production costs. While replacing tools early is considered a safe and cautious approach, studies show that this often results in tools being discarded after only 50–80% of their life span, mainly due to the absence of reliable wear prediction methods². Conversely, extending tool usage to its full potential life span, without compromising quality, can significantly reduce downtime, enhance productivity, and promote sustainable manufacturing³.

To ensure machining quality and efficiency, process monitoring systems have been widely adopted in modern manufacturing. These systems employ multi-sensor measurements to detect various anomalies such as chatter vibrations, tool breakage, surface defects, and dimensional errors in real time^4,5. Within this framework, tool wear monitoring and prediction has become particularly critical given its direct impact on manufacturing outcomes^6,7. Tool wear assessment can be performed using direct and indirect methods. Direct methods consist of measuring tool wear through physical inspection techniques, such as gravimetric analysis⁸, optical imaging^9,10, laser scanning^11,12, or profilometry¹³. These approaches can accurately capture the cutter’s geometry to quantify wear parameters such as flank wear or crater depth; however, they are often limited in industrial applications¹⁴. Direct methods typically rely on offline measurements, which require pausing the machining process and can reduce overall production efficiency¹⁵.

In contrast, indirect methods utilize signals measured from the machining processes themselves to infer tool wear. These approaches dominate industrial applications due to their non-invasiveness, cost-effectiveness, and suitability. Commonly used signals include cutting forces^16,17, vibrations^18,19, acoustic emissions (AE)^20,21, spindle current/power¹⁰, and temperature²². Among indirect methods, some employ physical modeling to relate measured signals to tool wear. Physical models describe wear evolution using mathematical formulations grounded in mechanics, such as friction, crack growth, or wear mechanisms (abrasive, adhesive, diffusive)²³, and can be divided into physical law-based models and mechanism-based models¹. Physical law-based models assume that wear progression follows general physical principles, and the governing equations typically contain empirical or tunable coefficients obtained through experimental calibrations. Classic examples include Taylor’s extended tool life equation, logarithmic wear laws²⁴, and generalized parametric wear models for flank wear^25,26. While interpretable and computationally simple, these models primarily relate wear to fixed cutting parameters (spindle speed, feed rate, and axial/radial depth of cut) and time. As a result, they often fail to capture dynamic variations such as sudden force changes or abnormal wear progression, and their accuracy degrades under variable conditions. Mechanism-based models explicitly represent the underlying physical processes of wear. These integrate stress distributions, temperature fields^27,28, or finite element analysis (FEA)^23,29 to describe material removal phenomena, including abrasion, adhesion, and diffusion. These approaches provide realistic descriptions of wear progression, but their main challenge lies in practicality: they require extensive assumptions, high computational cost, and detailed material parameters, making them difficult to apply in real-time or industrial shop-floor scenarios. Both model types, despite their differences, rely on static assumptions and calibrated parameters, limiting their ability to handle real-world variability such as tool-workpiece interactions, material inhomogeneity, or stochastic chip formation.

Data-driven approaches use machine learning (ML) to map signal characteristics to tool wear states, wear values, or remaining useful life (RUL). ML is well suited to this task as it can capture nonlinear dependencies and uncover latent structures in complex data. As a result, various methods, based on diverse ML architectures such as support vector machines (SVM)^30,31,32, convolutional neural networks (CNN)^33,34, long short-term memory (LSTM) networks^35,36, and gated recurrent units (GRU)^37,38, have been developed to address this mapping challenge. For instance, Gomes et al.³⁰ achieved 97.5% accuracy using SVM with vibration and sound signals for real-time wear classification, while Bazi et al.³⁴ employed a hybrid CNN-BiLSTM model to extract critical features from noisy force and acoustic emission signals despite signal interference. Recent studies have also improved the interpretability of data-driven models through intrinsically transparent architectures and post-hoc explainability techniques. Intrinsically interpretable methods like random forests^39,40 provide transparency through their decision structures, while post-hoc techniques enable explanation of complex models. For example, Yao et al.⁴¹ used SHAP and Grad-CAM to reveal how neural networks make predictions in process monitoring, while Sotubadi et al.⁴² developed multi-modal explainable AI combining these techniques with deep neural networks for tool wear detection. However, purely data-driven models still lack explicit physical structure and operate solely by learning statistical patterns from training data. This fundamental limitation makes them data-intensive, requiring extensive, varied, and high-quality datasets for effective training⁴³. Furthermore, most data-driven methods provide only point predictions without quantifying uncertainty, making risk-aware maintenance decisions difficult in practice. Consequently, data-driven approaches remain less suitable when only limited experimental measurements are available or when encountering new conditions not well represented in the training data.

Recognizing that physics-based and data-driven models each have complementary limitations, recent studies have increasingly adopted hybrid physics-data frameworks that combine physical insight with flexible learning to improve tool wear prediction. Representative approaches include physics-informed and physics-guided neural network methods that incorporate mechanistic wear models, physical constraints, or domain knowledge through modified loss functions, network architectures, or training procedures^6,44,45,46. These hybrid methods demonstrate that embedding physical structure within learning models can enhance prediction stability, interpretability, and robustness under varying machining conditions. Despite these advantages, such physics-informed approaches often require careful tuning of loss weight hyperparameters to balance physics and data terms, and their performance can degrade when applied to conditions that deviate significantly from the training distribution or embedded physical assumptions. Moreover, most existing hybrid methods still require substantial labeled data for effective training and typically provide only point predictions without explicit uncertainty quantification for risk-aware decision-making. Within this hybrid landscape, Gaussian process regression (GPR) is adopted as our base framework due to its distinct capability for effective learning under sparse labeled data. Unlike other hybrid methods that require substantial labeled datasets, GPR enables accurate modeling from limited measurements, making it particularly suitable when obtaining wear values requires stopping production and is therefore costly and disruptive. Critically, GPR also addresses the lack of uncertainty quantification in existing hybrid approaches by naturally providing predictive confidence intervals, enabling risk-aware maintenance decisions. Additionally, GPR’s kernel-based structure flexibly captures nonlinear signal-wear relationships while explicitly accounting for measurement noise.

Recent studies have utilized GPR’s strengths to address tool wear challenges. For example, Kong et al.⁴⁷ integrated kernel principal component analysis with GPR to monitor tool flank wear, effectively reducing noise and refining confidence intervals, which improved the model’s suitability for real-time monitoring. Zhu et al.⁴⁸ proposed a physics-informed GPR (PI-GPR) that embeds parametric physical wear models into the mean function, constraining predictions to better reflect actual tool degradation. Sun et al.⁴⁹ developed a hybrid PI-GPR that incorporates a health indicator from filtered sensor features and an explicit physical wear model as the prior mean, achieving high prediction accuracy and reduced confidence interval variance.

These GPR-based studies demonstrate that incorporating physical insight can improve predictive stability and interpretability. However, they also face a persistent challenge: in wear prediction, with continued operation, the signal features and corresponding wear states gradually diverge from the training distribution, leaving new predictions without a nearby reference. Sun et al.⁴⁹ relied on the initial portion of tool life to set up the model, while Zhu et al.⁴⁸ periodically updated predictions with new measurements to re-anchor forecasts. These strategies partially mitigate sparsity but remain limited once forecasting moves far into out-of-range regions. In such cases, GPR must extrapolate along the evolving trajectory, but without nearby points, forecasts tend to revert toward the prior mean and accumulate uncertainty. Even with physically informed structures, sparse data constrain GPR’s ability to remain aligned with the true wear path, leaving predictions vulnerable to noise, drift, and costly decision errors^50,51.

To address the limitations of conventional physics-guided GPR models, this study proposes Gaussian process regression with physics-guided pseudo-samples (GPR-PPS). Unlike prior methods^47,48,49 that embed physics primarily as a constraining prior within the mean or kernel function, the proposed framework actively addresses extrapolation uncertainty in sparse-data regimes by repurposing the physical wear model to generate synthetic training data.

The main contributions of this manuscript to tool wear prediction are as follows:

1.
Pseudo-sample generation mechanism: Transforms intermediate GPR predictions between measured points into training samples using a physics-based wear function, addressing GPR’s extrapolation limitations without requiring additional real measurements.
2.
Closed-loop predict-update-augment strategy: Integrates prediction, physics-based fitting, and training enrichment into a continuous iterative update cycle, which enables reliable long-horizon wear forecasting even when labeled data is severely limited.
3.
Experimental validation: Demonstrates that pseudo-samples closely match real measurements, enabling the model to outperform standard GPR and machine learning baselines in sparse-data regimes..

The remainder of this paper is organized as follows: Section Theoretical background reviews the Gaussian process preliminaries and the physics-based flank-wear model. Section Methods details the proposed GPR-PPS framework. Section Results presents the experimental evaluation. Section Conclusion summarizes our findings, limitations, and directions for future work.

Theoretical background

Gaussian process regression

A Gaussian process (GP) is a collection of random variables, any finite subset of which follows a joint Gaussian distribution. In the context of regression, we consider an unknown function $\:f\left(x\right)$ that we wish to infer from observed data (e.g., true tool wear as a function of input signal features). A GP provides a probabilistic framework to model this function, offering both predictions and uncertainty estimates. Before incorporating observations, a GP specifies a prior distribution over such functions and is fully characterized by a mean function and a covariance function (kernel):

$$\:f\left(x\right)\sim\:GP\left(m\left(x\right),\:k\left({x}_{\text{i}},{x}_{\text{j}}\right)\right)$$

(1)

Here, the mean function $\:m\left(x\right)$ represents the expected value of $\:f\left(x\right)$ at input $\:x$:

$$\:m\left(x\right)\:=\:E\left[f\right(x\left)\right]$$

(2)

The mean function is commonly assumed to be zero when the underlying relationship between inputs and $\:f\left(x\right)$ is not well known, reflecting no prior bias about the function’s shape. The kernel $\:k\left({x}_{i},{x}_{j}\right)$ defines the correlation between function values at different input $\:{x}_{i}$ and $\:{x}_{j}$:

$$\:k({x}_{\text{i}},\:{x}_{\text{j}})\:=\:E\left[\right(f\left({x}_{\text{i}}\right)\:-\:m\left({x}_{\text{i}}\right)\left)\right(f\left({x}_{\text{j}}\right)\:-\:m\left({x}_{\text{j}}\right)\left)\right]$$

(3)

The kernel captures relationships between the inputs, such as their distance or periodic alignment, to infer similarity in outputs (e.g., tool wear values). For example, the radial basis function (RBF) kernel, discussed later, assumes that inputs that are closer in feature space (e.g., similar sensor readings) correspond to more similar tool wear values. In real-world applications, observations are corrupted by noise, so the model incorporates additive Gaussian noise. Given a training dataset $\:D=\{\left({x}_{i},{y}_{i}\right){\}}_{i=1}^{n}$ where $\:{x}_{i}\in\:{R}^{d}$ are vectors of input features and $\:{y}_{i}\in\:R$ are measurements affected by noise, the model assumes:

$$\:{y}_{i}\:=\:f\left({x}_{i}\right)\:+\:\epsilon\:,\hspace{1em}\epsilon\:\mathcal{\:}\sim\mathcal{\:}\mathcal{N}(0,\:{\sigma\:}_{\text{n}}^{2})$$

(4)

Equivalently, for a finite set of training inputs $\:X=\{\left({x}_{i}\right){\}}_{i=1}^{n}$, the training outputs $\:Y=\{\left({y}_{i}\right){\}}_{i=1}^{n}$ follow the GP prior:

$$\:Y\:\sim\:GP\:\left(m\left(x\right),\:k\left({x}_{i},{x}_{j}\right)+\:{\sigma\:}_{n}^{2}{\delta\:}_{ij}\right)$$

(5)

where $\:{{\updelta\:}}_{ij}$ is the Kronecker delta function, adding noise variance $\:{\sigma\:}_{n}^{2}$. For a finite training set, this GP prior reduces to a multivariate normal distribution:

$$\:Y\sim\:N\left(m\left(X\right),K+{\sigma\:}_{n}^{2}\text{}I\right)$$

(6)

In this expression, $\:K$ denotes the $\:n\times\:n$ covariance matrix of elements $\:{K}_{ij}=k\left({x}_{i},{x}_{j}\right)$. This distribution reflects prior knowledge of the outputs before any observations are made. When making predictions at a new input $\:{x}_{\text{*}}$, we consider the joint prior over both the training outputs $\:Y$ and the underlying function value $\:f\left({x}_{\text{*}}\right)$. Extending the prior in this way gives:

$$\:\begin{array}{c}\left[\begin{array}{c}Y\\\:f\left({x}_{\text{*}}\right)\end{array}\right]\end{array}\sim\:\mathcal{N}\left(\begin{array}{c}\left[\begin{array}{c}m\left(X\right)\\\:m\left({x}_{\text{*}}\right)\end{array}\right]\end{array},\left[\begin{array}{cc}K+{\sigma\:}_{n}^{2}I&\:{k}_{\text{*}}\\\:{k}_{\text{*}}^{T}&\:k\left({x}_{\text{*}},{x}_{\text{*}}\right)\end{array}\right]\right)$$

(7)

where $\:{k}_{\text{*}}={\left[k\left({x}_{1},{x}_{\text{*}}\right),\dots\:,k\left({x}_{n},{x}_{\text{*}}\right)\right]}^{\text{}}$ captures correlations between the training inputs and the test input. Conditioning on the observed training outputs $\:Y$ yields the posterior predictive distribution at $\:{x}_{\text{*}}$. The mean $\:{\mu\:}_{*}$ is shaped by the values of $\:Y$, while the variance $\:{\sigma\:}_{*}^{2}$ reflects the input geometry of $\:X$ and $\:{x}_{*}$ through the kernel. Thus:

$$\:p\left(f\left({x}_{\text{*}}\right)|X,Y,{x}_{\text{*}}\right)\sim\:\mathcal{N}\left({{\upmu\:}}_{\text{*}},{{\upsigma\:}}_{\text{*}}^{2}\right)$$

(8)

with:

$$\:{\mu\:}_{*}=m\left({x}_{*}\right)+{k}_{*}^{\top\:}{\left[K+{\sigma\:}_{n}^{2}I\right]}^{-1}\left(Y-m\left(X\right)\right)$$

(9)

$$\:{\sigma\:}_{*}^{2}=k\left({x}_{*},{x}_{*}\right)-{k}_{*}^{\top\:}{\left[K+{\sigma\:}_{n}^{2}I\right]}^{-1}{k}_{*}$$

(10)

The posterior mean $\:{\mu\:}_{\text{*}}\:$is the predicted tool wear at input point $\:{x}_{\text{*}}$ while the variance $\:{\sigma\:}_{\text{*}}^{2}$ quantifies uncertainty, enabling confidence intervals critical for industrial applications. Unlike the prior, which serves as a general assumption before observing data, the posterior incorporates training data and is used exclusively for predictions.

The training of GPR

The choice of kernel function significantly influences the predictive accuracy of GPR. Among various kernel options, the RBF kernel is the most widely used due to its smoothness and infinite differentiability, and because it encodes the assumption that samples close in feature space should exhibit similar wear values, making nearby points exert stronger influence on predictions. This locality property aligns well with tool wear behavior under similar cutting conditions. The RBF kernel is expressed as:

$$\:k\left({x}_{i}\text{},{x}_{j}\text{}\right)={\sigma\:}_{f}^{2}\text{}exp\left(-\frac{1}{2{l}^{2}}\Vert\:{x}_{i}\text{}-{x}_{j}\text{}{\Vert\:}^{2}\right)+{\sigma\:}_{n}^{2}\text{}{\delta\:}_{ij}$$

(11)

where $\:\Vert\:{x}_{i}-{x}_{j}{\Vert\:}^{2}={\sum\:}_{k=1}^{d}{\left({x}_{ik}-{x}_{jk}\right)}^{2}$ is the squared Euclidean distance between input vectors $\:{x}_{i}$ and $\:{x}_{j}$ in d-dimensional space. $\:{\sigma\:}_{f}^{2}$ is the signal variance, controlling the amplitude of the function, $\:l$ is the length-scale parameter controlling how quickly correlation decays with distance in the input space, and $\:{\sigma\:}_{n}^{2}$ adds the noise variance. The parameter set $\:\varTheta\:=\{l,{\sigma\:}_{f}^{2},{\sigma\:}_{n}^{2}\}$ defines the hyperparameters of the GP model, which are central to model training.

Learning the hyperparameters is performed by maximizing the marginal likelihood, conditioned on the training outputs $\:Y$ and inputs $\:X$. Under the GP prior $\:f\sim\:\mathcal{\:}\mathcal{N}\left(m\left(X\right),K\right)$, the marginal likelihood is obtained by integrating over the underlying function values $\:f$:

$$\:p(Y\mid\:X,\varTheta\:)=\int\:p(Y\mid\:f,X)p(f\mid\:X,\varTheta\:)df$$

(12)

Assuming Gaussian noise$\:,\:p\left(Y\right|f)\sim\:\mathcal{N}(f,{\sigma\:}_{n}^{2}\text{}I)$, the log-marginal likelihood becomes:

$$\:\log p\left( {Y|X,\theta \:} \right) = - \frac{1}{2}\left( {Y - m\left( X \right)} \right)^{{}} \left[ {K + \sigma \:_{n}^{2} I} \right]^{{ - 1}} \left( {Y - m\left( X \right)} \right) - \frac{1}{2}\log \left| {K + \sigma \:_{n}^{2} I} \right| - \frac{N}{2}\log 2\pi \:$$

(13)

The optimal hyperparameters $\:\varTheta\:{\prime\:}$ are obtained by maximizing this log-marginal likelihood:

$$\:{\varTheta\:}^{{\prime\:}}=argma{x}_{\varTheta\:}\text{}logp(Y\mid\:X,\varTheta\:)$$

(14)

This optimization can be performed using gradient-based methods or other numerical approaches. Once the hyperparameters are determined, the covariance matrix $\:K$ is fully specified, allowing predictions at new inputs $\:{x}_{\text{*}}$ with their corresponding posterior mean and variance to be computed using the standard GPR formulas.

Generic average flank wear model

The generic tool wear model (GTWM) proposed by Zhu et al.⁵² offers a comprehensive framework for predicting average flank wear in high-speed milling processes, addressing the limitations of traditional empirical and intelligent models by incorporating adjustable coefficients that enhance flexibility and generalizability across various machining conditions. This model divides the tool’s life cycle into three distinct wear zones: initial zone, steady-state region, and accelerated wear zone, separated by critical times derived from the wear rate and acceleration rate profiles. This three-zone structure (initial, steady-state, accelerated wear) aligns with the typical progression reported for cutting tools in milling operations. While other parametric wear models could potentially be employed, GTWM offers a balance of physical interpretability, generality, and mathematical tractability for our framework.

The construction of the model begins with empirical observations of tool flank wear $\:VB$ over milling time $\:t$, where the wear rate $\:VB{\prime\:}$ (first derivative) and acceleration rate $\:VB{\prime\:}{\prime\:}\:$(second derivative) are analyzed to identify transition points. The model represents overall wear as:

$$\:VB\left(t\right)={VB}_{E}\left(t\right)+{VB}_{L}\left(t\right)$$

(15)

where the early wear rate is defined as $\:{VB}_{E}^{{\prime\:}}={a}_{1}/\left(t+{b}_{1}\right)+{c}_{1}$ showing a quick drop in wear at the beginning, and the late acceleration rate as $\:{VB}_{L}^{{\prime\:}{\prime\:}}={a}_{2}t+{b}_{2}$, capturing the rapid wear increase toward the end. Integrating these rate functions under the boundary condition $\:VB\left(0\right)=0$ and reparameterizing by absorbing integration constants into the coefficients yields an early-stage wear function of the form $\:V{B}_{E}\left(t\right)=Aln\left(Bt+1\right)$ and a late-stage function dominated by a cubic term $\:V{B}_{L}\left(t\right)=C{t}^{3}$. Together, these components form the model:

$$\:VB\left(t\right)={VB}_{E}\left(t\right)+{VB}_{L}\left(t\right)=Aln\left(Bt+1\right)+C{t}^{3}$$

(16)

providing a smooth representation of tool flank wear throughout milling.

To improve adaptability, simple conditions based on the turning points of the wear curve are established and their positive roots are solved to determine the coefficients A, B, and C, resulting in a refined and re-parameterized formula:

$$\:VB\left(t\right)=Aln\left[{\left[\frac{{v}_{E}}{{v}_{L}+1}\right]}^{{\left({v}_{L}t\right)}^{3}}\cdot\:\left({v}_{E}t+1\right)\right]$$

(17)

Here, $\:A$ sets the overall wear level, while $\:{v}_{E}$ and $\:{v}_{L}\text{}$ are adjusted to reflect the wear rates across the tool’s early and late stages, offering more physical insight into the wear process than Eq. (16).

Methods

Framework overview

The overall workflow of the proposed GPR-PPS is presented in Fig. 1 and operates through a closed-loop predict-update-augment strategy. Multi-sensor signals (e.g., cutting force, vibration, acoustic emission) are continuously collected during milling and processed to extract features in time, frequency, and time-frequency domains. After feature selection, the GPR model is initialized using a labeled dataset from the early stage of tool life, $\:\left({x}_{t},{y}_{t}^{\text{real}}\right):t\in\:\left[1,{n}_{\text{init}}\right]$ where $\:{x}_{t}$ represents features from milling signals, $\:{y}_{t}$ denotes tool wear values and $\:{n}_{\text{init}}$ represents a dense initial training segment. A parametric mean function is fitted to capture the dominant feature-wear trend, and the GPR is trained on residuals to model nonlinear deviations and uncertainty. The trained model then generates wear predictions and confidence intervals for the subsequent interval until a new measurement is made.

When a new real wear measurement (update value) becomes available at the end of the prediction interval, the framework enters the pseudo-sample generation phase. The physics-based flank wear function (Eq. (17)) is fitted to previous GPR predictions, then aligned via affine transformation to pass exactly through both anchoring real measurements. This aligned curve is evaluated at every intermediate cut to produce pseudo-wear values $\:{y}_{t}^{\text{pseudo}}$ paired with their corresponding signal features $\:{x}_{t}$. Critically, the number of pseudo-samples generated equals the number of intermediate cuts in the prediction interval: each cut where signal features are available but real wear measurements are not obtained receives a pseudo-label, ensuring complete temporal coverage between sparse measurements.

The enriched training set combines all real measurements and interpolated pseudo-samples, assigning them equal weight for GPR model updates, conceptually expressed as:

$$\:\text{Training\:set}=\left\{\left({x}_{t},{y}_{t}^{\text{real}}\right)\::t\in\:\mathcal{M}\right\}\cup\:\left\{\left({x}_{t},{y}_{t}^{\text{pseudo}}\right)\::t\in\:\mathcal{G}\right\}$$

(18)

The set $\:\mathcal{M}=\left[1,{n}_{\text{init}}\right]\cup\:\mathcal{U}$ combines the initial dense segment with sparse update points $\:\mathcal{U}={n}_{1},{n}_{2},\dots\:$ obtained at periodic intervals, while $\:\mathcal{G}={\bigcup\:}_{i}({n}_{i},{n}_{i+1})$ contains all cuts strictly between consecutive elements of $\:\mathcal{M}$. This ensures pseudo-samples always interpolate between two real measurement anchors, preventing drift and maintaining physical consistency.

The GPR is retrained on this augmented dataset, updating mean function parameters and GP hyperparameters via marginal likelihood maximization. The updated model predicts the next interval, and the cycle repeats until tool end of life.

Data processing

The data processing pipeline follows a streamlined sequence. Multi-sensor signals are first collected from the milling process, then divided into time-domain segments (e.g., cutting passes or time intervals). From these segments, the fast Fourier transform (FFT) and wavelet packet decomposition are applied⁵³ to obtain the frequency-domain and time-frequency-domain representations (as illustrated in Fig. 2), from which analytical features are extracted.

Time-domain features capture waveform shape and statistical properties, including mean, RMS, standard deviation, skewness, kurtosis, and various factors characterizing peaks and signal shape. Frequency-domain features describe spectral characteristics such as spectral moments, energy distribution, and relative peak measures. Time-frequency features are obtained via wavelet packet decomposition, providing localized information on how signal energy evolves across both time and frequency. These multi-domain features provide a comprehensive representation of the tool condition, enabling subsequent analysis and modeling.

After feature extraction, a selection process is performed to filter out the most significant features for modeling. The choice of selection strategy depends on dataset characteristics and is guided by an analysis of feature-wear relationships. The selected features are then used to train the GPR model.

GPR initialization and iterative update

The GPR model is initially trained using a labeled dataset $\:({x}_{t},{y}_{t}{)}_{t=1}^{{n}_{\text{init}}}$, corresponding to the early stage of the tool’s life. This initial training establishes a prior model capable of generating wear predictions based on the extracted signal features. As milling progresses and new tool wear measurements become available, the GPR model is iteratively updated to enhance prediction accuracy and adapt to evolving wear dynamics.

At each training stage, whether initialization or update, a parametric mean function $\:m\left({x}_{t}\right)$ is fitted to the training data. The choice of the mean function’s form is determined from domain knowledge or empirical analysis of signal-wear relationships, and the parameters are optimized to capture the dominant trend in the available data. The GPR is then trained on the residuals between the measured wear values and the fitted prior mean:

$$\:{y}_{t}^{\text{res}}={y}_{t}-m\left({x}_{t}\right),\hspace{1em}t=1,\dots\:,n$$

(19)

This allows the GPR to learn nonlinear deviations from the physical trend. Together, the prior mean function and the optimized hyperparameters of the GP form the trained model, which is subsequently used to predict wear values for new input features. For a new input feature vector $\:{x}_{t}^{\text{*}}$, the predicted tool wear is given by:

$$\:{\widehat{y}}_{t}^{*}=m\left({x}_{t}^{*}\right)+{f}_{\text{GPR}}\left({x}_{t}^{*}\right),\hspace{1em}t=n+1,\:\dots\:,n+{\Delta\:}n$$

(20)

Here, $\:{\widehat{y}}_{t}^{*}$ denotes the predicted value, $\:{f}_{\text{GPR}}\left({x}_{t}^{\text{*}}\right)$ represents the GPR prediction of the residual and $\:{\Delta\:}n$ represents the interval between measurements. This process is repeated, enabling the GPR model to continuously adapt to the tool’s evolving condition throughout its operational life.

Pseudo-sample generation

Since GPR is initialized on data from the early stage of tool life, as milling progresses, both tool wear and the associated signal features progressively move beyond the range of the training set, leaving GPR with limited information for accurate extrapolation. While subsequent measurements can be incorporated to update the model, such data are inherently sparse because obtaining wear values requires stopping the machine, interrupting production and incurring additional cost. Relying on single-point updates provides only limited constraint for the evolving wear trajectory, which can lead to prediction drift, horizon instability, or error accumulation.

This challenge motivates the GPR-PPS approach, which addresses the extrapolation limitation by reusing past predictions to enrich the training data at each update step. A visual example of our proposed framework is shown in Fig. 3, where the dashed orange line represents GPR predictions across a milling interval, the gray points show actual wear values, and the black training points and blue/red anchor points mark real measurements available for training. Although the predicted wear trajectories may contain fluctuations or shifts, they still capture valuable information about the current tool dynamics. Therefore, rather than discarding those predictions and the corresponding signal features, they are repurposed to generate pseudo-sample points, enhancing the dataset near the tool’s current state.

The process begins by fitting the parameters of Eq. (17) to the predicted trajectory of the GPR method using least-squares minimization. This physics-based flank-wear function is chosen because, unlike other parametric functions (e.g., polynomials or splines), it captures flank wear evolution over the tool’s full life cycle within a single continuous, interpretable curve. This ensures the fitted curve (solid orange line) remains smooth, monotonic, and physically realistic while suppressing implausible oscillations or artifacts.

After obtaining the best-fit parameters, the resulting curve is realigned to anchor both the last observed training sample and the update point (blue and red points), which are the real measurements at either end of the prediction interval. Alignment is performed by computing the offsets between the measured values and the fitted physics curve (teal line) at these two endpoints, then applying linear correction (affine transformation) so that the final aligned curve passes exactly through both real points. While this realignment adjusts both position and slope to match endpoint measurements, which modifies intermediate wear rates, it prevents biased predictions from propagating as training labels across intervals. This aligned curve is then evaluated at each cut $\:t$ in the prediction interval to generate pseudo-wear values $\:{y}_{t}^{\text{pseudo}}$, which are paired with the corresponding signal feature vectors $\:{x}_{t}$ to form enriched training pairs (Eq. (18)). This dual use of past predictions and sparse new measurements ensures that updates remain physically grounded while enriching the data available for GPR retraining in the next cycle.

Through this iterative process of GPR prediction, parametric function fitting, and pseudo-sample updating, our method:

Mitigates GPR’s extrapolation weaknesses over sparse horizons.
Reduces reliance on frequent measurements, minimizing costly process interruptions.
Maintains forecasts consistent with the physical dynamics of wear progression.

This framework enables reliable tool wear prediction even under sparse or intermittent measurement conditions.

Results

Experiment setup

To evaluate the proposed GPR-PPS model, experiments were conducted using data from the PHM2010 high-speed milling dataset⁵⁴, which comprises trials using three-flute 6 mm ball-nose tungsten-carbide cutters. The machining parameters were fixed at a spindle speed of 10,400 rpm and a feed rate of 1555 mm/min, with the radial and axial depths of cut set to 0.125 mm and 0.2 mm, respectively. During each cut, seven sensor signals were acquired at 50 kHz, consisting of three force components $\:({F}_{x},{F}_{y},\:{F}_{z})$, three vibration responses $\:({V}_{x},{V}_{y},\:{V}_{z})$, and one acoustic-emission root mean square (AE-RMS) channel. Each signal consists of 200,000 samples per cut. After every cutting pass, flank wear on each flute was measured offline to provide reference values for tool wear progression. Three cutter records (C1, C4, C6) were selected for this study, each comprising 315 cuts. The experimental setup and monitoring process are shown in Fig. 4.

Feature extraction and selection

From the sensor signals of the dataset, time-domain features and frequency-domain features were extracted following the methodology of Lei et al.⁵⁵, where the mathematical formulations are documented. For the time-frequency domain, eight level-3 wavelet packet energy features were extracted in natural order from low to high frequency using the db5 wavelet. In total, 11 time-domain, 13 frequency-domain, and 8 time-frequency-domain features were selected, yielding 224 features per cutter (32 feature types × 7 signals). Details of the features are shown in Table 1.

Table 1 Detail of the features extracted in time, frequency, and time-frequency domain.

Full size table

Preliminary inspection via scatter plots and correlation analysis revealed that many features exhibited an approximately linear relationship with measured flank wear. This observation motivated the use of a linear mean function in GPR, reflecting prior knowledge that wear progression can be explained, to first order, by linear trends in the extracted signals. The GP component models deviations from this trend, capturing residual nonlinearities and uncertainties.

Feature selection was performed using the Pearson correlation coefficient (PCC) as the relevance metric. For each cutter (C1, C4, C6), the absolute PCC was computed for every feature-signal combination. Figure 5 shows the PCC scores between extracted features and the wear values of cutter C1 as the representative case. Analysis revealed that AE-RMS features exhibited systematically weak correlations (max |PCC| ≤ 0.56) compared to force and vibration features and were therefore excluded. The remaining 192 feature-channel combinations (32 feature types × 6 force/vibration channels) were averaged across channels to identify feature types that perform consistently across measurement channels, as shown in the bottom row of Fig. 5. Three features: $\:{F}_{1}$, $\:{F}_{6}$, and $\:{F}_{13}$ consistently ranked highest, achieving average |PCC| values above 0.95 for this cutter. Consistent patterns were observed for cutters C4 and C6. These three frequency-domain feature types were selected as final GPR inputs, yielding 18 individual features (3 types × 6 channels).

Implementation of the GPR-PPS model

GPR model setting

As described in the Feature selection section, since the extracted features exhibited approximately linear relationships with flank wear, a linear mean function was applied individually for each feature:

$$\:{{m}_{i}(x}_{i,t})={a}_{i}{x}_{i,t}+{b}_{i}$$

(21)

where $\:{x}_{i,t}$ is the $\:i$-th selected feature at cut $\:t$, and $\:{a}_{i}$ and $\:{b}_{i}$ are regression coefficients fitted separately for each of the $\:n$ selected features. The final mean function was computed as the average of these individual mean functions:

$$\:{m(x}_{t})=\frac{1}{n}{\sum\:}_{i=1}^{n}{{m}_{i}(x}_{i,t})$$

(22)

This method aggregates individual linear trends into a single mean, which serves as the baseline for the GPR model. The GP component, using an RBF kernel with white noise and hyperparameters optimized via maximum marginal likelihood, then models the residual nonlinearities and uncertainties, enabling flexible adaptation to local variations and smooth extrapolation.

Online monitoring on PHM2010 dataset

The framework in Sect. 3.1 is implemented as an online simulation on the PHM2010 milling dataset, where each tool undergoes 315 cuts with multi-sensor features available at every cut, but flank wear is measured only at sparse inspection points. In each prediction run, two parameters are fixed: the initial training length $\:{n}_{\text{init}}$ and the prediction interval $\:{\Delta\:}n$. The process proceeds as follows:.

An initial segment of $\:{n}_{\text{init}}$ cuts with their corresponding features and wear values is used to fit the ensemble mean and train the GPR on residuals..
The model ingests features from all cuts in the next interval of length $\:{\Delta\:}n$ to predict wear and uncertainty at each cut until the scheduled measurement..
At the interval endpoint, the revealed wear value anchors pseudo-sample generation: the physics-based function (Eq. (17)) is fitted to predictions, realigned through the two most recent measurements, and evaluated at intermediate cuts to produce pseudo-wear labels paired with their corresponding features..
The augmented training set (Eq. (18)) combining real measurements and pseudo-samples is used to retrain the GPR..
This predict-update-augment cycle repeats until the end of the 315 cuts, yielding a continuous sequence of wear predictions and uncertainty estimates..

To comprehensively evaluate performance and ensure fair comparison, this protocol is applied across three cutters (C1, C4, C6), seven $\:{n}_{\text{init}}$ values (30, 40, 50, 75, 100, 125, 150 cuts), and eight $\:{\Delta\:}n$ values (30, 40, 50, 60, 75, 80, 90, 100 cuts), yielding 168 evaluation cases. All comparison models presented later are also evaluated through these 168 cases for fair comparisons. Experiments were performed on a personal computer with an Intel Core i5-12400 F processor running at 2.50 GHz and 32GB of RAM.

Error metrics and uncertainty quantification

The predictive performance of the GPR-PPS model and other compared methods was evaluated using three error metrics and uncertainty quantification. Here, $\:{y}_{t}$ denotes the true wear value at cut $\:t$, $\:{\widehat{y}}_{t}$ denotes the predicted value and $\:{\widehat{\sigma\:}}_{t}$ the predictive standard deviation (from posterior variance $\:{\sigma\:}_{*}^{2}$in Eq. (10)):

Mean absolute error (MAE): represents the average magnitude of prediction errors, without considering their direction. A smaller MAE value indicates more accurate predictions. It is calculated as:

$$\:\text{M}\text{A}\text{E}=\frac{1}{n}{\sum\:}_{\text{t}=1}^{n}\left|{y}_{t}-{\widehat{y}}_{t}\right|$$

(23)

Mean absolute percentage error (MAPE): measures the average relative deviation of predictions from the true values, expressed as a percentage. A smaller MAPE means better predictive accuracy.

$$\:\text{M}\text{A}\text{P}\text{E}=\frac{1}{n}{\sum\:}_{t=1}^{n}\left|\frac{{y}_{t}-{\widehat{y}}_{t}}{{y}_{t}}\right|$$

(24)

Root mean square error (RMSE): reflects the standard deviation of the residuals, quantifying how much the predictions deviate from the true values. A smaller RMSE indicates that predictions are more tightly clustered around the ground truth. It is given by:

$$\:\text{R}\text{M}\text{S}\text{E}=\sqrt{\frac{1}{n}{\sum\:}_{t=1}^{n}{\left({y}_{t}-{\widehat{y}}_{t}\right)}^{2}}$$

(25)

Uncertainty Quantification (95% confidence interval (CI)): derived from GPR’s predictive posterior (Eq. (8) at test input $\:{x}_{\text{*}}$), providing probabilistic bounds on wear estimates to assess reliability under sparse data:

$$\:\left[{\widehat{y}}_{t}-1.96{\widehat{\sigma\:}}_{t},{\widehat{y}}_{t}+1.96{\widehat{\sigma\:}}_{t}\right]$$

(26)

Result and comparison

Results of the proposed model

The model is first evaluated on a representative configuration using an initial training set of 50 cuts and prediction intervals of 50 cuts. Table 2 summarizes the model prediction performance on the above configuration. The top rows outline the training and prediction ranges per iteration: the initial set (cut 1–50), followed by subsequent ranges (1–100, 1–150, etc.) incorporating revealed measurements and pseudo-samples. Subsequent rows report MAE, MAPE, and RMSE for cutters C1, C4, and C6.

Table 2 Prediction result using 50-point initial training range and 50-point prediction range.

Full size table

Figure 6 illustrates tool wear prediction, where the black line depicts the actual tool wear values, the red line represents the GPR prediction, the shaded region shows the 95% confidence intervals derived from GPR posterior variance (Eq. 26), and the blue points show real measured points that were used as the training set. After each update, the red curve aligns closely with the black measurements, reflecting adaptive corrections from new data and pseudo-samples, while the green line (absolute errors) consistently declines as new data and pseudo-samples are integrated.

For cutter C1, the model maintains stable performance with a moderate increase in errors during the later stages, illustrating how GPR-PPS effectively adapts to accelerated wear phases through pseudo-sample augmentation. Cutter C4 exhibits higher variability, reflecting irregular wear patterns that challenge extrapolation. Cutter C6 shows consistent performance, accurately capturing its smooth wear trend through residual modeling. The prediction errors fluctuate across intervals, with some later intervals exhibiting larger deviations, highlighting how tool wear progression affects model accuracy over time. Occasional non-monotonic behavior is observed due to noise or temporary variations in signal features, which can affect both the mean and kernel components of the GPR. Overall, the average metrics (MAE: 3.5259–5.4969 $\:{\upmu\:}\text{m}$, MAPE: 3.05%–4.95%, RMSE: 4.3474–7.8563 $\:{\upmu\:}\text{m}$) confirm GPR-PPS’s ability to handle consistent trends despite variability, driven by incremental updates and pseudo-samples that reduce confidence intervals in later stages.

Ablation study and model robustness

To isolate the contribution of pseudo-samples, we conduct an ablation study by comparing GPR-PPS with GPR-1P, a variant identical in architecture (signal-informed mean function, kernel structure, hyperparameter optimization) and initialization but trained only with real wear measurements at each update, excluding pseudo-sample augmentation. Figure 7 illustrates the training data composition for both models. To evaluate robustness under varying data availability, both models are evaluated across the 168 test cases describedearlier, with results averaged by initial training interval (across all cutters and prediction intervals, 24 cases each) and by prediction interval (across all cutters and training intervals, 21 cases each).

Figure 8a (averaged by training interval) shows the proposed GPR-PPS method (red bars) maintaining stable MAE values between 5.22 $\:{\upmu\:}\text{m}$ and 6.01 $\:{\upmu\:}\text{m}$ across initial training length. In contrast, GPR-1P (blue bars) starts at 8.19 $\:{\upmu\:}\text{m}$, and only reaches 6.15 $\:{\upmu\:}\text{m}$ at 150 points (47.6% of the tool’s life). This suggests that GPR-1P requires a larger initial dataset, up to 150 points to achieve performance comparable to GPR-PPS. In contrast, GPR-PPS maintains consistent accuracy even with only 30 points (9.5% of the tool’s life), highlighting its efficiency with sparse data. Figure 8b (averaged by prediction interval) reveals that GPR-1P exhibits deteriorating performance with wider prediction intervals, whereas GPR-PPS consistently achieves lower MAE values, ranging from 4.30 $\:{\upmu\:}\text{m}$ to 6.68 $\:{\upmu\:}\text{m}$. This steady increase suggests that pseudo-samples enable the model to effectively capture underlying trends, enhancing long-range prediction robustness.

The results affirm that incorporating pseudo-samples into GPR-PPS addresses inherent limitations of standard GPR. This enhancement provides robustness in scenarios with sparse training data and versatility for extended predictions, making GPR-PPS a superior approach for practical applications.

Pseudo-sample efficiency

To evaluate the reliability of the pseudo-sample points, this analysis focuses on their proximity to real wear measurements, their predictive agreement with a model trained solely on real data, and their dynamic stability over time. The evaluation is conducted using results from the 168 evaluation cases described earlier.

A boxplot analysis of per-case MAE, RMSE, and MAPE (Fig. 9) quantifies the differences between individual pseudo-sample points and their real counterparts across all experiments. The results show median errors of 1.216 $\:{\upmu\:}\text{m}$ (MAE), 1.677 $\:{\upmu\:}\text{m}$ (RMSE), and 1.085% (MAPE), with interquartile ranges tightly constrained between Q1 = 0.581–0.879 $\:{\upmu\:}\text{m}$ and Q3 = 1.925–2.736 $\:{\upmu\:}\text{m}$. This indicates strong alignment between pseudo-samples and real wear data, confirming that pseudo-samples serve as effective, physically consistent substitutes.

To further validate the predictive reliability, GPR-PPS is compared against GPR-R, a variant updated exclusively with real wear values at each iteration (example of the training set is shown in Fig. 10). Unlike the proposed method, which employs a hybrid training set of real and pseudo-data, GPR-R relies solely on real data, as depicted in Fig. 10. This comparison extends the earlier assessments by focusing on the prediction results.

Bland-Altman analysis (Fig. 11a) is a statistical tool to compare the agreement between two measurement methods. Here, it evaluates the case-wise agreement between GPR-PPS and GPR-R using the same cutter, initial training set, and prediction window, across 168 test cases. In this plot, the MAE for each model, calculated as the difference between its predictions and the true wear values, is compared, with each pair plotted as a point. The x-axis shows the average of the two MAEs, while the y-axis displays their difference. The central red line indicates a mean bias of −0.38 $\:{\upmu\:}\text{m}$, with limits of agreement ranging from − 1.62 $\:{\upmu\:}\text{m}$ to −0.87 $\:{\upmu\:}\text{m}$. The tight clustering of points around zero, without any noticeable trend, suggests that both models deviate from the true values in a similar manner. This similarity indicates that the Gaussian process regression in GPR-PPS perceives the hybrid training set (including pseudo-points) as not significantly different from a set with only real data, leading to consistent agreement in their predictions.

The distribution of point-wise prediction differences between GPR-PPS and GPR-R, evaluated at identical timestamps and configurations (Fig. 11b), examines whether their predictions align on a point-by-point basis. These distributions exhibit narrow, symmetric shapes centered at zero, with tails tapering within ± 5–10 $\:{\upmu\:}\text{m}$. The absence of skewness or multimodality confirms that the pointwise predictions of GPR-PPS mirror those of GPR-R, highlighting consistent time-series behavior.

The analyses from the boxplot (Fig. 9), Bland-Altman (Fig. 11a), and distribution (Fig. 11b) plots collectively demonstrate that pseudo-sample points in GPR-PPS are highly reliable. The boxplot shows their close alignment with real wear data, the Bland-Altman analysis confirms case-wise predictive similarity to models using real data, and the distribution plot validates pointwise consistency. Together, these findings establish that pseudo-points can be trusted as effective substitutes for real points in data-scarce scenarios, reinforcing the robustness of the GPR-PPS method.

Comparative performance

To assess GPR-PPS’s performance against standard machine learning approaches, the model is compared with a baseline GPR (standard GPR with neither signal-informed prior mean nor pseudo-samples), GPR-1P (as introduced in the earlier section), support vector regression (SVR), and neural networks (NN). For the baseline GPR, a composite kernel combining dot product, RBF, and noise terms is employed. The SVR model is optimized through a grid search over RBF and linear kernels, with hyperparameters C, $\epsilon$, and $\:{\upgamma\:}$ selected via time-series-aware cross-validation. For NN, a multi-layer perceptron with three hidden layers (50–50–25 neurons) is used, adopting ReLU activations, the Adam optimizer, and early stopping to enhance generalization. Prediction performance is evaluated using average results from the 168 evaluation cases (consistent with previous sections), spanning varying cutters, initial training intervals, and prediction intervals with an update point applied at the end of each prediction interval.

Table 3 Performance comparison with other methods.

Full size table

Table 3 presents prediction accuracy (MAE, RMSE, MAPE), uncertainty quantification (coverage probability and average width for 95% confidence intervals), and computational cost (training and inference times per session). In terms of accuracy, GPR-PPS outperforms all models, with pseudo-sample points enhancing extrapolation by augmenting the training data. The GPR-1P variant, which benefits from a signal-informed prior mean, improves upon the baseline GPR but falls short of GPR-PPS. In contrast, SVR and NN exhibit higher and more variable MAE values, reflecting their limitations in extrapolation without physical constraints. The trends in MAPE and RMSE align with MAE, indicating consistent error patterns across metrics. GPR-PPS demonstrates consistent performance advantage across diverse initial training intervals and prediction ranges, likely due to the synergy between pseudo-sample points, which address data scarcity, and the signal-informed linear mean, which provides a robust prior. Conversely, baseline GPR, SVR, and NN, optimized primarily for interpolation, exhibit reduced accuracy in out-of-sample predictions.

Beyond accuracy, we also evaluated the uncertainty quantification of GPR-based models through assessing coverage probability (percentage of true wear values falling within the 95% CI interval in Eq. (26)) and average interval width (the mean width of the 95% CI interval). Baseline GPR achieves 88.94% coverage but with excessively wide intervals (51.70 $\:{\upmu\:}\text{m}$). Introducing the signal-informed mean function (GPR-1P) substantially improves prediction accuracy but reduces coverage to 69.09%. GPR-PPS partially recovers calibration (79.41% coverage, 16.69 $\:{\upmu\:}\text{m}$ width), indicating that pseudo-sample augmentation not only improves point predictions but also helps maintain uncertainty quantification relative to the single-update approach. While coverage remains below the nominal 95% level across all signal-informed variants, GPR-PPS demonstrates the best balance between prediction accuracy and interval informativeness.

Regarding computational complexity, GPR-PPS continuously accumulates real and pseudo-samples, expanding covariance matrix storage from 900 to 96,100 parameters. Despite this growth, average training time is 3.00 s with 11.02 s maximum, comparable to baseline GPR’s 7.02 s maximum. This confirms computational feasibility under continuous pseudo-sample accumulation. GPR-1P trains faster (1.16s average) without pseudo-samples, while SVR and NN are faster still (0.12s and 0.045s) but sacrifice accuracy. Fast inference times across methods suggest potential for real-time applications. The results demonstrate that integrating pseudo-sample augmentation with signal-informed priors improves accuracy and uncertainty quantification while maintaining practical computational feasibility.

Conclusion

This study introduces a novel Gaussian process regression with physics-guided pseudo-samples, integrating a data-physical hybrid approach to predict tool wear values under data-limited scenarios.

1.
The model enhances forecasting by incorporating a signal-informed GPR to provide wear-related predictive guidance, paired with a physics-based pseudo-sample generation strategy that enriches the training dataset with synthetic data, thereby improving degradation forecast accuracy with restricted data.
2.
This approach also overcomes a key tool wear prediction challenge by utilizing as little as 9.5% of the tool’s life data and sparse update points to forecast the full tool’s life cycle, demonstrating robustness across varying data sparsity conditions.
3.
Pseudo-samples closely approximate real samples, with median MAE of 1.216 $\:{\upmu\:}\text{m}$, RMSE of 1.677 $\:{\upmu\:}\text{m}$, and MAPE of 1.085%, and their prediction accuracy and dynamics remain nearly identical to models trained solely on real data.

The effectiveness of the proposed model was verified through comparisons with other machine learning models, demonstrating superior predictive accuracy as evidenced by the reported performance metrics. In contrast to prior physics-informed GPR approaches that incorporate physical knowledge primarily as constraining priors within mean or kernel functions, GPR-PPS repurposes physics as a data-augmentation mechanism. By actively generating synthetic training points between sparse measurements, this strategy offers an alternative approach to the extrapolation challenges inherent in data-scarce forecasting regimes. While other methods required dense measurements or frequent updates, this data-augmentation approach reduces measurement dependency while maintaining computational efficiency.

Several limitations of the current study should be acknowledged and will inform future research. First, the pseudo-sample generation relies on a three-stage physical wear model (GTWM), which may not fully capture abrupt tool failures or highly irregular wear progression. Second, the validation used only the PHM2010 dataset with fixed cutting parameters, and feasibility under varying machining conditions remains unverified. Third, while GPR-PPS improved prediction accuracy, its uncertainty quantification indicates some over-confidence, likely due to the tight coupling between the physics prior and the GP likelihood. This suggests the need for adaptive prior weighting or conformal calibration in future implementations. Future work will investigate these limitations to optimize the proposed model’s architecture, as well as to generalize other operational conditions.

Data availability

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Zhang, X., Shi, B., Feng, B., Liu, L. & Gao, Z. A hybrid method for cutting tool RUL prediction based on CNN and multistage wiener process using small sample data. Measurement 213, 112739 (2023).
Google Scholar
Wiklund, H. Bayesian and regression approaches to on-line prediction of residual tool life. Qual. Reliab. Eng. Int. 14, 303–309 (1998).
Google Scholar
Hanachi, H., Yu, W., Kim, I. Y., Liu, J. & Mechefske, C. K. Hybrid data-driven physics-based model fusion framework for tool wear prediction. Int. J. Adv. Manuf. Technol. 101, 2861–2872 (2019).
Google Scholar
Yao, Z., Luo, M., Mei, J. & Zhang, D. Position dependent vibration evaluation in milling of thin-walled part based on single-point monitoring. Measurement 171, 108810 (2021).
Google Scholar
Li, X. et al. Chatter-free milling of aerospace thin-walled parts. J. Mater. Process. Technol. 341, 118903 (2025).
Google Scholar
Pashmforoush, F., Ebrahimi Araghizad, A. & Budak, E. Tool wear prediction in milling process using physics-informed machine learning and thermo-mechanical force model with monitoring applications. J. Manuf. Syst. 82, 1192–1212 (2025).
Google Scholar
Li, X., Liu, X., Yue, C., Liang, S. Y. & Wang, L. Systematic review on tool breakage monitoring techniques in machining operations. Int. J. Mach. Tools Manuf. 176, 103882 (2022).
Google Scholar
Krilek, J. et al. Assessment of the chipping process of Beech (Fagus sylvatica L.) wood: knives wear, chemical and microscopic analysis of wood. Wood Mater. Sci. Eng. 19, 473–484 (2024).
CAS Google Scholar
Ruitao Peng, Pang, H., Jiang, H. & Hu, Y. Study of tool wear monitoring using machine vision. Autom. Control Comput. Sci. 54, 259–270 (2020).
Google Scholar
Yuan, J., Liu, L., Yang, Z., Bo, J. & Zhang, Y. Tool wear condition monitoring by combining spindle motor current signal analysis and machined surface image processing. Int. J. Adv. Manuf. Technol. 116, 2697–2709 (2021).
Google Scholar
Betts, J., Sadrafshari, S., Mohammadi, A. & Shokrani, A. On machine 3D reconstruction of endmill tool wear. Procedia CIRP. 133, 340–345 (2025).
Google Scholar
Agebo, S. W., Zieliński, D. & Deja, M. Comparison of different optical measurement methods in the evaluation of the wear of SLS-fabricated tool used for free abrasive machining. Int. J. Adv. Manuf. Technol. 138, 5165–5182 (2025).
Google Scholar
Storchak, M., Zakiev, I., Zakiev, V. & Manokhin, A. Coatings strength evaluation of cutting inserts using advanced multi-pass scratch method. Measurement 191, 110745 (2022).
Google Scholar
Cheng, Y. et al. Tool wear intelligent monitoring techniques in cutting: a review. J. Mech. Sci. Technol. 37, 289–303 (2023).
Google Scholar
Wong, S. Y., Chuah, J. H. & Yap, H. J. Technical data-driven tool condition monitoring challenges for CNC milling: a review. Int. J. Adv. Manuf. Technol. 107, 4837–4857 (2020).
Google Scholar
Peng, D. & Li, H. Intelligent monitoring of milling tool wear based on milling force coefficients by prediction of instantaneous milling forces. Mech. Syst. Signal. Process. 208, 111033 (2024).
Google Scholar
Zhang, P. et al. Systematic review of cutting force measuring systems in machining: Principles, design, filtering techniques and applications. Int. J. Mach. Tools Manuf. 210, 104308 (2025).
Google Scholar
Zhou, C., Guo, K. & Sun, J. An integrated wireless vibration sensing tool holder for milling tool condition monitoring with singularity analysis. Measurement 174, 109038 (2021).
Google Scholar
Maliuk, A. S., Ahmad, Z., Kim, J. & Kim, J. M. A technique for flank wear monitoring in milling machines based on the novel tool wear indicator. Int. J. Comput. Integr. Manuf. 38, 1295–1311 (2025).
Google Scholar
Liu, M. K., Tseng, Y. H. & Tran, M. Q. Tool wear monitoring and prediction based on sound signal. Int. J. Adv. Manuf. Technol. 103, 3361–3373 (2019).
Google Scholar
Ahmad, Z., Ullah, S., Maliuk, A. S. & Kim, J. M. Milling machine fault detection and identification based on a novel vitality index and temporal-residual network. Appl. Acoust. 239, 110861 (2025).
Google Scholar
Souflas, T. et al. Tool condition monitoring in milling using cutting force and temperature data from an instrumented milling head. Int. J. Adv. Manuf. Technol. 139, 6049–6072 (2025).
Google Scholar
Yang, S., Zhu, G., Xu, J. & Fu, Y. Tool wear prediction of machining hydrogenated titanium alloy Ti6Al4V with uncoated carbide tools. Int. J. Adv. Manuf. Technol. 68, 673–682 (2013).
Google Scholar
Laakso, S. V. A. & Johansson, D. There is logic in logit – including wear rate in colding’s tool wear model. Procedia Manuf. 38, 1066–1073 (2019).
Google Scholar
Zhang, Y., Zhu, K., Duan, X. & Li, S. Tool wear Estimation and life prognostics in milling: model extension and generalization. Mech. Syst. Signal. Process. 155, 107617 (2021).
Google Scholar
John, R., Lin, R., Jayaraman, K. & Bhattacharyya, D. Modified taylor’s equation including the effects of fiber characteristics on tool wear when machining natural fiber composites. Wear 468–469, 203606 (2021).
Google Scholar
Ding, H., Shen, N. & Shin, Y. C. Thermal and mechanical modeling analysis of laser-assisted micro-milling of difficult-to-machine alloys. J. Mater. Process. Technol. 212, 601–613 (2012).
CAS Google Scholar
Das, R., Joshi, S. S. & Barshilia, H. C. Analytical model of progression of flank wear land width in drilling. J. Tribol. 141, 011601 (2019).
Google Scholar
Yen, Y. C., Söhner, J., Lilly, B. & Altan, T. Estimation of tool wear in orthogonal cutting using the finite element analysis. J. Mater. Process. Technol. 146, 82–91 (2004).
Google Scholar
Gomes, M. C., Brito, L. C. & Da Silva, B. Viana Duarte, M. A. Tool wear monitoring in micromilling using support vector machine with vibration and sound sensors. Precis Eng. 67, 137–151 (2021).
Google Scholar
Cheng, Y. et al. A new method based on a WOA-optimized support vector machine to predict the tool wear. Int. J. Adv. Manuf. Technol. 121, 6439–6452 (2022).
Google Scholar
Liao, X., Zhou, G., Zhang, Z., Lu, J. & Ma, J. Tool wear state recognition based on GWO–SVM with feature selection of genetic algorithm. Int. J. Adv. Manuf. Technol. 104, 1051–1063 (2019).
Google Scholar
Xu, X., Wang, J., Zhong, B., Ming, W. & Chen, M. Deep learning-based tool wear prediction and its application for machining process using multi-scale feature fusion and channel attention mechanism. Measurement 177, 109254 (2021).
Google Scholar
Bazi, R., Benkedjouh, T., Habbouche, H., Rechak, S. & Zerhouni, N. A hybrid CNN-BiLSTM approach-based variational mode decomposition for tool wear monitoring. Int. J. Adv. Manuf. Technol. 119, 3803–3817 (2022).
Google Scholar
Marani, M., Zeinali, M., Songmene, V. & Mechefske, C. K. Tool wear prediction in high-speed turning of a steel alloy using long short-term memory modelling. Measurement 177, 109329 (2021).
Google Scholar
Chan, Y. W. et al. Tool wear prediction using convolutional bidirectional LSTM networks. J. Supercomput. 78, 810–832 (2022).
Google Scholar
Li, W., Fu, H., Han, Z., Zhang, X. & Jin, H. Intelligent tool wear prediction based on informer encoder and stacked bidirectional gated recurrent unit. Robot Comput. -Integr Manuf. 77, 102368 (2022).
Google Scholar
Ma, J. et al. Tool wear mechanism and prediction in milling TC18 titanium alloy using deep learning. Measurement 173, 108554 (2021).
Google Scholar
Cheng, Y. et al. Research on tool wear prediction based on the random forest optimized by NGO algorithm. Mach. Sci. Technol. 28, 523–546 (2024).
Google Scholar
Wu, D., Jennings, C., Terpenny, J., Gao, R. X. & Kumara, S. A comparative study on machine learning algorithms for smart manufacturing: tool wear prediction using random forests. J. Manuf. Sci. Eng. 139, 071018 (2017).
Google Scholar
Yao, Z., Wu, M., Qian, J. & Reynaerts, D. Intelligent discharge state detection in micro-EDM process with cost-effective radio frequency (RF) radiation: integrating machine learning and interpretable AI. Expert Syst. Appl. 291, 128607 (2025).
Google Scholar
Sotubadi, S. V., Pallissery, S. S. & Nguyen, V. Multi-Modal explainable artificial intelligence for neural network-based tool wear detection in machining. Eng. Appl. Artif. Intell. 144, 110141 (2025).
Google Scholar
Li, S., Li, J. & Zhu, K. Application of physics-guided deep learning model in tool wear monitoring of high-speed milling. Mech. Syst. Signal. Process. 224, 111949 (2025).
Google Scholar
Zhu, K., Guo, H., Li, S. & Lin, X. Physics-Informed deep learning for tool wear monitoring. IEEE Trans. Ind. Inf. 20, 524–533 (2024).
Google Scholar
Hua, J., Li, Y., Liu, C., Wan, P. & Liu, X. Physics-Informed neural networks with weighted losses by uncertainty evaluation for accurate and stable prediction of manufacturing systems. IEEE Trans. Neural Netw. Learn. Syst. 35, 11064–11076 (2024).
PubMed Google Scholar
Han, Z. et al. Digital twin with dynamic mechanistic simulation core for milling tool wear prediction. J. Manuf. Process. 149, 1138–1150 (2025).
Google Scholar
Kong, D., Chen, Y. & Li, N. Gaussian process regression for tool wear prediction. Mech. Syst. Signal. Process. 104, 556–574 (2018).
ADS Google Scholar
Zhu, K., Huang, C., Li, S. & Lin, X. Physics-informed Gaussian process for tool wear prediction. ISA Trans. 143, 548–556 (2023).
PubMed Google Scholar
Sun, M. et al. Tool wear monitoring based on physics-informed Gaussian process regression. J. Manuf. Syst. 77, 40–61 (2024).
Google Scholar
Bauer, M., van der Wilk, M. & Rasmussen, C. E. Understanding probabilistic sparse gaussian process approximations. Preprint at (2017). https://doi.org/10.48550/arXiv.1606.04820
Eriksson, D., Dong, K., Lee, E., Bindel, D. & Wilson, A. G. Scaling Gaussian process regression with derivatives. in advances in neural information processing systems vol. 31Curran Associates, Inc., (2018).
Zhu, K. & Zhang, Y. A generic tool wear model and its application to force modeling and wear monitoring in high speed milling. Mech. Syst. Signal. Process. 115, 147–161 (2019).
ADS Google Scholar
Yao, Z., Zhang, P. & Luo, M. Extreme learning machine oriented surface roughness prediction at continuous cutting positions based on monitored acceleration. Mech. Syst. Signal. Process. 219, 111633 (2024).
Google Scholar
2010 PHM Society Conference Data Challenge. PHM Society https://phmsociety.org/phm_competition/2010-phm-society-conference-data-challenge/
Lei, Y., He, Z. & Zi, Y. A new approach to intelligent fault diagnosis of rotating machinery. Expert Syst. Appl. 35, 1593–1600 (2008).
Google Scholar

Download references

Funding

This work was supported by the Technology Innovation Program (20023566, Development and Demonstration of Industrial IoT and AI-Based Process Facility Intelligence Support System in Small and Medium Manufacturing Sites) funded by the Ministry of Trade, Industry, and Energy (Korea). This work was also supported by the Korea Institute of Energy Technology Evaluation and Planning(KETEP) grant funded by the Korea government (MOTIE)(‘RS-2024-00449107’, ‘Development of Flexible Pipe and Connector for Hydrogen gas’).

Author information

Authors and Affiliations

Department of Electrical Electronic and Computer Engineering, University of Ulsan, 44610, Ulsan, South Korea
Hai-Phong Nguyen, Duc-Thuan Nguyen & Jong-Myon Kim
Prognosis and Diagnostics Technologies Co., Ltd, 44610, Ulsan, South Korea
Jong-Myon Kim

Authors

Hai-Phong Nguyen
View author publications
Search author on:PubMed Google Scholar
Duc-Thuan Nguyen
View author publications
Search author on:PubMed Google Scholar
Jong-Myon Kim
View author publications
Search author on:PubMed Google Scholar

Contributions

H.-P.N.: conceptualization, methodology, data curation, software, investigation, visualization, writing–original draft; D.-T.N.: methodology, data curation, validation, writing–review & editing; J.-M.K.: conceptualization, supervision, funding acquisition, writing–review & editing.

Corresponding author

Correspondence to Jong-Myon Kim.

Ethics declarations

Competing interests

The authors declare no competing interests.

We have added 8 new references from 2023 to 2025 to address this concern. These recent references cover:

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Nguyen, HP., Nguyen, DT. & Kim, JM. Gaussian process regression with physics-guided pseudo-sample augmentation for wear prediction under sparse measurements in milling. Sci Rep 16, 7231 (2026). https://doi.org/10.1038/s41598-026-38067-9

Download citation

Received: 22 November 2025
Accepted: 28 January 2026
Published: 04 February 2026
Version of record: 20 February 2026
DOI: https://doi.org/10.1038/s41598-026-38067-9

Subjects

Abstract

Introduction

Theoretical background

Gaussian process regression

The training of GPR

Generic average flank wear model

Methods

Framework overview

Data processing

GPR initialization and iterative update

Pseudo-sample generation

Results

Experiment setup

Feature extraction and selection

Implementation of the GPR-PPS model

GPR model setting

Online monitoring on PHM2010 dataset

Error metrics and uncertainty quantification

Result and comparison

Results of the proposed model

Ablation study and model robustness

Pseudo-sample efficiency

Comparative performance

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links