Mondragon Unibertsitatea face-milling dataset for smart tool condition monitoring

Peralta Abadia, Jose Joaquin; Cuesta Zabaljauregui, Mikel; Larrinaga Barrenechea, Felix

doi:10.1038/s41597-025-05168-5

Download PDF

Data Descriptor
Open access
Published: 23 May 2025

Mondragon Unibertsitatea face-milling dataset for smart tool condition monitoring

Scientific Data volume 12, Article number: 855 (2025) Cite this article

4007 Accesses
2 Citations
Metrics details

Subjects

Abstract

This article presents a dataset of face-milling experiments for smart tool condition monitoring (TCM) performed under varying cutting conditions in the High-Perfomance Machining laboratory of Mondragon Unibertsitatea (MU). The experiments collected raw internal signals from the machine. Cutting forces, vibration signals, and acoustic emission signals were collected with external sensors. Tool wear was measured before each experiment and annotated accordingly, providing tool wear progression throughout the dataset. The dataset was technically validated using Python scripts to ensure the quality and reproducibility of the dataset. The resulting MU-TCM face-milling dataset offers a reproducible research design of experiments and associated data to carry out and advance smart TCM of milling processes. The dataset supports applications such as training machine learning and deep learning for TCM, enables sensor fusion research with diverse signal combinations, and facilitates the development of TCM solutions using only internal CNC signals for industrial environments. By supporting these applications, the dataset is expected to help reduce the gap between research and industry in smart TCM applications.

A novel simultaneous monitoring method for surface roughness and tool wear in milling process

Article Open access 08 March 2025

Study on the combination of virtual machine tools and wearable vibration devices for operators experiencing cutting forces in the milling process

Article Open access 17 April 2024

A new open dataset from a milling process – data for classification and estimation of tool life

Article Open access 17 April 2025

Background & Summary

Machining processes, such as milling, form the backbone of modern manufacturing, particularly in the production of machine parts¹. Maintaining the quality and performance of machining processes requires effective tool condition monitoring (TCM) systems, which track tool wear and performance in computer numerical control (CNC) machines. The introduction of such systems has led to an improvement in product quality, process reliability, and production efficiency.

In TCM, signals can be acquired from either external sensors, attached to the workpiece or spindle², or internal sensors, placed on the table and spindles of CNC machines³. Although external sensors have proven to be effective in research environments, their complexity, cost, and difficulties in integration make them impractical for industrial applications, contributing to a significant gap between academic research and industry⁴. On the other hand, internal signals are easier to access, but are seldom employed in TCM because of their low resolution and restricted access, as commercial CNC systems often lack APIs or data extraction tools to retrieve the signals.

To overcome the limitations of both external and internal sensors, sensor fusion–the process of combining multiple sensor signals–has become a critical factor in improving the accuracy and reliability of TCM. Sensor fusion allows various aspects of machining processes to be monitored simultaneously, enhancing the quality of the data and improving overall process efficiency. This is particularly crucial in industrial settings, where internal signals such as motor torque, current, and power are more practical for implementation. By leveraging sensor fusion with internal signals, TCM systems can bridge the gap between academic research and industrial application, providing more reliable and efficient monitoring solutions for real-world manufacturing environments⁴.

A number of recent studies have proposed deep learning-based TCM systems (DL), which take advantage of the generalisation potential of DL knowledge^5,6. Nevertheless, data collection for training DL models to date has been based on limited experiments or specific process conditions, since data collection and labelling are costly and time-consuming in machining processes. Moreover, training DL models for specific or limited process conditions widens the gap between research and industry. This is because DL models can only generalise knowledge from the data they are fed and are not easily transferable to other process conditions.

To reduce the burden of data collection during milling processes, a number of open-access datasets performing varied milling operations have been published for several monitoring scopes. Milling processes can differ significantly depending on the type of operation, such as face milling and side milling. These datasets were created by collecting data in both laboratory and industrial environments. The datasets collected in laboratory experiments focus on TCM and have limited ranges of cutting conditions and/or signal quantity^7,8,9,10. The datasets collected in industrial environments, on the other hand, consider process health and workpiece quality monitoring^11,12. These latter cover varying cutting conditions but collect only up to two signal types. Such a limited range of cutting conditions and signal types hinders the transfer of knowledge extracted from the datasets to industrial shop floors. In addition, some of these datasets were published more than one decade ago using what is now outdated equipment. Table 1 presents a comparison of the MU-TCM face-milling dataset with the milling datasets that focus on TCM.

Table 1 A comparison of the MU-TCM face-milling dataset with existing benchmark milling datasets.

Full size table

This paper, therefore, presents the Mondragon Unibertsitatea-TCM (MU-TCM) milling dataset collected in a laboratory environment in the High-Perfomance Machining (HPM) laboratory of Mondragon Unibertsitatea (MU). Milling experiments were performed in a LAGUN L1000 CNC vertical machining centre. The dataset includes internal and external signals encompassing several cutting conditions and materials, and provides a balance between varying cutting conditions and signal variety. In addition, several high-frequency external signals and an extensive number of internal CNC signals were collected. The main contributions of the MU-TCM milling dataset to future TCM research are:

Modern reproducible milling environment: The face milling experiments were performed in a modern CNC vertical machining centre using state-of-the-art milling cutters and workpiece materials. Thus, the knowledge that can be extracted from the dataset reflects modern industrial milling environments and can be reproduced.
Sensor fusion applicability: The dataset includes an extensive number of external and internal CNC signals, which can assist in future identification of combinations of signals to exploit sensor fusion potential in TCM.
Transfer and continual learning potential: The dataset is an ideal resource for developing and testing transfer learning and continual learning techniques for DL models. DL models can leverage the dataset to improve knowledge generalisation across a wide range of cutting conditions and operations, as well as continuously adapt to new data in milling environments.

The dataset approach is suitable for multiple applications, such as: training state-of-the-art machine learning and DL-based TCM systems for milling without the need for laboratory experiments, exploring the potential of sensor fusion with various combinations of signals, and training and testing DL-based TCM systems for industrial face milling applications with only internal CNC signals. Targeting these research efforts is expected to help reduce the gap between research and industry in TCM for machining. In addition, the dataset opens up new avenues for research in adaptive and resilient smart TCM systems.

Methods

The MU-TCM dataset is a face-milling dataset for tool condition monitoring, specifically measuring tool wear. To create this dataset, a four-step methodology was defined: design of experiments (DOE), experimental setup, data collection, and data synchronisation.

Design of experiments

The DOE for the MU-TCM dataset was based on industrial applicability and the recommended manufacturer settings of tools and workpiece materials. The DOE consisted of 8 combinations of cutting conditions, as presented in Table 2. Two materials were employed, cast iron (dry, without lubrication) and stainless steel (minimum quantity lubrication, MQL). The experiments were conducted at four cutting speeds v_C (100 and 200 m/mm for cast iron and 50 and 100 m/mm for stainless steel), and four feed rates f (0.1 and 0.2 mm/rev for cast iron and 0.05 and 0.1 mm/rev for stainless steel). The axial depth of cut a_p was 1.5 mm and the radial depth of cut a_e was 58.4 mm for all experiments.

Table 2 Design of experiments for the MU-TCM dataset.

Full size table

The procedure was designed to test four wear levels (0.0, 0.1, 0.2, and 0.3 mm) for each combination of cutting conditions, resulting in a total of 32 experiments. These wear levels were selected to simulate progressive tool degradation under realistic machining conditions. A three-dimensional representation of the experimental setup is illustrated in Fig. 1, with feed rate (f) on the X axis, cutting speed (v_C) on the Y axis, and tool wear (VB) on the Z axis. Both materials were tested under an identical combination of cutting conditions, with a v_C of 100 m/min and an f of 0.1 mm/rev. This combination of cutting conditions was specifically designed to explore the potential for transfer learning and continual learning, by testing whether knowledge gained from one material can be applied to another.

Experimental setup

Figure 2 illustrates the experimental setup. The experiments were performed in a LAGUN L1000 vertical machining centre with a Fagor CNC 8065 in the HPM laboratory of MU. An 80mm face mill with one Ayma SPKR M55 1203EDSR AFT720 cutting insert was used for material removal. Blocks of 200 × 292.5 × 50 mm (width x depth x height) were used for each material, as depicted in Fig. 3.

Signals

As shown in Fig. 2, three sensors were fixed to the table of the machining centre to acquire external signals:

Cutting forces: A Kistler 9139 AA dynamometer was mounted to the table fixture and connected to a Kistler 5223 amplifier. Cutting forces were measured along the X, Y, and Z axes.
Vibration: A PCB J356A45 triaxial accelerometer was coupled to the side of the dynamometer. Vibration signals were measured along the X, Y, and Z axes without pre-processing.
Acoustic emission (AE): A Kistler 8152C AE sensor was coupled to the side of the dynamometer. The AE sensor was connected to a Kistler 5125 piezotron coupler, which amplified the signal and filtered it with high-pass (1 MHz) and low-pass (50 kHz) filters.

The dynamometer, AE sensor, and accelerometer outputs were then connected to an NI cDAQ 9178 data acquisition (DAQ) unit, via NI 9239, NI 9223, and NI 9234 analog-to-digital converter acquisition cards, respectively. The DAQ unit was connected to an external workstation via Ethernet, and the data was collected with a MatLab script. In addition, a root-mean-squared (RMS) function with a moving average was applied to the AE signal to reduce noise and emphasise significant signal trends, such as tool wear progression and cutting anomalies.

Internal CNC signals were acquired using the OptiTwin system¹³, in parallel with the external sensors. This system enabled the real-time collection of internal signals. The OptiTwin system consists of two key components: the experiment designer, which communicates with the Fagor CNC via a custom API, ensuring real-time data acquisition, and the experiment manager, which stores the internal CNC signals for further analysis. Table 3 summarises the details of the external and internal signals.

Table 3 Details of the internal (I) and external (E) signals. (*In contrast to the programmed value in the CNC.).

Full size table

Tool wear measurement

Tool wear takes various forms, such as cutting edge rounding, crater wear on the rake face, and flank wear due to friction. Flank wear VB was measured in this study, as it is a generally accepted parameter to evaluate tool wear⁷. VB is calculated as the distance from the cutting edge on the flank face of the tool to the end of the abrasive wear, as shown in Fig. 4. In these experiments, a Leica Z16 APO macroscope was used to measure the VB of the cutting inserts before each trial.

Cutting process

The cutting process consisted of horizontal face milling cuts along the X axis of the workpiece. Entrance cuts with a length of half the diameter of the face mill (40 mm) were made to prepare the workpiece and maintain consistent measurements throughout the experiments. This ensured that the tool would be in full contact with the material from the beginning to the end of each experiment. The cutting process is depicted in Fig. 5, with both a top-down view (Fig. 5a) and a 3D view (Fig. 5b). Axes are included to show the positioning of the material and the direction of the cutting speed (v_C). A zoom-in of the VB has been added to Fig. 5b to demonstrate the location of flank wear during the cutting process.

The defined radial depth of cut (a_e) of 58.5 mm allowed for five complete passes through the workpiece with an axial depth of cut (a_p) of 1.5 mm. This, together with the positioning of the face mill in the workpiece after the entrance cut, is illustrated in Fig. 5a. Figure 6 shows the cutting action with photographs of the actual cutting process for both stainless steel (Fig. 6a) and cast iron (Fig. 6b).

The face milling experiments were conducted in accordance with ISO-8688-1/1989 recommended tool life criteria: until tool wear reaches 0.3 mm or until fatal tool failure¹⁴. Table 4 details the number of experiments performed for each combination of cutting conditions and target VB, together with the total cutting time. Complete tool life experiments were not feasible due to practical constraints (time and cost). Therefore, to capture the behaviour of the tool across various wear stages efficiently, additional roughing cuts were performed to pre-wear the tool to approximate the next target VB value. This allowed for efficient data collection at various wear stages, although slight deviations existed between planned and actual values.

Table 4 Experiment count by cutting conditions and VB.

Full size table

Nine cutting inserts were used for the experiments. Table 5 summarises the usage details by cutting conditions, including the number of experiments per insert and edge and the minimum, mean, and maximum VB values recorded. The same tool and edge were reused for various conditions and, in some cases, materials. Reusing the same tool across a range of conditions and materials can offer several advantages. First, it isolates the impact of cutting conditions and material properties on wear progression, eliminating tool variation as a factor. Second, utilising the same tool for various conditions can provide insight into how wear patterns and rates change with different parameters, and thus offer a deeper understanding of tool-material interactions and wear mechanisms. This could be beneficial for transfer and continual learning approaches in DL-based TCM for industrial applications, as it allows DL models to leverage knowledge gained from one set of conditions to more effectively adapt to new situations.

Table 5 Experiment count by cutting conditions.

Full size table

Data acquisition

The data acquisition process and the interaction between components are illustrated in Fig. 7. To capture external signals, a MATLAB script was executed in the external workstation that interfaced with the DAQ unit. This script configured the acquisition parameters (sensor sensitivity, range, sampling frequency, and acquisition time) of the DAQ unit. The script then waited for a trigger from the CNC, which was an analogous output configured to indicate when the process had started. Once the triggering signals were received, the data acquisition started until the acquisition time elapsed and the face milling stopped.

Running directly on the CNC, the OptiTwin experiment designer acquired the internal CNC signals using a custom API within the CNC software. This ensured seamless integration with the CNC system and its inherent sampling rate of 250 Hz. In addition, experiment definition data, i.e., process data, cutting conditions, and tool data, were entered in the OptiTwin experiment designer. Specifically, the following data were recorded:

(a) Process data: Experiment identifier, repetition number, process type (milling, drilling, and turning), machine, workpiece material, lubrication type.
(b) Cutting conditions: Cutting speed (v_C), feed rate (f), axial depth of cut (a_p), and radial depth of cut (a_e).
(c) Tool data: Tool identifier, diameter, material, coating, manufacturer, part reference number, and initial VB.

The sampling rates for signal acquisition were chosen based on the capacity of the setup, with the aim of acquiring the highest quantity of data. Internal CNC signals were sampled at 250 Hz, which was limited by the sampling capabilities of the CNC system. However, for external sensors that measure vibration, force, and AE signals, higher sampling rates were employed to capture more detailed information about tool-material interaction. Vibration and force signals were acquired at 50 kHz, while AE signals were sampled at a higher rate of 1 MHz, capturing high-frequency information about cutting dynamics, tool wear progression, and potential anomalies. The sampling rates were maintained consistent across all experiments to ensure data comparability and facilitate further analysis.

Synchronisation issues arose between the CNC machine and the external workstation, due to technical limitations in communication between the CNC machine and the external workstation, resulting in unsynchronised internal and external signals. This is further detailed in the “Data Synchronisation” subsection. Due to this issue, four experiments did not capture external signals during the first repetition, as the MATLAB script was not manually initiated. Specifically, for cutting insert 3 with edge 1, at cutting conditions of v_C 50 m/min and f 0.05 mm/rev with an approximate VB of 0.2 mm, and for insert 6 with edge 1, at v_C 200 m/min and f 0.1 mm/rev with an approximate VB of 0.3 mm, a third repetition was conducted to ensure that at least two complete repetitions with external signals were available. In the case of insert 9 with edge 2, for v_C 100 m/min and feed rates f 0.05 and 0.1 mm/rev, where the VB was close to 0.0 mm, an additional repetition was not performed to avoid altering the VB condition.

Data annotation

Tool VB was measured with the macroscope at the beginning of each experiment, to 3 decimal places (Fig. 8). Images were captured at 2x zoom level. Two images were captured per measurement, one with annotation and the other without. The unannotated images offer researchers the opportunity to implement varying image processing and annotation techniques, free from pre-existing markings. The images were then stored alongside the acquired sensor data in the experiment manager. The value was also recorded in the OptiTwin experiment designer subsystem in the CNC, linking it to the corresponding experiment.

Data storage

The internal signals, acquired by the OptiTwin experiment designer on the CNC machine, were transmitted via TCP sockets to the centralised OptiTwin experiment manager. There, the data were stored in a custom comma-separated-value (CSV) file format, which includes a header that records the VB, process data, and cutting conditions, all collected by the experiment designer.

The external signals were stored during acquisition directly on the external workstation using the MATLAB script. Upon completion, the script stored the acquired signals and their corresponding timestamps in MATLAB-formatted data files (.mat extension), facilitating efficient data access and analysis. The files were subsequently uploaded to the OptiTwin experiment manager to ensure centralised storage and seamless integration of both internal and external data within the OptiTwin framework for comprehensive analysis.

Data synchronisation

Internal sensor data were synchronised with the external sensor data. This synchronisation ensured comprehensive monitoring of the machining process by correlating internal machine states with external measurements.

Automatic synchronisation was not possible due to technical limitations in communication between the CNC machine and the external workstation. Additionally, delays were observed between the cutting force and vibration AE signals, probably caused by the preprocessing through the piezotron coupler. As a result, the data were synchronised manually to ensure accurate alignment between the signals, as described in the “Technical Validation” section.

First, the internal signals and the experiment definition data were merged into the external signals files, maintaining MATLAB formatting for consistency. Then, synchronisation was carried out by analysing both sets of signals to identify initial and final peaks, which corresponded to the entry and exit of the tool in the workpiece. To ensure alignment, the first peak was skipped to synchronise when the tool had fully entered the workpiece, as the initial peak is small until full entry. Since the signals reflect different aspects of the process, such as spindle speed, cutting forces, and vibration, slight variations in peak location were expected as a result of tool-material interaction dynamics.

The internal SREAL signal (actual spindle speed) was used as the reference, as it showed distinct markers when the spindle speed changed at the entry and exit points of the tool. For external signals, the cutting force on the Z axis (Fz) was chosen as reference for synchronisation, despite the fact that the cuts were performed horizontally along the X axis. This was because material resistance causes the tool to exert a downward force on the Z axis as it engages the material (enters the cut). Similarly, as the tool exits the material, there is a release of resistance, resulting in a decrease in the force along the Z axis. These distinct changes in the Z axis force provided clear and sharp peaks, making it ideal for synchronisation. Moreover, the AE_RMS signal was aligned with Fz, as both signals accurately captured the entry and exit points. This approach effectively synchronised the internal and external datasets, ensuring accurate analysis of the machining process.

Figure 9 illustrates an example of the signal synchronisation process. On the left of the figure, the unsynchronised signals are shown. From top to bottom, the first two signals correspond to the internal signals: SREAL and TV50 (spindle motor power feedback), while the following three are the external signals: Fz (cutting force on the Z axis), Az (vibration in the Z axis), and AE_RMS (RMS acoustic emission). In this unsynchronised view, it is evident that the external and internal signals are misaligned, and the AE_RMS signal displays a noticeable delay compared to the other external signals. On the right of the figure, the signals are presented after synchronisation. It can be observed that both internal and external signals are now properly aligned, ensuring accurate correlation between the measurements.

Data Records

The MU-TCM face-milling dataset is available at the digital repository (eBiltegia) of MU¹⁵, with this section being the primary source of information on the availability and content of the data being described. Due to the large size of the dataset, a smaller subset has been included in the digital repository to allow users to evaluate the data before committing to downloading the full dataset. This subset includes the data for experiments 13 to 20, covering both materials under identical cutting conditions. The dataset and the subset are both organised into three main folders: (i) unsynchronised signals, (ii) synchronised signals, and (iii) VB images.

The unsynchronised and synchronised signals folders are composed of files with .mat extension, which include both external and internal signals, as well as the experiment definition data. The VB images folder is organised into subfolders, with each folder corresponding to a specific cutting insert and edge. These subfolders contain images of the measured VB for each experiment. In addition to the three folders, two CSV files with .csv extension are provided: (i) signal synchronisation details and (ii) cutting conditions and extracted features in time, frequency, and time-frequency domains. Figure 10 illustrates the folder structure of the MU-TCM dataset.

In the unsynchronised and synchronised signals folders, the files are named using a combination of the cutting insert number, edge number, cutting conditions, and repetition number for each experiment. For example, for the first repetition of the cutting insert 2 and edge 1, with a v_C of 200 m/min, a f of 0.2 mm/rev, a a_p of 1.5 mm and a VB of 0.125 mm, the file name would be:

Insert2Edge1_Vc200.0_fz0.2_ap1.5_VB0.125_Rep1.mat.

The VB images are stored in subfolders, which are named based on the cutting insert and edge numbers. Each subfolder contains two images with “.jpg” extension per experiment: one with annotated VB values and one without, together with images of a final measurement for each cutting edge and insert. The images are named using a combination of VB and cutting conditions, as well as a tag label for the annotated images. For example, for the measurement of the cutting insert 3 and edge 1, with a v_C of 50 m/min, a f of 0.05 mm/rev, and a VB of 0.202 mm, the file names would be:

Insert3Edge1\VB0.202_Vc50.0_fz0.05.jpg.

Insert3Edge1\VB0.202_Vc50.0_fz0.05_tag.jpg.

The images for the final measurement are named using a combination of VB and an end tag, as well as a tag label for the annotated images. For example, for the final measurement of the cutting insert 3 and edge 1, with a VB of 0.213 mm, the file names would be:

Insert3Edge1\VB0.213_end.jpg.

Insert3Edge1\VB0.202_end_tag.jpg.

Finally, the two CSV files are located at the root of the dataset structure. The file for signal synchronisation is named signals_sync.csv and the file for cutting conditions and extracted features is named signals_stats.csv.

Data organisation of unsynchronised and synchronised signals

The data are stored in files with .mat extension, which organises information in a dictionary-like structure. These files contain both internal and external signals, as well as the experiment definition data. Table 6 details the keys and their corresponding values (datatype, logical type, description, and unit).

Table 6 Organisation of the unsynchronised and synchronised signals files.

Full size table

Data organisation of V B images

The VB images data are organised into subfolders corresponding to each cutting insert and edge. In each folder, images are stored for every experiment execution, with two images per experiment: one showing the annotated VB measurement and one without annotations. Additionally, each folder contains a final set of images taken for each cutting edge and insert at the conclusion of the experiment. All images are stored in JPEG format (.jpg extension) for ease of access and analysis.

Data organisation of CSV files

The files are structured in a CSV format, with each row corresponding to a specific experiment file. Values are separated by semicolons (;). The signals_sync.csv file contains key reference points and values identified during the manual signal synchronisation process. The signals_stats.csv file summarises cutting conditions, as well as time, frequency, and time-frequency domain features extracted for each signal of the experiments. The extracted features are based on the methodology proposed by Wang et al.¹⁶ for training ML models for TCM. Detailed steps for signal synchronisation and feature extraction are outlined in the “Technical Validation” section. The files are organised as follows:

signal_sync.csv:
- file_name: Name of the signals file.
- RPM_avg: Average value calculated from the SREAL signal.
For the internal (i) and external signals (e1 for signals at 50 kHz and e2 for signals at 1 MHz),
- (e1-e2-i)_signal: Signals selected as reference for synchronisation.
- (e1-e2-i)_start: The start index of the signals selected after synchronisation.
- (e1-e2-i)_end: The end index of the signals selected after synchronisation.
- (e1-e2-i)_peak_distance: The average distance between peaks.
- (e1-e2-i)_freq_peaks: The calculated frequency of the signals. This is calculated as RPM_avg ÷ 60 × peak_dist, where peak_dist is the value of peak_distance.
- (e1-e2-i)_peak_first: The index of the first peak.
- (e1-e2-i)_peak_last: The index of the last peak.
- (e1-e2-i)_peak_qty: The quantity of peaks between start and end.
- (e1-e2-i)_peak_height: The minimum height (value) to identify peaks selected during the synchronisation process.
- (e1-e2-i)_peaks_value_avg: The average value of the peaks.
- (e1-e2-i)_peaks_value_max: The maximum value of the peaks.
- (e1-e2-i)_peaks_value_min: The minimum value of the peaks.
For the e1 and e2 signals,
- (e1-e2)_sec_search: Number of seconds of the start of the signal selected to look for the first peak and to identify (e1-e2)_peak_height.
- (e1-e2)_sec_between_peaks: Number of seconds of the start of the signal selected to look for the first peak and to identify (e1-e2)_peak_height.
signals_stats.csv:
- file_name: Name of the signals file.
- RPM_avg: Average value calculated from the SREAL signal.
- material: Workpiece material.
- VB: The VB measured before the beginning of the experiment.
- Vc: The v_C of the experiment.
- ae: The a_e of the experiment.
- ap: The a_p of the experiment.
- fz: The f per tooth of the experiment.
For each signal,
- (signal)_start): Indicates the start index of the signal used to extract the features.
- (signal)_end):Indicates the end index of the signal used to extract the features.
Time domain features:
- (signal)_max): Indicates the maximum value.
- (signal)_kurt): Indicates the kurtosis value.
- (signal)_rms): Indicates the RMS value.
- (signal)_skew): Indicates the skewness value.
- (signal)_var): Indicates the variance value.
- (signal)_ptp): Indicates the peak-to-peak value.
Frequency domain features:
- (signal)_speckurt): Indicates the spectral kurtosis value.
- (signal)_specskew): Indicates the spectral skewness value.
Time-frequency domain feature:
- (signal)_wavenergy): Indicates the wavelet energy value.

Technical Validation

The technical validation of the MU-TCM dataset focused on signal accuracy and reliability. The first stage of validation involved checking the synchronisation of internal and external signals. As mentioned in the “Data synchronisation” subsection, technical limitations in communication between the CNC machine and the external workstation meant that the signals were not automatically synchronised. Therefore, a manual signal synchronisation process was carried out based on peak analysis. This approach ensured that key reference points in both the internal and external signals were aligned, facilitating accurate correlation across datasets. The steps applied to synchronise the signals were the following:

1.
Initial visual inspection: Internal and external signals were loaded and plotted to visually inspect the misalignment level. The user was prompted to optionally trim the external signals, as they often contained additional noise at the end due to the lack of automatic synchronisation between the CNC and external workstation. This trimming removed any recorded noise after the experiment had ended.
2.
Selection of reference internal signal: The internal signal to be used as the reference for synchronisation was selected. The SREAL signal, representing the actual spindle RPM, was suggested by default.
3.
Peak identification in internal signals: The peaks in the selected internal signal were identified using the find_peaks function from the SciPy library. The minimum height parameter for peak detection was set manually by the user to ensure accurate peak selection. Given the noisy nature of the signals, the find_peaks function often detected neighbouring local peaks. To compensate for this, an average of neighbouring peaks was calculated to determine a stable reference point for synchronisation.
4.
Selection of reference 50 kHz external signal: The external signal sampled at 50 kHz to be used as reference was then selected, with the cutting forces in the Z-axis (Fz) suggested by default. The user was asked to specify how many seconds of the start of the signal should be analysed to identify the appropriate minimum peak height.
5.
Peak identification in 50 kHz external signals: Similar to the internal signal, the peaks in the 50 kHz external signal were identified using the find_peaks function. Again, noisy signals were managed by averaging neighbouring peaks to provide a reliable synchronisation reference.
6.
Calculation of trimming signal portions: Since the internal signals were sampled at a lower frequency than the external signals, the time difference between the start of the acquisition and the first peak was adjusted proportionally using the ratio of their respective sampling frequencies. The same was done for the time difference between the last peak and the end of the acquisition. Based on this ratio, the cutoff sections of the external signals at the start and end were calculated to align with the internal signal, ensuring that both sets of signals matched in time.
7.
Synchronisation of 1 MHz signals: Since the AE signals had acquisition delays due to filtering, steps 5 and 6 were repeated for these signals, using AE_RMS as reference.
8.
Final signal alignment: After identifying the trimming sections of both the internal and external signals, the signals were synchronised in time. This was validated by plotting the signals, as shown in Fig. 9.

After synchronisation, the next step involved extracting time, frequency, and time-frequency features from the signals to evaluate trends and behaviours in the machining process. The features to be extracted were defined following the methodology proposed by Wang et al.¹⁶ to train ML models for TCM. Table 7 summarises the extracted features.

Table 7 Features extracted from the MU-TCM dataset.

Full size table

For each experiment, the extracted features were analysed against the corresponding VB measurements to identify correlated or inversely correlated trends. Pearson and Spearman correlation coefficients were calculated to quantify the strength and direction of these relationships. Features showing significant correlations with VB values were identified, providing insight into the potential of these signals for monitoring tool wear during the machining process.

Table 8 presents the correlation coefficients between the extracted features of external signals and the VB for cast iron and stainless steel, and Table 9 presents the same for the extracted features of internal signals. In the case of the external signals, correlations between the extracted features and the VB values varied across the signals, with both positive and negative relationships observed. Correlations were consistently strong across several time-domain and frequency-domain features. In particular, for the cutting force signals, strong correlations (above 0.8) were found for RMS, variance, peak-to-peak (ptp), and wavelet energy features in Fx and Fz, with slightly lower but still strong correlations in Fy (0.75 to 0.89). This is in agreement with the existing literature, in which cutting force sensors have been effectively used to monitor machining processes⁴. In contrast, correlations for the AE signals were generally lower and more variable. AE_F displayed a moderate to strong negative correlation with VB for SS in spectral skewness and wavelet energy (around -0.7), while a positive correlation (0.77) was observed for CI in the frequency-domain features for SS. In the vibration signals, weaker correlations were identified.

Table 8 Correlation coefficients between extracted features of external signals and VB for cast iron (CI) and stainless steel (SS).

Full size table

Table 9 Correlation coefficients between extracted features of internal signals and VB for cast iron (CI) and stainless steel (SS).

Full size table

The correlations of the internal signals were generally weaker than those of the external sensors, likely due to the loss of data quality resulting from lower sampling frequencies. Nevertheless, strong correlations were still identified. CV3_X demonstrated strong positive correlations across RMS, var, and p2p, where coefficients exceed 0.8, indicating a stable relationship. TV2_X exhibited a similarly strong correlation. CV3_Y and TV2_Y showed moderate correlations in SS, particularly in peak-to-peak and maximum values, although these signals presented weaker and sometimes moderate negative correlations in CI. The other signals exhibited weak to moderate correlations.

Code availability

All scripts used for synchronisation, feature extraction, and signal analysis are available at a public repository in GitHub¹⁷. The code is written in Python (version 3.11) and the library dependencies are listed in a requirements.txt file. The repository includes three main scripts designed for use with the MU-TCM face-milling dataset. The Signal_sync.py script synchronises internal and external signals. The Signal_feature_extraction.py script extracts features from the synchronised signals. Finally, the Signal_evaluator.py script assesses the extracted features against tool wear data. Researchers are encouraged to adapt the workflows of the scripts as needed for their specific use cases.

References

Soori, M., Arezoo, B. & Dastres, R. Machine learning and artificial intelligence in CNC machine tools, A review. Sustain. Manuf. Serv. Econ. 2, 100009, https://doi.org/10.1016/j.smse.2023.100009 (2023).
Article Google Scholar
Korkmaz, M. E. et al. Indirect monitoring of machining characteristics via advanced sensor systems: A critical review. The Int. J. Adv. Manuf. Technol. 120, 7043–7078, https://doi.org/10.1007/s00170-022-09286-x (2022).
Article Google Scholar
Duo, A., Basagoiti, R., Arrazola, P. J. & Cuesta, M. Sensor signal selection for tool wear curve estimation and subsequent tool breakage prediction in a drilling operation. Int. J. Comput. Integr. Manuf. 35, 203–227, https://doi.org/10.1080/0951192X.2021.1992661 (2022).
Article Google Scholar
Teti, R., Mourtzis, D., D’Addona, D. M. & Caggiano, A. Process monitoring of machining. CIRP Annals 71, 529–552, https://doi.org/10.1016/j.cirp.2022.05.009 (2022).
Article Google Scholar
Zhou, Y. et al. A new tool wear condition monitoring method based on deep learning under small samples. Measurement 189, 110622, https://doi.org/10.1016/j.measurement.2021.110622 (2022).
Article Google Scholar
Pillai, S. & Vadakkepat, P. Deep learning for machine health prognostics using Kernel–based feature transformation. J. Intell. Manuf. 33, 1–16, https://doi.org/10.1007/s10845-021-01747-6 (2021).
Article Google Scholar
Agogino, A. & Goebel, K. Milling data set. NASA Ames Prognostics Data Repository https://data.nasa.gov/Raw-Data/Milling-Wear/vjv9-9f3x/data (2007).
PHM Society. 2010 PHM Society conference data challenge. https://phmsociety.org/phm_competition/2010-phm-society-conference-data-challenge/ (2010).
Liu, C., Li, Y., Li, J. & Hua, J. A meta-invariant feature space method for accurate tool wear prediction under cross conditions. IEEE Transactions on Ind. Informatics 18, 922–931, https://doi.org/10.1109/TII.2021.3070109 (2022).
Article Google Scholar
Denkena, B., Klemme, H. & Stiehl, T. H. Multivariate time series data of milling processes with varying tool wear and machine tools. Data Brief 50, 109574, https://doi.org/10.1016/j.dib.2023.109574 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tnani, M.-A., Feil, M. & Diepold, K. Smart data collection system for brownfield cnc milling machines: A new benchmark dataset for data-driven machine monitoring. Procedia CIRP 107, 131–136, https://doi.org/10.1016/j.procir.2022.04.022 (2022).
Article Google Scholar
Proteau, A. et al. CNC machining quality prediction using variational autoencoder: A novel industrial 2 TB dataset. In 2022 Prognostics and Health Management Conference, 360–367, https://doi.org/10.1109/PHM2022-London52454.2022.00069 (IEEE, London, UK, 2022).
Peralta Abadia, J. J. et al. OptiTwin: Data-driven machining process optimization platform for SMEs. In 29th IEEE International Conference on Emerging Technologies and Factory Automation, https://doi.org/10.1109/ETFA61755.2024.10711032 (IEEE, Padova, Italy, 2024).
ISO. ISO 8688-1:1989 - Tool Life Testing in Milling - Part 1: Face Milling. Tech. Rep., ISO, Columbia, WA, USA (1989).
Peralta Abadia, J. J., Cuesta Zabaljauregui, M. & Larrinaga Barrenechea, F. MU-TCM face-milling dataset. Mondragon Uniberstitatea eBiltegia https://doi.org/10.48764/3hdp-gf23 (2025).
Wang, J., Xie, J., Zhao, R., Zhang, L. & Duan, L. Multisensory fusion based virtual tool wear sensing for ubiquitous manufacturing. Robotics computer-integrated manufacturing 45, 47–58, https://doi.org/10.1016/j.rcim.2016.05.010 (2017).
Article Google Scholar
Peralta Abadia, J. J., Cuesta Zabaljauregui, M., Larrinaga Barrenechea, F. Technical validation code for the MU-TCM face-milling dataset (V1.2). Zenodo https://doi.org/10.5281/zenodo.14055658 (2024).

Download references

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 814078 and by the Department of Education, Universities and Research of the Basque Government under the projects Ikerketa Taldeak (Grupo de Ingeniería de Software y Sistemas IT1519-22 and Grupo de investigación de Mecanizado de Alto Rendimiento IT1443-22).

Author information

Authors and Affiliations

Mondragon Goi Eskola Politeknikoa, Faculty of Engineering, Arrasate, 20500, Spain
Jose Joaquin Peralta Abadia, Mikel Cuesta Zabaljauregui & Felix Larrinaga Barrenechea

Authors

Jose Joaquin Peralta Abadia
View author publications
Search author on:PubMed Google Scholar
Mikel Cuesta Zabaljauregui
View author publications
Search author on:PubMed Google Scholar
Felix Larrinaga Barrenechea
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors conceived the experiments, J.J.P.A. and M.C.Z. conducted the experiments, and all authors analysed the results. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Mikel Cuesta Zabaljauregui or Felix Larrinaga Barrenechea.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Peralta Abadia, J.J., Cuesta Zabaljauregui, M. & Larrinaga Barrenechea, F. Mondragon Unibertsitatea face-milling dataset for smart tool condition monitoring. Sci Data 12, 855 (2025). https://doi.org/10.1038/s41597-025-05168-5

Download citation

Received: 20 November 2024
Accepted: 08 May 2025
Published: 23 May 2025
Version of record: 23 May 2025
DOI: https://doi.org/10.1038/s41597-025-05168-5