Background & Summary

Metal milling is crucial for producing high-precision parts, but tool wear over time reduces machine accuracy1. An effective tool replacement strategy is essential for maintaining high-quality production and prolonging tool life. Monitoring tool wear is crucial for tool replacement decisions, coating improvements, and new material development. Researchers assess machining performance of new materials by analyzing wear, surface roughness, and cutting temperature2,3. Many researchers use cutting force signals4, cutting temperature5, spindle current power6and other indirect signals in the machining process to obtain the mapping relationship between these signals and tool wear to achieve good monitoring results. The emergence of artificial intelligence (AI) has opened up new opportunities for tool forecasting, and many works are beginning to utilize AI for intelligent forecasting7,8.

With the rapid advancement of big data and deep learning, datasets have become indispensable in studying the full lifecycle of tool wear9,10,11. These datasets provide rich samples and diverse working conditions, enabling models to predict wear under various scenarios accurately. They capture different wear stages and subtle changes, offering extensive training data to improve model robustness and accuracy. Furthermore, the accumulation of datasets facilitates in-depth research and comparison of different evaluation methods, driving advancements in tool wear detection technologies12,13.

Many datasets on tool wear have been proposed in recent years. Table 1 compares existing tool wear datasets with the proposed QIT-CEMC dataset. As shown, open-source datasets are limited and predominantly use linear cutting methods. Only the IEEE NUAA IDEAHOUSE14 and SDU-QIT15 datasets feature non-linear cutting. The PHM2010 dataset16 focuses on tool wear and remaining useful life prediction under constant experimental conditions, terminating at a wear value of 150 micrometers. The Milling Data Set17 involves multi-layer coated carbide tools with signals like acoustic emission, vibration, and current, primarily addressing single-edge wear. The IEEE NUAA IDEAHOUSE dataset14 captures vibration, spindle current, and power signals, suitable for single-edge wear analysis in square spiral cutting paths, but lacks force signals. Other datasets also face limitations, such as single cutting paths or limited signal types, and some are not open-source.

Table 1 Comparison of currently available datasets used for tool wear estimation.

This paper introduces QIT-CEMC, a comprehensive dataset on the full lifecycle wear of coated end mills. The data was collected during the milling of Ti6Al4V, a challenging material commonly used in aerospace and medical industries18,19. Unlike traditional linear milling, this study employs a complex circumferential milling path, generating periodic signals that offer new insights into tool path and wear analysis. The dataset includes direct measurements of cutting force and torque using a rotary dynamometer. It also contains multidimensional signals, including three-channel force, torque, vibration, and sound. QIT-CEMC is fully open-source, providing a valuable resource for research into tool wear under varied conditions. Here are the advantages of the QIT-CEMC Dataset:

  • Diverse Signal Types: QIT-CEMC captures a wide range of signals, including forces (in x, y, and z directions), torque, sound, and vibrations (in x, y, and z directions). In addition, offline images of various types of tool wear are provided. This provides comprehensive data for tool wear analysis. In contrast, datasets like PHM201016 and MATW120 focus on fewer signal types, such as cutting forces, vibrations, and acoustic emissions.

  • Advanced Sensor: QIT-CEMC employs a variety of advanced sensors, including a rotary dynamometer, sound sensors, and accelerometers, offering precise measurements. Other datasets, such as IEEE NUAA-IDEAHOUSE16 and SDU-QIT15, typically rely on one or two sensors, often excluding sound or force sensors.

  • Varied Machining Paths: QIT-CEMC uses a circular machining path to simulate complex real-world conditions, facilitating research on tool wear under different paths. Most other datasets, like MATW120 and NUJST21, employ linear paths, missing out on the unique wear information circular paths provide.

  • Comprehensive Wear Analysis: QIT-CEMC not only focuses on main cutting edge wear but also includes end teeth and four tool regions wear, offering a more holistic analysis. This is in stark contrast to existing datasets, such as MATW120 and Ross et al.22, which generally emphasize single-edge wear or surface morphology.

Methods

Data acquisition

The QIT-CEMC dataset was generated using a four-flute coated carbide end mill to machine Ti6Al4V titanium alloy along a circular path23,24, as shown in Fig. 1. This alloy is widely used in the aerospace industry for its excellent properties but presents significant machining challenges. The workpiece was a titanium alloy rod with a diameter of 11.3 cm and a height of 400 mm, yielding a base area of 100 cm2. The end mill had a diameter of 10 mm, made from YG3X carbide (composed of 96.5% WC, 0.5% TaC/NbC, and 3% Co), with a TiAlN coating and a 55° helix angle.

Fig. 1
figure 1

Ti6Al4V titanium alloy along a circular path.

The milling cutter used for the experiment has four edges, we measured the wear characterization on the back face of the four edges on the peripheral edge of the milling cutter including the wear value at the maximum wear point, the wear value at the 1/2ap point, and the area of the wear area; in addition, we measured the wear characterization on the back face of the four edges on the bottom edge including the maximum wear value and the area of the wear area. The milling tool used had a rake angle of 10, a flank angle of 15, and a corner radius of 0.8 mm. As shown in Fig. 2, the milling tool has a diameter of 10 mm with a tolerance of ±0.020 mm, an overall length of 75 mm, a cutting length of 25 mm, and helix angles of 38 and 41.

Fig. 2
figure 2

Detailed geometry of the milling tool (mm).

The process involves three main stages: equipment setup and calibration, signal acquisition, and tool wear measurement. As shown in Fig. 3, the experiment used a cylindrical titanium alloy workpiece. First, the rotary dynamometer was mounted on the spindle of the vertical machining center, with accelerometers and a microphone positioned to accurately capture vibration and audio signals. The tool and workpiece were then securely mounted on the worktable. During signal acquisition, the rotary dynamometer measured cutting forces, accelerometers captured vibration signals, and the microphone recorded sound signals. These signals were wirelessly transmitted to a PC, where they were recorded and displayed in real-time for preliminary analysis. Then the PC software processed the data. It recorded cutting forces in the x, y, and z directions, as well as torque. Vibration parameters, including accelerations in the x, y, and z directions, were also recorded. Additionally, the software captured sound signal characteristics. After machining, the tool was examined under a microscope to measure wear on the side and end teeth, with metrics such as maximum wear width (VBmax) and wear area (SVB) documented. The specific parameters of the conditions are shown in Table 2.

Fig. 3
figure 3

Schematic diagram of the data acquisition for the QIT-CEMC dataset.

Table 2 Tool wear experiment platform processing parameters.

The tool wear measurement procedure is illustrated in Fig. 4. On the right side, the workflow for wear measurement is presented. First, the system, including the microscope, camera, and lighting, is calibrated. The tool is positioned on the platform, aligning the wear area with X- and Y-axis controllers. Lighting and focus adjustments are made for optimal image clarity. A CCD color camera captures high-resolution images of the wear area, which are then analyzed by software to quantify key wear metrics, such as maximum wear and wear area, allowing for accurate tool wear assessment.

Fig. 4
figure 4

The description of a tool wear measurement procedure.

For wear measurement, we used a 19JPC-V microscope with a resolution of 0.0005 mm (glass grating). The maximum inaccuracy is defined as \(1+\frac{L}{100},\mu {\rm{m}}\), where L represents the measurement length in mm, and the measurement range is 200 × 100 mm. To maintain accuracy, the environment was controlled at a temperature of 20 ± 1° C, with fluctuations limited to 0. 5° C per hour and a temperature difference of no more than 0. 5° C between the measured part and the instrument. We recorded maximum wear values for each of the four circumferential edges, wear values at mid-depth, and the wear zone area. Additionally, we measured maximum wear and wear zone area on the bottom edge and calculated average wear values across the four edges.

The QIT-CEMC dataset was gathered using a Kistler 9170B251 dynamometer and includes 68 milling samples, each generating approximately 5 million records. These records contain data on vibration, sound, cutting force, torque, and detailed tool wear measurements. The milling parameters used are listed in Table 3. Tool wear measurements are divided into online and offline methods. Online methods are efficient but may suffer from noise and vibration-related inaccuracies. By contrast, the QIT-CEMC dataset employs offline methods, where the machining process is halted to allow precise measurements with high-accuracy instruments. For cutting force measurements, the Kistler 9170B251 dynamometer was operated at a 10 kHz sampling rate to capture dynamic force variations accurately. Vibration data was recorded with an accelerometer near the tool holder, also sampled at 10 kHz, and acoustic emission (AE) signals were captured by a AE sensor mounted on the spindle housing. To ensure signal clarity, a low-pass filter at 1 kHz was applied to the force and vibration data to minimize high-frequency noise, while a high-pass filter at 50 kHz was used on AE data to isolate relevant emission events.

Table 3 Processing parameters and sample count.

Data process

The raw data collection process includes invalid signals and unavoidable baseline wander25, necessitating signal processing. Figure 5 illustrates the distribution of these invalid signals, which primarily occur during the initial tool engagement and withdrawal phases. Additionally, due to the circular path used during cutting, invalid signals are also observed during path transitions. Figure 6 shows the data processing workflow. A binary Gaussian mixture model (GMM)26 is employed to cluster the signals based on energy frequency. As shown in Fig. 6 invalid signals of raw data mentioned above are eliminated.

Fig. 5
figure 5

Invalid signal from the raw data.

Fig. 6
figure 6

The flow chat remove the invalid raw data.

To address the presence of baseline wandering problem, this work uses the moving average method for correction27. Figure 7 shows the data signal after the moving average method, and it can be seen that the baseline wander is well corrected.

Fig. 7
figure 7

The raw data baseline wander remove.

Data Records

The QIT-CEMC dataset records four-dimensional data: vibration, sound, cutting force, and torque, along with measurements of end and side cutting edge wear. The vibration and sound data are collected using Donghua DH5922D sensors. These data are stored in .csv files. The cutting force and torque data are gathered with the Kistler 9170B251 force gauge. These data are saved in .txt files. The detailed content and structure of the QIT-CEMC dataset are shown in Fig. 8. The records for each file of the dataset are listed below:

  • Force and torque data folder: The collected data was renamed and saved in txt format. Each file contains five columns of data: the first column represents time, while the remaining four columns correspond to the X, Y, and Z direction cutting force and the torque.

  • Vibration and sound data folder: Each sample acquisition is recorded in a csv file containing five columns of data. The first column represents time, while the remaining five columns the first column represents time, while the remaining four columns correspond to the X, Y, and Z direction vibration and the sound. The data for the second sample was lost because the acquisition software disconnected.

  • Image data folder: The folder contains 68 subfolders, each documenting 8 images of tool wear from both the end and side teeth.

  • Tool Wear Health indicators: The CSV file comprises 12 initial columns, each representing the maximum flank wear width (VBmax), wear width at half the depth of cut (VB in 1/2 ap) and wear area (S), for each of the tool’s four cutting edges. The subsequent columns contain the maximum wear width (VBmax) and wear area (S) for the end teeth of these cutting edges.

Fig. 8
figure 8

QIT-CEMC Dataset structure and content.

Technical Validation

To verify the data’s usability, we analyzed the distribution of the collected data and the tool wear trends. Figure 9 displays data distribution histograms and KDE distributions. Each distribution is generally symmetrical, indicating a near-normal shape without significant skewness or extreme outliers. This pattern suggests that the data is consistent and free from major anomalies, affirming the dataset’s reliability and usability for further analysis. The absence of irregularities or unexpected patterns in the distributions confirms that the data acquisition process was conducted properly. This validation supports the dataset’s suitability for subsequent computational modeling and ensures its readiness for high-precision applications.

Fig. 9
figure 9

Data distribution maps.

Figure 10 illustrates the tool wear trend in the dataset. The wear rate is slow during the initial stage, remains relatively stable in the middle stage, and accelerates sharply in the final stage. This pattern aligns with real-world observations and matches the wear curves from datasets in reference16 and reference20, confirming the dataset’s reliability.

Fig. 10
figure 10

Tool wear curve.

Usage Notes

This dataset facilitates a comprehensive, data-driven study on the full life cycle of tool wear. The raw dataset is available for download at https://doi.org/10.6084/m9.figshare.2732334628. Additionally, we provide a Python script, data processing.py, which handles the processing of raw data, including the removal of invalid data and baseline wander correction. The tool wear.csv file contains wear measurements for the four tool edges—side and end edges, respectively. All data and script are accessible in the repository.