Identification of gear fault by weighted Mahalanobis distance method based on multi-scale permutation entropy

Zhou, Xintao; Ma, Na; Zhang, Jialing; Liu, Jiamin

doi:10.1038/s41598-025-33051-1

Download PDF

Article
Open access
Published: 19 December 2025

Identification of gear fault by weighted Mahalanobis distance method based on multi-scale permutation entropy

Xintao Zhou^1,2,3,
Na Ma³,
Jialing Zhang^1,3 &
…
Jiamin Liu^1,2

Scientific Reports volume 16, Article number: 3104 (2026) Cite this article

674 Accesses
Metrics details

Subjects

Abstract

Accurate identification of weak fault signals is critical for gear fault detection, yet particularly challenging. This study proposes a gear fault diagnosis method that utilizes Mutual Information (MI) and an Improved False Nearest Neighbor (IFNN) algorithm to optimize the delay time (τ) and embedding dimension (m) for Multiscale Permutation Entropy (MPE) calculation. The MPE values of various fault samples are computed using this optimized approach. The minimum Mahalanobis distance (min-MD_Maha) for each sample achieves a fault identification accuracy of 76.87%. Information entropy is then employed to extract useful information from different fault samples, serving as weights for the MDMaha. Experiments on gear pitting and wear faults validate the method. The weighted MDMaha significantly improves accuracy to 99.72%. The results demonstrate the superior effectiveness of the proposed weighted MDMaha-enhanced MPE framework in characterizing vibration signatures induced by gear faults.

A novel intelligent fault diagnosis method for gearbox based on multi-dimensional attention denoising convolution

Article Open access 21 October 2024

Gearbox fault diagnosis method based on lightweight channel attention mechanism and transfer learning

Article Open access 07 January 2024

Research on fault diagnosis method for variable condition planetary gearbox based on SKN attention mechanism and deep transfer learning

Article Open access 02 July 2025

Introduction

As critical components in rotating machinery, gear assemblies perform the dual function of transmitting power and enabling operational capabilities in mechanical systems. Surface degradation initiating at gear tooth interfaces induces detrimental vibration signatures and acoustic emissions, ultimately compromising operational reliability through progressive mechanical performance deterioration^1,2. Moreover, gears operate in harsh environments and are prone to failures, resulting in substantial maintenance costs. Hence, it is advantageous to prevent embryonic failures during the gearbox assembly process. Moreover, the early detection of minor faults is critical to avert cascade failure modes and catastrophes^3,4.

Gear trains typically operate under harsh conditions, such as fluctuating loads and high levels of background noise. This often results in abnormal vibration induced by gear defects being masked. Moreover, vibration signals associated with gear wear are inherently weak and susceptible to background noise interference^5,6. Consequently, extracting fault features and identifying defect patterns from vibration signals rely heavily on advanced signal processing techniques. Gearbox fault diagnosis primarily consists of two stages: feature extraction and fault recognition. In recent years, numerous methods for analyzing gearbox vibration characteristics have emerged, with several finding practical application. Feature extraction is a crucial procedure for gearbox fault diagnosis, as selecting appropriate signal features not only reduce computational time but also improve the efficiency of fault identification. Recently, entropy has become a popular method for extracting fault features in mechanical equipment due to its computational simplicity, anti-noise capability, and robustness, leading to its widespread use^7,8. Owing to the complexity and variability of practical problems, various enhanced entropy-based techniques have been developed to improve their effectiveness in practical applications. These include approximate entropy, fuzzy entropy, sample entropy and multiscale permutation entropy (MPE)^9,10. Moreover, the approximate entropy method, fuzzy entropy method, and sample entropy method can all be utilized to assess the complexity and regularity of time series. These methods do not require coarse granulation treatment of time series, and the fault detection results are independent of sub-series length, thereby improving analysis efficiency and offering additional advantages. However, the accuracy of these three methods in time series detection is relatively limited and highly dependent on the degree of effective information contained in the samples. By expanding from a single-scale to a multi-scale phase space, the MPE method ensures the integrity of both local and global information of the vibration characteristics^11,12. Furthermore, this approach can not only effectively mitigate the interference of noise on fault characteristics but also adept at detecting faint fault signals. At the same time, the MPE method can resolve multi-scale coupling issue among multiple faults and precisely reflect the dynamic mutation capability of the fault system^13,14. In addition to the above advantages, the accuracy of MPE method depends on several parameters, including the delay time of time series, the embedding dimension of phase space and the scale factor. Formerly, reasonable values of these parameters were determined through simulation and multiple trial calculations. However, this method of determining parameter values is subject to a certain degree of subjectivity, which reduces the accuracy of the MPE method in time series detection.

The computation of the MPE method is highly dependent on algorithm parameters, with different embedding dimensions and delay times significantly influencing the resulting entropy values. Otherwise, the accuracy of time series detection is affected by the delay time of the signal and the embedding dimension of the phase space. For the minimum delay time τ: while τ is too large, all points containing feature information concentrate on the main diagonal of the feature matrix, leading to strong correlation among the features. While τ is small, the points carrying characteristic information exhibit no significant correlation^15,16. Hence, while selecting τ, the influence of this parameter on the signal’s characteristic information must be evaluated. Research has shown that the mutual information method can be used to calculate relatively optimal τ parameters of a signal^17,18. Mutual information (MI) serves as an important tool in information theory to assess the interdependence between two sets of events. The first local minimum of the signal’s MI function is generally adopted as the optimal delay, and the method has perfect theoretical basis and is independent of the embedding dimension. As for the embedding dimension m: current research indicates that the optimal range for m is between 3 and 7. While m is too large, phase space reconstruction becomes computationally intensive and increases algorithm runtime. While m is too small, the reconstructed information is too slightly to realize the extraction and detection of mutation signals¹⁹. In addition, vibrations caused by mechanical system failures exist in a multi-dimensional space. When the dimension of the dynamic system is reduced, two points that are non-adjacent in the multi-dimensional space may appear neighboring in the low-dimensional space—a phenomenon known as False Nearest Neighbors (FNN)²⁰. Altering the value of m can also lead to misinterpretation of the actual positions of two neighboring points in the phase space. Selecting the optimal m enables the identification and elimination of these FNN points. Therefore, the accuracy of FNN detection is affected by various thresholds, resulting in strong subjectivity of the discriminant consequences²¹. To address this issue, Cao improved the algorithm so that the determination of m depends only on the value of τ, effectively overcoming the influence of various thresholds²². This method is used to calculate the optimal m value within the signal’s phase space, and the improved FNN method is commonly referred to as Cao’s algorithm²³.

Although the MPE method can fully extract weak fault feature information, it struggles to effectively distinguish such information when the differences among multiple characteristic features are subtle. Research has shown that the MD_Maha method can address this issue. The MD_Maha algorithm takes into account the correlations among various characteristic indicators and eliminates interference caused by the relationships between feature variables. In addition, the calculation of MD_Maha is unaffected by parameter dimensions and is independent of the scale of feature measurement, thereby offering advantages such as higher accuracy in fault diagnosis results²⁴. Nevertheless, when the MD_Maha method is used alone to directly identify fault characteristic information, it treats the importance of different characteristic samples equally. This leads to a significant discrepancy between the theoretical MD_Maha values and the actual situation, which impairs the accuracy of fault feature identification²⁵. To improve the accuracy of the MD_Maha method, the importance of different characteristic samples should be treated differently according to practical conditions. Furthermore, leveraging the useful information contained in the samples, information entropy can be used to weight samples with different affiliations, thereby calculating the weighted MD_Maha values in line with the actual scenario. The information entropy method is adopted to compute the objective weight of each sample. While the randomness of a set index is high, the information entropy value is larger, and the indicator carries greater importance in the weighting of decision-making samples. Conversely, lower entropy indicates lower importance^26,27.

In this paper, a novel gear fault identification approach is developed using weighted MD_Maha based on MI and enhanced FNN techniques for the MPE method. The accuracy of the MPE method in identifying the dynamic mutation characteristics of fault signals depends on whether the values of parameters such as τ and m are reasonable. Initially, leveraging the basic principle and strengths of the MI method, the minimum value of the delay time parameter τ is calculated for the fault vibration signal. Subsequently, the optimal value of m is computed via an improved FNN method, which reduces the subjectivity in parameter selection and enhances the precision of MPE in characterizing fault features. Secondly, the min - MD_Maha of different fault characteristics is calculated to identify fault characteristics. In order to improve the accuracy of fault feature recognition by MD_Maha method, the information entropy approach is adopted to quantify the amount of useful information contained in different attribute samples, which is taken as the weight of MD_Maha. Ultimately, gear fault experiments are conducted to validate the reliability of the proposed framework. The results confirm the effectiveness of combining weighted MD_Maha with MI and improved FNN-optimized MPE for accurate fault identification.

Main contributions and innovations in this paper:

(1)
Paper propose a new method for gear fault identification based on the weighted Mahalanobis distance method combined with multi-scale permutation entropy. This method effectively integrates the advantages of multi-scale permutation entropy for feature extraction and the Mahalanobis distance for fault classification, providing a more accurate and robust approach for identifying gear faults.
(2)
Through extensive parametric studies, we determine the optimal embedding dimension and delay parameters for multi-scale permutation entropy calculation. This optimization process ensures that the extracted features are highly representative of the underlying fault patterns, leading to improved fault identification performance.
(3)
This method demonstrates strong robustness and generalizability across various experimental setups, load conditions, and fault types/severities. We have conducted comprehensive validation experiments, including tests on different gear types under different load scenarios (no-load, light-load, medium-load, and heavy-load conditions) and with various fault types (pitting, wear, misalignment, and broken teeth). The results show that our method can reliably identify gear faults under diverse real-world conditions.
(4)
This paper provide a thorough validation of our method by comparing it with existing fault identification methods. The comparative analysis is based on key performance indicators such as accuracy, sensitivity, and specificity. Our method outperforms several existing methods, showcasing its superiority in gear fault identification.

Analysis of the theoretical underpinnings

Fundamental principle of MPE

In this section, the basic principle of calculating the permutation entropy associated with time sequences is briefly introduced and sorted out, so as to provide theoretical reference for the subsequent used this method to extract feature information of the fault mechanism and the application of MPE approach. The calculation process is shown as Fig. 1.

(1)
Formulate the phase space matrix from time sequences

As shown in Eq. (1), a time sequences X of a signal, it is defined as follows:

$$X = \{ x(i)|i = 1,2, \cdots ,K + (m - 1)\tau \}$$

(1)

Where x(1) represents a time series, where, m is the embedding dimension of the phase space; τ is the delay time of a time sequence; K is the number of reconstructed components in phase space.

And the phase space reconstruction of the time sequences was carried out, and the reconstruction matrix Y of the phase space of the sequence was obtained, as shown in Eq. (2):

$$Y = \left[ {\begin{array}{*{20}c} {Y1} \\ {Y2} \\ \vdots \\ {Yj} \\ \vdots \\ {YK} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {x(1)} & {x(1 + \tau )} & \cdots & {x(1 + (m - 1)\tau )} \\ {x(2)} & {x(2 + \tau )} & \cdots & {x(2 + (m - 1)\tau )} \\ \vdots & \vdots & \vdots & \vdots \\ {x(j)} & {x(j + \tau )} & \cdots & {x(j + (m - 1)\tau )} \\ \vdots & \vdots & \vdots & \vdots \\ {x(K)} & {x(K + \tau )} & \cdots & {x(K + (m - 1)\tau )} \\ \end{array} } \right]$$

(2)

Where m is the embedding dimension of the phase space; τ is the delay time of a time sequence; K is the number of reconstructed components in phase space.

(2)
Ascending arrangement of elements in each reconstructed element

For illustration, the elements in the J_th component of the reconstruction matrix Y are selected to be arranged in ascending order, and then the position numbers of each element in the column are recorded: j₍₁₎, j₍₂₎…, j_(m). The sequence of the J_th element is shown in Eq. (3):

$$x(j + (j_{{(1)}} - 1)\tau ) \le x(j + (j_{{(2)}} - 1)\tau ) \le \cdots \le x(j + (j_{{(m)}} - 1)\tau )$$

(3)

(3)
Computation the entropy value and normalization of each reconstructed component

$$H_{P} = - (\sum\limits_{{g = 1}}^{K} {P_{g} \ln P_{g} } )/\ln (m!)$$

(4)

(4)
Coarse granulation and calculate the permutation entropy value of coarse-grained sequences

The permutation entropy value of the sequence was calculated, establishing the multiscale permutation entropy as shown in Eq. (5):

$$MPE(X,s,m,\tau ) = PE(\frac{1}{s}\sum\limits_{{i = (j - 1)s + 1}}^{{js}} {x_{i} } ,m,\tau )$$

(5)

Where X is the time sequences, s is the scale factor, m represents the embedding dimension, τ represents the delay time.

The approach to determining crucial threshold parameters

Fundamental principle of the MI approach

Specifically, the relationship between the MI of time sequences and delay time is depicted in Eq. (6):

$$I(\tau ) = I(x_{i} ,x_{{i + \tau }} ) = \sum\limits_{{i = 1}}^{N} {P(x_{i} ,x_{{i + \tau }} )\log _{2} \left[ {\frac{{P(x_{i} ,x_{{i + \tau }} )}}{{P(x_{i} )P(x_{{i + \tau }} )}}} \right]}$$

(6)

where p(x_i) represents the probability distribution function of the time sequence x_i; p(x_i+τ) is the statistical distribution function of time sequences x_i+τ; p(x_i, x_i+τ) is the integrated probability distribution function of time sequences x_i, x_i+τ; H(x_i) is the information entropy of time sequences x_i; H(x_i+τ) is the information entropy of time sequences x_i+τ; H(x_i, x_i+τ) is the integrated information entropy of time sequences x_i and x_i+τ.

Theoretical basis of the improved FNN method

The improved calculation principle of FNN points is shown as follows:

The FNN points method is employed to compute the relationship between E(m) and m, as depicted in Eqs. (7) and (8):

$$E(m) = \frac{1}{{N - m\tau }}\sum\limits_{{i = 1}}^{{N - m\tau }} {\frac{{\left\| {X_{{m + 1}} (i) - X_{{m + 1}}^{f} (i)} \right\|}}{{\left\| {X_{m} (i) - X_{m}^{f} (i)} \right\|}}}$$

(7)

$$E_{1} (m) = \frac{{E(m + 1)}}{{E(m)}}$$

(8)

where m-dimensional temporal sequences X_m(i), and the nearest neighbor point $X_{m}^{f} (i)$; (m + 1)-dimensional temporal sequences X_m+1(i), and the nearest neighbor point $X_{{m + 1}}^{f} (i)$; $\left\| \bullet \right\|$ is the $\infty$-norm.

Whereas, in practical applications, with the increase of m_i, it is difficult to distinguish whether the alteration of E₁(m_i) value is stabilization or not. Therefore, a criterion is added, as shown in Eqs. (9) and (10):

$${E^*}(m)=\frac{1}{{N - m\tau }}\sum\limits_{{i=1}}^{{N - m\tau }} {\left\| {{X_{m+1}}(i) - X_{{m+1}}^{f}(i)} \right\|}$$

(9)

$$E_{2} (m) = \frac{{E^{*} (m + 1)}}{{E^{*} (m)}}$$

(10)

The investigation shows that, the value of E₂(m) is constant to 1 for random time sequences, and the sequences data is not dependency. For deterministic time series, the value of E₂(m) depends on the embedding dimension m within a certain threshold, but the E₂(m) is always not equal to 1.

Principles of the Mahalanobis distance (MD_Maha) method

The MDMaha

The calculation principle of the Mahalanobis distance algorithm is as follows: There is a test sample X=[x₁, x₂, …, x_n]^T, And a failure feature sample Y=[y₁, y₂, …,y_n]^T, The Mahalanobis distance MD_Maha of the measured sample X and the characteristic sample Y is shown in formula (11):

$$M{D_{Maha}}=\sqrt {{{(X - {\mu _Y})}^T}C_{Y}^{{-1}}(X - {\mu _Y})}$$

(11)

Where µ_Y is mean vector, and $\mu _{Y} = \frac{1}{{n_{Y} }}\varsigma \sum\limits_{{i = 1}}^{{n_{Y} }} {y_{i} }$. C_Y is Covariance matrix, and $C_{Y} = \frac{1}{{n_{Y} - 1}}\sum\limits_{{i = 1}}^{{n_{Y} }} {(x_{i} - \mu _{Y} )} (x_{i} - \mu _{Y} )^{T}$. ς= [1, 1, …, 1]^T.

Weighted MD_Maha by information entropy method

The formula for calculating the weighted MD_Maha of samples (X, Y) is as follows:

$$MD_{{Maha}}^{W} = \left[ {(X - \mu _{Y} )^{T} \Im C_{Y}^{{ - 1}} \Im (X - \mu _{Y} )} \right]^{{\frac{1}{2}}}$$

(12)

Where µ_Y is the mean of sample, C_Y is the covariance of sample, $\Im$ is the weight of the sample.

Then, the calculation steps of the information entropy weight algorithm are shown as follows:

(1)
The original matrix A with n samples and m decision indicators was constructed, and A = [A₁, A₂, …, A_i, …, A_m]^T. The attribute value corresponding to the j_th index of the i_th sample in A is a_ij. As shown in Eq. (13).

$$A=\left[ {\begin{array}{*{20}{c}} {{A_1}} \\ {{A_2}} \\ \vdots \\ {{A_m}} \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{a_{11}}}&{{a_{12}}}& \cdots &{{a_{1n}}} \end{array}} \\ {\begin{array}{*{20}{c}} {{a_{21}}}&{{a_{22}}}& \cdots &{{a_{2n}}} \end{array}} \\ {\begin{array}{*{20}{c}} {{a_{i2}}}&{{a_{i2}}}& \ddots &{{a_{in}}} \end{array}} \\ {\begin{array}{*{20}{c}} {{a_{m1}}}&{{a_{m2}}}& \cdots &{{a_{mn}}} \end{array}} \end{array}} \right]$$

(13)

(2)
Determine the positive index A⁺ and negative index A⁻ of matrix A, and normalize them. The mathematical expression is as follows:

Positive indicator set:

$$A^{ + } = \frac{{a_{{ij}} - \min \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right]}}{{\max \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right] - \min \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right]}}$$

(14)

Negative index set:

$$A^{ - } = \frac{{\max \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right] - a_{{ij}} }}{{\max \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right] - \min \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right]}}$$

(15)

(3)
Calculate the proportion of the i-th sample in the j_th index p_ij, as shown in Eq. (16):

$$p_{{ij}} = \frac{{a_{{ij}} }}{{\sum\limits_{{i = 1}}^{n} {a_{{ij}} } }}$$

(16)

(4)
The information entropy of the j-th index is calculated, as shown in Eq. (17):

$$e_{j} = - \frac{1}{{\ln m}}\sum\limits_{{i = 1}}^{n} {(p_{{ij}} )} \ln (p_{{ij}} )$$

(17)

(5)
The objective weight value of the sample is calculated, as shown in Eq. (18):

$$\Im = \frac{{1 - e_{j} }}{{\sum\limits_{{j = 1}}^{m} {(1 - e_{j} )} }}$$

(18)

The proposed fault identification method based on weighted Mahalanobis distance and multiscale permutation entropy differs from traditional methods in terms of computational complexity. The computational complexity of multiscale permutation entropy primarily depends on the embedding dimension. In the field of mechanical engineering, the embedding dimension is usually small, thus making the computational complexity of permutation entropy relatively manageable. The computational complexity of the weighted Mahalanobis distance mainly relies on the dimension of the feature vector. In this method, the dimension of the feature vector is also small, so the computational complexity of the weighted Mahalanobis distance is within an acceptable range. Compared with traditional time-domain and frequency-domain feature extraction methods, although this method has slightly higher computational complexity, it significantly improves the accuracy and robustness of fault identification.

The calculation method for the fault identification accuracy rate

To evaluate the performance of fault identification, we calculated the fault identification accuracy. Accuracy is obtained by dividing the number of correctly identified fault samples by the total number of fault samples and multiplying by 100%. The specific calculation formula is:

$$Ac = \frac{C}{T} \times 100\%$$

(19)

Where Ac fault identification accuracy rate; C number of correctly identified faulty samples; T total number of fault samples.

The accuracy of fault identification is an indicator that can intuitively reflect the performance of the method in fault identification tasks.

Experimental verification and result analysis

Figure 2 is shown that, it is an experimental bench for simulating gear surface pitting and gear surface wear failure. The experimental conditions are set as follows: 4 test points are set on the outside of the gearbox, testing point I, II, III and IV. (1. Variable frequency motor, 2. Gearbox, 3. Magnetic powder loading device, 4. Signal controller, 5. Acquisition card, 6. Acquisition computer, 7. Industrial personal computer, 8. Operation panel).

Experimental parameters: operational speed 1500 r/min; magnetic powder loader is 0.1 A; gear modulus 2 mm, tooth number z₁ = 19 and z₂ = 95, tooth width 20 mm, and the material of the gear is 18CrNiMo7 steel. In this experiment, the time-domain data of vibration acceleration of gearbox pitting fault and gear tooth wear fault were extracted respectively. And a total of 30 groups of experimental data were extracted. Among them, the first 20 groups are used as the training sample set, and the last 10 groups are used as the test sample set.

Calculate τ of the signal

The results were derived through simulation analysis and are presented in Fig. 3. While the τ = 2, and the value of MI I(τ) was the smallest of the experimental data on the 4 measuring points of the gearbox. In the indication, the noise is dominant in the time sequence while the correlation of time sequence is small in the case that the τ is larger than 2. Additionally, the value of mutual information I(τ) of position III and IV is higher than other position’s in time sequences.

In the indication shows that, the time sequences have little mutual correlation at the positions of measuring points I and II, and the time sequences at these two measuring points are significantly disturbed by background noise. In conclusion, the analysis demonstrates that the τ values of the time sequences at the four measuring points on the gearbox are uniformly 2.

Calculate m of the signal

In accordance with the calculation principle of the improved FNN point method, the results illustrated in Fig. 4 were acquired via simulation analysis. It this figure, the change curve of the improved FNN proportion E₁(m) values corresponding to the time sequences of the four measurement points is basically consistent in the definition domain of 1 ≤ m ≤ 20. And the change curves of the value of E₁(m) all reach a plateau after m is 15. At this point, the value of E₁(m) can’t be used to determine the value of m of time sequences. Therefore, the values of the supplementary criterion E₂(m) of pseudo nearest neighbor rate is used to determine the m of time series. And the FNN proportion E₂(m) values corresponding to the time sequences at the four measuring points fluctuates slightly within the definition domain 1 ≤ m ≤ 5. While the m = 5, the value of E₂(m) reaches 1 for the first time. Summary, it can be concluded shows that: while m is specifically 5, it is the most appropriate for the time series at the four measuring points on the gearbox.

Multi-scale permutation entropy method to extract fault features of sample set

Firstly, using the information entropy method and the improved FNN method, the minimum τ = 2 of the 20 samples and the optimal m = 7 of phase space are calculated. Then, the MPE method was used to extract the dynamic mutation characteristics of the 20 sample sets of each failure, and the extraction results are shown in Fig. 5. In this figure shown the variation disciplinarian of the MPE value of each sample over the entire measurement scale is approximately similar. Additionally, the characteristic information of pitting faults on the tooth surface is mainly distributed in the measurement scale of S = 1 ～ 5. The content of fault information in each sample is very consistent, indicating that the quality of the selected sample is effective.

In order to further weaken the implication of interference information on the precision of the fault feature information extraction results, the average values of the sample sets of pitting failure and wear failure are calculated respectively. The average value of the sample set is used as the standard value for fault feature identification.

Figure 6 is shown the distribution results of the standard MPE values of pitting and tooth wear faults: At a single scale (the scale factor S = 1), the difference between the permutation entropy value of the pitting fault on the tooth surface and the permutation entropy value under normal operating conditions is small, and it is difficult to intuitively determine the type of the tooth failure. For the same reason, the value of the permutation entropy of the gear tooth wear fault is in the scale factor S = 1 ～ 6, and the maximum difference of the MPE is 0.0493. While the gear teeth have a wear fault, the actual meshing clearance between the gear teeth is larger than the theoretical design value; the gear side clearance shock vibration during gear operation is greater. Nevertheless, while the scale factor S ≥ 6, the permutation entropy of the pitting and tooth wear faults on the tooth surface is exactly the same as the permutation entropy of the gear system under normal conditions. This shows that while S = 1 ~ 5, the fault vibration signal already contains complete fault characteristic information.

Furthermore, Fig. 7 is shown the relative error values of the characteristic vibration signal MPE relative to the tooth surface pitting failure and gear tooth wear failure under the normal operating conditions. And the dynamic characteristics of the pitting failure of gear teeth fluctuate slightly throughout the measurement scale, and the peak value of the relative error does not exceed 2%. Correspondingly, the relative error value of the gear tooth wear failure and the normal gear system is within a range of the scale factor S = 1 to 5, and the relative error value fluctuation range is relatively severe. While S = 2, the maximum value of the relative error is 6.79%. Due to the fact that after a certain amount of tooth surface wear, the tooth side clearance increases, causing the fault information to overlap with the tooth side clearance impact. Additionally, when the scale factor is 2, the fault information in the phase space matrix is compressed toward the main diagonal, increasing the difficulty of fault identification and leading to larger errors.While S ≥ 6, the change of the relative error of the gear tooth wear failure is consistent with the tooth surface pitting failure.

Identification of fault characteristics by the min - MD_Maha method

In this section, according to the MPE values of samples No. 18 to No. 30 in the test sample set to calculate the min - MD_Maha from each test sample to the standard failure value. The calculation results are shown in Fig. 8. “pitting corrosion standard - pitting corrosion detection” is the min - MD_Maha between the evaluation standard of pitting failure characteristics and the pitting failure test sample set; “pitting corrosion standard - wear detection” is the evaluation standard of pitting failure characteristics value and the min - MD_Maha of the wear failure test sample set. According to the principle of the min - MD_Maha of similar faults, it is shown that only the sample numbers S = 1, 6, and 12 fail to identify, shown as the red “□” mark. The difference between the min - MD_Maha of these three samples is very slight, and the difference range is only 1.6178 ~ 3.5559. Finally, the failure characteristics of these samples were not successfully identified. And the min - MD_Maha for pitting faults is only 69.23% for the centralized identification of pitting fault samples.

It is explained that the Mahalanobis distance method failed to identify tooth surface pitting faults at No. 1, 6, and 12, necessitating an improvement to the Mahalanobis distance method.

Correspondingly, Fig. 9 is shown results of the min - MD_Maha of wear failures. In figure, “wear standard - wear test” is the minimum MD_Maha of the wear failure feature evaluation standard value and the wear failure test sample set; “wear standard-pitting detection” is the wear standard feature evaluation standard value and pitting failure of the min - MD_Maha of sample set. According to the principle that the MD_Maha of the same kind of fault is the smallest, it is concluded that the fault identification of samples No. 3 and No. 12 in the sample set fails, shown as the red “□” mark. The difference between the min - MD_Maha of samples No. 3 and No. 12 is 6.0233 and 7.5069, respectively. The fault characteristics of these two groups of samples should be able to be identified, but ultimately, the identification failed. Finally, the accuracy rate is 84.72% of wear fault identification by the min - MD_Maha.

Identification of fault characteristics by weighted MD_Maha method

(1)
Determine the weight parameter by information entropy method

Figure 10 is the weights distribution on each measurement scale of the two fault type samples. The black line is the weight of the pitting fault sample set, and the blue line is the weight of the sample set of gear wear fault. And it is shown that, the weight distribution of the two kinds of fault sample sets in the whole measurement scale, except the point of S = 6, 9 is roughly the same.

(2)
Identification of fault characteristics by weighted MD_Maha method

Figure 11a shows the recognition of pitting fault characteristics of tooth surface by weighted MD_Maha method. “pitting standard - pitting corrosion detection” is the weighted MD_Maha between the evaluation standard of pitting failure characteristics and the pitting failure test sample set, and show as the green “○” mark; “pitting standard - wear detection” is the evaluation standard of pitting failure characteristics value and the weighted MD_Maha of the wear failure test sample set, show as the orange “□” mark. Correspondingly, Fig. 11b is the results of weighted MD_Maha of wear fault characteristics. “wear standard - wear corrosion detection " is the weighted MD_Maha of the wear failure feature evaluation standard value and the wear failure test sample set, and show as the blue “∆” mark; “wear standard-pitting corrosion detection” is the wear standard feature evaluation standard value and pitting failure of the weighted MD_Maha of sample set, and show as the blank “□” mark. The analysis above shows that the weight MD_Maha of each fault vibration sample is achieved, and which is identification accurateness about 99.72% of fault characteristic.

Discussion

The proposed gear fault diagnosis method integrating MI, IFNN algorithm, and weighted Mahalanobis distance based on MPE has demonstrated significant efficacy in identifying gear faults, as evidenced by the substantial improvement in fault identification accuracy from 76.87% to 99.72% with the application of weights derived from information entropy.

Comparison with existing methods

Compared with the deep-learning-based gear fault diagnosis scheme proposed by Wang et al. under controlled laboratory conditions²⁸, the present strategy demonstrates a “lightweight” advantage that is far better suited to real industrial sites. But our deep learning models, while powerful, often necessitate extensive training data and computational resources. In contrast, our method, which relies on MI, IFNN, and MPE, is more computationally efficient and can achieve high accuracy with relatively fewer data samples. This is particularly advantageous in practical industrial settings where large-scale labeled data may not always be readily available.

Gao et al.²⁹ employed wavelet transforms to extract gear-fault features; while effective at capturing transient impacts, the approach offers insufficient characterization of the strongly nonlinear, multiscale-coupled information commonly present in fault-induced vibrations. Our method, by utilizing MPE, is specifically designed to quantify the complexity of time series signals across multiple scales, thereby providing a more comprehensive representation of the underlying fault patterns. This enhanced representation capability is crucial for distinguishing between different types of gear faults, as demonstrated by the high accuracy achieved in our experiments.

Although Wen et al.³⁰ present a promising gear fault diagnosis framework that couples vibration analysis with machine learning, it still lacks a systematic recipe for selecting optimal feature-extraction parameters and for taming the curse of dimensionality inherent in high-dimensional vibration signals. Our pipeline addresses both bottlenecks in one shot: mutual information (MI) and improved false nearest neighbors (IFNN) are jointly employed to pinpoint the globally optimal delay τ and embedding dimension m, guaranteeing that multiscale permutation entropy always captures fault evolution at its most sensitive scale. A subsequent weighted Mahalanobis distance performs a second-round feature compression and weight redistribution, stripping off redundant dimensions while amplifying the separability of incipient faults. Experimental results demonstrate that this strategy markedly boosts diagnostic robustness and accuracy, offering a self-tuning parameter framework that can be directly replicated for similar rotating-machinery fault-identification tasks characterized by high dimensionality and small sample sizes.

Strengths and limitations

A notable strength of our method is its ability to adaptively optimize key parameters for MPE calculation, which is crucial for accurately characterizing gear fault-induced vibration signatures. The incorporation of information entropy as weights for Mahalanobis distance adds another layer of sophistication, allowing the method to better capture the subtle differences between various fault types. This adaptability and enhanced feature extraction capability contribute to the high fault identification accuracy observed in our experiments.

However, our method is not without limitations. The computation of MPE and the subsequent optimization process may be relatively time-consuming compared to some simpler methods, especially when dealing with large datasets. Additionally, the performance of our method is highly dependent on the quality and quantity of the training data. In scenarios where the training data are insufficient or not representative of the actual operating conditions, the accuracy of fault identification may be compromised.

Future work

Future research will focus on addressing these limitations. One potential direction is to explore more efficient algorithms for MPE calculation and parameter optimization to reduce computational time. Another area of interest is to investigate the use of additional types of data, such as acoustic emission signals or temperature data, in conjunction with vibration data to further enhance the fault identification capability. Moreover, efforts will be made to develop more robust training strategies to improve the generalizability of the method under various operating conditions and fault scenarios.

In conclusion, the proposed weighted Mahalanobis distance-enhanced MPE framework has proven to be a highly effective tool for gear fault identification. Its ability to accurately characterize gear fault-induced vibration signatures and distinguish between different fault types holds great potential for practical applications in gear fault diagnosis.

Conclusions

This paper implements a novel approach for gear fault identification is developed using weighted Mahalanobis distance (MD_Maha) base on MI and improved false nearest neighbor (IFNN) algorithm for MPE method. The accurate identification of weak fault characteristic signals is validated through experimental test. The main contributions of this paper are as follows:

(1)
A parameter selection strategy is proposed to determine the optimum couple of m and τ of MPE. This method can enhance the fault feature extraction ability with help of parameter optimization process for MPE.
(2)
The MPE values of various fault characteristic samples are calculated by the MPE method based on MI and IFNN algorithm. Experiments on tooth surface pitting faults and tooth wear faults show that the MPE method exhibits better performance in describing the dynamic characteristics of vibration signals.
(3)
The min - MD_Maha of each fault vibration samples are acquired, which is accurateness about 76.87% of fault characteristic identification. Then, the useful information contained is obtained in different fault samples by information entropy method and which is the weight of MD_Maha. Finally, the weight MD_Maha of each fault vibration sample is achieved, and which is accurateness about 99.72% of fault characteristic identification by weighted MD_Maha.
(4)
The experimental results indicate that the weighted MD_Maha for the MPE method is efficient to identification the vibration characteristics of gear faults, thus demonstrating the effectiveness of this method for fault characteristic identification.

Data availability

All simulation data and the majority of experimental results are provided in this article and supplementary materials. Selected datasets are withheld due to confidentiality agreements. The data are available from the corresponding author upon reasonable request.

Abbreviations

IFNN:: Improved false nearest neighbor
MPE:: Multi-scale permutation entropy
X :: Signal time series
Y :: Reconstruction matrix
m :: Embedding dimension of the phase space
τ :: Delay time of a time sequence
K :: Number of reconstructed components in phase space
S :: Scale factor
H _p :: Normalization of entropy value
MI :: Mutual information
I(τ):: Mutual information entropy
E(m):: False nearest neighbors criterion
min-MD_Maha :: The minimum Mahalanobis distance
MD_Maha :: Mahalanobis distance
µ _Y :: Vector set
C _Y :: Covariance matrix
$M{D^W}_{{Maha}}$ :: Weighted Mahalanobis distance
$\Im$ :: Weight of the sample
A ⁺ :: Positive indicator set
A ⁻ :: Negative indicator set
e _j :: Information entropy
Ac :: Fault identification accuracy rate
C :: Number of correctly identified faulty samples
T :: Total number of fault samples

References

Han, Y. Hierarchical decomposed dual-domain deep learning for sparse-view CT reconstruction. Phys. Med. Biol. 69 (8), 17 (2024).
Article Google Scholar
Li, Y. et al. A method based on refined composite multi-scale symbolic dynamic entropy and ISVM-BT for rotating machinery fault diagnosis. Neurocomputing 13 (315), 246–260 (2018).
Article Google Scholar
Zhou, X. et al. Hierarchical multiscale Fluctuation-Based symbolic fuzzy entropy: A novel tensor health indicator for mechanical fault diagnosis. Sens. J. IEEE. 25 (3), 5013–5030 (2025).
Article ADS Google Scholar
Singh, A. & Parey, A. Gearbox fault diagnosis under non-stationary conditions with independent angular re-sampling technique applied to vibration and sound emission signals. Appl. Acoust. 144, 11–22 (2019).
Article Google Scholar
Bai, L. et al. Weak fault feature extraction of the rotating machinery using flexible analytic wavelet transform and nonlinear quantum permutation entropy. Comput. Mater. Continua 79 (3). (2024).
Hou, F., Chen, J. & Dong, G. Weak fault feature extraction of rolling bearings based on globally optimized sparse coding and approximate SVD. Mech. Syst. Signal Process. 111, 234–250 (2018).
Article ADS Google Scholar
Watt, S. & Politi, A. Permutation entropy revisited. Chaos Solitons Fractals. 120, 95–99 (2019).
Article ADS MathSciNet Google Scholar
Xu, M., Wei, Y., Li, Y. & Huang, W. Gearbox fault diagnosis based on local mean decomposition, permutation entropy and extreme learning machine. J. VibroEng. 18 (3), 1459–1473 (2016).
Article Google Scholar
Chen, D. et al. Fault diagnosis based on FVMD multi-scale permutation entropy and GK fuzzy clustering. J. Mech. Eng. 54 (14), 16–27 (2018).
Article Google Scholar
Yan, X. & Jia, M. Intelligent fault diagnosis of rotating machinery using improved multiscale dispersion entropy and mRMR feature selection. Knowl. Based Syst. 163, 450–471 (2019).
Article Google Scholar
Yue, C. et al. A novel fault diagnosis model for rolling bearings in electric drive systems using multi-scale improved weighted permutation entropy and radial basis function neural network. Insight-Non-Destructive Test. Condition Monit. 67 (1), 9 (2025).
Google Scholar
Shen, C. et al. Optimal Weighted multi-scale entropy-energy Ratio Feature for Rolling Bearing Degradation Assessment (IOP Publishing Ltd, 2025).
Sun, Y. et al. Fault diagnosis for railway point machines using VMD Multi-Scale permutation entropy and relieff based on vibration signals. Chin. J. Electron. 34 (1), 204–211 (2025).
Article Google Scholar
Shen, C. et al. Optimal weighted multi-scale entropy-energy ratio feature for machine fault diagnosis. Measurement 242 (Part A). (2025).
Gibson, J. D. Entropy and Mutual Information (Springer, 2025).
Wu, B. et al. Strong asymptotic composition theorems for mutual information measures. Inf. Theory IEEE Trans. (T-IT) 70 (5), 10. (2024).
Liang, J. & Mao, X. Rectifier fault diagnosis based on euclidean norm fusion multi-frequency bands and multi-scale permutation entropy. Electronics 14 (3), 2079–9292. (2025).
Shen, L. & Zhu, Q. Application of singular spectrum analysis to nonstationary time series in flow-induced vibrations of a circular cylinder. (2023).
Xiong, L. et al. Dynamic adaptive graph convolutional transformer with broad learning system for multi-dimensional chaotic time series prediction. Appl. Soft Comput. 157 (000), 13 (2024).
Google Scholar
De Souza, S. M. & Rojas, O. Quasi-phases and pseudo-transitions in one-dimensional models with nearest neighbor interactions. Solid State Commun. 269, 131–134 (2018).
Article ADS Google Scholar
Pei, J. et al. K-homogeneous nearest neighbor-driven discriminant graph coupled nonnegative matrix factorization for low-resolution image recognition. Pattern Anal. Appl. 27 (3). (2024).
Cao, L., Wu, S. & Zhao, H. Chaos in topological Markov chains. Syst. Sci. Math. Sci. 7 (2), 97–105 (1994).
MathSciNet Google Scholar
Zhou, X. et al. Fault identification technology for gear tooth surface wear based on MPE method by MI and improved FNN algorithm. Vibroeng. PROCEDIA. 28, 24–29 (2019).
Article Google Scholar
Weng, J. M. et al. Research on an improved convolutional neural network fault diagnosis method for exciter system. Australian J. Electr. Electron. Eng. 20 (3), 10 (2023).
Google Scholar
Hassan Sarmadi, A. et al. Energy-based damage localization under ambient vibration and non-stationary signals by ensemble empirical mode decomposition and Mahalanobis-squared distance. J. Vib. Control 1–16. (2019).
Cabana, E., Lillo, R. E. & Laniado, H. Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators. Stat. Pap. 62, (2021).
Lin, J. & Chen, Q. Fault diagnosis of rolling bearings based on multifractal detrended fluctuation analysis and Mahalanobis distance criterion. Mech. Syst. Signal. Process. 38 (2), 515–533 (2013).
Article ADS Google Scholar
Wang, H., Wang, H. & Tang, X. A. Review of deep learning in rotating machinery fault diagnosis and its prospects for Port applications. Appl. Sci. 15, 11303. https://doi.org/10.3390/app152111303 (2025).
Article CAS Google Scholar
Gao, H., Xu, T., Li, R. & Cai, C. Gearbox fault diagnosis based on ICEEMDAN-MPE-AWT and SE-ResNeXt50 transfer learning model. Appl. Sci. 14, 2565. https://doi.org/10.3390/app14062565 (2024).
Article CAS Google Scholar
Juan Wen, H. & Gao Remaining useful life prediction of the ball screw system based on weighted Mahalanobis distance and an exponential model. J. VibroEng. 20 (4), 1691–1707 (2018).
Article Google Scholar

Download references

Acknowledgements

This work supported by grants from the Scientific Research Program Funded by Education Department of Shaanxi Provincial Government of China (Program No. 24JR016), which were highly appreciated by the authors.

Funding

This work supported by grants from the Scientific Research Program Funded by Education Department of Shaanxi Provincial Government of China (Program No. 24JR016), which were highly appreciated by the authors.

Author information

Authors and Affiliations

School of Mechanical Engineering, Shaanxi Polytechnic University, Xianyang, 712000, China
Xintao Zhou, Jialing Zhang & Jiamin Liu
Engineering Research Center of Composite Movable Robot, Universities of Shaanxi Province, Xianyang, 712000, China
Xintao Zhou & Jiamin Liu
School of Machinery and Precision Instrument Engineering, Xi’an University of Technology, Xi’an, 710000, China
Xintao Zhou, Na Ma & Jialing Zhang

Authors

Xintao Zhou
View author publications
Search author on:PubMed Google Scholar
Na Ma
View author publications
Search author on:PubMed Google Scholar
Jialing Zhang
View author publications
Search author on:PubMed Google Scholar
Jiamin Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

1. Xintao Zhou: his contributions in this paper are writing papers, providing research methods, investigation, validation, feature information extraction algorithm, fault identification algorithmsetc and experimental verification, etc. 2. Na Ma: her contributions in this paper are writing papers, mechanical modeling, simulation analysis, etc. 3. Jialing Zhang: her contributions in this paper are experimental data analysis, noise reduction algorithm and thesis translation, etc. 4. Jiamin Liu: her contributions in this paper are studying on nonlinear dynamics of transmission system and thesis translation, fault identification algorithms, etc.

Corresponding author

Correspondence to Xintao Zhou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download ZIP )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, X., Ma, N., Zhang, J. et al. Identification of gear fault by weighted Mahalanobis distance method based on multi-scale permutation entropy. Sci Rep 16, 3104 (2026). https://doi.org/10.1038/s41598-025-33051-1

Download citation

Received: 08 September 2025
Accepted: 15 December 2025
Published: 19 December 2025
Version of record: 23 January 2026
DOI: https://doi.org/10.1038/s41598-025-33051-1