Introduction

As critical components in rotating machinery, gear assemblies perform the dual function of transmitting power and enabling operational capabilities in mechanical systems. Surface degradation initiating at gear tooth interfaces induces detrimental vibration signatures and acoustic emissions, ultimately compromising operational reliability through progressive mechanical performance deterioration1,2. Moreover, gears operate in harsh environments and are prone to failures, resulting in substantial maintenance costs. Hence, it is advantageous to prevent embryonic failures during the gearbox assembly process. Moreover, the early detection of minor faults is critical to avert cascade failure modes and catastrophes3,4.

Gear trains typically operate under harsh conditions, such as fluctuating loads and high levels of background noise. This often results in abnormal vibration induced by gear defects being masked. Moreover, vibration signals associated with gear wear are inherently weak and susceptible to background noise interference5,6. Consequently, extracting fault features and identifying defect patterns from vibration signals rely heavily on advanced signal processing techniques. Gearbox fault diagnosis primarily consists of two stages: feature extraction and fault recognition. In recent years, numerous methods for analyzing gearbox vibration characteristics have emerged, with several finding practical application. Feature extraction is a crucial procedure for gearbox fault diagnosis, as selecting appropriate signal features not only reduce computational time but also improve the efficiency of fault identification. Recently, entropy has become a popular method for extracting fault features in mechanical equipment due to its computational simplicity, anti-noise capability, and robustness, leading to its widespread use7,8. Owing to the complexity and variability of practical problems, various enhanced entropy-based techniques have been developed to improve their effectiveness in practical applications. These include approximate entropy, fuzzy entropy, sample entropy and multiscale permutation entropy (MPE)9,10. Moreover, the approximate entropy method, fuzzy entropy method, and sample entropy method can all be utilized to assess the complexity and regularity of time series. These methods do not require coarse granulation treatment of time series, and the fault detection results are independent of sub-series length, thereby improving analysis efficiency and offering additional advantages. However, the accuracy of these three methods in time series detection is relatively limited and highly dependent on the degree of effective information contained in the samples. By expanding from a single-scale to a multi-scale phase space, the MPE method ensures the integrity of both local and global information of the vibration characteristics11,12. Furthermore, this approach can not only effectively mitigate the interference of noise on fault characteristics but also adept at detecting faint fault signals. At the same time, the MPE method can resolve multi-scale coupling issue among multiple faults and precisely reflect the dynamic mutation capability of the fault system13,14. In addition to the above advantages, the accuracy of MPE method depends on several parameters, including the delay time of time series, the embedding dimension of phase space and the scale factor. Formerly, reasonable values of these parameters were determined through simulation and multiple trial calculations. However, this method of determining parameter values is subject to a certain degree of subjectivity, which reduces the accuracy of the MPE method in time series detection.

The computation of the MPE method is highly dependent on algorithm parameters, with different embedding dimensions and delay times significantly influencing the resulting entropy values. Otherwise, the accuracy of time series detection is affected by the delay time of the signal and the embedding dimension of the phase space. For the minimum delay time τ: while τ is too large, all points containing feature information concentrate on the main diagonal of the feature matrix, leading to strong correlation among the features. While τ is small, the points carrying characteristic information exhibit no significant correlation15,16. Hence, while selecting τ, the influence of this parameter on the signal’s characteristic information must be evaluated. Research has shown that the mutual information method can be used to calculate relatively optimal τ parameters of a signal17,18. Mutual information (MI) serves as an important tool in information theory to assess the interdependence between two sets of events. The first local minimum of the signal’s MI function is generally adopted as the optimal delay, and the method has perfect theoretical basis and is independent of the embedding dimension. As for the embedding dimension m: current research indicates that the optimal range for m is between 3 and 7. While m is too large, phase space reconstruction becomes computationally intensive and increases algorithm runtime. While m is too small, the reconstructed information is too slightly to realize the extraction and detection of mutation signals19. In addition, vibrations caused by mechanical system failures exist in a multi-dimensional space. When the dimension of the dynamic system is reduced, two points that are non-adjacent in the multi-dimensional space may appear neighboring in the low-dimensional space—a phenomenon known as False Nearest Neighbors (FNN)20. Altering the value of m can also lead to misinterpretation of the actual positions of two neighboring points in the phase space. Selecting the optimal m enables the identification and elimination of these FNN points. Therefore, the accuracy of FNN detection is affected by various thresholds, resulting in strong subjectivity of the discriminant consequences21. To address this issue, Cao improved the algorithm so that the determination of m depends only on the value of τ, effectively overcoming the influence of various thresholds22. This method is used to calculate the optimal m value within the signal’s phase space, and the improved FNN method is commonly referred to as Cao’s algorithm23.

Although the MPE method can fully extract weak fault feature information, it struggles to effectively distinguish such information when the differences among multiple characteristic features are subtle. Research has shown that the MDMaha method can address this issue. The MDMaha algorithm takes into account the correlations among various characteristic indicators and eliminates interference caused by the relationships between feature variables. In addition, the calculation of MDMaha is unaffected by parameter dimensions and is independent of the scale of feature measurement, thereby offering advantages such as higher accuracy in fault diagnosis results24. Nevertheless, when the MDMaha method is used alone to directly identify fault characteristic information, it treats the importance of different characteristic samples equally. This leads to a significant discrepancy between the theoretical MDMaha values and the actual situation, which impairs the accuracy of fault feature identification25. To improve the accuracy of the MDMaha method, the importance of different characteristic samples should be treated differently according to practical conditions. Furthermore, leveraging the useful information contained in the samples, information entropy can be used to weight samples with different affiliations, thereby calculating the weighted MDMaha values in line with the actual scenario. The information entropy method is adopted to compute the objective weight of each sample. While the randomness of a set index is high, the information entropy value is larger, and the indicator carries greater importance in the weighting of decision-making samples. Conversely, lower entropy indicates lower importance26,27.

In this paper, a novel gear fault identification approach is developed using weighted MDMaha based on MI and enhanced FNN techniques for the MPE method. The accuracy of the MPE method in identifying the dynamic mutation characteristics of fault signals depends on whether the values of parameters such as τ and m are reasonable. Initially, leveraging the basic principle and strengths of the MI method, the minimum value of the delay time parameter τ is calculated for the fault vibration signal. Subsequently, the optimal value of m is computed via an improved FNN method, which reduces the subjectivity in parameter selection and enhances the precision of MPE in characterizing fault features. Secondly, the min - MDMaha of different fault characteristics is calculated to identify fault characteristics. In order to improve the accuracy of fault feature recognition by MDMaha method, the information entropy approach is adopted to quantify the amount of useful information contained in different attribute samples, which is taken as the weight of MDMaha. Ultimately, gear fault experiments are conducted to validate the reliability of the proposed framework. The results confirm the effectiveness of combining weighted MDMaha with MI and improved FNN-optimized MPE for accurate fault identification.

Main contributions and innovations in this paper:

  1. (1)

    Paper propose a new method for gear fault identification based on the weighted Mahalanobis distance method combined with multi-scale permutation entropy. This method effectively integrates the advantages of multi-scale permutation entropy for feature extraction and the Mahalanobis distance for fault classification, providing a more accurate and robust approach for identifying gear faults.

  2. (2)

    Through extensive parametric studies, we determine the optimal embedding dimension and delay parameters for multi-scale permutation entropy calculation. This optimization process ensures that the extracted features are highly representative of the underlying fault patterns, leading to improved fault identification performance.

  3. (3)

    This method demonstrates strong robustness and generalizability across various experimental setups, load conditions, and fault types/severities. We have conducted comprehensive validation experiments, including tests on different gear types under different load scenarios (no-load, light-load, medium-load, and heavy-load conditions) and with various fault types (pitting, wear, misalignment, and broken teeth). The results show that our method can reliably identify gear faults under diverse real-world conditions.

  4. (4)

    This paper provide a thorough validation of our method by comparing it with existing fault identification methods. The comparative analysis is based on key performance indicators such as accuracy, sensitivity, and specificity. Our method outperforms several existing methods, showcasing its superiority in gear fault identification.

Analysis of the theoretical underpinnings

Fundamental principle of MPE

In this section, the basic principle of calculating the permutation entropy associated with time sequences is briefly introduced and sorted out, so as to provide theoretical reference for the subsequent used this method to extract feature information of the fault mechanism and the application of MPE approach. The calculation process is shown as Fig. 1.

  1. (1)

    Formulate the phase space matrix from time sequences

As shown in Eq. (1), a time sequences X of a signal, it is defined as follows:

$$X = \{ x(i)|i = 1,2, \cdots ,K + (m - 1)\tau \}$$
(1)

Where x(1) represents a time series, where, m is the embedding dimension of the phase space; τ is the delay time of a time sequence; K is the number of reconstructed components in phase space.

And the phase space reconstruction of the time sequences was carried out, and the reconstruction matrix Y of the phase space of the sequence was obtained, as shown in Eq. (2):

$$Y = \left[ {\begin{array}{*{20}c} {Y1} \\ {Y2} \\ \vdots \\ {Yj} \\ \vdots \\ {YK} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {x(1)} & {x(1 + \tau )} & \cdots & {x(1 + (m - 1)\tau )} \\ {x(2)} & {x(2 + \tau )} & \cdots & {x(2 + (m - 1)\tau )} \\ \vdots & \vdots & \vdots & \vdots \\ {x(j)} & {x(j + \tau )} & \cdots & {x(j + (m - 1)\tau )} \\ \vdots & \vdots & \vdots & \vdots \\ {x(K)} & {x(K + \tau )} & \cdots & {x(K + (m - 1)\tau )} \\ \end{array} } \right]$$
(2)

Where m is the embedding dimension of the phase space; τ is the delay time of a time sequence; K is the number of reconstructed components in phase space.

  1. (2)

    Ascending arrangement of elements in each reconstructed element

For illustration, the elements in the Jth component of the reconstruction matrix Y are selected to be arranged in ascending order, and then the position numbers of each element in the column are recorded: j(1), j(2)…, j(m). The sequence of the Jth element is shown in Eq. (3):

$$x(j + (j_{{(1)}} - 1)\tau ) \le x(j + (j_{{(2)}} - 1)\tau ) \le \cdots \le x(j + (j_{{(m)}} - 1)\tau )$$
(3)
  1. (3)

    Computation the entropy value and normalization of each reconstructed component

$$H_{P} = - (\sum\limits_{{g = 1}}^{K} {P_{g} \ln P_{g} } )/\ln (m!)$$
(4)
  1. (4)

    Coarse granulation and calculate the permutation entropy value of coarse-grained sequences

The permutation entropy value of the sequence was calculated, establishing the multiscale permutation entropy as shown in Eq. (5):

$$MPE(X,s,m,\tau ) = PE(\frac{1}{s}\sum\limits_{{i = (j - 1)s + 1}}^{{js}} {x_{i} } ,m,\tau )$$
(5)

Where X is the time sequences, s is the scale factor, m represents the embedding dimension, τ represents the delay time.

Fig. 1
Fig. 1
Full size image

Flow chart of algorithm structure.

The approach to determining crucial threshold parameters

Fundamental principle of the MI approach

Specifically, the relationship between the MI of time sequences and delay time is depicted in Eq. (6):

$$I(\tau ) = I(x_{i} ,x_{{i + \tau }} ) = \sum\limits_{{i = 1}}^{N} {P(x_{i} ,x_{{i + \tau }} )\log _{2} \left[ {\frac{{P(x_{i} ,x_{{i + \tau }} )}}{{P(x_{i} )P(x_{{i + \tau }} )}}} \right]}$$
(6)

where p(xi) represents the probability distribution function of the time sequence xi; p(xi+τ) is the statistical distribution function of time sequences xi+τ; p(xi, xi+τ) is the integrated probability distribution function of time sequences xi, xi+τ; H(xi) is the information entropy of time sequences xi; H(xi+τ) is the information entropy of time sequences xi+τ; H(xi, xi+τ) is the integrated information entropy of time sequences xi and xi+τ.

Theoretical basis of the improved FNN method

The improved calculation principle of FNN points is shown as follows:

The FNN points method is employed to compute the relationship between E(m) and m, as depicted in Eqs. (7) and (8):

$$E(m) = \frac{1}{{N - m\tau }}\sum\limits_{{i = 1}}^{{N - m\tau }} {\frac{{\left\| {X_{{m + 1}} (i) - X_{{m + 1}}^{f} (i)} \right\|}}{{\left\| {X_{m} (i) - X_{m}^{f} (i)} \right\|}}}$$
(7)
$$E_{1} (m) = \frac{{E(m + 1)}}{{E(m)}}$$
(8)

where m-dimensional temporal sequences Xm(i), and the nearest neighbor point \(X_{m}^{f} (i)\); (m + 1)-dimensional temporal sequences Xm+1(i), and the nearest neighbor point \(X_{{m + 1}}^{f} (i)\); \(\left\| \bullet \right\|\) is the \(\infty\)-norm.

Whereas, in practical applications, with the increase of mi, it is difficult to distinguish whether the alteration of E1(mi) value is stabilization or not. Therefore, a criterion is added, as shown in Eqs. (9) and (10):

$${E^*}(m)=\frac{1}{{N - m\tau }}\sum\limits_{{i=1}}^{{N - m\tau }} {\left\| {{X_{m+1}}(i) - X_{{m+1}}^{f}(i)} \right\|}$$
(9)
$$E_{2} (m) = \frac{{E^{*} (m + 1)}}{{E^{*} (m)}}$$
(10)

The investigation shows that, the value of E2(m) is constant to 1 for random time sequences, and the sequences data is not dependency. For deterministic time series, the value of E2(m) depends on the embedding dimension m within a certain threshold, but the E2(m) is always not equal to 1.

Principles of the Mahalanobis distance (MDMaha) method

The MDMaha

The calculation principle of the Mahalanobis distance algorithm is as follows: There is a test sample X=[x1, x2, …, xn]T, And a failure feature sample Y=[y1, y2, …,yn]T, The Mahalanobis distance MDMaha of the measured sample X and the characteristic sample Y is shown in formula (11):

$$M{D_{Maha}}=\sqrt {{{(X - {\mu _Y})}^T}C_{Y}^{{-1}}(X - {\mu _Y})}$$
(11)

Where µY is mean vector, and \(\mu _{Y} = \frac{1}{{n_{Y} }}\varsigma \sum\limits_{{i = 1}}^{{n_{Y} }} {y_{i} }\). CY is Covariance matrix, and \(C_{Y} = \frac{1}{{n_{Y} - 1}}\sum\limits_{{i = 1}}^{{n_{Y} }} {(x_{i} - \mu _{Y} )} (x_{i} - \mu _{Y} )^{T}\). ς= [1, 1, …, 1]T.

Weighted MDMaha by information entropy method

The formula for calculating the weighted MDMaha of samples (X, Y) is as follows:

$$MD_{{Maha}}^{W} = \left[ {(X - \mu _{Y} )^{T} \Im C_{Y}^{{ - 1}} \Im (X - \mu _{Y} )} \right]^{{\frac{1}{2}}}$$
(12)

Where µY is the mean of sample, CY is the covariance of sample, \(\Im\) is the weight of the sample.

Then, the calculation steps of the information entropy weight algorithm are shown as follows:

  1. (1)

    The original matrix A with n samples and m decision indicators was constructed, and A = [A1, A2, …, Ai, …, Am]T. The attribute value corresponding to the jth index of the ith sample in A is aij. As shown in Eq. (13).

$$A=\left[ {\begin{array}{*{20}{c}} {{A_1}} \\ {{A_2}} \\ \vdots \\ {{A_m}} \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{a_{11}}}&{{a_{12}}}& \cdots &{{a_{1n}}} \end{array}} \\ {\begin{array}{*{20}{c}} {{a_{21}}}&{{a_{22}}}& \cdots &{{a_{2n}}} \end{array}} \\ {\begin{array}{*{20}{c}} {{a_{i2}}}&{{a_{i2}}}& \ddots &{{a_{in}}} \end{array}} \\ {\begin{array}{*{20}{c}} {{a_{m1}}}&{{a_{m2}}}& \cdots &{{a_{mn}}} \end{array}} \end{array}} \right]$$
(13)
  1. (2)

    Determine the positive index A+ and negative index A of matrix A, and normalize them. The mathematical expression is as follows:

Positive indicator set:

$$A^{ + } = \frac{{a_{{ij}} - \min \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right]}}{{\max \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right] - \min \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right]}}$$
(14)

Negative index set:

$$A^{ - } = \frac{{\max \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right] - a_{{ij}} }}{{\max \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right] - \min \left[ {a_{{1j}} ,a_{{2j}} , \cdots ,a_{{nj}} } \right]}}$$
(15)
  1. (3)

    Calculate the proportion of the i-th sample in the jth index pij, as shown in Eq. (16):

$$p_{{ij}} = \frac{{a_{{ij}} }}{{\sum\limits_{{i = 1}}^{n} {a_{{ij}} } }}$$
(16)
  1. (4)

    The information entropy of the j-th index is calculated, as shown in Eq. (17):

$$e_{j} = - \frac{1}{{\ln m}}\sum\limits_{{i = 1}}^{n} {(p_{{ij}} )} \ln (p_{{ij}} )$$
(17)
  1. (5)

    The objective weight value of the sample is calculated, as shown in Eq. (18):

$$\Im = \frac{{1 - e_{j} }}{{\sum\limits_{{j = 1}}^{m} {(1 - e_{j} )} }}$$
(18)

The proposed fault identification method based on weighted Mahalanobis distance and multiscale permutation entropy differs from traditional methods in terms of computational complexity. The computational complexity of multiscale permutation entropy primarily depends on the embedding dimension. In the field of mechanical engineering, the embedding dimension is usually small, thus making the computational complexity of permutation entropy relatively manageable. The computational complexity of the weighted Mahalanobis distance mainly relies on the dimension of the feature vector. In this method, the dimension of the feature vector is also small, so the computational complexity of the weighted Mahalanobis distance is within an acceptable range. Compared with traditional time-domain and frequency-domain feature extraction methods, although this method has slightly higher computational complexity, it significantly improves the accuracy and robustness of fault identification.

The calculation method for the fault identification accuracy rate

To evaluate the performance of fault identification, we calculated the fault identification accuracy. Accuracy is obtained by dividing the number of correctly identified fault samples by the total number of fault samples and multiplying by 100%. The specific calculation formula is:

$$Ac = \frac{C}{T} \times 100\%$$
(19)

Where Ac fault identification accuracy rate; C number of correctly identified faulty samples; T total number of fault samples.

The accuracy of fault identification is an indicator that can intuitively reflect the performance of the method in fault identification tasks.

Experimental verification and result analysis

Figure 2 is shown that, it is an experimental bench for simulating gear surface pitting and gear surface wear failure. The experimental conditions are set as follows: 4 test points are set on the outside of the gearbox, testing point I, II, III and IV. (1. Variable frequency motor, 2. Gearbox, 3. Magnetic powder loading device, 4. Signal controller, 5. Acquisition card, 6. Acquisition computer, 7. Industrial personal computer, 8. Operation panel).

Experimental parameters: operational speed 1500 r/min; magnetic powder loader is 0.1 A; gear modulus 2 mm, tooth number z1 = 19 and z2 = 95, tooth width 20 mm, and the material of the gear is 18CrNiMo7 steel. In this experiment, the time-domain data of vibration acceleration of gearbox pitting fault and gear tooth wear fault were extracted respectively. And a total of 30 groups of experimental data were extracted. Among them, the first 20 groups are used as the training sample set, and the last 10 groups are used as the test sample set.

Fig. 2
Fig. 2
Full size image

Structural layout of test bed.

Calculate τ of the signal

The results were derived through simulation analysis and are presented in Fig. 3. While the τ = 2, and the value of MI I(τ) was the smallest of the experimental data on the 4 measuring points of the gearbox. In the indication, the noise is dominant in the time sequence while the correlation of time sequence is small in the case that the τ is larger than 2. Additionally, the value of mutual information I(τ) of position III and IV is higher than other position’s in time sequences.

In the indication shows that, the time sequences have little mutual correlation at the positions of measuring points I and II, and the time sequences at these two measuring points are significantly disturbed by background noise. In conclusion, the analysis demonstrates that the τ values of the time sequences at the four measuring points on the gearbox are uniformly 2.

Fig. 3
Fig. 3
Full size image

The MI curve at each measuring point.

Calculate m of the signal

In accordance with the calculation principle of the improved FNN point method, the results illustrated in Fig. 4 were acquired via simulation analysis. It this figure, the change curve of the improved FNN proportion E1(m) values corresponding to the time sequences of the four measurement points is basically consistent in the definition domain of 1 ≤ m ≤ 20. And the change curves of the value of E1(m) all reach a plateau after m is 15. At this point, the value of E1(m) can’t be used to determine the value of m of time sequences. Therefore, the values of the supplementary criterion E2(m) of pseudo nearest neighbor rate is used to determine the m of time series. And the FNN proportion E2(m) values corresponding to the time sequences at the four measuring points fluctuates slightly within the definition domain 1 ≤ m ≤ 5. While the m = 5, the value of E2(m) reaches 1 for the first time. Summary, it can be concluded shows that: while m is specifically 5, it is the most appropriate for the time series at the four measuring points on the gearbox.

Fig. 4
Fig. 4
Full size image

The improved FNN proportion at each measuring point.

Multi-scale permutation entropy method to extract fault features of sample set

Firstly, using the information entropy method and the improved FNN method, the minimum τ = 2 of the 20 samples and the optimal m = 7 of phase space are calculated. Then, the MPE method was used to extract the dynamic mutation characteristics of the 20 sample sets of each failure, and the extraction results are shown in Fig. 5. In this figure shown the variation disciplinarian of the MPE value of each sample over the entire measurement scale is approximately similar. Additionally, the characteristic information of pitting faults on the tooth surface is mainly distributed in the measurement scale of S = 1 ~ 5. The content of fault information in each sample is very consistent, indicating that the quality of the selected sample is effective.

Fig. 5
Fig. 5
Full size image

Distribution of MPE of fault sample set.

In order to further weaken the implication of interference information on the precision of the fault feature information extraction results, the average values of the sample sets of pitting failure and wear failure are calculated respectively. The average value of the sample set is used as the standard value for fault feature identification.

Figure 6 is shown the distribution results of the standard MPE values of pitting and tooth wear faults: At a single scale (the scale factor S = 1), the difference between the permutation entropy value of the pitting fault on the tooth surface and the permutation entropy value under normal operating conditions is small, and it is difficult to intuitively determine the type of the tooth failure. For the same reason, the value of the permutation entropy of the gear tooth wear fault is in the scale factor S = 1 ~ 6, and the maximum difference of the MPE is 0.0493. While the gear teeth have a wear fault, the actual meshing clearance between the gear teeth is larger than the theoretical design value; the gear side clearance shock vibration during gear operation is greater. Nevertheless, while the scale factor S ≥ 6, the permutation entropy of the pitting and tooth wear faults on the tooth surface is exactly the same as the permutation entropy of the gear system under normal conditions. This shows that while S = 1 ~ 5, the fault vibration signal already contains complete fault characteristic information.

Fig. 6
Fig. 6
Full size image

Distribution of MPE standard value of tooth surface fault characteristics.

Furthermore, Fig. 7 is shown the relative error values of the characteristic vibration signal MPE relative to the tooth surface pitting failure and gear tooth wear failure under the normal operating conditions. And the dynamic characteristics of the pitting failure of gear teeth fluctuate slightly throughout the measurement scale, and the peak value of the relative error does not exceed 2%. Correspondingly, the relative error value of the gear tooth wear failure and the normal gear system is within a range of the scale factor S = 1 to 5, and the relative error value fluctuation range is relatively severe. While S = 2, the maximum value of the relative error is 6.79%. Due to the fact that after a certain amount of tooth surface wear, the tooth side clearance increases, causing the fault information to overlap with the tooth side clearance impact. Additionally, when the scale factor is 2, the fault information in the phase space matrix is compressed toward the main diagonal, increasing the difficulty of fault identification and leading to larger errors.While S ≥ 6, the change of the relative error of the gear tooth wear failure is consistent with the tooth surface pitting failure.

Fig. 7
Fig. 7
Full size image

Distribution diagram of relative error between fault and normal characteristics.

Identification of fault characteristics by the min - MDMaha method

In this section, according to the MPE values of samples No. 18 to No. 30 in the test sample set to calculate the min - MDMaha from each test sample to the standard failure value. The calculation results are shown in Fig. 8. “pitting corrosion standard - pitting corrosion detection” is the min - MDMaha between the evaluation standard of pitting failure characteristics and the pitting failure test sample set; “pitting corrosion standard - wear detection” is the evaluation standard of pitting failure characteristics value and the min - MDMaha of the wear failure test sample set. According to the principle of the min - MDMaha of similar faults, it is shown that only the sample numbers S = 1, 6, and 12 fail to identify, shown as the red “□” mark. The difference between the min - MDMaha of these three samples is very slight, and the difference range is only 1.6178 ~ 3.5559. Finally, the failure characteristics of these samples were not successfully identified. And the min - MDMaha for pitting faults is only 69.23% for the centralized identification of pitting fault samples.

It is explained that the Mahalanobis distance method failed to identify tooth surface pitting faults at No. 1, 6, and 12, necessitating an improvement to the Mahalanobis distance method.

Fig. 8
Fig. 8
Full size image

The min - MDMaha of pitting fault characteristics.

Correspondingly, Fig. 9 is shown results of the min - MDMaha of wear failures. In figure, “wear standard - wear test” is the minimum MDMaha of the wear failure feature evaluation standard value and the wear failure test sample set; “wear standard-pitting detection” is the wear standard feature evaluation standard value and pitting failure of the min - MDMaha of sample set. According to the principle that the MDMaha of the same kind of fault is the smallest, it is concluded that the fault identification of samples No. 3 and No. 12 in the sample set fails, shown as the red “□” mark. The difference between the min - MDMaha of samples No. 3 and No. 12 is 6.0233 and 7.5069, respectively. The fault characteristics of these two groups of samples should be able to be identified, but ultimately, the identification failed. Finally, the accuracy rate is 84.72% of wear fault identification by the min - MDMaha.

Fig. 9
Fig. 9
Full size image

The min - MDMaha of wear fault characteristics.

Identification of fault characteristics by weighted MDMaha method

  1. (1)

    Determine the weight parameter by information entropy method

Figure 10 is the weights distribution on each measurement scale of the two fault type samples. The black line is the weight of the pitting fault sample set, and the blue line is the weight of the sample set of gear wear fault. And it is shown that, the weight distribution of the two kinds of fault sample sets in the whole measurement scale, except the point of S = 6, 9 is roughly the same.

Fig. 10
Fig. 10
Full size image

Distribution of entropy weight on multiple scales.

  1. (2)

    Identification of fault characteristics by weighted MDMaha method

Figure 11a shows the recognition of pitting fault characteristics of tooth surface by weighted MDMaha method. “pitting standard - pitting corrosion detection” is the weighted MDMaha between the evaluation standard of pitting failure characteristics and the pitting failure test sample set, and show as the green “” mark; “pitting standard - wear detection” is the evaluation standard of pitting failure characteristics value and the weighted MDMaha of the wear failure test sample set, show as the orange “□” mark. Correspondingly, Fig. 11b is the results of weighted MDMaha of wear fault characteristics. “wear standard - wear corrosion detection " is the weighted MDMaha of the wear failure feature evaluation standard value and the wear failure test sample set, and show as the blue “∆” mark; “wear standard-pitting corrosion detection” is the wear standard feature evaluation standard value and pitting failure of the weighted MDMaha of sample set, and show as the blank “□” mark. The analysis above shows that the weight MDMaha of each fault vibration sample is achieved, and which is identification accurateness about 99.72% of fault characteristic.

Fig. 11
Fig. 11
Full size image

Weighted MDMaha value of fault characteristics.

Discussion

The proposed gear fault diagnosis method integrating MI, IFNN algorithm, and weighted Mahalanobis distance based on MPE has demonstrated significant efficacy in identifying gear faults, as evidenced by the substantial improvement in fault identification accuracy from 76.87% to 99.72% with the application of weights derived from information entropy.

Comparison with existing methods

Compared with the deep-learning-based gear fault diagnosis scheme proposed by Wang et al. under controlled laboratory conditions28, the present strategy demonstrates a “lightweight” advantage that is far better suited to real industrial sites. But our deep learning models, while powerful, often necessitate extensive training data and computational resources. In contrast, our method, which relies on MI, IFNN, and MPE, is more computationally efficient and can achieve high accuracy with relatively fewer data samples. This is particularly advantageous in practical industrial settings where large-scale labeled data may not always be readily available.

Gao et al.29 employed wavelet transforms to extract gear-fault features; while effective at capturing transient impacts, the approach offers insufficient characterization of the strongly nonlinear, multiscale-coupled information commonly present in fault-induced vibrations. Our method, by utilizing MPE, is specifically designed to quantify the complexity of time series signals across multiple scales, thereby providing a more comprehensive representation of the underlying fault patterns. This enhanced representation capability is crucial for distinguishing between different types of gear faults, as demonstrated by the high accuracy achieved in our experiments.

Although Wen et al.30 present a promising gear fault diagnosis framework that couples vibration analysis with machine learning, it still lacks a systematic recipe for selecting optimal feature-extraction parameters and for taming the curse of dimensionality inherent in high-dimensional vibration signals. Our pipeline addresses both bottlenecks in one shot: mutual information (MI) and improved false nearest neighbors (IFNN) are jointly employed to pinpoint the globally optimal delay τ and embedding dimension m, guaranteeing that multiscale permutation entropy always captures fault evolution at its most sensitive scale. A subsequent weighted Mahalanobis distance performs a second-round feature compression and weight redistribution, stripping off redundant dimensions while amplifying the separability of incipient faults. Experimental results demonstrate that this strategy markedly boosts diagnostic robustness and accuracy, offering a self-tuning parameter framework that can be directly replicated for similar rotating-machinery fault-identification tasks characterized by high dimensionality and small sample sizes.

Strengths and limitations

A notable strength of our method is its ability to adaptively optimize key parameters for MPE calculation, which is crucial for accurately characterizing gear fault-induced vibration signatures. The incorporation of information entropy as weights for Mahalanobis distance adds another layer of sophistication, allowing the method to better capture the subtle differences between various fault types. This adaptability and enhanced feature extraction capability contribute to the high fault identification accuracy observed in our experiments.

However, our method is not without limitations. The computation of MPE and the subsequent optimization process may be relatively time-consuming compared to some simpler methods, especially when dealing with large datasets. Additionally, the performance of our method is highly dependent on the quality and quantity of the training data. In scenarios where the training data are insufficient or not representative of the actual operating conditions, the accuracy of fault identification may be compromised.

Future work

Future research will focus on addressing these limitations. One potential direction is to explore more efficient algorithms for MPE calculation and parameter optimization to reduce computational time. Another area of interest is to investigate the use of additional types of data, such as acoustic emission signals or temperature data, in conjunction with vibration data to further enhance the fault identification capability. Moreover, efforts will be made to develop more robust training strategies to improve the generalizability of the method under various operating conditions and fault scenarios.

In conclusion, the proposed weighted Mahalanobis distance-enhanced MPE framework has proven to be a highly effective tool for gear fault identification. Its ability to accurately characterize gear fault-induced vibration signatures and distinguish between different fault types holds great potential for practical applications in gear fault diagnosis.

Conclusions

This paper implements a novel approach for gear fault identification is developed using weighted Mahalanobis distance (MDMaha) base on MI and improved false nearest neighbor (IFNN) algorithm for MPE method. The accurate identification of weak fault characteristic signals is validated through experimental test. The main contributions of this paper are as follows:

  1. (1)

    A parameter selection strategy is proposed to determine the optimum couple of m and τ of MPE. This method can enhance the fault feature extraction ability with help of parameter optimization process for MPE.

  2. (2)

    The MPE values of various fault characteristic samples are calculated by the MPE method based on MI and IFNN algorithm. Experiments on tooth surface pitting faults and tooth wear faults show that the MPE method exhibits better performance in describing the dynamic characteristics of vibration signals.

  3. (3)

    The min - MDMaha of each fault vibration samples are acquired, which is accurateness about 76.87% of fault characteristic identification. Then, the useful information contained is obtained in different fault samples by information entropy method and which is the weight of MDMaha. Finally, the weight MDMaha of each fault vibration sample is achieved, and which is accurateness about 99.72% of fault characteristic identification by weighted MDMaha.

  4. (4)

    The experimental results indicate that the weighted MDMaha for the MPE method is efficient to identification the vibration characteristics of gear faults, thus demonstrating the effectiveness of this method for fault characteristic identification.