Abstract
Environmental effects often cause variability in dynamic features, obscuring actual damage indicators and leading to false alarms in damage detection. The Gaussian mixture model (GMM) based method is an effective solution, but challenges such as selecting initial model parameters and determining the optimal number of Gaussian components can hinder its performance. To address these challenges, we propose a two-step method that combines sequential iteration with the GMM approach. In the first step, sequential iteration is employed to determine initial model parameters and the optimal number of Gaussian components for a reliable GMM. In the second step, the expectation-maximization (EM) algorithm is used to establish the GMM, clustering the training data into local subsets. For each subset, the Mahalanobis squared distance (MSD) between each sample point and the center of its Gaussian component is calculated. This distance is used to create a novelty index based on the minimum Mahalanobis squared distance (MMSD), facilitating effective damage detection by the statistical control chart. Moreover, generalized extreme value distribution modeling method is presented to determine an accurate control limit. We validate our method using real data from two bridges, demonstrating its effectiveness through comparative analysis.
Similar content being viewed by others
Introduction
Bridge structural health monitoring (SHM) plays a vital role in ensuring the safety and longevity of transportation infrastructure. Over time, bridges are subjected to various factors such as traffic loads, environmental conditions, and natural aging, which can cause deterioration. Early detection of damage through SHM enables timely maintenance, reducing the risk of catastrophic failures. By continuously assessing the condition of bridge components, the need for costly repairs can be minimized, and service life can be extended1,2. This proactive approach ultimately enhances public safety, reduces disruptions to transportation, and promotes sustainable infrastructure management.
In recent years, there has been extensive research on vibration-based damage identification methods within the field of SHM3. These methods operate on the principle that damage can induce alterations in the dynamic characteristics of structures, including frequency and mode shapes. Subsequently, according to changes in the identified dynamic features, structural damage can be detected through the application of inverse problem-solving techniques4. However, in practical engineering applications, the dynamic characteristics of structures are not solely influenced by damage; they are also significantly impacted by diverse environmental and operational conditions, such as temperature5,6 humidity7 wind8 and traffic loads9. These influences can even completely mask the changes in dynamic characteristics caused by damage, thereby making accurate detection of damage occurrence challenging if environmental factors are not duly considered10.
Accordingly, researchers have proposed various methods to effectively remove the environmental effects from the damage identification process. Depending on whether the environmental parameters are measured, these methods can be categorized as input-output and output-only methods11. Input-output methods attempt to establish an explicit relationship between environmental variables and dynamic characteristics to quantify the impact of environmental variables on dynamic characteristics. Common methods include linear regression12,13 polynomial regression14 bilinear regression15 support vector regression16 random forest models17 Gaussian process regression18 and artificial neural networks19 among others. These methods use environmental variables and the monitored dynamic characteristics as the model’s inputs and outputs, respectively, making it easier to interpret the influence of environmental variables on dynamic characteristics. However, their drawback lies in the requirement for comprehensive measurement of environmental variables, which is often difficult to obtain in practice20. Furthermore, once the input-output model is established, the sensors must remain in their original positions, as any failure in a sensor can lead to false-positive or false-negative outcomes21.
Compared to input-output methods, output-only methods do not directly measure environmental information; instead, they rely solely on the measured dynamic characteristics to learn the implicit relationship between environmental variables and dynamic characteristics22. The fundamental concept behind output-only methods is to transform the damaged features into a feature space through a suitable transformation23. This space yields new features that are insensitive to environmental operating variables but retain sensitivity to damage. Over recent years, output-only methods have gained widespread attention24,25,26 and several commonly employed methods in this context include PCA and its variants27,28,29,30,31,32,33 cointegration and its variants34,35,36 factor analysis37 GMM38,39 Mahalanobis squared-distance37 singular spectrum analysis24 and auto-associative neural networks40,41 nonlinear narrow dimension techniques42. More recently, Mousavi et al. developed a nonlinear method based on variational mode decomposition, cointegration, and recurrent neural network. The superiority of the proposed method was demonstrated by an experimental example of the Z24 bridge43. Wah et al.21 developed a multiple-regression-based damage detection method under changing environmental and operational conditions. This method employs natural frequencies as both the independent and dependent variables and utilizes the difference in fits to identify influential observations and remove outlier measurements. Peng et al.44 proposed a novel structural damage detection based on manifold learning, utilizing the phase space representation of vibration characteristics to detect structural damage under varying environmental and operational conditions. Sarmadi et al.45 introduced a probabilistic data self-clustering method based on semi-parametric extreme value theory for damage detection. This method effectively handles environmental and operational variability while providing an integrated framework for determining the damage index and decision threshold.
Moreover, some machine learning or deep learning methods are used to address the temperature compensation problem in SHM. Sawant et al.46 proposed an unsupervised, temperature-compensated damage localization method for guided wave-based structural health monitoring (GW-SHM) using transfer learning from a convolutional autoencoder, achieving improved accuracy and robustness to environmental variations compared to supervised approaches, with fewer trainable parameters and without the need for pre-processed signals or material property knowledge. Du et al.47 developed an attention-based multi-task network with a modified U-Net architecture for accurate bolt loosening detection and temperature compensation in multi-bolt connections, validated experimentally on a simulated aircraft structure. Kashyap et al.48 presented a TinyML-based unsupervised learning framework for GW-SHM using a lightweight neural network, enabling real-time damage identification and localization in a honeycomb composite sandwich structure under varying temperature conditions. Sawant et al.49 proposed a CNN-based supervised framework for automated feature extraction, coupled with a Gaussian mixture model for temperature compensation and damage localization in guided wave-based structural health monitoring.
Gaussian Mixture Model is a highly effective probabilistic statistical tool for characterizing data distributions. The GMM-based methods have gained significant traction for detecting structural damage under changing environmental and operational conditions. For example, Figueiredo et al.38,39 utilized GMM to model data distribution under varying normal conditions, employing the smallest MSD from each component for damage detection. This approach was successfully demonstrated using daily standard data sets from the Z24 Bridge. Other researchers have also explored the use of GMM in structural health monitoring. Kullaa et al.50 and Zang et al.51 applied GMM to cluster modal frequencies into several local subdomains. They then used minimum mean square error estimation and principal component analysis (PCA) to mitigate the effects of environmental and operational variations (EOV) on the subdomain data. Zhang et al.52 introduced a GMM-based method to quantify fatigue crack size, establishing a comprehensive baseline GMM database under varying temperatures and detecting cracks through the disparity between the baseline and test GMMs. Ren et al.53 combined GMM with a delay-and-sum method to achieve reliable damage monitoring and imaging localization for structures under variable conditions. More recently, Daneshvar et al.54 proposed an unsupervised damage detection method that employs GMM to provide local information for a deep reinforcement learning algorithm. This method reconstructs local training subsets using subdictionaries and uses an anomaly detector based on reconstruction error for damage detection. Beyond bridge structures55,56 GMM has been applied to damage and anomaly detection in various other domains, including gas turbines57 aircraft wings58 and wind turbines59.
Although GMM-based damage detection methods have proven effective, two important issues in mixture modeling need to be addressed: the selection of initial model parameters and the determination of the number of Gaussian components. (1) The first issue concerns the selection of initial model parameters. The common method to fit a GMM is through maximum likelihood estimation using the EM algorithm. However, the EM algorithm is a local optimization method and is highly sensitive to initialization. Due to the non-unimodal nature of the likelihood function in a mixture model, the EM algorithm may struggle to reach the optimal solution if the initial parameters deviate significantly from the actual model parameters. (2) The second issue is selecting the number of Gaussian components. The Bayesian Information Criterion (BIC) is commonly used to estimate this number60. The optimal number of components is indicated by the lowest BIC value, balancing a good fit to the data with the avoidance of overfitting. However, the EM algorithm’s sensitivity to initial model parameters can cause variations in BIC values for the same number of components when different initial values are chosen. This instability makes determining the appropriate number of Gaussian components more complex61,62.
Figueiredo et al.63 also highlight these two challenges in GMMs: sensitive to initialization and selection of the number of Gaussian components. This paper proposed an unsupervised algorithm for learning finite mixture models from multivariate data, which integrates model estimation and selection into a single process, avoiding the issue of convergence to singular estimates at the boundary of the parameter space. Qiu et al.58 proposed an Improved Density Peaks Clustering (IDPC)-based EM algorithm for constructing an adaptive GMM. The algorithm improves the initialization process of GMM parameters by adaptively searching for probability density peaks of Guided Wave (GW) Damage Features (DFs), reducing reliance on experience-dependent parameters. Moreover, to ensure the accuracy of the classification results, Wang et al.64 and Yi et al.65 used the K-means clustering algorithm to estimate the initial parameters of the GMM.
Although similar methods have been proposed for determining the initial value of GMM, they have certain disadvantages. For example, in the K-means clustering-based GMM method64,65 to determine the number of clusters for K-means clustering, the BIC method is usually used. However, the variations in BIC values for the same number of components will cause the instability in determining the appropriate number of clusters. In the density peak clustering-based adaptive GMM method58 the ratio k in the algorithm should be empirically determined, limiting the application of this damage detection method. Moreover, for the unsupervised algorithm for learning finite mixture models63 it faces the challenge when dealing with mixtures that have components with very different weights, potentially leading to the premature elimination of smaller components or improper overlap of heavier components.
In this paper, we propose a method that integrates a sequential iteration operation into the GMM framework to address two critical issues: (1) the automatic determination of the number of Gaussian components, and (2) the enhancement of the reliability of the initial model parameters for GMM. Our method follows a systematic procedure. First, a sequential iteration approach is employed to analyze the modal frequencies under varying environmental conditions. This approach automatically divides the data into multiple clusters that conform to Gaussian distributions. These clusters provide the mean and covariance of each Gaussian component, which are then used as the initial parameters for the GMM, thereby improving their reliability. Next, the EM algorithm is applied iteratively to refine these parameters until an optimal solution is obtained, ensuring effective optimization of the GMM parameters. Finally, the MSD between the damage feature and each Gaussian component is calculated. Using this information, a statistical process control chart is established for damage detection based on the minimum MSD, and the generalized extreme value distribution modeling method is used to determine an appropriate threshold. To validate the effectiveness of the proposed method, we utilize real data from two bridge structures: a wooden truss bridge and the Z24 Bridge. The damage detection results indicate that the proposed method significantly reduces the occurrence of false alarms, demonstrating superior performance compared to conventional methods in real-world scenarios.
Conventional Gaussian mixture model
GMM is a powerful statistical tool that accurately describes the distribution of real data by employing a weighted linear combination of multiple Gaussian probability density functions66. In the context of a dataset \(\:{\varvec{X}}_{n\times\:d}={\left\{{\varvec{x}}_{1},\:.\:.\:.,\:{\varvec{x}}_{n}\right\}}^{\text{T}}\), where n represents the number of samplings and d denotes the dimension of damage features, GMM assumes that the data can be effectively represented as a mixture of K multivariate Gaussian distributions. Consequently, the probability distribution of X adheres to the principles of the GMM model, the probability density function of X can be derived by summing the weighted Gaussian probability density functions of the K components. This can be mathematically expressed as follows58:
with parameters
where \(\:K\:\)is the number of Gaussian components, \(\:{\alpha\:}_{k}\) represents the weight of the k-th Gaussian component which satisfies the constraint that \(\:\sum\:_{k=1}^{K}{\alpha\:}_{k}=1\) and \(\:0\le\:{\alpha\:}_{k}\le\:1\). \(\:{\varvec{\mu\:}}_{k}\in\:{R}^{d}\) and \(\:{\varvec{C}}_{k}\in\:{R}^{d\times\:d}\) are the mean and covariance matrix of the k-th Gaussian component, respectively. \(\:{\rm\:N}\left(X|{\varvec{\mu\:}}_{k},{\varvec{C}}_{\varvec{k}}\right)\) represents the probability density function of the normal distribution:
To estimate the unknown parameters, the maximum likelihood estimation based on the EM algorithm is usually used. The log likelihood function to be maximized is defined as follows,
The EM algorithm is an iterative process that comprises two main steps58: (a) the E-step, which involves estimating the posterior probability using Eq. (5); and (b) the M-step, where the Gaussian component model parameters are updated based on the posterior probability, utilizing Eqs. (6)-(8). Initially, the parameters in \(\:\varvec{\varTheta\:}\) are randomly selected, and the E-step and M-step are iterated repeatedly until the estimated \(\:\varvec{\varTheta\:}\) remains almost unchanged. This convergence indicates that \(\:\varvec{\varTheta\:}\) has reached a local optimum value. Finally, the model parameters \(\:\varvec{\varTheta\:}\) of the GMM that maximize the log-likelihood function \(\:L\left(\varvec{X}|\varvec{\varTheta\:}\right)\) are obtained.
However, the EM algorithm is a local optimization method due to the non-convex likelihood surface and iterative refinement, it may encounter difficulties in achieving the optimal solution when the initial parameters deviate significantly from the actual model parameters63. To illustrate the impact of randomly selecting initial parameters on the conventional GMM method, consider a clustering example using a 2-dimensional simulation dataset. When setting the number of model components to 13, the EM algorithm is applied to model the dataset multiple times using different initial values. Figure 1 showcases the modeling results obtained from three separate initial parameters. It is evident that the clustering outcomes obtained by the EM algorithm vary due to the usage of three different sets of initial parameters, with only one correct clustering result. This highlights that the random selection of initial parameters can lead to instability in the GMM model established by the EM algorithm.
In addition to selecting appropriate initial parameters, determining the appropriate number of Gaussian components is also crucial for building a reliable GMM. The BIC is commonly used for this purpose. It involves using the EM algorithm to establish GMMs with varying numbers of components, selecting the number of components corresponding to the minimum BIC value as the optimal choice. However, due to the randomness associated with initial parameter selection, BIC values can fluctuate for the same number of Gaussian components. To illustrate this variability, Fig. 2 shows the changes in BIC values with the number of components for the simulation data. In this illustration, the EM algorithm is executed five times with different initial parameters for each component. It is evident that the calculated BIC value varies significantly for a given number of Gaussian components due to different initial parameters. This instability in the BIC curve results from the EM algorithm’s sensitivity to random initialization, which reduces the reliability of the BIC method for determining the optimal number of components.
Three clustering results based on the conventional GMM method by randomly using 3 sets of initial parameters. Table 1 shows the three groups of selected initial sample points. For each set of initial parameters, the initial weight of the k-th Gaussian component is set as \(\:{\alpha\:}_{k}^{0}=1/13\), the initial covariance matrix of the k-th Gaussian component is set as \(\:{\varvec{C}}_{k}^{0}=\left[\begin{array}{cc}86.57&\:1.89\\\:1.89&\:97.09\end{array}\right]\), the initial mean of the k-th Gaussian component is listed in Table 1.
BIC curves under different initial parameters. When the number of Gaussian components is 4, the initial weight of the k-th Gaussian component is set as \(\:{\alpha\:}_{k}^{0}=1/4\) in each set of initial parameters, the initial covariance matrix of the k-th Gaussian component is \(\:{\varvec{C}}_{k}^{0}=\left[\begin{array}{cc}86.57&\:1.89\\\:1.89&\:97.09\end{array}\right]\), the initial mean of the k-th Gaussian component is listed in Table 2.
Improved GMM damage detection method
Improved GMM based on sequential iteration
To address the challenges in conventional GMM, we incorporate a sequential iteration operation to obtain stable and reasonable initial parameters67. This sequential iteration method automatically partitions the data into several clusters that adhere to Gaussian distributions, with each cluster serving as an initial Gaussian component of the GMM. The mean and covariance of each Gaussian component, derived from the sequential iteration method, are then used as the initial parameters for the GMM. This enhancement improves the modeling accuracy of the conventional GMM.
The sequential iteration method can be summarized in the following detailed steps67:
Step 1: Set the Gaussian cluster index k to 1, initialize the covariance parameter \(\:{\sigma\:}^{2}\) of the dataset \(\:\varvec{X}\) using Eq. (9)63 and randomly choose a sample point \(\:{\widehat{\varvec{\mu\:}}}_{k}\) from the training samples as the initial point. Let \(\:{\widehat{\varvec{C}}}_{k}={\sigma\:}^{2}\cdot\:\varvec{I}\), and \(\:\varvec{I}\) represents the d-dimensional identity matrix.
Step 2: Calculate the Mahalanobis distance between \(\:{\varvec{x}}_{i}\:(i=1,\dots\:,n)\) and the arbitrarily chosen point \(\:{\widehat{\varvec{\mu\:}}}_{k}\in\:\varvec{X}\) using Eq. (10).
If the Mahalanobis distance is below 3, assign the \(\:{\varvec{x}}_{i}\) into the sample set \(\:{\varvec{X}}_{s}^{k}\).
Step 3: Calculate the mean \(\:{\stackrel{\sim}{\varvec{\mu\:}}}_{k}\) and variance \(\:{\stackrel{\sim}{\varvec{C}}}_{k}\) of the sample set \(\:{\varvec{X}}_{s}^{k}\) using Eqs. (11)-(12),
where \(\:{n}_{k}\) represents the number of samples in \(\:{\varvec{X}}_{s}^{k}\).
Step 4: Let\(\:{\widehat{\varvec{\mu\:}}}_{k}={\stackrel{\sim}{\varvec{\mu\:}}}_{k},{\:\:\widehat{\varvec{C}}}_{k}={\stackrel{\sim}{\varvec{C}}}_{k},\) and repeat Steps (2)-(3) until:
where \(\:\epsilon\:\) is a small enough value, \(\:\epsilon\:\) is set to \(\:{10}^{-6}\) in this paper.
Step 5: Determine the sample set \(\:{\varvec{X}}_{s}^{k}\), compute the mean \(\:{\widehat{\varvec{\mu\:}}}_{k}\), covariance matrix \(\:{\widehat{\varvec{C}}}_{k}\), and weight coefficient \(\:{\widehat{\alpha\:}}_{k}=\frac{{n}_{k}}{n}\:\)of \(\:{\varvec{X}}_{s}^{k}\). Then, from the dataset \(\:\varvec{X}\), remove the sample points \(\:{\varvec{x}}_{i}\) that have been allocated to \(\:{\varvec{X}}_{s}^{k}\);
Step 6: Increment \(\:k\) by 1 (\(\:k=k+1\)).
Step 7: Repeat Steps (2)-(6) until all the data is assigned to the corresponding \(\:{\varvec{X}}_{s}^{k}\). Finally, the \(\:{\widehat{\varvec{\mu\:}}}_{k}\), \(\:{\widehat{\varvec{C}}}_{k}\), and \(\:{\widehat{\alpha\:}}_{k}\) obtained from each sample set \(\:{\varvec{X}}_{s}^{k}\) are considered as the initial mean, covariance matrix, and weight in GMM.
To better visualize the stages of the proposed algorithm, the entire procedure for implementing the sequential iteration operation is illustrated in Fig. 3. The improved GMM eliminates the need for manual determination of the number of Gaussian components, as the sequential iteration process can automatically determine this value. Moreover, the model parameters obtained by the sequential iteration method is used as the initial model parameters of the EM algorithm, which can improve the computational efficiency and data fitting accuracy of the conventional GMM method.
Implementation process of the proposed GMM damage detection algorithm
After enhancing the conventional GMM with a sequential iteration method, we utilize the improved GMM approach for detecting damage in changing environmental conditions. Initially, continuous monitoring data is collected in a healthy state under the changing environmental conditions. From these data, damage features are extracted, forming a baseline training sample set denoted as X. Next, a sequential iteration approach is employed to establish the initial model parameters of the GMM.
. The EM algorithm is then utilized to construct a baseline GMM based on the data in X. Furthermore, the MSD between the damage feature and each Gaussian component is calculated as follows:
The MSD has been shown to be an effective damage indicator that is insensitive to changing environments37. However, it is only capable of addressing linear EOV and remains sensitive to nonlinear EOV effects. To tackle this issue, Figueiredo et al. proposed the MMSD as a novelty index for damage detection under changing environments38,39. Similar studies utilizing the MMSD for this purpose can also be found in the literatures54,68,69. The minimum MSD is calculated as follows:
The insensitivity of the MMSD to nonlinear environmental effects stems from the division of nonlinearly related data into several categories using GMM. Compared to the entire dataset, each localized category exhibits a stronger linear correlation after clustering. Therefore, the MSD can effectively eliminate the linear environmental effects in these local datasets. In practice, the MMSD is calculated as the MSD of a sample point within its respective category. By separating the linear EOV effects within each category, the overall nonlinear environmental effects on the entire dataset are mitigated. This makes the MMSD insensitive to nonlinear EOV effects. In this paper, the MMSD is also used as the novelty index (NI) for damage detection under changing environments.
Furthermore, the MMSD values obtained during the training phase are used to compute a threshold for damage detection. When damage occurs in a structure, the feature data associated with the damaged state will deviate from the original distribution characteristics. Consequently, the MMSD will exceed the threshold during the testing phase, clearly indicating the presence of damage. To provide a comprehensive understanding of the algorithm’s stages, the complete implementation procedure for the enhanced GMM method in damage detection is illustrated in Fig. 4.
To quantify the effectiveness of damage detection, this study employs two metrics: the false positive rate (FPR) and the false negative rate (FNR). The two metrics originate from the field of machine learning, where a confusion matrix table enables visualization of the performance of a machine learning algorithm based on some criteria, including true positive (TP), false positive (FP), false negative (FT), and true negative (TN). In the context of SHM, the terms “positive” and “negative” refer to the damaged and undamaged states, respectively21. The TN means that the structure is in undamaged condition and the method could correctly detect the undamaged state (i.e., the novelty indexes are below the threshold in the undamaged state); the FN means that the structure suffered from damage, but the method cannot correctly alarm the occurrence of damage (i.e., the novelty indexes are below the threshold in the damaged state); the FP means that the structure is in undamaged condition, but the method cannot correctly detect the undamaged state (i.e., the novelty indexes are over the threshold in undamaged state); the TP means that the structure suffered from damage and the method could correctly alarm the occurrence of damage (i.e., the novelty indexes are over the threshold in damaged state). Table 3 depicts the confusion matrix, where two metrics, including FPR and FNR, are defined as follows.
where \(\:{N}_{TN},\:{N}_{FN},\:{N}_{FP}\:\)and \(\:{N}_{TP}\) represent the number of TN, FN, FP and TP sample points, respectively.
Threshold limit determination
The estimation of the threshold plays a crucial role in improved GMM for early damage detection. This estimation is typically derived from the probabilistic characteristics of MMSD during the training phase. One commonly used approach is to employ a standard confidence interval based on the central limit theory. For instance, a 95% confidence interval for MMSD is often utilized, assuming that MMSD during the training stage follows a normal distribution70. Nevertheless, relying solely on the standard confidence interval proves inadequate when MMSDs exhibit a non-normal or heavy-tailed distribution. To determine an appropriate threshold, the extreme value statistics is used in the improved GMM method. Extreme value statistics is a methodology employed to analyze exceptional events to identify extreme values within a given probability distribution. This approach finds extensive application in diverse fields such as structural engineering, finance, earth sciences, and geological engineering.
Extreme value analysis proves invaluable in establishing control limits that mark a failure point, beyond which a failure or end-of-life event is likely to occur. The generalized extreme value distribution (GEV) comprises three main forms: the Gumbel (Type I) distribution, the Fréchet (Type II) distribution, and the Weibull (Type III) distribution, these three types of extreme value distributions can effectively model extreme events across a diverse range of datasets71.
Gumbel distribution:
Fréchet distribution:
Weibull distribution:
where \(\:\mu\:\) is the location parameter, \(\:\sigma\:\) is the scale parameter, and \(\:\alpha\:\) is the shape parameter. From a mathematical standpoint, the three types of extreme value distributions can be unified and expressed in a common form.
where \(\:\xi\:\) represents the shape parameter in the GEV distribution, where its value determines the type of extreme value distribution71. The threshold limit determination based on the GEV distribution model contains three main steps, which includes72: the selection of one of the extreme value distributions, the estimation of unknown parameters of selected extreme value distribution, and the determination of its extreme quantile. The extreme quantile of the distribution is estimated at a given significance level by inverting Eq. (21), which yields,
In this paper, a confidence level of 95% is chosen, treating the tail probability of 5% in the cumulative probability function.
Contributions
The main contributions and novelty of this research can be summarized as:
-
(i)
Development of an innovative damage detection method: This research introduces a novel method specifically designed for long-term SHM under varying environmental conditions. The key innovation lies in the integration of sequential iteration and GMM. The advantages of this method are: (1) It is capable of automatically selecting the number of components, and it does not require careful initialization; (2) Unlike the density peak clustering-based GMM58 it does not require to empirically determine the model parameters; (3) Compared to the BIC-based method for determining the number of components, the proposed method is more stable and reliable. Moreover, this approach effectively addresses environmental variability, including nonlinear effects, which most traditional normalization techniques (like MSD, PCA, and factor analysis) can only handle in a linear context.
-
(ii)
Determination of threshold using the GEV distribution method. Traditional threshold determination method is to employ a standard confidence interval based on the central limit theory, which assumes that the damage index follows a normal distribution. However, relying solely on the standard confidence interval proves inadequate when MMSDs exhibit a non-normal or heavy-tailed distribution. Therefore, the proposed method uses the extreme value distribution to determine an appropriate threshold, which ensures more accurate threshold estimation, reducing the likelihood of false alarms and improving the reliability of damage detection.
Real-world application 1: a wooden truss Bridge
In this section, the experimental data of a wooden truss bridge are used to verify the effectiveness of the proposed GMM method. The wooden bridge is a laboratory truss structure monitored by Prof. Kullaa for several days under a changing environmental condition73. The monitoring system for the structure is shown in Fig. 5, in which an electrodynamic shaker is applied on the wooden bridge to produce a random white noise excitation and fifteen accelerometers are deployed at different locations to measure the acceleration responses. Moreover, modal frequencies and mode shapes of the structure were identified from the vibration measurements based on the stochastic subspace method. Because only modal frequencies of the seven modes were made available to researchers, the seven frequencies are used as the main damage-sensitive features for damage detection.
To simulate various damage scenarios, different sizes of masses were attached to the wooden bridge, as detailed in Table 4. The total mass of the structure was 36 kg, with the heaviest mass added to the structure weighing 193.7 g. The monitoring process involved test measurements of both undamaged and damaged conditions of the bridge. The measurements 1-2000 were taken in undamaged state under varying environmental conditions, while the measurements 2001–2019, 2020–2042, 2043–2065, 2066–2091, and 2092–2114 were performed on the damaged state of the structure at different damage levels.
(a) The Wooden Bridge, (b) the locations of the acceleration sensors and the electro-dynamic shaker73.
Figure 6 shows the changes of natural frequencies \(\:{f}_{1}\), \(\:{f}_{2}\), \(\:{f}_{6}\) and \(\:{f}_{7}\) over time, with the vertical dashed line indicating the moment of damage occurrence. It is apparent that environmental variations significantly impact the natural frequencies throughout the monitoring period, making it challenging to distinguish whether the frequency changes stem from structural damage or changing environments.
Comparison of the proposed method with other methods
In the context of mitigating environmental influences on damage detection, cointegration34 and PCA28 are widely used techniques. This paper compares the proposed method with these two damage detection methods. Additionally, since the proposed method improves upon the GMM, we also compare the damage detection results with those of the conventional GMM to highlight the superiority of our approach. Using PCA, cointegration, and both conventional and proposed GMM methods, we implement damage detection for a Wooden Bridge during both training and test phases. During offline learning in the training period, 1800 observations under normal conditions are utilized as the training dataset. For the test phase, the remaining observations from normal conditions, alongside all observations from the damaged state, are utilized as the test dataset.
Regarding the first comparison, it is necessary to calculate a stationary cointegration residual by linearly combining observed frequency data. The cointegration residual can be interpreted as a long-term stable equilibrium relationship between nonstationary frequencies. Once structural damage occurs, this equilibrium relationship will no longer be maintained, and the stationary residual will become nonstationary. Following the damage detection procedure of the cointegration method34 the cointegration residual can be calculated by the Johansen test. Figure 7 displays the variation of cointegration residuals in the training and test phases, where the black scatter points represent the residuals in the training phase, the blue scatter points represent the undamaged residuals in the test phase, and the red points are the damaged residuals in the test phase.
Regarding the second comparison, the fundamental concept of PCA involves projecting the original data onto a vector space formed by the principal components (PCs) and then mapping it back to the original space while retaining a specific number of PCs. The error between the original and remapped data can be utilized to compute a novelty index for damage detection. Therefore, it is initially necessary to determine the number of PCs. This can be achieved by calculating the ratio of the sum of eigenvalues of consecutive principal components to the total sum of all eigenvalues and identifying the smallest integer with a ratio exceeding a predefined threshold. In this case, the threshold is set at 95%. Accordingly, the optimal number of principal components for the Wooden Bridge is determined to be 2, which accounted for 98.37% of the data variance. Based on the damage detection procedure of the PCA method28 the novelty index can be obtained and shown in Fig. 8.
Damage detection in the wooden bridge based on the cointegration method: (a) entire residuals and (b) partial residuals. (The black scatter points represent the residuals in the training phase, the blue scatter points represent the undamaged residuals in the test phase, and the red points are the damaged residuals in the test phase).
Damage detection in the wooden bridge based on the PCA method: (a) entire novelty index and (b) partial novelty index. (The black points represent the novelty indexes in the training phase, the blue scatter points represent the undamaged novelty indexes in the test phase, and the red points are the damaged novelty indexes in the test phase).
From Fig. 7, it can be seen that the cointegration residual is stationary in undamaged state, and almost all the residuals of the normal condition related to either the training or validation samples are within the control limit, indicating that environmental effects are effectively removed and no damage is detected in the structure in undamaged state. On the contrary, Fig. 8 shows that the evolution process of the PCA novelty indexes is nonstationary, and some of the undamaged novelty indexes are outside the control limit, indicating that some undamaged samples are misclassified as damaged samples. Furthermore, the poor performance of the cointegration and PCA methods emerges from the damaged state, where most of the residuals and novelty indexes are within the control limit, implying large false detection. Regardless of the control limit, it is observed that numerous cointegration residuals and novelty indexes of the damaged state are in the same scales as the undamaged state, which implies the low damage detectability of the cointegration and PCA methods.
In the third comparison, the conventional GMM method is employed for structural damage detection, which determines the number of Gaussian components using the BIC and estimates the model parameters using the EM algorithm. Figure 9 illustrates five BIC curves resulting from running the EM algorithm five times with different initial parameters. Similar to the numerical simulation results in Sect. 2, noticeable fluctuations are observed in the BIC curve. Furthermore, it is noted that when the number of Gaussian components is 9, the BIC value is relatively small. Opting for 9 Gaussian components, damage detection based on the conventional GMM method is conducted, as depicted in Fig. 10. In this analysis, the EM algorithm is run three times under varying initial parameters. The results indicate that due to the randomly selected initial parameters, the damage detection outcomes vary. This suggests that damage detection based on the conventional GMM method is prone to instability.
Damage detection in the wooden bridge based on the conventional GMM method using 3 different sets of initial parameters. For each set of initial parameters, the initial mean is listed in Tables 5, 6 and 7, the initial weight of each Gaussian component is set as \(\:{\alpha\:}_{k}^{0}=1/9\), the initial covariance matrix of each Gaussian component is set as
The primary step in evaluating the performance of the proposed GMM-based method involves determining the initial GMM parameters through sequential iteration. For the process of the sequential iteration method, it begins with random selection of initial points \(\:{\widehat{\varvec{\mu\:}}}_{0}\), which is here chosen as \(\:{\widehat{\varvec{\mu\:}}}_{0}={\left[24.80,\:28.46,\:39.01,\:52.57,\:63.79,\:65.33,\:78.64\right]}^{\text{T}}\). Subsequently, the mean, covariance matrix, and weight of each data set can be obtained by executing Steps 1–3 of the sequential iteration in Sect. 3.1. These outputs serve as the initial parameters for establishing the GMM model. The next step involves calculating the MSD between the sampling points and each Gaussian component. The minimum distance is then chosen as the novelty index for damage detection. Figure 11 illustrates the results of early damage detection in the wooden truss bridge using the proposed GMM method. It is evident that the majority of novelty indexes related to the training samples fall below the threshold line, indicating a correct detection of the undamaged state with minimal false alarms. Moreover, the vast majority of novelty indexes for the validation data within samples 1800–2000 are also below the threshold. These observations confirm the efficacy of the proposed method in accurately detecting the undamaged state of the wooden bridge. In the case of novelty indexes related to the damaged state in samples 2001–2114, almost all of them exceed the threshold limit, demonstrating the high damage detectability of the proposed method. Nonetheless, it’s worth noting that some observations in damage case 1 (23.5 g) fall below the threshold limit, which aligns with findings in references21,51. This behavior can be attributed to the relatively small mass, accounting for only \(\:6.53\times\:{10}^{-7}\) of the total structure mass, which causes a negligible deviation in natural frequencies from the undamaged cases. Regardless of the threshold line, it is observed that there is a clear discrepancy between the novelty indexes of the undamaged and damaged states. This outcome reinforces the robustness of the proposed method in providing discerning novelty indexes and achieving high damage detectability.
Table 8 presents the FPR and FNR calculated using cointegration, PCA, the conventional GMM, and the proposed GMM method. The FPR and FNR values for the conventional GMM are averaged from three results shown in Fig. 10. Regarding the FPR metric, the values for all three methods are relatively small, indicating minimal misclassifications under normal conditions. Concerning the FNR metric, the proposed GMM method significantly outperforms the other methods, suggesting a higher capability for accurate damage detection. The limitations of PCA and cointegration methods in accurate damage detection can be attributed to their linear nature. These methods attempt to derive a damage index by linearly combining different frequencies, assuming a strong linear relationship between natural frequencies. However, the natural frequencies of the wooden bridge exhibit inadequate linear correlation, leading to reduced accuracy in damage detection for PCA and cointegration methods. Additionally, the damage detection results obtained by the conventional GMM are not reliable due to its sensitivity to initial parameters. In contrast, the proposed GMM method divides the data into multiple datasets, treating each Gaussian component of the mixture as a cluster. This approach effectively decomposes nonlinearly related data into multiple locally linearly related datasets, transforming nonlinear problems into linear ones. Consequently, the proposed method achieves more reliable damage detection, particularly when dealing with data exhibiting poor linear correlation, surpassing the capabilities of the cointegration and PCA methods.
Effects of measurement noise and initial points
Previous evaluations were conducted under laboratory conditions with minimal measurement noise, field measurements often encounter higher levels of noise in the recorded signals. The incorporation of Gaussian white noise into the original frequency data allows for a comprehensive assessment of the method’s ability to handle real-world conditions with varying noise levels. Gaussian white noise was introduced into the original frequency data to examine the damage detection performance of the proposed GMM method using noisy data.
where \(\:\varvec{f}\left(t\right)\) is the measured frequency data with noise, \(\:{E}_{p}\) is the noise level,\(\:\:{\mathcal{N}}_{\text{noise}}\) is a Gauss-distributed random vector with zero mean and variance 1, and \(\:\sigma\:\left(\stackrel{\sim}{\varvec{f}}\right)\) is the standard deviation of \(\:\varvec{f}\). Two levels of noise, namely 5% and 10%, were deliberately introduced into the original frequency data. The resulting correlations scatter plot between \(\:{f}_{3}\) and \(\:{f}_{7}\) at different noise levels is depicted in Fig. 12. Notably, as the noise level increases, the correlation between \(\:{f}_{3}\) and \(\:{f}_{7}\) exhibits a clear decrease, and the frequency sample points become noticeably more dispersed. This observation underscores the significant influence of measurement noise on the correlation between frequencies.
Following the procedure of the proposed GMM method in Sect. 3, the novelty indexes of the wooden bridge at different noise levels were obtained. Figures 13 and 14 present the results of damage detection at noise levels of 5% and 10%, respectively. It is evident that most novelty indexes in the undamaged state fall below the threshold limit, while some exceed it, possibly due to high noise levels. As the noise level increases, the misclassification rate also rises. However, even at a 10% noise level, most novelty indexes in the damaged state surpass the threshold value, indicating successful damage detection. Without considering the threshold limit, there is a discernible difference between the novelty indexes of the damaged and undamaged states, validating the high damage detectability of the proposed GMM method.
To demonstrate the stability and reliability of the proposed method, the damage identification process was repeated 20 times at various noise levels, given the random nature of measurement noise. The FNR and FPR metrics were computed for each of these 20 damage detections. Figure 12 displays the values of FNR and FPR for the 20 damage detections under different noise levels. From Fig. 15, several key observations can be made: (1) Similar to Fig. 14, as the noise level increases from 5 to 10%, both the FNR and FPR values also increase, indicating that higher noise levels can impact the accuracy of damage detection. (2) Despite the influence of measurement noise on the damage detection results, the FNR and FPR values remain relatively low, suggesting that the proposed GMM method exhibits excellent damage detectability even in the presence of high noise. (3) The amplitudes of FNR and FPR obtained from the 20 damage detections are remarkably stable, showing minimal fluctuations across different instances of damage detection. This stability underscores the high confidence in utilizing the proposed GMM method for damage detection. In summary, the repeated experiments at various noise levels illustrate the consistent and dependable performance of the proposed GMM method in detecting damage, as evidenced by the relatively low FNR and FPR values and their stable nature throughout the experiments.
Regarding the proposed GMM method, it is initially necessary to specify an initial point needed for the sequential iteration process. As explained in Sect. 3.1, this initial point is randomly selected from the original dataset. To investigate the impact of randomly selected initial points on the method’s performance, the damage detection process was repeated 10 times with 10 different random initial points. The values of these initial points are listed in Table 9, and Fig. 16 illustrates the corresponding FNR and FPR calculated for each damage detection. It is evident from the results that the FNR and FPR values are consistently low across all damage detections, with minimal fluctuations in their amplitudes. This finding indicates that the selection of different initial points has negligible effects on the proposed method. The proposed GMM method demonstrates reliability and robustness for SHM even under different initial points.
Real-world application 2: Z24 Bridge
To further validate the environmental robustness of the proposed GMM method for damage detection, a widely used benchmark structure, namely the Z24 bridge, is used in this section. Located in the province of Bern, Switzerland, the Z24 bridge was a three-span post-tensioned concrete bridge (Fig. 17) that was eventually demolished in 1998 to make way for a larger span railway bridge. Before its demolition, the bridge was equipped with a SHM system, which measured the vibration responses as well as environmental variables, such as acceleration, temperature, humidity, and wind characteristics. During the undamaged period of Z24 bridge, the SHM system continuously collected data on the bridge’s behavior under normal operating conditions. Towards the end of the health monitoring period, a series of progressive damage scenarios were artificially applied to simulate different damage states. For more detailed information regarding the configuration of the SHM system and implemented vibration tests, interested readers can find comprehensive details in References12.
An automatic modal analysis based on frequency domain decomposition was developed to extract the natural frequencies of the first four modes under varying environmental conditions. The dataset comprises 5652 observations, with the first 4848 observations representing the healthy state and the subsequent observations from 4849 to 5652 corresponding to the damaged state. In Fig. 18, the evolution of the four natural frequencies over time is presented, with the dashed vertical line indicating the moment of damage. The results reveal that changing environmental conditions have a significant impact on the natural frequencies of the Z24 bridge, and the frequency variations caused by environmental influences outweigh those caused by the damage. Particularly, an evident peak is observed around sample 2000, coinciding with a period of very low temperatures. Moreover, a bilinear relationship between frequency and temperature is observed, which further leads to a nonlinear relationship between the frequencies44.
Damage detection
A comparative analysis was conducted to showcase the superiority of the proposed method over the PCA and cointegration methods. In the comparison, damage detection via different methods is carried out in the training and test phases. In the training phase, 80% of the undamaged observations of the modal frequencies were utilized as the training dataset. Subsequently, the remaining 20% of the undamaged observations, along with all damaged observations, were considered as the test dataset.
In the first comparative study, the popular machine learning methods, Support Vector Regression (SVR) and Gaussian Process Regression (GPR), were employed to detect structural damage under varying EOCs. Initially, these methods involve establishing a regression relationship between temperature and modal frequency during the training phase. The trained model is then used to predict frequency based on the testing temperature data. Structural damage is detected by analyzing the prediction error. According to the above process, damage detection results can be obtained based on SVR and GPR methods, as shown in Figs. 19 and 20. It can be seen that when using the frequency \(\:{f}_{1}\) as the output, both SVR and GPR methods produced a significant number of false alarms in damaged stage, indicating that damage was not effectively detected. Although the damage identification results are improved when using \(\:{f}_{2}\) as the output, there are still a large number of undamaged samples that are mistakenly detected as damaged samples, indicating that there are a large number of false positive detection. This result shows that these two methods are inaccurate for damage detection of Z24 bridge.
Damage detection based on the SVR method using (a) temperature and \(\:{f}_{1}\), (b) temperature and \(\:{f}_{2}\) as the input and output, respectively. (The black scatter points represent the prediction errors in the training phase, the blue scatter points represent the undamaged prediction errors in the test phase, and the red points are the damaged prediction errors in the test phase).
Damage detection based on the GPR method using (a) temperature and \(\:{f}_{1}\), (b) temperature and \(\:{f}_{2}\) as the input and output, respectively. (The black scatter points represent the prediction errors in the training phase, the blue scatter points represent the undamaged prediction errors in the test phase, and the red points are the damaged prediction errors in the test phase).
In the second comparative study, by performing the damage detection procedures of the cointegration and PCA methods, the cointegration residuals and PCA novelty index were obtained (see Fig. 21). Observing Fig. 21, it is evident that almost all the residuals and novelty indexes in the undamaged state are below the threshold limit. This indicates that both methods yield only a few false alarms in the undamaged state. However, the cointegration residuals do not show a clear shift when the damage occurs (Fig. 21(a)), resulting in the cointegration method’s failure to detect damage accurately and distinguish the damaged state from the undamaged state. Similarly, in the case of the PCA method (Fig. 21(b)), most of the novelty indexes in the damaged state do not exceed the threshold. This indicates a significant number of false negatives in detecting damage with the PCA method. Consequently, these observations highlight the poor damage detectability of both the cointegration and PCA methods under nonlinear environmental variability. Moreover, Fig. 22 illustrates the damage detection results of the conventional GMM using three sets of randomly selected initial parameters. It can be observed that due to its sensitivity to initial parameters, the detection results of the GMM vary each time, with only one result being reasonable. This highlights the instability of the conventional GMM method in detecting damage in the Z24 bridge.
Damage detection in the Z24 Bridge based on the conventional methods: (a) cointegration, (b) PCA. (The black scatter points represent the residuals in the training phase, the blue scatter points represent the undamaged residuals in the test phase, and the red points are the damaged residuals in the test phase).
Damage detection in the Z24 bridge based on the traditional GMM using 3 different sets of initial parameters. For each set of initial parameters, the three sets of initial mean are listed in Tables 10, 11 and 12, the initial weight of the k-th Gaussian component is set as \(\:{\alpha\:}_{k}^{0}=1/2\), the initial covariance matrix of the k-th Gaussian component is set as,
Figure 23 displays the novelty index obtained using the proposed GMM method. From Fig. 23, it can be seen that in the undamaged state during the test phase, most of the novelty indexes fall below the threshold limit, suggesting that these observations correspond to the same structural condition as the observations in the training dataset (normal condition); a small number of undamaged observations are out of limit and this might be attributed to large measurement noise. In the damaged state, all the novelty indexes surpass the threshold limit, and the value of the novelty index increases with the severity of the damage. This indicates that the proposed method not only detects the occurrence of structural damage but also identifies the progression of damage in the structure accurately. The clear distinction between undamaged and damaged cases is evident regardless of the threshold value, confirming the high damage detectability of the proposed GMM method in accurately assessing the state of the Z24 bridge.
For further evaluation, Table 13 presents the FPR and FNR obtained using the cointegration, PCA, conventional and the proposed GMM methods. Although the cointegration and PCA methods demonstrate reasonable performances in terms of low FPR (minimizing the occurrence of false alarms), they suffer from a significant number of misclassifications with high FNR (failing to detect actual damage). This indicates that these two methods are not adequately suitable for effectively detecting structural damage in the Z24 bridge. On the other hand, the proposed GMM method exhibits the lowest rate of misclassification and triggers only a small number of false alarms. Hence, the comparison of the results of damage detection shown in Table 13 reveals that the proposed GMM method outperforms the other methods in terms of having smaller errors and providing higher damage detectability.
In the following, the performance of the proposed GMM-based sequential iteration (SI) method is compared with clustering techniques based on the K-means and Fuzzy C-means (FCM) algorithms. For this comparison, the SI method is used to initialize the parameters of these algorithms. Damage detection results using the SI-K-means and SI-FCM techniques are presented in Fig. 24(a)–(b). It is evident that both SI-K-means and SI-FCM techniques produce a significant number of false alarms during the undamaged stage, resulting in ineffective damage detection. In contrast, the proposed method shows almost no false alarms, as observed in the Fig. 23. This result may be attributed to the fact that K-means and FCM assume clusters to be spherical and of equal size, which is not ideal for data with varying shapes or sizes. On the other hand, GMM can handle elliptical clusters and accommodates different covariance structures, allowing it to better model the Z24 frequency data, which is not spherically distributed. As a result, the GMM-based method outperforms K-means and FCM in terms of clustering accuracy, leading to more reliable damage detection outcomes.
Effects of different initial points and training data
To assess the impact of initial points on the damage detection of the proposed method for the Z24 bridge, we conducted 20 repeated damage detection processes using 20 randomly chosen initial points. Figure 25 presents the calculated FNR and FPR for each damage detection, while Fig. 26 illustrates the evolution of novelty index for two damage detections over time. It can be seen that the FPR and FNR obtained in each damage detection is very stable with less volatility, and almost no false alarms are observed in both undamaged and damage phases. The results clearly demonstrate the remarkable stability of the proposed GMM method for damage detection, affirming its insensitivity to the initial point selection. Furthermore, when progressive damage scenarios are introduced to the bridge, there is a notable increase in the novelty index values. This observation indicates the effective detection capability of the proposed method in identifying introduced damages, with the novelty index proving to be highly sensitive to structural damage while remaining unaffected by changing environmental conditions.
In all the previous analyses, the process of damage detection involved utilizing 80% of the observations of natural frequencies associated with the normal condition as the training data. In order to furtherly investigate the impact of the training dataset size on the performance of the proposed GMM damage detection method, new training datasets were created using 40%, 60%, and 100% of observations of the undamaged natural frequencies. These training sets comprised 1939, 2909, and 4848 learning samples, respectively, covering different wide ranges of environmental conditions under the normal condition. Following the procedure shown in Fig. 4 for the proposed GMM damage detection method, the novelty indexes are computed in training and test stages. Figure 27 shows the results of early damage detection in the Z24 Bridge via the proposed GMM method using different numbers of learning samples. Furthermore, Table 14 lists the FPR and FNR obtained under different training samples.
From Fig. 27; Table 14, it is evident that a significant number of misclassifications occur in the undamaged state when using 40% and 60% of the natural frequencies as training data. Particularly, a noticeable peak is observed around sample 2000 when utilizing 40% learning samples. Moreover, as the size of the learning samples increases, the misclassification rate decreases. Accurate and reliable damage detection can be achieved when using 80% and 100% of the observations from the normal condition. The observed phenomenon can be explained using Figs. 28 and 29, which depict the contour plots and probability distribution plots of \(\:{f}_{1}\) based on the proposed method using different numbers of frequency data, denoted as \(\:{f}_{1}\) and \(\:{f}_{2}\). Notably, when only 40% of the undamaged data is used as the training data, the number of Gaussian components and distribution diagram notably differ from those obtained with other training data percentages. This discrepancy arises because 40% of training samples fail to encompass frequency data under low-temperature conditions, resulting in an inaccurate probability distribution. Consequently, errors occur in the calculation of the MMSD, leading to inaccurate damage detection. This underscores the critical importance of employing a sufficient number of learning samples that cover a wide range of environmental variations to ensure the accuracy and reliability of the damage detection process.
Real-world application 3: KW51 Bridge
To furtherly assess the effectiveness of the proposed GMM method on real structures, this section examines its application to the KW51 railway bridge. The KW51 bridge, a steel arch structure located on railway line L36N between Leuven and Brussels, Belgium, has been operational since 2003 74. In September 2018, a SHM system was installed to record various operational data, including acceleration, displacement, strain, temperature and relative humidity. From May 15 to September 27, 2019, the bridge underwent a retrofit to correct a construction error identified during inspections. This retrofit involved reinforcing the connections between the diagonals, arches, and bridge deck. Consequently, the collected data spans three distinct periods: pre-retrofit (7.5 months), during the retrofit (4.5 months), and post-retrofit (3.5 months). Figure 30 shows the variation of the model frequencies and temperature series before and after the retrofit. It can be seen that the fluctuations in natural frequencies are strongly affected by environmental conditions, especially the higher order modal frequencies. Around the 3000th sample, the stiffness of the asphalt layer on the bridge deck increased significantly due to frost caused by the lower temperature, resulting in a sharp increase in the natural frequency.
Changes of (a) the surface temperature below the bridge deck and (b) the 14 natural frequency series over time. The period during which the retrofit is performed is indicated with a gray shading74.
To evaluate the performance of various methods for damage identification, only the first 6 frequencies with low sensitivity to damage are selected for damage identification. Similar to the Z24 bridge analysis, 80% of the undamaged data was allocated for training, while the remaining 20% and all damaged data were used for testing. Figure 31 (a)-(d) presents the damage detection results for PCA, cointegration, conventional GMM, and the proposed GMM methods. A numerical comparison, summarized in Table 15, highlights the false positive rate (FPR) and false negative rate (FNR) for each approach. PCA and cointegration demonstrated the weakest performance, with high false negative rates, indicating their limited effectiveness in identifying damage in the KW51 bridge. The conventional GMM method showed improved results compared to PCA and cointegration; however, fluctuations in the novelty index during the undamaged phase suggest its limited ability to remove environmental influences. The proposed GMM method, by contrast, achieved the best performance, with the lowest error rates. During the undamaged phase, almost no novelty indices exceeded the control threshold, whereas all novelty indices during the damaged phase surpassed it. These findings confirm the proposed method’s effectiveness in eliminating environmental effects and accurately detecting structural damage.
The choice of threshold determination method is crucial for reliable damage detection. Figure 32 compares four threshold methods—Gaussian distribution, chi-square distribution, F distribution, and extreme value distribution—with their respective FPR and FNR summarized in Table 16. The extreme value distribution method demonstrated superior performance, achieving the most accurate results in damage identification. While the other three methods produced slightly lower FNRs, their higher FPRs increased the likelihood of false alarms during the undamaged phase. This limitation arises from the assumption that real-world data strictly follow predefined statistical distributions. The extreme value distribution method, by determining thresholds based on the extreme values of the data, adapts more effectively to the inherent characteristics of the data, reducing false alarms and improving reliability in practical applications.
Limitations and drawbacks
From the above results, we can see that the shortcomings of this method are as follows:
-
(i)
Insensitive to minor damage. For instance, in the example of Fig. 11, the algorithm cannot detect the damage occurrence in damage case 1 with the minor damage (23.5 g). An improved approach is to propose a more sensitive damage indicator to identify the minor damage, such as weighted Mahalanobis distance or threshold-normalized Mahalanobis distance51.
-
(ii)
The obtained MMSD time series is non-stationary. As shown in Figs. 11 and 23, the minimum Mahalanobis distance obtained by the proposed method has non-stationary fluctuations, which indicates that the environmental effects may not be completely removed. A possible solution to this problem consists in using the switching cointegration method to reduce the nonstationary22.
-
(iii)
The need for sufficient training data under varying environments. As shown in Fig. 27, inaccurate damage detection is obtained when using 40% and 60% of the natural frequencies as training data, while damage can be reliably detected when using 80% and 100% of the natural frequencies as training data. This implies that insufficient training data will lead to the failure of the proposed damage identification method.
Conclusion
In this article, a novel method for detecting structural damage under changing environments is proposed. The method combines sequential iteration and GMM to achieve accurate and efficient results. Firstly, the initial model parameters and the number of Gaussian components in the GMM are determined using the sequential iteration approach. This step helps in setting up the foundation for our analysis. Next, the EM algorithm is employed to construct a baseline GMM based on the natural frequencies observed under varying environmental conditions. This baseline GMM serves as a reference for identifying deviations caused by structural damage. To quantify the extent of novelty or deviation from the baseline, we calculate the MSD between each sample point and the center of each Gaussian component in the GMM. The minimum MSD is then utilized as the novelty index, enabling us to detect and pinpoint structural damage effectively. To validate the practicality and effectiveness of the proposed GMM damage detection method, the frequency measurements from two real-world structures (i.e. the wooden truss bridge and the Z24 bridge) are employed.
According to the results, the detailed conclusions can be drawn as follows: (1) The proposed method effectively discriminates between damaged and undamaged states in real-world bridges with severe environmental variability. It exhibits high detectability and few false alarms, even when the observations are affected by high measurement noise. (2) The proposed GMM damage detection method is not sensitive to the initial point selection. It consistently yields stable and reliable detection results, even when different initial points are used in the sequential iteration step. (3) Comparative studies of different methods demonstrate that the proposed approach outperforms conventional cointegration and PCA methods. It shows a smaller FPR and FNR while achieving higher damage detectability. (4) Regarding the issue of learning sample size, the comparative study on the Z24 Bridge reveals that more reliable and appropriate damage detection results are obtained when the training dataset covers a wider range of environmental conditions. Specifically, the proposed method does not yield reasonable damage detection results when the percentage of training samples is smaller than 60%.
Data availability
All data generated or analysed during this study are included in this published article.
References
Han, S. et al. Non-destructive testing and structural health monitoring technologies for carbon fiber reinforced polymers: a review. Nondestructive Test. Evaluation. 39, 725–761 (2024).
Lai, X. et al. Digital twin-based non-destructive testing for structural health monitoring of bridges. Nondestructive Test. Evaluation. 39, 57–74 (2024).
Pallarés, F. J., Betti, M., Bartoli, G. & Pallarés, L. Structural health monitoring (SHM) and nondestructive testing (NDT) of slender masonry structures: A practical review. Constr. Build. Mater. 297, 123768 (2021).
Hou, R. & Xia, Y. Review on the new development of vibration-based damage identification for civil engineering structures: 2010–2019. J. Sound Vib. 491, 115741 (2021).
Ceravolo, R., Coletta, G., Miraglia, G. & Palma, F. Statistical correlation between environmental time series and data from long-term monitoring of buildings. Mech. Syst. Signal Process. 152, 107460 (2021).
Anastasopoulos, D. et al. Vibration-based monitoring of an FRP footbridge with embedded fiber-Bragg gratings: influence of temperature vs. damage. Compos. Struct. 287, 115295 (2022).
Xia, Y., Hao, H., Zanardo, G. & Deeks, A. Long term vibration monitoring of an RC slab: temperature and humidity effect. Eng. Struct. 28, 441–452 (2006).
Li, H., Li, S., Ou, J. & Li, H. Modal identification of bridges under varying environmental conditions: temperature and wind effects. Struct. Control Health Monit. 17, 495–512 (2010).
Zhang, Y., Kurata, M. & Lynch, J. P. Long-term modal analysis of wireless structural monitoring data from a suspension Bridge under varying environmental and operational conditions: system design and automated modal analysis. J. Eng. Mech. 143, 04016124 (2017).
Aravanis, T. C., Sakellariou, J. & Fassois, S. Damage precise localization under varying operating conditions via the vibration-data-based functional model method: formulation and experimental validation. J. Sound Vib. 535, 117078 (2022).
Maes, K., Van Meerbeeck, L., Reynders, E. P. B. & Lombaert, G. Validation of vibration-based structural health monitoring on retrofitted railway Bridge KW51. Mech. Syst. Signal Process. 165, 108380 (2022).
Peeters, B. & De Roeck, G. One-year monitoring of the Z24‐Bridge: environmental effects versus damage events. Earthq. Eng. Struct. Dynamics. 30, 149–171 (2001).
Cross, E., Koo, K., Brownjohn, J. & Worden, K. Long-term monitoring and data analysis of the Tamar Bridge. Mech. Syst. Signal Process. 35, 16–34 (2013).
Hu, W. H. et al. Comparison of different statistical approaches for removing environmental/operational effects for massive data continuously collected from footbridges. Struct. Control Health Monit. 24, e1955 (2017).
Moser, P. & Moaveni, B. Environmental effects on the identified natural frequencies of the Dowling hall footbridge. Mech. Syst. Signal Process. 25, 2336–2357 (2011).
Oh, C. K. & Sohn, H. Damage diagnosis under environmental and operational variations using unsupervised support vector machine. J. Sound Vib. 325, 224–239 (2009).
Bayane, I., Leander, J. & Karoumi, R. An unsupervised machine learning approach for real-time damage detection in bridges. Eng. Struct. 308, 117971 (2024).
Han, Q., Ma, Q., Dang, D. & Xu, J. Modal Parameters Prediction and Damage Detection of Space Grid Structure under Environmental Effects Using Stacked Ensemble Learning. Structural Control and Health Monitoring 5687265 (2023). (2023).
Mousavi, M. & Gandomi, A. H. Structural health monitoring under environmental and operational variations using MCD prediction error. J. Sound Vib. 512, 116370 (2021).
Wah, W. S. L., Chen, Y. T. & Owen, J. S. A regression-based damage detection method for structures subjected to changing environmental and operational conditions. Eng. Struct. 228, 111462 (2021).
Soo, L., Wah, W. & Xia, Y. Elimination of outlier measurements for damage detection of structures under changing environmental conditions. Struct. Health Monit. 21, 320–338 (2022).
Huang, J. Z., Li, D. S. & Li, H. N. A new regime-switching cointegration method for structural health monitoring under changing environmental and operational conditions. Measurement 212, 112682 (2023).
Worden, K., Baldacchino, T., Rowson, J. & Cross, E. J. Some recent developments in SHM based on nonstationary time series analysis. Proc. IEEE. 104, 1589–1603 (2016).
Diao, Y., Sui, Z. & Guo, K. Structural damage identification under variable environmental/operational conditions based on singular spectrum analysis and statistical control chart. Struct. Control Health Monit. 28, e2721 (2021).
Wang, X., Li, L., Beck, J. L. & Xia, Y. Sparse bayesian factor analysis for structural damage detection under unknown environmental conditions. Mech. Syst. Signal Process. 154, 107563 (2021).
Huang, J., Yuan, S. J., Li, D. & Li, H. -n. A kernel canonical correlation analysis approach for removing environmental and operational variations for structural damage identification. J. Sound Vib. 548, 117516 (2023).
Yan, A. M., Kerschen, G., De Boe, P. & Golinval, J. C. Structural damage diagnosis under varying environmental conditions - part II: local PCA for non-linear cases. Mech. Syst. Signal Process. 19, 865–880 (2005).
Yan, A. M., Kerschen, G., De Boe, P. & Golinval, J. C. Structural damage diagnosis under varying environmental conditions—part I: a linear analysis. Mech. Syst. Signal Process. 19, 847–864 (2005).
Reynders, E., Wursten, G. & De Roeck, G. Output-only structural health monitoring in changing environmental conditions by means of nonlinear system identification. Struct. Health Monit. 13, 82–93 (2014).
Santos, A., Figueiredo, E., Silva, M., Sales, C. & Costa, J. Machine learning algorithms for damage detection: Kernel-based approaches. J. Sound Vib. 363, 584–599 (2016).
Jin, S. S. & Jung, H. J. Vibration-based damage detection using online learning algorithm for output-only structural health monitoring. Struct. Health Monit. 17, 727–746 (2018).
Sen, D., Erazo, K., Zhang, W., Nagarajaiah, S. & Sun, L. On the effectiveness of principal component analysis for decoupling structural damage and environmental effects in Bridge structures. J. Sound Vib. 457, 280–298 (2019).
Bisheh, H. B. & Amiri, G. G. Structural damage detection based on variational mode decomposition and kernel PCA-based support vector machine. Eng. Struct. 278, 115565 (2023).
Cross, E., Manson, G., Worden, K. & Pierce, S. Features for damage detection with insensitivity to environmental and operational variations. Proc. Royal Soc. A: Math. Phys. Eng. Sci. 468, 4098–4122 (2012).
Shi, H., Worden, K. & Cross, E. J. A regime-switching cointegration approach for removing environmental and operational variations in structural health monitoring. Mech. Syst. Signal Process. 103, 381–397. https://doi.org/10.1016/j.ymssp.2017.10.013 (2018).
Dao, P. B. & Staszewski, W. J. Cointegration and how it works for structural health monitoring. Measurement 209, 112503 (2023).
Deraemaeker, A. & Worden, K. A comparison of linear approaches to filter out environmental effects in structural health monitoring. Mech. Syst. Signal Process. 105, 1–15 (2018).
Figueiredo, E. & Cross, E. Linear approaches to modeling nonlinearities in long-term monitoring of bridges. J. Civil Struct. Health Monit. 3, 187–194 (2013).
Figueiredo, E., Radu, L., Worden, K. & Farrar, C. R. A bayesian approach based on a Markov-chain Monte Carlo method for damage detection under unknown sources of variability. Eng. Struct. 80, 1–10 (2014).
Hsu, T. Y. & Loh, C. H. Damage detection accommodating nonlinear environmental effects by nonlinear principal component analysis. Struct. Control Health Monit. 17, 338–354 (2010).
Abbassi, A., Römgens, N., Tritschel, F. F., Penner, N. & Rolfes, R. Evaluation of machine learning techniques for structural health monitoring using ultrasonic guided waves under varying temperature conditions. Struct. Health Monit. 22, 1308–1325 (2023).
Yang, C. & Liu, Y. Detecting the damage of bridges under changing environmental conditions using the characteristics of the nonlinear narrow dimension of damage features. Mech. Syst. Signal Process. 159, 107842 (2021).
Mousavi, M. & Gandomi, A. H. Prediction error of Johansen cointegration residuals for structural health monitoring. Mech. Syst. Signal Process. 160, 107847 (2021).
Peng, Z., Li, J. & Hao, H. Structural damage detection via phase space based manifold learning under changing environmental and operational conditions. Eng. Struct. 263, 114420 (2022).
Sarmadi, H., Entezami, A. & De Michele, C. Probabilistic data self-clustering based on semi-parametric extreme value theory for structural health monitoring. Mech. Syst. Signal Process. 187, 109976 (2023).
Sawant, S., Sethi, A., Banerjee, S. & Tallur, S. Unsupervised learning framework for temperature compensated damage identification and localization in ultrasonic guided wave SHM with transfer learning. Ultrasonics 130, 106931 (2023).
Du, F., Wu, S., Xing, S., Xu, C. & Su, Z. Temperature compensation to guided wave-based monitoring of bolt loosening using an attention-based multi-task network. Struct. Health Monit. 22, 1893–1910 (2023).
Kashyap, P. et al. Unsupervised deep learning framework for temperature-compensated damage assessment using ultrasonic guided waves on edge device. Sci. Rep. 14, 3751 (2024).
Sawant, S., Patil, S., Thalapil, J., Banerjee, S. & Tallur, S. Temperature variation compensated damage classification and localisation in ultrasonic guided wave SHM using self-learnt features and Gaussian mixture models. Smart Mater. Struct. 31, 055008 (2022).
Kullaa, J. Structural health monitoring under nonlinear environmental or operational influences. Shock Vib. 863494 2014 (2014).
Zang, J. G., Huang, H. B., Sun, Z. G. & Wang, D. S. Subdomain principal component analysis for damage detection of structures subjected to changing environments. Eng. Struct. 288, 116265 (2023).
Zhang, X., Yang, J., Wang, T., Xie, J. & Tian, L. Quantitative evaluation of crack damage under a variable temperature environment based on a Gaussian mixture model. Struct. Health Monit. 22, 2064–2081 (2022).
Ren, Y., Qiu, L., Yuan, S. & Fang, F. Gaussian mixture model and delay-and-sum based 4D imaging of damage in aircraft composite structures under time-varying conditions. Mech. Syst. Signal Process. 135, 106390 (2020).
Daneshvar, M. H., Sarmadi, H. & Yuen, K. V. A locally unsupervised hybrid learning method for removing environmental effects under different measurement periods. Measurement 208, 112465 (2023).
Ni, Y. Q. & Chen, R. Strain monitoring based Bridge reliability assessment using parametric bayesian mixture model. Eng. Struct. 226, 111406 (2021).
Prawin, J. & Bhaskara, G. V. Outlier analysis combined with Gaussian mixture model for structural damage detection. Materials Today: Proceedings (2023).
Jacobs, W. R., Edwards, H., Li, P., Kadirkamanathan, V. & Mills, A. R. Gas turbine engine condition monitoring using Gaussian mixture and hidden Markov models. Int. J. Prognostics Health Manage. 9, 1–15 (2018).
Qiu, L., Fang, F. & Yuan, S. Improved density peak clustering-based adaptive Gaussian mixture model for damage monitoring in aircraft structures under time-varying conditions. Mech. Syst. Signal Process. 126, 281–304 (2019).
Avendaño-Valencia, L. D. & Fassois, S. D. Damage/fault diagnosis in an operating wind turbine under uncertainty via a vibration response Gaussian mixture random coefficient model based framework. Mech. Syst. Signal Process. 91, 326–353 (2017).
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).
Qiu, L., Fang, F., Yuan, S., Boller, C. & Ren, Y. An enhanced dynamic Gaussian mixture model–based damage monitoring method of aircraft structures under environmental and operational conditions. Struct. Health Monit. 18, 524–545 (2019).
Kang, Y. Structural Damage Monitoring Method Based on Adaptive Hierarchical Partitioned Gaussian Mixture Model (Nanjing University of Aeronautics and Astronautics, 2021).
Figueiredo, M. A. T. & Jain, A. K. Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24, 381–396 (2002).
Wang, Q., Ma, S. & Yue, D. Identification of damage in composite structures using Gaussian mixture model-processed lamb waves. Smart Mater. Struct. 27, 045007 (2018).
Yi, Z., Pan, N., Liu, Y. & Guo, Y. Study of laser displacement measurement data abnormal correction algorithm. Eng. Comput. 34, 123–133 (2017).
Bishop, C. M. & Nasrabadi, N. M. Pattern Recognition and Machine LearningVol. 4 (Springer, 2006).
Qiu, T. EM Algorithm and its Application Research Based on Gaussian Mixture Model (University of Electronic Science and Technology of China, 2015).
Miguel, L. P., Teloli, R. O., da Silva, S. & Chevallier, G. Probabilistic machine learning for detection of tightening torque in bolted joints. Struct. Health Monit. 21, 2136–2151 (2022).
Entezami, A., Sarmadi, H. & Behkamal, B. Long-term health monitoring of concrete and steel bridges under large and missing data by unsupervised meta learning. Eng. Struct. 279, 115616 (2023).
Sarmadi, H. & Yuen, K. V. Early damage detection by an innovative unsupervised learning method based on kernel null space and peak-over‐threshold. Computer Aided Civil Infrastructure Eng. 36, 1150–1167 (2021).
Haan, L. & Ferreira, A. Extreme Value Theory: an IntroductionVol. 3 (Springer, 2006).
Sarmadi, H. & Karamodin, A. A novel anomaly detection method based on adaptive Mahalanobis-squared distance and one-class kNN rule for structural health monitoring under environmental effects. Mech. Syst. Signal Process. 140, 106495 (2020).
Kullaa, J. Distinguishing between sensor fault, structural damage, and environmental or operational effects in structural health monitoring. Mech. Syst. Signal Process. 25, 2976–2989 (2011).
Anastasopoulos, D., De Roeck, G. & Reynders, E. P. One-year operational modal analysis of a steel Bridge from high-resolution macrostrain monitoring: influence of temperature vs. retrofitting. Mech. Syst. Signal Process. 161, 107951 (2021).
Acknowledgements
This research work is jointly supported by the National Natural Science Foundation of China (Grant No. 52078284, 52308318), Natural Science Foundation of Guangdong Province (Grant No. 2023A1515012230, 2021A1515011770), Guangdong Province Special Fund for Science and Technology (“Big project + task list”) Project (Grant No. STKJ2023043), STU Scientific Research Foundation for Talents (Grant No. NTF18012, NTF21019), Anhui international joint research center of data diagnosis and smart maintenance on bridge structures (Grant No. 2022AHGHYB01), Special Fund Project for Central Government to Guide Local Science and Technology Development (Grant No. STKJ2024058) and Open Fund for State Key Laboratory of Coastal and Offshore Engineering (Grant No. LP2407).
Author information
Authors and Affiliations
Contributions
Jie-zhong Huang and Jian Yang wrote the main manuscript text; Dong-sheng Li and Wei-chen Kong reviewed the manuscript; Ya-fei Wang collected the experiment data and perform data analysis.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Huang, Jz., Yang, J., Li, Ds. et al. A novel damage detection method based on sequential iteration and Gaussian mixture model for structural health monitoring under environmental effects. Sci Rep 15, 23823 (2025). https://doi.org/10.1038/s41598-025-08206-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-08206-9