Abstract
In the data obtained by laser interferometric gravitational wave detectors, transient noise with non-stationary and non-Gaussian features occurs at a high rate. This often results in problems such as detector instability and the hiding and/or imitation of gravitational-wave signals. This transient noise has various characteristics in the time–frequency representation, which is considered to be associated with environmental and instrumental origins. Classification of transient noise can offer clues for exploring its origin and improving the performance of the detector. One approach for accomplishing this is supervised learning. However, in general, supervised learning requires annotation of the training data, and there are issues with ensuring objectivity in the classification and its corresponding new classes. By contrast, unsupervised learning can reduce the annotation work for the training data and ensure objectivity in the classification and its corresponding new classes. In this study, we propose an unsupervised learning architecture for the classification of transient noise that combines a variational autoencoder and invariant information clustering. To evaluate the effectiveness of the proposed architecture, we used the dataset (time–frequency two-dimensional spectrogram images and labels) of the Laser Interferometer Gravitational-wave Observatory (LIGO) first observation run prepared by the Gravity Spy project. The classes provided by our proposed unsupervised learning architecture were consistent with the labels annotated by the Gravity Spy project, which manifests the potential for the existence of unrevealed classes.
Similar content being viewed by others
Introduction
Gravitational waves are distortions of the space–time continuum that propagate (with high probability) at the speed of light. They are emitted during events such as the coalescence of compact star binaries and supernova explosions. The first observation of a gravitational wave, which was from the coalescence of a black hole binary, was achieved by the Laser Interferometer Gravitational-wave Observatory (LIGO)1 located in Livingston, Louisiana and Hanford, Washington in the USA in September 20152. Subsequently, LIGO and Virgo3 in Europe made three international joint observation runs and observed as many as 90 events of gravitational waves emitted by the coalescence of compact binaries4,5,6,7. Moreover, GEO6008, in Germany and KAGRA9,10,11,12, in Japan, made a 2-week observation run (O3GK) in April 202013,14. The subsequent fourth observation run (O4) is planned to be conducted jointly with LIGO, Virgo, and KAGRA.
When searching for a gravitational wave signal in the data from the interferometers, suitable techniques for separating the gravitational waves from instrumental noise in the observed data are essential because the signals of the gravitational waves are generally smaller than the detector noise. The gravitational-wave detector is sensitive to environmental and instrumental states (such as ground motions, air pressure, optics suspensions, fluctuations in the laser, vacuum, and mirror). Consequently, non-stationary and non-Gaussian noise, called “transient noise”, frequently appears in the detector. Transient noise causes instability in the detector and the hiding and/or imitating of the gravitational-wave signals. The LIGO and Virgo collaboration reported that transient noise with a signal-to-noise ratio \(> 6.5\) occurred at a rate of 1.10 events per minute at LIGO Livingston (LLO) in the first half of the third observation run (O3a) between 1 April 2019, 15:00 UTC and 1 October 2019, 15:00 UTC5, and at a rate of 1.17 events per minute at LLO in the second half of O3 (O3b) between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC7, respectively.
Transient noise has various time–frequency characteristics that are related to its causes in the detector. Classifying transient noise could provide us with clues to explore its origins and improve the performance of the detector. Among others, the Gravity Spy project15,16,17,18 is one such effort to classify transient noise. The Gravity Spy project used the Omicron software19 to identify the signal of transient noise observed in the time-series data. Thereafter, Omega Scan20 was used to create a time–frequency spectrogram around the identified transient noise as two-dimensional (2D) images. Based on a part of these created 2D images, using cloud resources in collaboration with LIGO detector characterisation experts and volunteer citizen scientists for the analysis, 22 types of labels associated with the characteristics or causes of transient noise were annotated. Both images and labels were recorded. Finally, they classified the transient noise in the remaining images by supervised learning using the pre-classified images and labels. As this process shows, the data annotation for machine learning is highly labour-intensive.
Previous studies21 using unsupervised classification grouped together similar transient noise in the Gravity Spy dataset16. Bahaadini et al. used the DIRECT method22 to analyse the feature embedding learned from the Gravity Spy dataset16 and observed a different class of transient noise from the existing classes. Unsupervised clustering applying transfer learning23 exhibited a new class of transient noise in addition to the 22 classes of the Gravity Spy project. Moreover, supervised classification using the latest observation O3 dataset presented a new class of transient noise17.
As unsupervised learning does not require any pre-assigned labels for the training dataset, this architecture is expected to reduce annotation work for the training data, increase the objectivity of the classification, and even classify a new class of the transient noise. Unsupervised learning is also useful in various fields, such as text categorisation, feature representation, and clustering24,25,26,27. In this study, we focus on unsupervised learning using a deep convolutional neural network (CNN) and propose a classification architecture for transient noise. Our proposed architecture consists of two processes: feature learning and classification. In the feature learning process, the features of transient noise are extracted from the time–frequency spectrogram images (2D images) using a variational autoencoder (VAE)28,29. In the classification process, invariant information clustering (IIC)30 is used to classify images of the transient noise using features extracted by the encoder of the pre-learned VAE. We applied the proposed architecture to the dataset16 created by the Gravity Spy project of the LIGO observation run 1 (O1)4 as our input images, examined the validity of the unsupervised classification result, and analysed the correspondence with the labels of the Gravity Spy project.
Results
The result section consists of two subsections: the results of the training process and evaluation of the unsupervised learning architecture. The Gravity Spy dataset of LIGO O1, which was developed by the Gravity Spy project shown in Fig. 1, was used for training in our proposed architecture. This dataset contains a total of 8535 transient noises in four time durations: 0.5, 1.0, 2.0, and 4.0 s. Each data unit has a label with one of the 22 types which are related to the origins or characteristics of the transient noise. The labels annotated by the Gravity Spy project under Zooniverse, which is the online citizen science platform, were used only when evaluating the training results of the proposed architecture. In addition, the pre-processing of the dataset is shown in “Pre-processing” section.
(a) Example of 2D image of the time–frequency spectrogram of transient noise in the Gravity Spy dataset. Regarding each transient noise, four time durations (0.5, 1.0, 2.0, and 4.0 s from the left of the figure) are recorded from the centre time. (b) Table showing all the classes, the number of data, and its ratio to the number in the Gravity Spy dataset. There are 22 classes in total, and each of 21 classes is given a name related to an occurrence cause or a characteristic of the shape on the spectrogram of transient noise. The other is “None_of_the_Above”, which does not belong to any class. (c) Example of the image for each class in the Gravity Spy dataset. The figure shows 12 of the 22 classes of the transient noise with 0.5 s.
Training process of our architecture
We investigated the training parameters to use for the VAE as follows. The dimensions of the feature variable \(\varvec{z}\) were 64, 128, 256, 512, and 1024; the training size rate was in the range of [0.6, 0.9] in increments of 0.1; the learning rate using the Adam31 optimiser with parameters \(\beta _1 = 0.9\), \(\beta _2 = 0.999\) (coefficients used for computing running averages of gradient and its square) and \(\epsilon =10^{-8}\) (term added to the denominator to improve numerical stability) was in the range of \([5\times 10^{-7}, 5 \times 10^{-2}]\) in increments of one digit; the minibatch size was in the range of [32, 128] in increments of 32. The maximisation of the lower bound (3) (i.e. let \(\delta = -\sum _{i}^ {N}\mathcal {L}(\varvec{x}^{(i)}, \varvec{\theta }, \varvec{\phi })\)) was used as a training objective, and the minimisation of \(\delta\) was used for training. The value of \(\delta\) does not have a significant effect on the dimension of \(\varvec{z}\) and the training size rate. By contrast, the learning rate and minibatch size are related to the value of \(\delta\) and its stability. The representative parameters for training are shown on the left side of Fig. 2a, and the training curves using these parameters are shown in Fig. 2b. Considering Case 1 (black line in Fig. 2b), the learning rate seems too low and \(\delta\) does not decrease. Regarding Case 2 (grey line), the result of the training is not stable, showing the fluctuation in the curve, although \(\delta\) has decreased compared with Case 1. In Case 3 (blue line), \(\delta\) decreases in both the training and evaluation and seems stable after 100 epochs. Considering these results, for the remainder of the study, the parameters of Case 3 were utilised in the proposed architecture.
Left (a) Training parameters for the VAE of the proposed architecture. The dimension of \(\varvec{z}\) is the output number of the encoder. The training size rate is the ratio of the total number of data to the data size of the input at training. Regarding the architecture evaluation, the input size is set to \((1 - \text {Training size rate})\). The learning rate is the initial learning rate, and the optimiser used is Adam31. Right (a) Training parameters for the IIC of the proposed architecture. The number of output classes is set to the number of classes to be classified. The classifier number is for multiple classifiers that are used to improve the performance of the classifier using spectral clustering. (b) Training curve during the training and evaluation of the VAE. The solid and dashed lines in the figure show the training objective \(\delta \equiv -\sum _{i}^ {N}\mathcal {L}(\varvec{x}^{(i)}, \varvec{\theta }, \varvec{\phi })\) at the time of training and evaluation, respectively. (c) Reconstructed images generated by the decoder of the VAE at 100 epochs in Case 3.
Examples of the reconstructed images of the transient noise generated by the decoder of the VAE at 100 epochs are shown in Fig. 2c. The characteristics of the reconstructed images seem similar to those of the input images. We confirmed a similar tendency for all the other inputs and reconstructed images. Therefore, the encoder of the VAE at 100 epochs was applied to the IIC for the classification of the transient noise.
Furthermore, the validity of the features by VAE is shown in Supplemental Material “Feature Visualization of Transient Noise using t-SNE” section by visualised features \(\varvec{z}\), which are projected using t-SNE.
After training the VAE, the training parameters of IIC were also investigated using the pre-trained encoder. The output classes were in the range of [22, 100] in increments of 2; the output over the classes was in range of [50, 500] in increments of 50; the classifier number was one of 3, 5, 10, 20; the learning rate of the Adam optimiser with parameters \(\beta _1 = 0.9\), \(\beta _2 = 0.999\) and \(\epsilon =10^{-8}\) was in the range of \([5\times 10^{-7}, 5 \times 10^{-2}]\) in increments of one digit; the minibatch size was in the range of [64, 256] in increments of 32. Owing to the training, the mutual information from (4) was high, between 30 and 40 output classes, which is consistent with the fact that the subclasses are implied in the dataset. When the output over classes and the classifiers change, the mutual information does not seem to change. In this study, the IIC parameters shown on the left side of Fig. 2a were used for the classification. In addition, considering the parameters of spectral clustering with multiple classifiers, the number of classifiers \(K=5\), and the number of classes \(C=36\). These values are the optimal performance for classification using the accuracy shown in “Discussion” section. The training for the VAE and IIC with a 128 mini-batch size took approximately 1.0 h/100 epochs and approximately 0.3 h/100 epochs, respectively, using two NVIDIA GeForce RTX 2080 Ti GPUs, an Intel Xeon CPU E5-2637 v4 (core 8), and 125 GB of main memory.
Evaluation of our architecture
The evaluation results are presented in this section. The proposed architecture shown in “Proposed architecture” section was trained using the pre-processing dataset described in “Pre-processing” section.
Figure 3 shows a randomly selected image from each class (representative image) and similar images that have a high degree of similarity to the representative image in a class. These similar images are derived from the cosine similarity32 between the representative image and the other images, using an affinity matrix which is calculated by spectral clustering.
The representative and similar images in all the classes were classified using unsupervised learning. This representative image which is denoted by i in the image is randomly selected from a class \(i\in c = \{0,\dots , 35\}\), and its most similar image is to the right of a representative image in class (i). The cosine similarity to the representative image in class (i) is shown at the top of the image.
The representative images seem to have different characteristics for each class, and similar images are close to their representative images. Moreover, the image of class (15) in Fig. 3 shows that the classifier recognises the same class even if the data are shifted in the time direction. Therefore, training that does not depend on the perturbation in the time duration is achieved by pre-processing the dataset.
To investigate the correspondence between the results of supervised and unsupervised learning, a confusion matrix using the Gravity Spy labels is shown in Fig. 4. Considering the classes (classes (6) (“1080 Lines”), (8) (“Repeating_Blips”), (14) (“Chirp”), (18) (“Helix”), and (24) (“Scratchy”)), unsupervised learning classifies the Gravity Spy labels (noted parentheses) as one class. In addition, the output images by the classifier shown in Fig. 3 are similar to those of the Gravity Spy labels.
Confusion matrix of the classification results of the proposed architecture. The vertical axis of the confusion matrix represents the labels and number of data in the Gravity Spy dataset. The lower and upper horizontal axes denote the number of images classified into the unsupervised classes and the labels of the unsupervised classes, respectively. Each column of the confusion matrix is coloured using the ratio of the Gravity Spy-labelled images classified into the unsupervised class (i). In addition, the classes that are separated from the Gravity Spy labels on the confusion matrix, such as classes (0), (13), (26), (32), (35), and (36), also show the ratio values in the matrix. The potential number of classes on Gravity Spy labels which are estimated by unsupervised learning are shown in the right column of the figure. The notation “1” (in white cells) indicates that the number of classes labelled by the Gravity Spy matches the result of the unsupervised learning, and the inequality sign (in light grey cells) indicates that the class is separated into multiple classes in the unsupervised learning. The notation “0” (in dark grey cells) indicates an unclassified class in this training and dataset, and “–” notation indicates that they do not belong to any class of the unsupervised learning.
The “Scattered_Light” class is separated into classes (2), (3), (11), and (16) on the confusion matrix, respectively. These classes are classified into different classes on unsupervised learning, whereas their characteristics are similar to Fig. 3. A previous study17 on supervised learning with the Gravity Spy labels indicated the existence of a subclass that might be in the “Scattered_Light” class. The unsupervised classification yielded the same results as in the previous study, indicating the existence of a subclass of the “Scattered_Light” class.
Considering the “Blip” and “Koi_Fish” classes, both classes are separated into multiple classes as shown in Fig. 4. The representative images and their similarity images from separating the classes are shown in Fig. 5, where the similarity images are sorted in descending order and are sampled randomly from the cosine similarity to the representative image. Each separating class is grouped into its own class, even for images with low cosine similarity. The images of the classes separated from “Blip” have a common Gravity Spy label. Moreover, the frequency growth of the spectrogram image for classes (9), (20), and (30) looks roughly similar, and the unsupervised classification classifies each class using their characteristics details. Similar results can be observed in “Koi_Fish” (class(5) and class(7)). Therefore, the images of “Blip” or “Koi_Fish” may be classified into more detailed subclasses.
Representative images and images similar to unsupervised learning. Considering the figure, classes (9), (22), and (30) are separated from the ‘‘Blip” class, and classes (5) and (7) are separated from the “Koi_Fish” class. The representative images in the left column are sampled randomly from the images classified in class (i) using unsupervised learning. The similar images in the other columns are sorted in a descending order and are sampled randomly from the cosine similarity (a value at the top of an image), considering the representative image.
The “Paired_Doves”, “Wandering_Line”, and “Air_Compressor” classes are a few of the samples in the dataset (Fig. 1b). “Air_Compressor” is classified into one class; however, the other classes are not classified into any unique classes in the unsupervised classification. We assume that “Air_Compressor” is a class that cannot be divided further. Therefore, it is classified into one class, even with few data. Conversely, “Paired_Doves” and “Wandering_Line” are assumed to have more subclasses. The reason why they are not classified into a specific class can be explained by the fact that a limited amount of transient noise is classified into “Paired_Doves” and “Wandering_Line”.
The “None_of_the_Above” class of the O1 dataset comprises data that do not belong to any other Gravity Spy labels. The unsupervised classification does not classify these data into unique classes; instead, it distributes them into various class types. This result is consistent with a previous study by Bahaadini et al.16. In fact, Soni et al.17 used the O3 dataset5 and reported that several of the “None_of_the_Above” appear in the “Blip” class or the new population of “Scattering_Light”. A similar classification result is expected when applying our architecture to the O3 dataset and retraining it.
Based on the above results, the data of the Gravity Spy labels that are classified into multiple classes in unsupervised classification are shown in grey in the “Estimated number of class” in Fig. 4. These data that are separated from the Gravity Spy labels may imply the existence of subclasses.
Discussion
Let the number of Gravity Spy classes (labels) be \(C^{\prime } = 22\) and the classified result (vector) whose unsupervised class is the i-th class be \(\varvec{v}^{(i)} \in {\mathbb {R}}^{C^{\prime }}\), where \(i \in c = \{0, \dots , 35\}\). Alternatively, \(v^{(i)}_j\), indicating the j-th component of \(\varvec{v}^{(i)}\), is the number of the j-th images, and the Gravity Spy label is classified as the i-th unsupervised class. The total number of classified i-th unsupervised classes is expressed by the \(L^1\) norm32 of \(\varvec{v}^{(i)}\) (i.e. \(|\varvec{v}^{(i)}|_1 = \sum _{j=1}^{C^{\prime }}|v^{(i)}_j|\)). The j-th component of a normalised vector \(\varvec{v}^{(i)} / |\varvec{v}^{(i)}|_1\) is the ratio of the j-th image of the Gravity Spy label on the i-th unsupervised class. Therefore, we define the accuracy of unsupervised learning as
It should be noted that the confusion matrix shown in Fig. 4 is not a square matrix, and its indices of unsupervised labels (columns) depend on the initial values of training. Therefore it is difficult to define the evaluation indicators, such as recall, precision, and F-measure. The accuracy of the proposed architecture was 90.9%, where the total number of unsupervised classes was set to \(C=36\). Comparatively, although (1) is a slightly different definition from the usual definition of the accuracy of supervised learning, the supervised learning of the Gravity Spy project15 achieved 97.1% accuracy on the testing data using the same dataset as that used here. Furthermore, we compared our results with those (shown in Table I of reference23) of different CNN models, such as Google Inception33 (with versions 2 and 3), Microsoft ResNet34, VGG35 (with 16 and 19 layers), and the retrained CNN model based on the Gravity Spy project15,18. Google Inception, ResNet, and VGG are the most popular image recognition architectures, all of which were submitted to the ILSVRC competitions36. Note that all models used the same dataset (Gravity Spy dataset of LIGO O1). The accuracy was more than 96% for all models. Although the accuracy of our model is less than the that of above models, unsupervised learning has the advantage that data annotations are not required, and our model has the potential to suggest the existence of subclasses, as shown in “Evaluation of our architecture” section.
Let us now examine the classification results in Fig. 4, one of the factors that decrease the accuracy of unsupervised learning in (1). The representative images of the major characteristics and images of their low similarities are shown in Fig. 6. Considering classes (0) and (35), the classifier is able to identify the global features of images because the images are similar to the representative images that also exist in the data of other Gravity Spy labels. Regarding classes (13) and (34), the classifier cannot recognise the images properly and may be learning the background features. This problem can be solved by adjusting the neural-network configuration. Moreover, regarding class (26), it is observed that the minor images (such as “Power_Line”) are mixed with the major class (“Air_Compressor”). The same result can also be observed for class (32). Because the characteristics of both images are similar, it is possible that both noises have similar characteristics. Additionally, a comparison of the classification results shown in Fig. 4 with the feature visualisation using t-SNE is discussed in Supplemental Material “Feature Visualization of Transient Noise using t-SNE” section. Based on the above results, we can confirm the consistency between the label annotated by the Gravity spy project and the class provided by our proposed unsupervised learning architecture and provide the potential for the existence of the unrevealed classes.
Examples of images in the classes with reduced accuracy in unsupervised learning. The major images in the left column are randomly sampled data from class (i). The minor images in the other columns are sorted in an ascending order from the cosine similarity to its major image, indicating that they are sampled from the lowest similarity to the major one. The Gravity Spy label and the value of the cosine similarity are on top of the sampled image.
Subsequently, we will build a system for the classification of transient noise using the proposed architecture in KAGRA. In addition, we will extend our architecture to self-supervised learning37 to enhance the accuracy of the classification. This algorithm trains the data of a specific label, known as the golden set15, which generates a pseudo label to the given dataset and retrains it. Using the new classes classified by unsupervised learning, the semi-supervised learning can help reduce the annotation process for the training and can solve the problem of ensuring objectivity in the classification. We would like to construct a semi-supervised architecture that incorporates the advantages of both Gravity Spy’s supervised and unsupervised learning.
Methods
The proposed unsupervised learning method consists of two architectures: a variational autoencoder (VAE) and invariant information clustering (IIC). The VAE is used to learn the features from the time–frequency spectrogram (2D images) of transient noise, and the IIC classifies the transient noise from the features that are learned by the encoder of the VAE. Before we present the details of the method, we explain the target dataset.
Target dataset
The Gravity Spy dataset16, which is the input dataset, is an image set of transient noise obtained from the LIGO O14. Omicron software19 searches for transient noise in time-series data, and Omega Scan20 software generates an image of the time–frequency spectrogram of each transient noise using Q-transformation20,38. Q-transformation is a method that estimates the frequency component of the time-series data by setting a window function on each time–frequency component, generating a 2D image of the time–frequency spectrogram. The spectrogram image of each transient noise in the Gravity Spy dataset has four time durations (0.5, 1.0, 2.0, and 4.0 s) at the centre, as shown in Fig. 1a. In addition, these transient noises are given 22 labels, which are related to cause as shown in Fig. 1b. For example, the images of 12 classes of transient noise are shown in Fig. 1c.
Pre-processing
The pre-processing applied to the Gravity Spy dataset for the training of our proposed architecture is shown in Fig. 7.
-
1.
For each transient noise, stack the images of the time–frequency spectrograms with the four time widths shown in Fig. 7, and use it as the input data for this transient noise. The resolution of the transient noise image for each time duration is 224 px \(\times\) 272 px (frequency and temporal direction, respectively), and the dimensions of the stacked images are 4 \(\times\) 224 \(\times\) 272 px.
-
2.
Convert the stacked data into two types:
- Input Image:
-
: Crop the left and right parts of the image equally such that the resulting image has dimensions of 4 \(\times\) 224 px \(\times\) 224 px
- Perturbed Image:
-
: Crop the left part of the image at the randomly time-shifted position in the range 0–24 px and also crop the right part of the image so that the resulting image has dimensions of 4 \(\times\) 224 px \(\times\) 224 px
Considering the characteristics of the time–frequency spectrogram, a small displacement in the time direction does not change its physical characteristics because this operation can be interpreted as a change in the event time. Therefore, the time-shifted images can be regarded as new events of transient noise, and it makes the architecture realise the classification of transient noise that does not depend on small displacements in the time direction. Conversely, a possible small displacement of the spectrogram in the frequency direction changes its physical characteristics. Therefore, the frequency-shifted images fall into different classes to that of the original image in the classification. Thus, the perturbation of transient noise is not applied in the frequency direction; nonetheless, they are applied only in the time direction.
Overview of the input data pre-processing. The original samples are stacked in four time durations (0.5, 1.0, 2.0, and 4.0 s) to generate the data, where the width and height of each image are 224 px and 272 px, respectively, and the dimensions of the stacked data are 4, 224, and 272. Considering the training process of the proposed architecture, after a random time shift of image in the range 0–24 px, the dimensions of the data (4, 224, and 224) are cropped and the cropped data are used as training data. The data that are cropped without a time shift are used for the evaluation of the VAE and as the input image of the IIC.
Proposed architecture for the classification of transient noise. The tables show the details of the architectures of neural networks. \(^B\) denotes batch-normalise to an object, and M denotes the mini-batch size. Left: Schematic architecture of the VAE for feature learning. The VAE trains neural networks to maximise the lower bound in (3). The input to the VAE is a perturbed image \(\varvec{x}^{\prime }\) of the time–frequency spectrogram of the transient noise. This pre-process allows the encoder to learn features that do not depend on the perturbation. At the output layer of the encoder, the average and variance of the feature variable \(\varvec{z}\) are output from the same network and separated into the dimensions (M, 512). Subsequently, the feature variables \(\varvec{z}\) are constructed using the reparameterisation trick. The decoder uses \(\varvec{z}\) to generate a reconstructed image that is close to the input image. Right: Schematic architecture of the IIC for classification. The IIC trains neural networks to maximise the mutual information between the input data and its perturbed data. The inputs to the pre-trained encoders of the VAE are the original and perturbed images, respectively. Both encoders have the same architecture, and the dashed lines indicate the sharing weights of the neural networks in the figure. The IIC classifies transient noise using the SoftMax activation function at the output layer from the feature, which is the output of the pre-trained encoder. C is the estimated number of classes of the transient noise, and W is the number of classes used in the over-clustering.
In the training process of the proposed architecture, there is a random time shift of the image in the 0–24 px range used for the training data. The data that were cropped without a time shift were used for the evaluation of the VAE and the input image of the IIC.
Variational autoencoder
In this study, the features of transient noise are obtained from their time–frequency 2D spectrogram image using VAE, one of the approaches for feature learning39,40 using convolutional deep learning. Generally, feature learning is a method for acquiring features that are effective for the prediction and classification of data. It also has the ability to convert high-dimensional data to low-dimensional features.
Let the input dataset be \(\mathcal {D} = \{ \varvec{x}^{(1)}, \dots , \varvec{x}^{(N)} | \varvec{x}^{(i)} \in {\mathbb {R}}^D, i = 1, \cdots , N\}\) and the marginal likelihood for \(\mathcal {D}\) be \(p_{\varvec{\theta }} (\varvec{x}^{(1)}, \dots , \varvec{x}^{(N)})\), where D is the dimension number, N is the number of the input data, and \(\theta\) are parameters for the architecture. The objective of the learning is to maximise the marginal likelihood. When the dataset \(\mathcal {D}\) is independent and identically distributed, the log marginal likelihood becomes \(\sum _{i=1}^N\text {ln}p_{\varvec{\theta }}({\varvec{x}^{(i)}})\). Consider that the inference architecture \(q_{\varvec{\phi }}(\varvec{z}|\varvec{x}^{(i)})\) (also known as encoder) approximates \(q_{\varvec{\phi }}(\varvec{z}|\varvec{x}^{(i)})\simeq p_{\varvec{\theta }}(\varvec{z}|\varvec{x}^{(i)})\), where \(\varvec{z} \in {\mathbb {R}}^J\) is a feature variable and \(J < D\). Therefore, the log marginal likelihood \(\text {ln}p_{\varvec{\theta }}(\varvec{x}^{(i)})\) can be expressed as
The second inequality is obtained by the Jensen’s inequality, and \(\mathcal {L}(\varvec{x}^{(i)}, \varvec{\theta }, \varvec{\phi })\) is an objective function known as the lower bound. Let a prior and a posterior distribution of \(\varvec{z}\) be a multivariate Gaussian distribution, indicating that \(p_{\varvec{}{\theta }}(\varvec{z})= \mathcal {N}(\varvec{z}|\varvec{0}, \varvec{I})\) and \(q_{\varvec{\phi }}(\varvec{z}|\varvec{x}^{(i)})= \mathcal {N}(\varvec{z}|\varvec{\mu _\phi }(\varvec{x}^{(i)}),\varvec{\Sigma _\phi }^{2}(\varvec{x}^{(i)})\varvec{I})\), where \(\varvec{\mu _\phi }(\cdot )\) and \(\varvec{\Sigma _\phi }(\cdot )\) are the outputs from an encoder and \(\varvec{I}\) is the identity matrix of dimension J. Let a posterior distribution of \(\varvec{x}\) be the multivariate Bernoulli distribution, \(p_{\varvec{\theta }}(\varvec{x}^{(i)}|\varvec{z})= \text {bern}({\varvec{x}}^{(i)}|\varvec{g}_{\varvec{\theta }}(\varvec{z}))\), where \(\varvec{g}_{\varvec{\theta }}(\cdot )\) are the outputs from the decoder. Thus, the expression of the lower bound to be maximised is
where \(D_{\text {KL}}[\cdot || \cdot ]\) is the Kullback–Leibler divergence of two distributions and \(\varvec{z}^{(i,l)}\) is referred to as the reparameterisation trick, such that \(\varvec{z}^{(i,l)} = \varvec{g}_{\varvec{\phi }}(\varvec{\epsilon }^{(l)}, \varvec{x}^{(i)}) = \varvec{\mu }_{\varvec{\phi }}(\varvec{x}^{(i)}) + \varvec{\epsilon }^{(l)}\odot \varvec{\Sigma }_{\varvec{\phi }}(\varvec{x}^{(i)})\), where \(\varvec{\epsilon }\sim \mathcal {N}(\varvec{0}, \varvec{I})\), and \(\odot\) signifies the Hadamard product.
Classification using invariant information clustering
A typical method for clustering is the k-means, which uses the Euclidean distances between data. Recently, several variants of the k-means have been developed (e.g. k-means++41, fuzzy c-means42, and x-means43). Regarding clustering in a high dimensional space, the variance of the distance between data becomes small owing to the “curse of dimensionality”. Alternatively, IIC30, which is a classification method, seems to be effective because it does not use the distances of the data for learning. In this study, transient noise is classified using IIC by maximising the mutual information. Let \(\varvec{x}\in {\mathbb {R}}^D\) be the input data, \(\varvec{x}^{\prime }\) be the perturbed data of \(\varvec{x}\), C the number of output classes, and \(\varvec{\Phi }(\varvec{x}) \in {\mathbb {R}}^C\) be a classifier in which the output layer of the classifier uses the SoftMax activation function. Consider a pair of cluster assignments for two inputs, \(\varvec{x}\) and \(\varvec{x}^{\prime }\). Their conditional joint distributions and marginal distributions are \(P_{ij} = \varvec{\Phi }(\varvec{x}^{(i)}) \cdot \varvec{\Phi }(\varvec{x}^{(j)\prime })^{\text {T}}\) and \(P_{i} = \sum _j\varvec{\Phi }(\varvec{x}^{(i)}) \cdot \varvec{\Phi }(\varvec{x}^{(j)\prime })^{\text {T}}\), respectively, where the superscript \(\text {T}\) denotes the transpose. The objective for the maximisation of the mutual information is expressed as
To improve the performance of the classifier, auxiliary over-clustering30 is also used when calculating the mutual information. This over-clustering formula is the same as (4), except for \(\varvec{\Phi }(\varvec{x}) \in {\mathbb {R}}^W\), where \(C < W\).
Proposed architecture
We propose the unsupervised classification architecture shown in Fig. 8. It is a deep learning architecture that trains time–frequency 2D spectrogram images of transient noise. Considering the proposed architecture, the feature variables of the input image \(\varvec{x}\) and its perturbation image \(\varvec{x}^\prime = \varvec{\xi }(\varvec{x})\) are extracted by a pre-trained encoder of the VAE. The perturbation \(\varvec{\xi }\) is a transformation that does not change the information required for the classification (see “Pre-processing” section). Subsequently, the IIC learns to maximise the mutual information \(I(\varvec{\Phi }(\varvec{z}), \varvec{\Phi }(\varvec{z}^{\prime }))\), which is composed of a pair of feature variables \((\varvec{z} = \varvec{\mu }_{\varvec{\phi }}(\varvec{x}), \varvec{z}^{\prime } = \varvec{\mu }_{\varvec{\phi }}(\varvec{x}^{\prime }))\).
The clustering of the IIC depends on the initial values of the neural networks in which the values are randomly provided. Thus, the classification results from each classifier varies slightly. Regarding the unsupervised learning, it is difficult to apply an ensemble average for each classification result to solve the dependencies of the initial values because the classified labels are random at each time. In this study, spectral clustering44 was applied to compress the multiple results of classification into one result. The procedure is as follows:
-
1.
Let D, K, and C be the number of datasets, number of classifiers, and estimated number of classes, respectively. Create a hypermatrix H whose dimension is \((D, K \times C)\) from each classifier result.
-
2.
Considering \(h^{(i)}\) as a row vector for each data of H, calculate an affinity matrix using the Gaussian kernel, which is given by \(\text {exp}(-\Vert h^{(i)}-h^{(j)}\Vert )\).
-
3.
Compute the spectral clustering to the affinity matrix. Consequently, we have labels for each dataset whose dimension is (D, C).
Data availability
All the results are reported for public data of Gravity Spy project. Data on the results of unsupervised learning are available upon request from Y. Sakai and HT.
Code availability
All the codes developed in this study are available upon request from Y. Sakai and HT.
References
Aasi, J. et al. Classical and quantum gravity. Adv. LIGO 32, 074001. https://doi.org/10.1088/0264-9381/32/7/074001 (2015).
Abbott, B. et al. GW150914: The advanced LIGO detectors in the era of first discoveries. Phys. Rev. Lett. 116, 131103. https://doi.org/10.1103/PhysRevLett.116.131103 (2016).
Acernese, F. et al. A second-generation interferometric gravitational wave detector. Adv. Virgo Class. Quant. Gravity 32, 024001. https://doi.org/10.1088/0264-9381/32/2/024001 (2015).
Abbott, B. et al. GWTC-1: A gravitational-wave transient catalog of compact binary mergers observed by LIGO and Virgo during the first and second observing runs. Phys. Rev. X 9, 031040. https://doi.org/10.1103/PhysRevX.9.031040 (2019).
Abbott, R. et al. GWTC-2: Compact binary coalescences observed by LIGO and Virgo during the first half of the third observing run. Phys. Rev. X 11, 021053. https://doi.org/10.1103/PhysRevX.11.021053 (2021).
Abbott, R. et al. GWTC-2.1: Deep extended catalog of compact binary coalescences observed by LIGO and Virgo during the first half of the third observing run. Preprint at http://arxiv.org/abs/2108.01045 (2021).
Abbott, R. et al. GWTC-3: Compact binary coalescences observed by LIGO and Virgo during the second part of the third observing run. Preprint at http://arxiv.org/abs/2111.03606 (2021).
Grote, H. et al. The status of GEO 600. Class. Quant. Gravity 25, 114043. https://doi.org/10.1088/0264-9381/21/5/006 (2008).
Akutsu, T. et al. KAGRA: 2.5 generation interferometric gravitational wave detector. Nat. Astron. 3, 35. https://doi.org/10.1038/s41550-018-0658-y (2019).
Akutsu, T. et al. Overview of KAGRA: Detector design and construction history. Prog. Theor. Exp. Phys. 2021, 114043. https://doi.org/10.1093/ptep/ptaa125 (2021).
Akutsu, T. et al. Overview of KAGRA: KAGRA science. Prog. Theor. Exp. Phys. 2021, 05A103. https://doi.org/10.1093/ptep/ptaa120 (2021).
Akutsu, T. et al. Overview of KAGRA: Calibration, detector characterization, physical environmental monitors, and the geophysics interferometer. Prog. Theor. Exp. Phys. 2021, 05A102. https://doi.org/10.1093/ptep/ptab018 (2021).
Abe, H. et al. Performance of the KAGRA detector during the first joint observation with GEO600 (O3GK). Preprint at http://arxiv.org/abs/2203.07011 (2022).
Abbott, R. et al. First joint observation by the underground gravitational-wave detector, KAGRA, with GEO600. Preprint at http://arXiv.org/2203.07011 (2022).
Zevin, M. et al. Gravity Spy: Integrating advanced LIGO detector characterization, machine learning, and citizen science. Class. Quant. Gravity 34, 064003. https://doi.org/10.1088/1361-6382/aa5cea (2017).
Bahaadini, S. et al. Machine learning for Gravity Spy: Glitch classification and dataset. Inf. Sci. 444, 172–186. https://doi.org/10.1016/j.ins.2018.02.068 (2018).
Soni, S. et al. Discovering features in gravitational-wave data through detector characterization, citizen science and machine learning. Class. Quant. Gravity 38, 195016. https://doi.org/10.1088/1361-6382/ac1ccb (2021).
Bahaadini, S. et al. Deep multi-view models for glitch classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2931–2935. https://doi.org/10.1109/ICASSP.2017.7952693 (2017).
Robinet, F. et al. Omicron: A tool to characterize transient noise in gravitational-wave detectors. SoftwareX 12, 100620. https://doi.org/10.1016/j.softx.2020.100620 (2020).
Chatterji, S., Blackburn, L., Martin, G. & Katsavounidis, E. Multiresolution techniques for the detection of gravitational-wave bursts. Class. Quant. Gravity 21, S1809. https://doi.org/10.1088/0264-9381/21/20/024 (2004).
Bini, S. Unsupervised Classification of Short Transient Noise to Improve Gravitational Wave Detection (2020). https://etd.adm.unipi.it/t/etd-08302020-184201/. Accessed 26 Apr 2022.
Bahaadini, S. et al. Direct: Deep discriminative embedding for clustering of LIGO data. In 2018 25th IEEE International Conference on Image Processing (ICIP) 748–752. https://doi.org/10.1109/ICIP.2018.8451708 (2018).
George, D., Shen, H. & Huerta, E. Classification and unsupervised clustering of LIGO data with deep transfer learning. Phys. Rev. D 97, 101501. https://doi.org/10.1103/PhysRevD.97.101501 (2018).
Shiping, W., Jinyu, C., Qihao, L. & Wenzhong, G. An overview of unsupervised deep feature representation for text categorization. IEEE Trans. Comput. Soc. Syst. 6, 504–517. https://doi.org/10.1109/TCSS.2019.2910599 (2019).
Wenzhong, G., Jinyu, C. & Shiping, W. Unsupervised discriminative feature representation via adversarial auto-encoder. Appl. Intell. 50, 1155–1171. https://doi.org/10.1007/s10489-019-01581-7 (2020).
Jinyu, C., Shiping, W. & Wenzhong, G. Unsupervised embedded feature learning for deep clustering with stacked sparse auto-encoder. Expert Syst. with Appl. 186, 115729. https://doi.org/10.1016/j.eswa.2021.115729 (2021).
Jinyu, C., Shiping, W., Chaoyang, X. & Wenzhong, G. Unsupervised deep clustering via contractive feature representation and focal loss. Pattern Recogn. 123, 108386. https://doi.org/10.1016/j.patcog.2021.108386 (2022).
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. 2nd Int. Conf. on Learn. Represent. (ICLR2014). Preprint at http://arxiv.org/abs/1312.6114 (2013).
Kingma, D. P. & Welling, M. An introduction to variational autoencoders. Found. Trends Mach. Learn. 12, 307–392. https://doi.org/10.1561/2200000056 (2019).
Ji, X., Henriques, J. F. & Vedaldi, A. Invariant information clustering for unsupervised image classification and segmentation. Proc. IEEE Int. Conf. Comput. Vis. 2019, 9865–9874. https://doi.org/10.1109/ICCV.2019.00996 (2019).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. 3rd Int. Conf. on Learn. Represent. (ICLR2015). Preprint at http://arxiv.org/abs/1412.6980 (2014).
Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).
Christian, S. et al. Going deeper with convolutions. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1–9. https://doi.org/10.1109/CVPR.2015.7298594 (2015).
Kaiming, H., Xiangyu, Z., Shaoqing, R. & Jian, S. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778. https://doi.org/10.1109/CVPR.2016.90 (2016).
Karen, S. & Andrew, Z. Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. on Learn. Represent. (ICLR2015). Preprint at http://arxiv.org/abs/1409.1556 (2014).
Jia, D., Wei, D., Richard, S., Li-Jia L., Kai, L. & Li, F. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255. https://doi.org/10.1109/CVPR.2009.5206848 (2009).
Lee, D.-H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. Workshop on Challenges in Representation Learning, ICML, Vol. 3, 896 (2013).
Brown, J. C. Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89, 425–435. https://doi.org/10.1121/1.400476 (1991).
Zhong, G., Wang, L.-N., Ling, X. & Dong, J. An overview on data representation learning: From traditional feature learning to recent deep learning. J. Financ. Data Sci. 2, 265–278. https://doi.org/10.1016/j.jfds.2017.05.001 (2016).
Bengio, Y., Courville, A. & Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828. https://doi.org/10.1109/TPAMI.2013.50 (2013).
Arthur, D. & Vassilvitskii, S. \(k\)-means++: The advantages of careful seeding. Proc. Annu. ACM-SIAM Symp. Discr. Algorithms 07, 1027–1035 (2007).
Bezdek, J. C., Ehrlich, R. & Full, W. Fcm: The fuzzy \(c\)-means clustering algorithm. Comput. Geosci. 10, 191–203. https://doi.org/10.1016/0098-3004(84)90020-7 (1984).
Radwan, A. et al. \(x\)-means clustering for wireless sensor networks. J. Robot. Netw. Artif. Life 7, 111–115. https://doi.org/10.2991/jrnal.k.200528.008 (2020).
Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 17, 395–416. https://doi.org/10.1007/s11222-007-9033-z (2007).
Acknowledgements
We are grateful to the members of the Gravity Spy project for enlightening discussions. This study was supported in part by the Inter-University Research Program of Institute for Cosmic Ray Research, University of Tokyo, Japan. It was also supported in part by the Japan Society for the Promotion of Science (JSPS) Grants-in-Aid for Scientific Research on Innovative Areas, Grant No. 24103005 [JP17H06358, JP17H06361, and JP20H04731], by JSPS Core-to-Core Program A, Advanced Research Networks, and by JSPS KAKENHI [Grant No. 19H0190 (YI and HT) and Nos. 19K14636 and 21H05599 (Y. Shikano)], and by JST, PRESTO [Grant No. JPMJPR20M4 (Y. Shikano) ].
Author information
Authors and Affiliations
Contributions
Y.Sakai, G.U., and H.T. conceptualised the study; Y.Sakai, G.U., Y.Sikano, T.U., and H.T. framed the methodology; Y.Sakai, G.U., and H.T. concentrated on the software development and analysis; Y.I., P.J., K.K., C.K., K.T.N., S.O., Y.Shikano, T.U., T.W., T.Yamamoto, and T.Yokozawa performed the validation and investigation; K.K., C.K., S.O., T.W., T.Yamamoto, and T.Yokozawa made some critical and technical suggestions; Y.Sakai, G.U., and H.T. prepared the initial draft of the manuscript; all the authors, mainly Y.Sakai, Y.I., Y.Shikano, H.T., and T.U. wrote, reviewed, and edited the manuscript; H.T. supervised the study; H.T. and T.U. administered the project. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sakai, Y., Itoh, Y., Jung, P. et al. Unsupervised learning architecture for classifying the transient noise of interferometric gravitational-wave detectors. Sci Rep 12, 9935 (2022). https://doi.org/10.1038/s41598-022-13329-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-022-13329-4










