Unsupervised learning architecture for classifying the transient noise of interferometric gravitational-wave detectors

Sakai, Yusuke; Itoh, Yousuke; Jung, Piljong; Kokeyama, Keiko; Kozakai, Chihiro; Nakahira, Katsuko T.; Oshino, Shoichi; Shikano, Yutaka; Takahashi, Hirotaka; Uchiyama, Takashi; Ueshima, Gen; Washimi, Tatsuki; Yamamoto, Takahiro; Yokozawa, Takaaki

doi:10.1038/s41598-022-13329-4

Download PDF

Article
Open access
Published: 15 June 2022

Unsupervised learning architecture for classifying the transient noise of interferometric gravitational-wave detectors

Yusuke Sakai¹,
Yousuke Itoh^2,3,
Piljong Jung⁴,
Keiko Kokeyama⁵,
Chihiro Kozakai⁶,
Katsuko T. Nakahira⁷,
Shoichi Oshino⁸,
Yutaka Shikano^9,10,11,
Hirotaka Takahashi^1,12,13,
Takashi Uchiyama⁸,
Gen Ueshima⁷,
Tatsuki Washimi⁶,
Takahiro Yamamoto⁸ &
…
Takaaki Yokozawa⁸

Scientific Reports volume 12, Article number: 9935 (2022) Cite this article

3342 Accesses
18 Citations
4 Altmetric
Metrics details

Subjects

Abstract

In the data obtained by laser interferometric gravitational wave detectors, transient noise with non-stationary and non-Gaussian features occurs at a high rate. This often results in problems such as detector instability and the hiding and/or imitation of gravitational-wave signals. This transient noise has various characteristics in the time–frequency representation, which is considered to be associated with environmental and instrumental origins. Classification of transient noise can offer clues for exploring its origin and improving the performance of the detector. One approach for accomplishing this is supervised learning. However, in general, supervised learning requires annotation of the training data, and there are issues with ensuring objectivity in the classification and its corresponding new classes. By contrast, unsupervised learning can reduce the annotation work for the training data and ensure objectivity in the classification and its corresponding new classes. In this study, we propose an unsupervised learning architecture for the classification of transient noise that combines a variational autoencoder and invariant information clustering. To evaluate the effectiveness of the proposed architecture, we used the dataset (time–frequency two-dimensional spectrogram images and labels) of the Laser Interferometer Gravitational-wave Observatory (LIGO) first observation run prepared by the Gravity Spy project. The classes provided by our proposed unsupervised learning architecture were consistent with the labels annotated by the Gravity Spy project, which manifests the potential for the existence of unrevealed classes.

Signatures of correlation of spacetime fluctuations in laser interferometers

Article Open access 23 December 2025

A data augmentation model integrating supervised and unsupervised learning for recommendation

Article Open access 10 February 2025

Robust two stages federated learning for sensor based human activity recognition with label noise

Article Open access 18 May 2025

Introduction

Gravitational waves are distortions of the space–time continuum that propagate (with high probability) at the speed of light. They are emitted during events such as the coalescence of compact star binaries and supernova explosions. The first observation of a gravitational wave, which was from the coalescence of a black hole binary, was achieved by the Laser Interferometer Gravitational-wave Observatory (LIGO)¹ located in Livingston, Louisiana and Hanford, Washington in the USA in September 2015². Subsequently, LIGO and Virgo³ in Europe made three international joint observation runs and observed as many as 90 events of gravitational waves emitted by the coalescence of compact binaries^4,5,6,7. Moreover, GEO600⁸, in Germany and KAGRA^9,10,11,12, in Japan, made a 2-week observation run (O3GK) in April 2020^13,14. The subsequent fourth observation run (O4) is planned to be conducted jointly with LIGO, Virgo, and KAGRA.

When searching for a gravitational wave signal in the data from the interferometers, suitable techniques for separating the gravitational waves from instrumental noise in the observed data are essential because the signals of the gravitational waves are generally smaller than the detector noise. The gravitational-wave detector is sensitive to environmental and instrumental states (such as ground motions, air pressure, optics suspensions, fluctuations in the laser, vacuum, and mirror). Consequently, non-stationary and non-Gaussian noise, called “transient noise”, frequently appears in the detector. Transient noise causes instability in the detector and the hiding and/or imitating of the gravitational-wave signals. The LIGO and Virgo collaboration reported that transient noise with a signal-to-noise ratio $> 6.5$ occurred at a rate of 1.10 events per minute at LIGO Livingston (LLO) in the first half of the third observation run (O3a) between 1 April 2019, 15:00 UTC and 1 October 2019, 15:00 UTC⁵, and at a rate of 1.17 events per minute at LLO in the second half of O3 (O3b) between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC⁷, respectively.

Transient noise has various time–frequency characteristics that are related to its causes in the detector. Classifying transient noise could provide us with clues to explore its origins and improve the performance of the detector. Among others, the Gravity Spy project^15,16,17,18 is one such effort to classify transient noise. The Gravity Spy project used the Omicron software¹⁹ to identify the signal of transient noise observed in the time-series data. Thereafter, Omega Scan²⁰ was used to create a time–frequency spectrogram around the identified transient noise as two-dimensional (2D) images. Based on a part of these created 2D images, using cloud resources in collaboration with LIGO detector characterisation experts and volunteer citizen scientists for the analysis, 22 types of labels associated with the characteristics or causes of transient noise were annotated. Both images and labels were recorded. Finally, they classified the transient noise in the remaining images by supervised learning using the pre-classified images and labels. As this process shows, the data annotation for machine learning is highly labour-intensive.

Previous studies²¹ using unsupervised classification grouped together similar transient noise in the Gravity Spy dataset¹⁶. Bahaadini et al. used the DIRECT method²² to analyse the feature embedding learned from the Gravity Spy dataset¹⁶ and observed a different class of transient noise from the existing classes. Unsupervised clustering applying transfer learning²³ exhibited a new class of transient noise in addition to the 22 classes of the Gravity Spy project. Moreover, supervised classification using the latest observation O3 dataset presented a new class of transient noise¹⁷.

As unsupervised learning does not require any pre-assigned labels for the training dataset, this architecture is expected to reduce annotation work for the training data, increase the objectivity of the classification, and even classify a new class of the transient noise. Unsupervised learning is also useful in various fields, such as text categorisation, feature representation, and clustering^24,25,26,27. In this study, we focus on unsupervised learning using a deep convolutional neural network (CNN) and propose a classification architecture for transient noise. Our proposed architecture consists of two processes: feature learning and classification. In the feature learning process, the features of transient noise are extracted from the time–frequency spectrogram images (2D images) using a variational autoencoder (VAE)^28,29. In the classification process, invariant information clustering (IIC)³⁰ is used to classify images of the transient noise using features extracted by the encoder of the pre-learned VAE. We applied the proposed architecture to the dataset¹⁶ created by the Gravity Spy project of the LIGO observation run 1 (O1)⁴ as our input images, examined the validity of the unsupervised classification result, and analysed the correspondence with the labels of the Gravity Spy project.

Results

The result section consists of two subsections: the results of the training process and evaluation of the unsupervised learning architecture. The Gravity Spy dataset of LIGO O1, which was developed by the Gravity Spy project shown in Fig. 1, was used for training in our proposed architecture. This dataset contains a total of 8535 transient noises in four time durations: 0.5, 1.0, 2.0, and 4.0 s. Each data unit has a label with one of the 22 types which are related to the origins or characteristics of the transient noise. The labels annotated by the Gravity Spy project under Zooniverse, which is the online citizen science platform, were used only when evaluating the training results of the proposed architecture. In addition, the pre-processing of the dataset is shown in “Pre-processing” section.

Training process of our architecture

We investigated the training parameters to use for the VAE as follows. The dimensions of the feature variable $\varvec{z}$ were 64, 128, 256, 512, and 1024; the training size rate was in the range of [0.6, 0.9] in increments of 0.1; the learning rate using the Adam³¹ optimiser with parameters $\beta _1 = 0.9$, $\beta _2 = 0.999$ (coefficients used for computing running averages of gradient and its square) and $\epsilon =10^{-8}$ (term added to the denominator to improve numerical stability) was in the range of $[5\times 10^{-7}, 5 \times 10^{-2}]$ in increments of one digit; the minibatch size was in the range of [32, 128] in increments of 32. The maximisation of the lower bound (3) (i.e. let $\delta = -\sum _{i}^ {N}\mathcal {L}(\varvec{x}^{(i)}, \varvec{\theta }, \varvec{\phi })$) was used as a training objective, and the minimisation of $\delta$ was used for training. The value of $\delta$ does not have a significant effect on the dimension of $\varvec{z}$ and the training size rate. By contrast, the learning rate and minibatch size are related to the value of $\delta$ and its stability. The representative parameters for training are shown on the left side of Fig. 2a, and the training curves using these parameters are shown in Fig. 2b. Considering Case 1 (black line in Fig. 2b), the learning rate seems too low and $\delta$ does not decrease. Regarding Case 2 (grey line), the result of the training is not stable, showing the fluctuation in the curve, although $\delta$ has decreased compared with Case 1. In Case 3 (blue line), $\delta$ decreases in both the training and evaluation and seems stable after 100 epochs. Considering these results, for the remainder of the study, the parameters of Case 3 were utilised in the proposed architecture.

Examples of the reconstructed images of the transient noise generated by the decoder of the VAE at 100 epochs are shown in Fig. 2c. The characteristics of the reconstructed images seem similar to those of the input images. We confirmed a similar tendency for all the other inputs and reconstructed images. Therefore, the encoder of the VAE at 100 epochs was applied to the IIC for the classification of the transient noise.

Furthermore, the validity of the features by VAE is shown in Supplemental Material “Feature Visualization of Transient Noise using t-SNE” section by visualised features $\varvec{z}$, which are projected using t-SNE.

After training the VAE, the training parameters of IIC were also investigated using the pre-trained encoder. The output classes were in the range of [22, 100] in increments of 2; the output over the classes was in range of [50, 500] in increments of 50; the classifier number was one of 3, 5, 10, 20; the learning rate of the Adam optimiser with parameters $\beta _1 = 0.9$, $\beta _2 = 0.999$ and $\epsilon =10^{-8}$ was in the range of $[5\times 10^{-7}, 5 \times 10^{-2}]$ in increments of one digit; the minibatch size was in the range of [64, 256] in increments of 32. Owing to the training, the mutual information from (4) was high, between 30 and 40 output classes, which is consistent with the fact that the subclasses are implied in the dataset. When the output over classes and the classifiers change, the mutual information does not seem to change. In this study, the IIC parameters shown on the left side of Fig. 2a were used for the classification. In addition, considering the parameters of spectral clustering with multiple classifiers, the number of classifiers $K=5$, and the number of classes $C=36$. These values are the optimal performance for classification using the accuracy shown in “Discussion” section. The training for the VAE and IIC with a 128 mini-batch size took approximately 1.0 h/100 epochs and approximately 0.3 h/100 epochs, respectively, using two NVIDIA GeForce RTX 2080 Ti GPUs, an Intel Xeon CPU E5-2637 v4 (core 8), and 125 GB of main memory.

Evaluation of our architecture

The evaluation results are presented in this section. The proposed architecture shown in “Proposed architecture” section was trained using the pre-processing dataset described in “Pre-processing” section.

Figure 3 shows a randomly selected image from each class (representative image) and similar images that have a high degree of similarity to the representative image in a class. These similar images are derived from the cosine similarity³² between the representative image and the other images, using an affinity matrix which is calculated by spectral clustering.

The representative images seem to have different characteristics for each class, and similar images are close to their representative images. Moreover, the image of class (15) in Fig. 3 shows that the classifier recognises the same class even if the data are shifted in the time direction. Therefore, training that does not depend on the perturbation in the time duration is achieved by pre-processing the dataset.

To investigate the correspondence between the results of supervised and unsupervised learning, a confusion matrix using the Gravity Spy labels is shown in Fig. 4. Considering the classes (classes (6) (“1080 Lines”), (8) (“Repeating_Blips”), (14) (“Chirp”), (18) (“Helix”), and (24) (“Scratchy”)), unsupervised learning classifies the Gravity Spy labels (noted parentheses) as one class. In addition, the output images by the classifier shown in Fig. 3 are similar to those of the Gravity Spy labels.

The “Scattered_Light” class is separated into classes (2), (3), (11), and (16) on the confusion matrix, respectively. These classes are classified into different classes on unsupervised learning, whereas their characteristics are similar to Fig. 3. A previous study¹⁷ on supervised learning with the Gravity Spy labels indicated the existence of a subclass that might be in the “Scattered_Light” class. The unsupervised classification yielded the same results as in the previous study, indicating the existence of a subclass of the “Scattered_Light” class.

Considering the “Blip” and “Koi_Fish” classes, both classes are separated into multiple classes as shown in Fig. 4. The representative images and their similarity images from separating the classes are shown in Fig. 5, where the similarity images are sorted in descending order and are sampled randomly from the cosine similarity to the representative image. Each separating class is grouped into its own class, even for images with low cosine similarity. The images of the classes separated from “Blip” have a common Gravity Spy label. Moreover, the frequency growth of the spectrogram image for classes (9), (20), and (30) looks roughly similar, and the unsupervised classification classifies each class using their characteristics details. Similar results can be observed in “Koi_Fish” (class(5) and class(7)). Therefore, the images of “Blip” or “Koi_Fish” may be classified into more detailed subclasses.

The “Paired_Doves”, “Wandering_Line”, and “Air_Compressor” classes are a few of the samples in the dataset (Fig. 1b). “Air_Compressor” is classified into one class; however, the other classes are not classified into any unique classes in the unsupervised classification. We assume that “Air_Compressor” is a class that cannot be divided further. Therefore, it is classified into one class, even with few data. Conversely, “Paired_Doves” and “Wandering_Line” are assumed to have more subclasses. The reason why they are not classified into a specific class can be explained by the fact that a limited amount of transient noise is classified into “Paired_Doves” and “Wandering_Line”.

The “None_of_the_Above” class of the O1 dataset comprises data that do not belong to any other Gravity Spy labels. The unsupervised classification does not classify these data into unique classes; instead, it distributes them into various class types. This result is consistent with a previous study by Bahaadini et al.¹⁶. In fact, Soni et al.¹⁷ used the O3 dataset⁵ and reported that several of the “None_of_the_Above” appear in the “Blip” class or the new population of “Scattering_Light”. A similar classification result is expected when applying our architecture to the O3 dataset and retraining it.

Based on the above results, the data of the Gravity Spy labels that are classified into multiple classes in unsupervised classification are shown in grey in the “Estimated number of class” in Fig. 4. These data that are separated from the Gravity Spy labels may imply the existence of subclasses.

Discussion

Let the number of Gravity Spy classes (labels) be $C^{\prime } = 22$ and the classified result (vector) whose unsupervised class is the i-th class be $\varvec{v}^{(i)} \in {\mathbb {R}}^{C^{\prime }}$, where $i \in c = \{0, \dots , 35\}$. Alternatively, $v^{(i)}_j$, indicating the j-th component of $\varvec{v}^{(i)}$, is the number of the j-th images, and the Gravity Spy label is classified as the i-th unsupervised class. The total number of classified i-th unsupervised classes is expressed by the $L^1$ norm³² of $\varvec{v}^{(i)}$ (i.e. $|\varvec{v}^{(i)}|_1 = \sum _{j=1}^{C^{\prime }}|v^{(i)}_j|$). The j-th component of a normalised vector $\varvec{v}^{(i)} / |\varvec{v}^{(i)}|_1$ is the ratio of the j-th image of the Gravity Spy label on the i-th unsupervised class. Therefore, we define the accuracy of unsupervised learning as

$$\begin{aligned} A = \sum _{i=1}^{C} \frac{\text {max} (\varvec{v}^{(i)})}{|\varvec{v}^{(i)}|_1}. \end{aligned}$$

(1)

It should be noted that the confusion matrix shown in Fig. 4 is not a square matrix, and its indices of unsupervised labels (columns) depend on the initial values of training. Therefore it is difficult to define the evaluation indicators, such as recall, precision, and F-measure. The accuracy of the proposed architecture was 90.9%, where the total number of unsupervised classes was set to $C=36$. Comparatively, although (1) is a slightly different definition from the usual definition of the accuracy of supervised learning, the supervised learning of the Gravity Spy project¹⁵ achieved 97.1% accuracy on the testing data using the same dataset as that used here. Furthermore, we compared our results with those (shown in Table I of reference²³) of different CNN models, such as Google Inception³³ (with versions 2 and 3), Microsoft ResNet³⁴, VGG³⁵ (with 16 and 19 layers), and the retrained CNN model based on the Gravity Spy project^15,18. Google Inception, ResNet, and VGG are the most popular image recognition architectures, all of which were submitted to the ILSVRC competitions³⁶. Note that all models used the same dataset (Gravity Spy dataset of LIGO O1). The accuracy was more than 96% for all models. Although the accuracy of our model is less than the that of above models, unsupervised learning has the advantage that data annotations are not required, and our model has the potential to suggest the existence of subclasses, as shown in “Evaluation of our architecture” section.

Let us now examine the classification results in Fig. 4, one of the factors that decrease the accuracy of unsupervised learning in (1). The representative images of the major characteristics and images of their low similarities are shown in Fig. 6. Considering classes (0) and (35), the classifier is able to identify the global features of images because the images are similar to the representative images that also exist in the data of other Gravity Spy labels. Regarding classes (13) and (34), the classifier cannot recognise the images properly and may be learning the background features. This problem can be solved by adjusting the neural-network configuration. Moreover, regarding class (26), it is observed that the minor images (such as “Power_Line”) are mixed with the major class (“Air_Compressor”). The same result can also be observed for class (32). Because the characteristics of both images are similar, it is possible that both noises have similar characteristics. Additionally, a comparison of the classification results shown in Fig. 4 with the feature visualisation using t-SNE is discussed in Supplemental Material “Feature Visualization of Transient Noise using t-SNE” section. Based on the above results, we can confirm the consistency between the label annotated by the Gravity spy project and the class provided by our proposed unsupervised learning architecture and provide the potential for the existence of the unrevealed classes.

Subsequently, we will build a system for the classification of transient noise using the proposed architecture in KAGRA. In addition, we will extend our architecture to self-supervised learning³⁷ to enhance the accuracy of the classification. This algorithm trains the data of a specific label, known as the golden set¹⁵, which generates a pseudo label to the given dataset and retrains it. Using the new classes classified by unsupervised learning, the semi-supervised learning can help reduce the annotation process for the training and can solve the problem of ensuring objectivity in the classification. We would like to construct a semi-supervised architecture that incorporates the advantages of both Gravity Spy’s supervised and unsupervised learning.

Methods

The proposed unsupervised learning method consists of two architectures: a variational autoencoder (VAE) and invariant information clustering (IIC). The VAE is used to learn the features from the time–frequency spectrogram (2D images) of transient noise, and the IIC classifies the transient noise from the features that are learned by the encoder of the VAE. Before we present the details of the method, we explain the target dataset.

Target dataset

The Gravity Spy dataset¹⁶, which is the input dataset, is an image set of transient noise obtained from the LIGO O1⁴. Omicron software¹⁹ searches for transient noise in time-series data, and Omega Scan²⁰ software generates an image of the time–frequency spectrogram of each transient noise using Q-transformation^20,38. Q-transformation is a method that estimates the frequency component of the time-series data by setting a window function on each time–frequency component, generating a 2D image of the time–frequency spectrogram. The spectrogram image of each transient noise in the Gravity Spy dataset has four time durations (0.5, 1.0, 2.0, and 4.0 s) at the centre, as shown in Fig. 1a. In addition, these transient noises are given 22 labels, which are related to cause as shown in Fig. 1b. For example, the images of 12 classes of transient noise are shown in Fig. 1c.

Pre-processing

The pre-processing applied to the Gravity Spy dataset for the training of our proposed architecture is shown in Fig. 7.

1.
For each transient noise, stack the images of the time–frequency spectrograms with the four time widths shown in Fig. 7, and use it as the input data for this transient noise. The resolution of the transient noise image for each time duration is 224 px $\times$ 272 px (frequency and temporal direction, respectively), and the dimensions of the stacked images are 4 $\times$ 224 $\times$ 272 px.
2.
Convert the stacked data into two types:
Input Image:

: Crop the left and right parts of the image equally such that the resulting image has dimensions of 4 $\times$ 224 px $\times$ 224 px

Perturbed Image:

: Crop the left part of the image at the randomly time-shifted position in the range 0–24 px and also crop the right part of the image so that the resulting image has dimensions of 4 $\times$ 224 px $\times$ 224 px

Considering the characteristics of the time–frequency spectrogram, a small displacement in the time direction does not change its physical characteristics because this operation can be interpreted as a change in the event time. Therefore, the time-shifted images can be regarded as new events of transient noise, and it makes the architecture realise the classification of transient noise that does not depend on small displacements in the time direction. Conversely, a possible small displacement of the spectrogram in the frequency direction changes its physical characteristics. Therefore, the frequency-shifted images fall into different classes to that of the original image in the classification. Thus, the perturbation of transient noise is not applied in the frequency direction; nonetheless, they are applied only in the time direction.

In the training process of the proposed architecture, there is a random time shift of the image in the 0–24 px range used for the training data. The data that were cropped without a time shift were used for the evaluation of the VAE and the input image of the IIC.

Variational autoencoder

In this study, the features of transient noise are obtained from their time–frequency 2D spectrogram image using VAE, one of the approaches for feature learning^39,40 using convolutional deep learning. Generally, feature learning is a method for acquiring features that are effective for the prediction and classification of data. It also has the ability to convert high-dimensional data to low-dimensional features.

Let the input dataset be $\mathcal {D} = \{ \varvec{x}^{(1)}, \dots , \varvec{x}^{(N)} | \varvec{x}^{(i)} \in {\mathbb {R}}^D, i = 1, \cdots , N\}$ and the marginal likelihood for $\mathcal {D}$ be $p_{\varvec{\theta }} (\varvec{x}^{(1)}, \dots , \varvec{x}^{(N)})$, where D is the dimension number, N is the number of the input data, and $\theta$ are parameters for the architecture. The objective of the learning is to maximise the marginal likelihood. When the dataset $\mathcal {D}$ is independent and identically distributed, the log marginal likelihood becomes $\sum _{i=1}^N\text {ln}p_{\varvec{\theta }}({\varvec{x}^{(i)}})$. Consider that the inference architecture $q_{\varvec{\phi }}(\varvec{z}|\varvec{x}^{(i)})$ (also known as encoder) approximates $q_{\varvec{\phi }}(\varvec{z}|\varvec{x}^{(i)})\simeq p_{\varvec{\theta }}(\varvec{z}|\varvec{x}^{(i)})$, where $\varvec{z} \in {\mathbb {R}}^J$ is a feature variable and $J < D$. Therefore, the log marginal likelihood $\text {ln}p_{\varvec{\theta }}(\varvec{x}^{(i)})$ can be expressed as

$$\begin{aligned} \text {ln}p_{\varvec{\theta }}(\varvec{x}^{(i)}) = \ln \int {p_{\varvec{\theta }}(\varvec{x}^{(i)}, \varvec{z})} d\varvec{z} \ge \int {q_{\varvec{\phi }}(\varvec{z}|\varvec{x}^{(i)})\ln \left( \frac{p_{\varvec{\theta }}(\varvec{x}^{(i)}, \varvec{z})}{q_{\varvec{\phi }}(\varvec{z}|\varvec{x}^{(i)})} \right) } d\varvec{z} \equiv \mathcal {L}(\varvec{x}^{(i)}, \varvec{\theta }, \varvec{\phi }). \end{aligned}$$

(2)

The second inequality is obtained by the Jensen’s inequality, and $\mathcal {L}(\varvec{x}^{(i)}, \varvec{\theta }, \varvec{\phi })$ is an objective function known as the lower bound. Let a prior and a posterior distribution of $\varvec{z}$ be a multivariate Gaussian distribution, indicating that $p_{\varvec{}{\theta }}(\varvec{z})= \mathcal {N}(\varvec{z}|\varvec{0}, \varvec{I})$ and $q_{\varvec{\phi }}(\varvec{z}|\varvec{x}^{(i)})= \mathcal {N}(\varvec{z}|\varvec{\mu _\phi }(\varvec{x}^{(i)}),\varvec{\Sigma _\phi }^{2}(\varvec{x}^{(i)})\varvec{I})$, where $\varvec{\mu _\phi }(\cdot )$ and $\varvec{\Sigma _\phi }(\cdot )$ are the outputs from an encoder and $\varvec{I}$ is the identity matrix of dimension J. Let a posterior distribution of $\varvec{x}$ be the multivariate Bernoulli distribution, $p_{\varvec{\theta }}(\varvec{x}^{(i)}|\varvec{z})= \text {bern}({\varvec{x}}^{(i)}|\varvec{g}_{\varvec{\theta }}(\varvec{z}))$, where $\varvec{g}_{\varvec{\theta }}(\cdot )$ are the outputs from the decoder. Thus, the expression of the lower bound to be maximised is

$$\begin{aligned} \mathcal {L}(\varvec{x}^{(i)}, \varvec{\theta }, \varvec{\phi })\simeq -D_{\text {KL}}\left( \mathcal {N}(\varvec{z}|\varvec{\mu _\phi }(\varvec{x}^{(i)}),\varvec{\Sigma _\phi }^{2}(\varvec{x}^{(i)})\varvec{I}) || \mathcal {N}(\varvec{z}|\varvec{0}, \varvec{I}) \right) + \frac{1}{L}\sum _{l=1}^L\ln \text {bern}({\varvec{x}}^{(i)}| \varvec{z}^{(i,l)}), \end{aligned}$$

(3)

where $D_{\text {KL}}[\cdot || \cdot ]$ is the Kullback–Leibler divergence of two distributions and $\varvec{z}^{(i,l)}$ is referred to as the reparameterisation trick, such that $\varvec{z}^{(i,l)} = \varvec{g}_{\varvec{\phi }}(\varvec{\epsilon }^{(l)}, \varvec{x}^{(i)}) = \varvec{\mu }_{\varvec{\phi }}(\varvec{x}^{(i)}) + \varvec{\epsilon }^{(l)}\odot \varvec{\Sigma }_{\varvec{\phi }}(\varvec{x}^{(i)})$, where $\varvec{\epsilon }\sim \mathcal {N}(\varvec{0}, \varvec{I})$, and $\odot$ signifies the Hadamard product.

Classification using invariant information clustering

A typical method for clustering is the k-means, which uses the Euclidean distances between data. Recently, several variants of the k-means have been developed (e.g. k-means++⁴¹, fuzzy c-means⁴², and x-means⁴³). Regarding clustering in a high dimensional space, the variance of the distance between data becomes small owing to the “curse of dimensionality”. Alternatively, IIC³⁰, which is a classification method, seems to be effective because it does not use the distances of the data for learning. In this study, transient noise is classified using IIC by maximising the mutual information. Let $\varvec{x}\in {\mathbb {R}}^D$ be the input data, $\varvec{x}^{\prime }$ be the perturbed data of $\varvec{x}$, C the number of output classes, and $\varvec{\Phi }(\varvec{x}) \in {\mathbb {R}}^C$ be a classifier in which the output layer of the classifier uses the SoftMax activation function. Consider a pair of cluster assignments for two inputs, $\varvec{x}$ and $\varvec{x}^{\prime }$. Their conditional joint distributions and marginal distributions are $P_{ij} = \varvec{\Phi }(\varvec{x}^{(i)}) \cdot \varvec{\Phi }(\varvec{x}^{(j)\prime })^{\text {T}}$ and $P_{i} = \sum _j\varvec{\Phi }(\varvec{x}^{(i)}) \cdot \varvec{\Phi }(\varvec{x}^{(j)\prime })^{\text {T}}$, respectively, where the superscript $\text {T}$ denotes the transpose. The objective for the maximisation of the mutual information is expressed as

$$\begin{aligned} \max _{\varvec{\Phi }}I(\varvec{\Phi }(\varvec{x}), \varvec{\Phi }(\varvec{x}^\prime )) = \sum _i^C\sum _j^C P_{ij}\ln \frac{P_{ij}}{P_i P_j}. \end{aligned}$$

(4)

To improve the performance of the classifier, auxiliary over-clustering³⁰ is also used when calculating the mutual information. This over-clustering formula is the same as (4), except for $\varvec{\Phi }(\varvec{x}) \in {\mathbb {R}}^W$, where $C < W$.

Proposed architecture

We propose the unsupervised classification architecture shown in Fig. 8. It is a deep learning architecture that trains time–frequency 2D spectrogram images of transient noise. Considering the proposed architecture, the feature variables of the input image $\varvec{x}$ and its perturbation image $\varvec{x}^\prime = \varvec{\xi }(\varvec{x})$ are extracted by a pre-trained encoder of the VAE. The perturbation $\varvec{\xi }$ is a transformation that does not change the information required for the classification (see “Pre-processing” section). Subsequently, the IIC learns to maximise the mutual information $I(\varvec{\Phi }(\varvec{z}), \varvec{\Phi }(\varvec{z}^{\prime }))$, which is composed of a pair of feature variables $(\varvec{z} = \varvec{\mu }_{\varvec{\phi }}(\varvec{x}), \varvec{z}^{\prime } = \varvec{\mu }_{\varvec{\phi }}(\varvec{x}^{\prime }))$.

The clustering of the IIC depends on the initial values of the neural networks in which the values are randomly provided. Thus, the classification results from each classifier varies slightly. Regarding the unsupervised learning, it is difficult to apply an ensemble average for each classification result to solve the dependencies of the initial values because the classified labels are random at each time. In this study, spectral clustering⁴⁴ was applied to compress the multiple results of classification into one result. The procedure is as follows:

1.
Let D, K, and C be the number of datasets, number of classifiers, and estimated number of classes, respectively. Create a hypermatrix H whose dimension is $(D, K \times C)$ from each classifier result.
2.
Considering $h^{(i)}$ as a row vector for each data of H, calculate an affinity matrix using the Gaussian kernel, which is given by $\text {exp}(-\Vert h^{(i)}-h^{(j)}\Vert )$.
3.
Compute the spectral clustering to the affinity matrix. Consequently, we have labels for each dataset whose dimension is (D, C).

Data availability

All the results are reported for public data of Gravity Spy project. Data on the results of unsupervised learning are available upon request from Y. Sakai and HT.

Code availability

All the codes developed in this study are available upon request from Y. Sakai and HT.

References

Aasi, J. et al. Classical and quantum gravity. Adv. LIGO 32, 074001. https://doi.org/10.1088/0264-9381/32/7/074001 (2015).
Article CAS Google Scholar
Abbott, B. et al. GW150914: The advanced LIGO detectors in the era of first discoveries. Phys. Rev. Lett. 116, 131103. https://doi.org/10.1103/PhysRevLett.116.131103 (2016).
Article ADS MathSciNet CAS PubMed Google Scholar
Acernese, F. et al. A second-generation interferometric gravitational wave detector. Adv. Virgo Class. Quant. Gravity 32, 024001. https://doi.org/10.1088/0264-9381/32/2/024001 (2015).
Article ADS Google Scholar
Abbott, B. et al. GWTC-1: A gravitational-wave transient catalog of compact binary mergers observed by LIGO and Virgo during the first and second observing runs. Phys. Rev. X 9, 031040. https://doi.org/10.1103/PhysRevX.9.031040 (2019).
Article CAS Google Scholar
Abbott, R. et al. GWTC-2: Compact binary coalescences observed by LIGO and Virgo during the first half of the third observing run. Phys. Rev. X 11, 021053. https://doi.org/10.1103/PhysRevX.11.021053 (2021).
Article CAS Google Scholar
Abbott, R. et al. GWTC-2.1: Deep extended catalog of compact binary coalescences observed by LIGO and Virgo during the first half of the third observing run. Preprint at http://arxiv.org/abs/2108.01045 (2021).
Abbott, R. et al. GWTC-3: Compact binary coalescences observed by LIGO and Virgo during the second part of the third observing run. Preprint at http://arxiv.org/abs/2111.03606 (2021).
Grote, H. et al. The status of GEO 600. Class. Quant. Gravity 25, 114043. https://doi.org/10.1088/0264-9381/21/5/006 (2008).
Article ADS CAS Google Scholar
Akutsu, T. et al. KAGRA: 2.5 generation interferometric gravitational wave detector. Nat. Astron. 3, 35. https://doi.org/10.1038/s41550-018-0658-y (2019).
Article ADS Google Scholar
Akutsu, T. et al. Overview of KAGRA: Detector design and construction history. Prog. Theor. Exp. Phys. 2021, 114043. https://doi.org/10.1093/ptep/ptaa125 (2021).
Article CAS Google Scholar
Akutsu, T. et al. Overview of KAGRA: KAGRA science. Prog. Theor. Exp. Phys. 2021, 05A103. https://doi.org/10.1093/ptep/ptaa120 (2021).
Article CAS Google Scholar
Akutsu, T. et al. Overview of KAGRA: Calibration, detector characterization, physical environmental monitors, and the geophysics interferometer. Prog. Theor. Exp. Phys. 2021, 05A102. https://doi.org/10.1093/ptep/ptab018 (2021).
Article CAS Google Scholar
Abe, H. et al. Performance of the KAGRA detector during the first joint observation with GEO600 (O3GK). Preprint at http://arxiv.org/abs/2203.07011 (2022).
Abbott, R. et al. First joint observation by the underground gravitational-wave detector, KAGRA, with GEO600. Preprint at http://arXiv.org/2203.07011 (2022).
Zevin, M. et al. Gravity Spy: Integrating advanced LIGO detector characterization, machine learning, and citizen science. Class. Quant. Gravity 34, 064003. https://doi.org/10.1088/1361-6382/aa5cea (2017).
Article ADS Google Scholar
Bahaadini, S. et al. Machine learning for Gravity Spy: Glitch classification and dataset. Inf. Sci. 444, 172–186. https://doi.org/10.1016/j.ins.2018.02.068 (2018).
Article MathSciNet Google Scholar
Soni, S. et al. Discovering features in gravitational-wave data through detector characterization, citizen science and machine learning. Class. Quant. Gravity 38, 195016. https://doi.org/10.1088/1361-6382/ac1ccb (2021).
Article ADS CAS Google Scholar
Bahaadini, S. et al. Deep multi-view models for glitch classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2931–2935. https://doi.org/10.1109/ICASSP.2017.7952693 (2017).
Robinet, F. et al. Omicron: A tool to characterize transient noise in gravitational-wave detectors. SoftwareX 12, 100620. https://doi.org/10.1016/j.softx.2020.100620 (2020).
Article Google Scholar
Chatterji, S., Blackburn, L., Martin, G. & Katsavounidis, E. Multiresolution techniques for the detection of gravitational-wave bursts. Class. Quant. Gravity 21, S1809. https://doi.org/10.1088/0264-9381/21/20/024 (2004).
Article ADS MATH Google Scholar
Bini, S. Unsupervised Classification of Short Transient Noise to Improve Gravitational Wave Detection (2020). https://etd.adm.unipi.it/t/etd-08302020-184201/. Accessed 26 Apr 2022.
Bahaadini, S. et al. Direct: Deep discriminative embedding for clustering of LIGO data. In 2018 25th IEEE International Conference on Image Processing (ICIP) 748–752. https://doi.org/10.1109/ICIP.2018.8451708 (2018).
George, D., Shen, H. & Huerta, E. Classification and unsupervised clustering of LIGO data with deep transfer learning. Phys. Rev. D 97, 101501. https://doi.org/10.1103/PhysRevD.97.101501 (2018).
Article ADS CAS Google Scholar
Shiping, W., Jinyu, C., Qihao, L. & Wenzhong, G. An overview of unsupervised deep feature representation for text categorization. IEEE Trans. Comput. Soc. Syst. 6, 504–517. https://doi.org/10.1109/TCSS.2019.2910599 (2019).
Article Google Scholar
Wenzhong, G., Jinyu, C. & Shiping, W. Unsupervised discriminative feature representation via adversarial auto-encoder. Appl. Intell. 50, 1155–1171. https://doi.org/10.1007/s10489-019-01581-7 (2020).
Article Google Scholar
Jinyu, C., Shiping, W. & Wenzhong, G. Unsupervised embedded feature learning for deep clustering with stacked sparse auto-encoder. Expert Syst. with Appl. 186, 115729. https://doi.org/10.1016/j.eswa.2021.115729 (2021).
Article Google Scholar
Jinyu, C., Shiping, W., Chaoyang, X. & Wenzhong, G. Unsupervised deep clustering via contractive feature representation and focal loss. Pattern Recogn. 123, 108386. https://doi.org/10.1016/j.patcog.2021.108386 (2022).
Article Google Scholar
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. 2nd Int. Conf. on Learn. Represent. (ICLR2014). Preprint at http://arxiv.org/abs/1312.6114 (2013).
Kingma, D. P. & Welling, M. An introduction to variational autoencoders. Found. Trends Mach. Learn. 12, 307–392. https://doi.org/10.1561/2200000056 (2019).
Article MATH Google Scholar
Ji, X., Henriques, J. F. & Vedaldi, A. Invariant information clustering for unsupervised image classification and segmentation. Proc. IEEE Int. Conf. Comput. Vis. 2019, 9865–9874. https://doi.org/10.1109/ICCV.2019.00996 (2019).
Article Google Scholar
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. 3rd Int. Conf. on Learn. Represent. (ICLR2015). Preprint at http://arxiv.org/abs/1412.6980 (2014).
Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).
MATH Google Scholar
Christian, S. et al. Going deeper with convolutions. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1–9. https://doi.org/10.1109/CVPR.2015.7298594 (2015).
Kaiming, H., Xiangyu, Z., Shaoqing, R. & Jian, S. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778. https://doi.org/10.1109/CVPR.2016.90 (2016).
Karen, S. & Andrew, Z. Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. on Learn. Represent. (ICLR2015). Preprint at http://arxiv.org/abs/1409.1556 (2014).
Jia, D., Wei, D., Richard, S., Li-Jia L., Kai, L. & Li, F. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255. https://doi.org/10.1109/CVPR.2009.5206848 (2009).
Lee, D.-H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. Workshop on Challenges in Representation Learning, ICML, Vol. 3, 896 (2013).
Brown, J. C. Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89, 425–435. https://doi.org/10.1121/1.400476 (1991).
Article ADS Google Scholar
Zhong, G., Wang, L.-N., Ling, X. & Dong, J. An overview on data representation learning: From traditional feature learning to recent deep learning. J. Financ. Data Sci. 2, 265–278. https://doi.org/10.1016/j.jfds.2017.05.001 (2016).
Article Google Scholar
Bengio, Y., Courville, A. & Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828. https://doi.org/10.1109/TPAMI.2013.50 (2013).
Article PubMed Google Scholar
Arthur, D. & Vassilvitskii, S. $k$-means++: The advantages of careful seeding. Proc. Annu. ACM-SIAM Symp. Discr. Algorithms 07, 1027–1035 (2007).
MathSciNet MATH Google Scholar
Bezdek, J. C., Ehrlich, R. & Full, W. Fcm: The fuzzy $c$-means clustering algorithm. Comput. Geosci. 10, 191–203. https://doi.org/10.1016/0098-3004(84)90020-7 (1984).
Article ADS Google Scholar
Radwan, A. et al. $x$-means clustering for wireless sensor networks. J. Robot. Netw. Artif. Life 7, 111–115. https://doi.org/10.2991/jrnal.k.200528.008 (2020).
Article Google Scholar
Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 17, 395–416. https://doi.org/10.1007/s11222-007-9033-z (2007).
Article MathSciNet Google Scholar

Download references

Acknowledgements

We are grateful to the members of the Gravity Spy project for enlightening discussions. This study was supported in part by the Inter-University Research Program of Institute for Cosmic Ray Research, University of Tokyo, Japan. It was also supported in part by the Japan Society for the Promotion of Science (JSPS) Grants-in-Aid for Scientific Research on Innovative Areas, Grant No. 24103005 [JP17H06358, JP17H06361, and JP20H04731], by JSPS Core-to-Core Program A, Advanced Research Networks, and by JSPS KAKENHI [Grant No. 19H0190 (YI and HT) and Nos. 19K14636 and 21H05599 (Y. Shikano)], and by JST, PRESTO [Grant No. JPMJPR20M4 (Y. Shikano) ].

Author information

Authors and Affiliations

Research Center for Space Science, Advanced Research Laboratories, Tokyo City University, Setagaya-ku, Tokyo, 158-0082, Japan
Yusuke Sakai & Hirotaka Takahashi
Graduate School of Science, Osaka Metropolitan University, Sumiyoshi-ku, Osaka City, Osaka, 558-8585, Japan
Yousuke Itoh
Nambu Yoichiro Institute of Theoretical and Experimental Physics (NITEP), Osaka Metropolitan University, Sumiyoshi-ku, Osaka City, Osaka, 558-8585, Japan
Yousuke Itoh
National Institute for Mathematical Sciences, Daejeon, 34047, Republic of Korea
Piljong Jung
School of Physics and Astronomy, Cardiff University, The Parade, Cardiff, CF24 3AA, UK
Keiko Kokeyama
Gravitational Wave Science Project, Kamioka Branch, National Astronomical Observatory of Japan, Hida City, Gifu, 506-1205, Japan
Chihiro Kozakai & Tatsuki Washimi
Department of Information and Management Systems Engineering, Nagaoka University of Technology, Nagaoka, Niigata, 940-2188, Japan
Katsuko T. Nakahira & Gen Ueshima
Institute for Cosmic Ray Research, KAGRA Observatory, The University of Tokyo, Hida City, Gifu, 506-1205, Japan
Shoichi Oshino, Takashi Uchiyama, Takahiro Yamamoto & Takaaki Yokozawa
Graduate School of Science and Technology, Gunma University, Maebashi, Gunma, 371-8510, Japan
Yutaka Shikano
Institute for Quantum Studies, Chapman University, Orange, CA, 92866, USA
Yutaka Shikano
JST PRESTO, Kawaguchi, Saitama, 332-0012, Japan
Yutaka Shikano
Institute for Cosmic Ray Research, The University of Tokyo, Kashiwa City, Chiba, 277-8582, Japan
Hirotaka Takahashi
Earthquake Research Institute, The University of Tokyo, Bunkyo-ku, Tokyo, 113-0032, Japan
Hirotaka Takahashi

Authors

Yusuke Sakai
View author publications
Search author on:PubMed Google Scholar
Yousuke Itoh
View author publications
Search author on:PubMed Google Scholar
Piljong Jung
View author publications
Search author on:PubMed Google Scholar
Keiko Kokeyama
View author publications
Search author on:PubMed Google Scholar
Chihiro Kozakai
View author publications
Search author on:PubMed Google Scholar
Katsuko T. Nakahira
View author publications
Search author on:PubMed Google Scholar
Shoichi Oshino
View author publications
Search author on:PubMed Google Scholar
Yutaka Shikano
View author publications
Search author on:PubMed Google Scholar
Hirotaka Takahashi
View author publications
Search author on:PubMed Google Scholar
Takashi Uchiyama
View author publications
Search author on:PubMed Google Scholar
Gen Ueshima
View author publications
Search author on:PubMed Google Scholar
Tatsuki Washimi
View author publications
Search author on:PubMed Google Scholar
Takahiro Yamamoto
View author publications
Search author on:PubMed Google Scholar
Takaaki Yokozawa
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.Sakai, G.U., and H.T. conceptualised the study; Y.Sakai, G.U., Y.Sikano, T.U., and H.T. framed the methodology; Y.Sakai, G.U., and H.T. concentrated on the software development and analysis; Y.I., P.J., K.K., C.K., K.T.N., S.O., Y.Shikano, T.U., T.W., T.Yamamoto, and T.Yokozawa performed the validation and investigation; K.K., C.K., S.O., T.W., T.Yamamoto, and T.Yokozawa made some critical and technical suggestions; Y.Sakai, G.U., and H.T. prepared the initial draft of the manuscript; all the authors, mainly Y.Sakai, Y.I., Y.Shikano, H.T., and T.U. wrote, reviewed, and edited the manuscript; H.T. supervised the study; H.T. and T.U. administered the project. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Yusuke Sakai or Hirotaka Takahashi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sakai, Y., Itoh, Y., Jung, P. et al. Unsupervised learning architecture for classifying the transient noise of interferometric gravitational-wave detectors. Sci Rep 12, 9935 (2022). https://doi.org/10.1038/s41598-022-13329-4

Download citation

Received: 19 November 2021
Accepted: 23 May 2022
Published: 15 June 2022
Version of record: 15 June 2022
DOI: https://doi.org/10.1038/s41598-022-13329-4