Interpretable semi-supervised clustering enables universal detection and intensity assessment of diverse aviation hazardous winds

Gao, Hang; Shen, Chun; Wang, Xuesong; Chan, Pak-Wai; Hon, Kai-Kwong; Li, Jianbing

doi:10.1038/s41467-024-51597-y

Download PDF

Article
Open access
Published: 26 August 2024

Interpretable semi-supervised clustering enables universal detection and intensity assessment of diverse aviation hazardous winds

Nature Communications volume 15, Article number: 7347 (2024) Cite this article

4798 Accesses
5 Citations
Metrics details

Subjects

Abstract

The identification of aviation hazardous winds is crucial and challenging in air traffic management for assuring flight safety, particularly during the take-off and landing phases. Existing criteria are typically tailored for special wind types, and whether there exists a universal feature that can effectively detect diverse types of hazardous winds from radar/lidar observations remains as an open question. Here we propose an interpretable semi-supervised clustering paradigm to solve this problem, where the prior knowledge and probabilistic models of winds are integrated to overcome the bottleneck of scarce labels (pilot reports). Based on this paradigm, a set of high-dimensional hazard features is constructed to effectively identify the occurrence of diverse hazardous winds and assess the intensity metrics. Verification of the paradigm across various scenarios has highlighted its high adaptability to diverse input data and good generalizability to diverse geographical and climate zones.

Modeling advanced air mobility aircraft in data-driven reduced order realistic urban winds

Article Open access 03 January 2024

Grasping extreme aerodynamics on a low-dimensional manifold

Article Open access 14 October 2023

WindSeer: real-time volumetric wind prediction over complex terrain aboard a small uncrewed aerial vehicle

Article Open access 25 April 2024

Introduction

Complex winds, such as wind shear^1,2,3, turbulence⁴, and aircraft wake⁵, can severely impact aviation safety, especially during the takeoff and landing phases. According to statistics⁶, over 40% of aviation incidents are mainly caused by complex wind conditions. For example, aircraft wake caused 130 accidents between 1983 and 2000⁵, and wind shear-related accidents have even resulted in more than 1400 fatalities since 1943⁷. Therefore, there is an urgent demand to robustly detect hazardous winds in air traffic management.

However, induced by local topography, surface roughness, and thermodynamics⁸, hazardous winds often involve multi-scale and chaotic dynamics. For example, turbulence can cause stochastic fluctuations in wind speed and direction within meters, whereas wind shear may produce sustained changes up to several nautical miles. Due to the chaotic dynamics of turbulence and the complex interaction between wind and aircraft, hazardous winds may show elusive and high-dimensional characteristics, making it challenging for hazardous winds to be universally detected.

During the past decades, some low-dimensional features have been developed in a hazard type-specific manner. For example, the hazard factor⁹ and severity factor^10,11 (broadly known as F-factor and S-factor, respectively) are proposed for wind shear detection; the eddy dissipation rate¹² and the velocity circulations¹³ were proposed to measure the intensity of turbulence and aircraft wake, respectively. However, since these expertise-reliant¹⁴ features neglected the high-dimensional characteristics of hazardous winds, their robustness is quite limited. Through the combination of low-dimensional hazard features, the integrated methods^15,16 have demonstrated improved performance in detecting mid- and upper-level turbulence. Besides these manually crafted features, recent researches have indicated that machine learning, especially deep learning¹⁷, can contribute to turbulence forecasting^18,19, wake vortex detection^20,21, and wind shear prediction²² by extracting the high-dimensional features from ever-increasing data. Despite their promise, machine learning-based methods still confront potential limitations when applied in air traffic management.

Firstly, compared to the mid-to-upper atmosphere, low-level winds exhibit complex interactions with the underlying surface and local terrains, which could significantly affect aircraft with lower airspeed and altitude during takeoff and landing phases²³. Therefore, it is essential to instantly identify whether a specific wind is hazardous for aviation safety and quantify its hazard intensity. However, existing approaches are tailored to specific types of hazardous winds (such as C_shear, C_turb, and C_wake in Fig. 1a), and the existence of a universal feature capable of detecting diverse types of hazardous winds (C_universal in Fig. 1a) remains as an open question.

**Fig. 1: Illustration of the presented paradigm.**

Secondly, the labels in this task are obtained through pilot reports, which serve as indicators of hazardous conditions to determine whether a wind record measured by remote sensors^24,25 is hazardous to flight. However, the majority of the wind records remain unlabeled because the takeoff and landing flights occupy only a small portion of the full time in airports. Also, the calm (non-hazardous) labels are unavailable since the pilot reports only concern the hazardous winds experienced by aircrafts, while ignoring those with minor or negligible impacts.

Thirdly, current machine learning methods interpret hazardous wind detection as classification¹⁸, regression²⁶, or object detection²¹ tasks, where the models are trained to produce outputs that match the labels. These methods adhere to the conventional machine learning paradigms, and as a result, their interpretability are restricted due to the absence of prior knowledge.

In this work, we validate the hypothesis that different types of hazardous winds and calm winds exhibit distinct intrinsic characteristics through the development of an interpretable semi-supervised clustering paradigm. Our technical contributions are two-fold. Firstly, supposing that various types of hazardous winds and calm winds can be categorized into two distinct classes, we transform the hazardous wind detection challenge into a semi-supervised clustering task. According to the hypothesis, we introduce four pieces of prior knowledge to guide the clustering process in establishing optimal cluster features. The prior knowledge defines that: (1) calm winds should cluster together; (2) hazardous winds should also cluster together; (3) there should be a significant separation between calm winds and hazardous winds; (4) the majority of unlabeled wind records should be identified as calm, considering the infrequency of hazardous winds. Secondly, the scaling parameters and loss terms in the objective function are derived from their probabilistic models in the established cluster feature space (as shown in Fig. 1b), rather than empirically defined in conventional machine learning methods. The integration of the prior knowledge and the probabilistic models facilitates an interpretable and universal paradigm for hazardous wind detection. The paradigm has been validated on a straightforward network comprising several dense layers (as shown in Fig. 1c), which is designed to map the input wind records to feature points within the established latent cluster feature space. We have demonstrated that the extracted cluster features can universally detect and quantify various hazardous wind types in real-world detection scenarios.

Results

Analysis of the extracted cluster features

In this study, we have gathered real-world scenario data from five airports across China. The wind records collected at HKIA (Hong Kong International Airport, represented by the red dot in Fig. 2a) from January 1, 2017 to December 31, 2020, are utilized to train and test the model we proposed. Given that hazardous winds in various scenarios are expected to exhibit similar features that are hazardous for flights, we further verify the model’s generalization capability using wind measurement data from four airports in mainland China (blue stars in Fig. 2a).

The established cluster feature space is shown in Fig. 2b. Within the established cluster feature space, a criteria termed “hazard factor” is defined to assess the intensities of hazardous winds (as shown in Fig. 1d). It is found that the majority of the unlabeled data in the HKIA training dataset (dots “ ⋅ ” in Fig. 2b) have negative hazard factors (refer to the colorbar) and are identified as calm winds. This is reasonable as calm winds occur more frequently than hazardous winds. In the HKIA test dataset, hazardous winds (circles with black edges “∘” in Fig. 2b) predominantly exhibit hazard factors higher than those of the majority of the unlabeled data, indicating that the extracted features can detect hazardous winds with high confidence.

For the hazardous winds collected from other airports, their extracted cluster features are represented by the stars with black edges “☆” in Fig. 2b. It is found that all the extracted features of the hazardous winds are situated in close proximity to the hazardous winds in the HKIA dataset. This finding suggests that hazardous winds in different scenarios share similar inherent characteristics, which can be accurately captured by the model we proposed. Furthermore, it is evident that the majority of the hazard features for other airports correspond to high hazard factor values, indicating that they can be properly detected by the model.

Detection of various hazardous wind types

To evaluate the hazard detection and intensity-assessment performance, a test dataset is established as illustrated in “Methods” section. Comparisons between the proposed method and the existing methods are conducted. The methods taken for comparison include the hazard type-specific criteria, as well as the representative machine learning techniques and operational systems.

The conventional hazard type-specific criteria for hazard detection were designed for specific types of hazards. For wind shear detection, the F-factor⁹ and S-factor¹⁰ are acknowledged as the most promising indicators. For turbulence detection, the eddy dissipation rate (EDR) has gained widespread acceptance. Consequently, the F-factor, S-factor, speed fluctuation¹⁵, and sampling scale-based EDR⁴ are presented as baseline methods. It is widely recognized that the thresholds of traditional criteria are pre-defined and often lack the adaptability for a variety of application scenarios. Consequently, we compare the proposed method (denoted as “ClusterNN”) against the conventional criteria not only upon the pre-defined thresholds, but also upon the optimal thresholds. These optimal thresholds are identified through an analysis of the receiver operating characteristic (ROC) curve derived from the training dataset, and the critical success index (CSI)²⁷ is employed to measure the accuracy of hazard detection. From the comparison in Fig. 3a, it is evident that traditional criteria perform more effectively when utilizing optimal thresholds rather than relying on pre-defined ones. Furthermore, our model demonstrates superior performance when compared to the hazard type-specific criteria.

Among the representative machine learning techniques, random forests (RF) technique has been broadly used in turbulence detection^26,28. Moreover, considering the similarity of hazardous wind detection and anomaly detection, we compare the detection performance of our model against state-of-the-art anomaly detection methods. From ref. ²⁹, the IForest³⁰, XGBOD³¹, and CatB³² methods can respectively outperform other unsupervised, semi-supervised, and fully-supervised anomaly detection methods when very limited (<1%) label is available. Therefore, we take the IForest, XGBOD, and CatB methods as the comparison models in the evaluation. All models (ClusterNN in this paper, RF, IForest, XGBOD, and CatB) are trained and tested to detect hazardous winds. The adopted XGBOD method utilizes kNN (k-nearest neighbor), one-class SVM (support vector machine), and Isolation Forest as base outlier scoring functions in this study. The quantitative comparison of the CSI values is shown in Fig. 3b. It is obvious that, the fully-supervised learning method CatB shows inferior performance than other methods, demonstrating the importance of the information contained in the unlabeled wind records and scarce labels. The comparison further verifies that our model outperforms the state-of-the-art anomaly detection methods in realistic scenarios.

Two operational hazardous wind detection systems, known as the wind shear and turbulence warning system (WTWS) and the anemometer-based wind shear alerting rules-enhanced (AWARE) system, have been utilized at HKIA for many years. According to historical data, the WTWS and the AWARE issue alerts with true positive rates of 0.67 and 0.18, respectively. Their performances are represented by the dashed lines in Fig. 3c. The AWARE system shows inferior performance since it only utilizes the observations from in-situ measurements such as anemometers and buoys. While the WTWS, which assimilates various sources of observations (lidar, terminal Doppler weather radars, anemometers, and buoys), provides state-of-the-art hazardous wind alerts for HKIA³³. Actually, the Turbulence Joint Safety Implementation Team, comprising experts from FAA (US Federal Aviation Administration), NASA, various federal laboratories, and end users, recommended that the true positive rate and false positive rate for turbulence detection should be higher than 0.8 and lower than 0.15, respectively. This remains unattainable for low-dimensional hazard criteria¹⁵. However, it is noticed that the proposed method outperforms the operational hazardous wind detection systems. Specifically, as shown in the relative operating characteristics (ROC) curves in Fig. 3c, the ROC curve of the classifier developed in the established cluster feature space (as shown in Fig. 1d), represented by red line, is evidently higher than the recommended performance (green dot) under turbulence circumstance. By the way, for all types of hazardous winds, when the true positive rate of our classifier is equal to that of WTWS, the expected false positive rate is only 0.011 (blue dot), which is quite satisfactory.

Intensity quantification of various hazardous wind types

From Fig. 4a, it is evident that the hazard factor distribution in the training dataset displays unimodal and positively skewed characteristics, and the hazardous ones (represented by the red bars) correspond to the tail of the distribution. According to the mainstream definition of extreme events^34,35,36,37, it can be inferred that the hazardous winds are related to the extreme events defined by the hazard factor. In this way, wind records with higher hazard factor values occur less frequently but are more likely to be indicated as extremely hazardous situations.

**Fig. 4: Intensity-assessment evaluation.**

The violin plot in Fig. 4b displays the distribution of the hazard factor values with respect to various hazard intensities reported by pilots. The comparison in the subfigure below demonstrates that the amplitudes of the hazard factors are directly proportional to the reported hazard intensities. Together, the comparisons show that the hazard factor values increase in tandem with the growth of hazard intensity, indicating that the hazard factor offers a rough assessment of hazard intensity, such that the higher the factor is, the severer the hazard will be. However, it is noted that the hazard intensities reported by pilots depend on the aircraft responses, which are functions of the dynamic characteristics of the wind, aircraft parameters (such as type, mass, altitude, and airspeed) and subjective factors like pilots’ experience (see Supplementary Information section 1.1). Therefore, we believe that, the relationship between the hazard factor and the reported hazard intensity can be more accurately analyzed with abundant pilot reports.

Discussion

We have demonstrated that different types of hazardous winds and calm winds exhibit distinct intrinsic characteristics, and they can be universally detected in a type-free manner. Moreover, a comprehensive set of features, which can universally identify and quantify various hazardous wind types, have been effectively extracted by an interpretable semi-supervised clustering network. By integrating prior knowledge and probabilistic models of winds, this work not only delivers an interpretable and transparent paradigm for evaluating aviation hazards, but also provides opportunities for the detection and elucidation of extreme events.

Currently, the implemented network is straightforward but has proven to be effective in illustrating the efficacy of the proposed paradigm in this work. Indeed, this paradigm is a versatile, plug-and-play mechanism, suggesting that improved performance can be anticipated by incorporating a more sophisticated network. It is noted that the model proposed in this work is currently trained on the HKIA lidar observation dataset, where the wind records along the corridors (Supplementary Fig. 2b) are updated approximately every 2 min. In this way, the trained model implicitly assumes that the wind records are quasi-stationary within ~2 min. In fact, the quasi-stationary assumption of natural wind within 2 min is widely accepted in the global civil aviation industry. For instance, the revisit time of the terminal Doppler weather radars (TDWRs) is about 5 min³⁸, and the averaging period for surface wind observations in aviation management is 2 min³⁹. Actually, the 2-min quasi-stationary assumption is also consistent with actual situations, as demonstrated in Supplementary Information section 1.2.

Moreover, although the model proposed in this work is trained using lidar observations from the HKIA dataset, it can be applied to other scenarios as long as fine spatial resolution wind records along the corridor can be obtained, which can be either observed or forecasted. In this way, we have demonstrated that the model can be generalized (1) from one airport to others, (2) from lidar data input to other types of wind observation data, and (3) from hazard assessment to prediction.

Generalization from one airport to another. Wind measurement data and pilot reports from other four airports in mainland China are selected to verify the generalization capability of this model. The results (Fig. 2) indicate that hazardous winds from other airports can be properly detected by the model proposed in this work. It is noticed that the deployment of lidar is becoming increasingly common at airports worldwide, which highlights the significant potential for the application of this model.
Generalization from lidar data input to other types of wind observation data. In cases where lidar observation data is not available, we have demonstrated that the hazard information in the wind records generated by a numerical weather prediction method⁴⁰, can be accurately detected by the proposed model. The detailed process is available in Supplementary Information section 3.1.
Generalization from hazard assessment to prediction. Currently, wind field prediction has been well investigated⁴¹, and the authors of this paper have also proposed an accurate wind field prediction method based on the convolutional long short-term memory (ConvLSTM) neural network⁴². Therefore, by applying the hazard assessment method proposed in this work to the predicted wind fields, the prediction of hazard intensity for the wind can then be realized. A detailed description is available in Supplementary Information section 3.2.

Methods

Related works

Complex winds, including wind shear, turbulence, and aircraft wake, exhibit elusive and high-dimensional characteristics. Consequently, the existing hazard detection methods are defined for specific types of hazardous winds.

Turbulence detection. Based on the statistical signature of turbulence, features, such as the Doppler spectral width⁴³, the variance of Doppler mean velocities⁴⁴, and turbulence energy dissipation rate^45,46,47 have been widely used to assess the intensity of turbulence. Specifically, for mid- and upper-level turbulence detection, it is impractical to directly and routinely measure or forecast the atmospheric motion at the scale that affects aircraft. Therefore, the defined expertise-reliant features^15,48,49, and machine learning techniques such as random forest^26,28 neural network¹⁸, utilize large-scale atmospheric motion forecasts as inputs. However, this reliance limits their adaptability and update rate.
Wake vortex detection. According to the unique crossover structure of the Doppler velocity field induced by wake vortices, the velocity range⁵⁰, and other graphical features extracted by deep learning framework^20,21 have been utilized to recognize aircraft wake vortex that evolve perpendicular to the corridor. Conversely, this work focuses on the hazardous winds that occur along the corridor, since these winds can cause significant and persistent changes to aircrafts.
Wind shear detection. To detect wind shear in ground proximity, the S-factor¹⁰, and the F-factor⁵¹ were proposed according to the induced wind speed variation, and related empirical thresholds were statistically defined^52,53. These features are easy to use⁵⁴, but the detection performance greatly depends on the pre-defined thresholds^11,53,55. In the last years, machine learning methods have been used for wind shear detection⁵⁶.

In general, the existing criteria are tailored for a specific type of hazardous winds, the potential to broaden the research scope from a single type of hazardous winds to various types of hazardous winds remains as an open question.

Dataset preparation

In this work, each wind record along the corridor has 115 sampling points, covering the range from the touchdown site to 6000 m away. The field campaigns in HKIA and other airports are reported in the Supplementary Information section 2.1. The wind records collected at HKIA from January 1, 2017 to December 31, 2020 are utilized to train and test the model we proposed. The wind records corresponding to pilot reports are labeled as hazardous winds (Supplementary Information section 1.1).

In the model development process, the data in 2017 and 2018 are employed. There are a total of N_train = 51,5733 wind records data, containing ${N}_{{{{\rm{train}}}}}^{{{{\rm{hazard}}}}}=181$ hazardous labels and ${N}_{{{{\rm{train}}}}}^{{{{\rm{unlabel}}}}}={N}_{{{{\rm{train}}}}}-{N}_{{{{\rm{train}}}}}^{{{{\rm{hazard}}}}}=515627$ unlabeled data. According to the seasonal occurrence probability of the hazardous winds observed in HKIA (Supplementary Information section 2.3), we randomly select ${N}_{{{{\rm{train}}}}}^{{{{\rm{calm}}}}}={N}_{{{{\rm{train}}}}}^{{{{\rm{hazard}}}}}$ wind records from the unlabeled data in October and November to further guide the training process. For testing purpose, we have established a test dataset consisting of the ${N}_{{{{\rm{test}}}}}^{{{{\rm{hazard}}}}}=178$ hazardous wind reports from 2019 to 2020, as well as ${N}_{{{{\rm{test}}}}}^{{{{\rm{calm}}}}}={N}_{{{{\rm{test}}}}}^{{{{\rm{hazard}}}}}$ calm wind records. These calm records are randomly selected from the unlabeled data in October and November during the same period.

Problem definition

From the machine learning perspective, the task of hazardous wind detection and intensity assessment can be regarded as a clustering problem, followed by a classification problem. The basic idea is to map the wind record along the corridor to the latent cluster features z_i by a cluster network parameterized by Θ, and further estimate the hazard factor y_i by a supervised classifier parameterized by Φ. The model can be illustrated as follows

$$\left[{y}_{i},{{{{\bf{z}}}}}_{i}\right]={\arg\max}_{\left[{y}_{i},{{{{\bf{z}}}}}_{i}\right]}\,P\left({y}_{i}| {{{{\bf{z}}}}}_{i},{{{{\bf{x}}}}}_{i};{{{\mathbf{\Phi }}}},{{{\mathbf{\Theta }}}}\right)P\left({{{{\bf{z}}}}}_{i}| {{{{\bf{x}}}}}_{i};{{{\mathbf{\Theta }}}}\right)$$

(1)

where x_i is the wind record observed at time t_i, and y_i > y_T (y_i < y_T) indicates whether the concerned wind is hazardous (calm). The amplitude of the hazard factor y_i can assess the intensities of hazardous winds. y_T is the optimal threshold of the classifier, it is obtained by the ROC curve in this work.

The model defined in Eq. (1) consists of two steps. The first step is to learn cluster features for hazardous wind clustering and perform clustering assignments simultaneously by semi-supervised clustering (Fig. 1c). The second step is to train a supervised classifier (Fig. 1d) from the extracted cluster features, where a hazard factor is proposed to assess the intensity of hazardous winds. The hazard factor is defined as the decision function of the classifier, it can be denoted as: ${y}_{i}={g}_{{{{\mathbf{\Phi }}}}}\left({{{{\bf{z}}}}}_{i}\right)$, where ${g}_{{{{\mathbf{\Phi }}}}}\left(\cdot \right)$ represents the transformation of the classifier.

Architecture and parameter initialization

The cluster network is defined as:

$${{{{\bf{h}}}}}_{1} ={{{\rm{ReLU}}}}\left({{{{\bf{W}}}}}_{1}^{{{{\rm{T}}}}}{{{\bf{x}}}}+{{{{\bf{b}}}}}_{1}\right)\\ {{{{\bf{h}}}}}_{2} ={{{\rm{ReLU}}}}\left({{{{\bf{W}}}}}_{2}^{{{{\rm{T}}}}}{{{{\bf{h}}}}}_{1}+{{{{\bf{b}}}}}_{2}\right)\\ {{{\bf{z}}}} ={{{{\bf{W}}}}}_{3}^{{{{\rm{T}}}}}{{{{\bf{h}}}}}_{2}+{{{{\bf{b}}}}}_{3}$$

(2)

where x and z are the input and the extracted cluster features, respectively, {h₁, h₂} represent the hidden states, and {W₁, ⋯, W₃, b₁, ⋯, b₃} are the model parameters.

The network is initialized with a stacked autoencoder (SAE)⁵⁷, which has been proved to consistently produce semantically meaningful representations on real-world datasets^58,59,60. Take the layers in the cluster network as encoded layers, the decoded layers in the SAE are defined as:

$${{{{\bf{h}}}}}_{4} ={{{\rm{ReLU}}}}\left({{{{\bf{W}}}}}_{4}^{{{{\rm{T}}}}}{{{\bf{z}}}}+{{{{\bf{b}}}}}_{4}\right)\\ {{{{\bf{h}}}}}_{5} ={{{\rm{ReLU}}}}\left({{{{\bf{W}}}}}_{5}^{{{{\rm{T}}}}}{{{{\bf{h}}}}}_{4}+{{{{\bf{b}}}}}_{5}\right)\\ {\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}} ={{{{\bf{W}}}}}_{6}^{{{{\rm{T}}}}}{{{{\bf{h}}}}}_{5}+{{{{\bf{b}}}}}_{6}$$

(3)

where ${\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}}$ represents the reconstructed output, {h₄, h₅} are the hidden states of the decoded layers, and {W₄, ⋯, W₆, b₄, ⋯, b₆} are the parameters. The initialization process is performed by minimizing the least squares loss $\parallel {{{\bf{x}}}}-{\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}}{\parallel }^{2}$.

Prior knowledge and probabilistic models

In this study, we hypothesize that various types of hazardous winds and calm winds can be categorized into two distinct classes. Specifically, four pieces of prior knowledge are incorporated:

Knowledge 1: calm winds should cluster together;
Knowledge 2: hazardous winds should cluster together;
Knowledge 3: there should be a significant separation between calm winds and hazardous winds;
Knowledge 4: the majority of unlabeled wind records should be identified as calm, since hazardous winds rarely occur in reality.

Taking the above prior knowledge into account, the objective function can be expressed as:

$${{{\mathcal{L}}}}={\lambda }_{1}{{{{\mathcal{L}}}}}_{{{{\rm{rec}}}}}+{\lambda }_{2}{{{{\mathcal{L}}}}}_{{{{\rm{unlabel}}}}}+{\lambda }_{3}{{{{\mathcal{L}}}}}_{{{{\rm{calm}}}}}+{\lambda }_{4}{{{{\mathcal{L}}}}}_{{{{\rm{hazard}}}}}+{\lambda }_{5}{{{{\mathcal{L}}}}}_{{{{\rm{calm}}}}-{{{\rm{hazard}}}}}$$

(4)

where the five terms on the right-hand side are the unsupervised reconstruction loss, unlabeled loss (Knowledge 4), calm loss (Knowledge 1), hazard loss (Knowledge 2), and calm-hazard loss (Knowledge 3), respectively, λ₁, λ₂, λ₃, λ₄, and λ₅ correspond to their scaling parameters.

Probabilistic models

In traditional deep learning methods, the scaling parameters λ₁, λ₂, λ₃, λ₄, and λ₅ in Eq. (4) are empirically set, which lacks interpretability. Our work defines the scaling parameters and the loss terms according to their probabilistic models.

For a wind record x, its extracted cluster features can be denoted as f_Θ(x), where Θ represents the parameters of the cluster network. Suppose that the deviation between the reconstructed record ${f}_{\!\!\Theta }({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}})$ and the input record f_Θ(x) obey Gaussian distribution in the cluster feature space, then the first term on the right-hand side of Eq. (4) can be denoted as:

$$P\left(\,{f}_{\!\!\Theta }({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}})-{f}_{\!\!\Theta }({{{\bf{x}}}});\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{1}^{2}\right)$$

(5)

where ${\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}}$ and x represent the reconstructed record and the input, respectively, σ₁ is the standard deviation of ${f}_{\!\!\Theta }({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}})-{f}_{\!\!\Theta }({{{\bf{x}}}})$.

We further assume that in the established cluster feature space, data points in each cluster obey Gaussian distribution. A two-dimensional illustration of the extracted cluster features is shown in Fig. 1b. In this way, the probability of the last four terms on the right-hand side of Eq. (4) can be denoted as:

$$P\left({f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{unlabel}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}});\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{2}^{2}\right)$$

(6)

$$P\left({f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}});\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{3}^{2}\right)$$

(7)

$$P\left({f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}});\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{4}^{2}\right)$$

(8)

$$P\left({f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})-{f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}})-\left[{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}})\right];\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{3}^{2}+{\sigma }_{4}^{2}\right)$$

(9)

where x_unlabel, x_calm, and x_hazard represent the unlabeled, calm, and hazardous winds, respectively, σ₂, σ₃, and σ₄ correspond to their standard deviations. For the convenience of the training process, Eq. (9) can be further simplified by substituting the cluster features with the distances between them, yielding the following form:

$$P\left(\left\Vert \, {f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}})\right\Vert -d;\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{3}^{2}+{\sigma }_{4}^{2}\right)$$

(10)

where d indicates the expected distance between the centroids of the hazardous winds and calm winds in the established cluster feature space.

To guide the model in quantifying the intensities of hazardous winds, we have introduced an additional constraint to Eq. (9). This constraint, as shown in Fig. 1b, defines that the distance between strong hazardous winds and calm winds should be greater than d, and the distance for weak hazardous winds and calm winds should be less than d. In this way, considering the intensity levels of hazardous winds, the expression in Eq. (9) is divided into the following two components:

$$\left\{\begin{array}{l}P\big(y\big({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\big);\Theta \big)=\frac{1}{\sqrt{2\pi }{\sigma }_{5}}\exp \left(-\frac{{\left\Vert y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\right)\right\Vert }^{2}}{2{\sigma }_{5}^{2}}\right)\cdot {{{\rm{sigmoid}}}}\big(y\big({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\big)\big)\quad \\ P\left(y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right);\Theta \right)=\frac{1}{\sqrt{2\pi }{\sigma }_{5}}\exp \left(-\frac{{\left\Vert y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right)\right\Vert }^{2}}{2{\sigma }_{5}^{2}}\right)\cdot {{{\rm{sigmoid}}}}\left(-y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right)\right)\quad \end{array}\right.$$

(11)

where ${{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}$ and ${{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}$ represent the hazardous winds with strong intensities and weak intensities, respectively, $y({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})=\left\Vert {f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}})\right\Vert -d$ and ${\sigma }_{5}^{2}={\sigma }_{3}^{2}+{\sigma }_{4}^{2}$.

By leveraging the training dataset, which includes unlabeled data, calm and hazardous winds, the model parameters can be derived by maximizing the following equation:

$$\Theta={{\arg\max}_{\Theta } \ln \left({{{{\mathcal{L}}}}}_{P}\right)}$$

(12)

where ${{{{\mathcal{L}}}}}_{P}$ is the product of the aforementioned probabilistic models:

$${{{{\mathcal{L}}}}}_{P}= P\left({f}_{\!\!\Theta }\left({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}}\right)-{f}_{\!\!\Theta }({{{\bf{x}}}});\Theta \right)\cdot P\left({f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{unlabel}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right);\Theta \right)\\ \cdot P\left({f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right);\Theta \right)\cdot P\left({f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}\right);\Theta \right)\\ \cdot P\left(y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\right);\Theta \right)\cdot P\left(y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right);\Theta \right)$$

(13)

which can be transformed into:

$$-\ln \left({{{{\mathcal{L}}}}}_{P}\right)= \frac{{\sum }_{{N}_{1}}{\left\Vert {f}_{\!\!\Theta }\left({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}}\right)-{f}_{\!\!\Theta }({{{\bf{x}}}})\right\Vert }^{2}}{2{\sigma }_{1}^{2}}+\frac{{\sum }_{{N}_{2}}{\left\Vert {f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{unlabel}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right)\right\Vert }^{2}}{2{\sigma }_{2}^{2}}\\ +\frac{{\sum }_{{N}_{3}}{\left\Vert {f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right)\right\Vert }^{2}}{2{\sigma }_{3}^{2}}+\frac{{\sum }_{{N}_{4}}{\left\Vert {f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}\right)\right\Vert }^{2}}{2{\sigma }_{4}^{2}}\\ +\sum\limits_{{N}_{5}}\frac{{\left\Vert y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\right)\right\Vert }^{2}}{2\left({\sigma }_{3}^{2}+{\sigma }_{4}^{2}\right)}-\log \left[{{{\rm{sigmoid}}}}\left(y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\right)\right)\right]\\ +\sum\limits_{{N}_{4}-{N}_{5}}\frac{{\left\Vert y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right)\right\Vert }^{2}}{2\left({\sigma }_{3}^{2}+{\sigma }_{4}^{2}\right)}-\log \left[{{{\rm{sigmoid}}}}\left(-y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right)\right)\right]+{{{\rm{Constant}}}}$$

(14)

where N₁, N₂, ⋯, and N₅ represent the number of training data, unlabeled data, calm winds, hazardous winds, and strong hazardous winds, respectively.

Each term in Eq. (14) corresponds to a loss term in Eq. (4). By combining Eqs. (4) and (14), the implications of the scaling parameters and the associated loss terms become interpretable: within the established cluster feature space, the loss terms quantify the deviations of feature points from the centroids, while the scaling parameters signify the diversity of these points.

Moreover, according to the diversity of wind records in real-world scenarios, we further expect the following constraints: σ₁ should be small since the reconstruction error should be very small to keep the network meaningful; σ₃ < σ₂ since the third term corresponds to calm winds, which are a subset of the unlabeled data, they should exhibit a lower standard deviation; and σ₃ < σ₄ since the calm winds are supposed to be more concentrated than the hazardous winds. Integrating the constraints and the objective function into the model can enhance its interpretability.

By incorporating prior knowledge and probabilistic models, this work offers an interpretable and transparent paradigm for assessing the aviation hazards.

Data availability

Maps in this paper were generated using the cartopy⁶¹ and Folium⁶². The flight's statistical data and daily wind speed and direction in Supplementary Information are available at: https://www.hongkongairport.com/sc/the-airport/hkia-at-a-glance/fact-figures.page and https://www.weather.gov.hk/tc/index.html. The station observations at airports in the United States were downloaded from: https://www.ncei.noaa.gov/data/automated-surface-observing-system-one-minute-pg1/access/, and the statistics of the digital elevation model came from https://cloud.tsinghua.edu.cn/d/695ed43696564904980f/. The wind records used in this study are provided in the Source Data file and ref. ⁶³. Source data are provided with this paper.

Code availability

Our source code is available at ref. ⁶³.

References

Linden, P. F. & Simpson, J. E. Microbursts: a hazard for aircraft. Nature 317, 601–602 (1985).
Article ADS Google Scholar
Hallowell, R. G. & Cho, J. Y. N. Wind shear system cost benefit analysis. Lincoln Lab. J. 18, 47–68 (2010).
Google Scholar
O’Connor, A. & Kearney, D. Low level turbulence detection for airports. Int. J. Aviation Aeronaut. Aerospace https://doi.org/10.15394/ijaaa.2019.1302 (2019).
Nijhuis, O. et al. Wind hazard and turbulence monitoring at airports with Lidar, radar, and mode-s downlinks the UFO project. Bull. Am. Meteorol. Soc. 99, 2275–2294 (2018).
Article ADS Google Scholar
Veillette, P. R. Data show that US wake-turbulence accidents are most frequent at low altitude and during approach and landing. Flight Safety Digest 21, 1–57 (2002).
Google Scholar
Carbaugh, D., Rockliff, L. & Vandel, R. High altitude operations airplane upset recovery training aid, revision 2. https://www.faa.gov/other_visit/aviation_industry/airline_operators/training/media/AP_UpsetRecovery_Book.pdf (2008).
Keohan, C. Ground-based wind shear detection systems have become vital to safe operations. ICAO J. 62, 16–19 (2007).
Google Scholar
Gryning, S. E., Batchvarova, E., Brümmer, B., Jørgensen, H. & Larsen, S. On the extension of the wind profile over homogeneous terrain beyond the surface boundary layer. Bound. Layer Meteorol. 124, 251–268 (2007).
Article ADS Google Scholar
Bowles, R. L. Reducing windshear risk through airborne systems technology. In 17th Congress of the International Council of the Aeronautical Sciences (ICAS Proceedings) 1603–1630 (ICAS, 1990).
Woodfield, A. A. & Woods, J. F. Worldwide Experience of Wind Shear during 1981–1982. In AGARD Flight Mechanics Panel Conf. on Flight Mechanics and System Design Lessons from Operational Experience 28 (Royal Aircraft Establishment, 1983).
Yuan, J. et al. Microburst, windshear, gust front, and vortex detection in mega airport using a single coherent Doppler wind lidar. Remote Sens. 14, 1–14 (2022).
Google Scholar
Chan, P. W. Latest aviation applications of LIDAR at the Hong Kong International Airport. In 15th Conference on Aviation, Range, and Aerospace Meteorology. 1–4 (American Meteorological Society, 2011).
Smalikho, I. N. & Banakh, V. A. Estimation of aircraft wake vortex parameters from data measured with a 15-μm coherent Doppler lidar. Opt. Lett. 40, 3408 (2015).
Article ADS CAS PubMed Google Scholar
Nieuwpoort, A. M. H., Gooden, J. H. M. & de Prins, J. L. Wind Criteria due to Obstacles at and Around Airports. Report No. NLR-TP-2010-312 (National Aerospace Laboratory, 2010).
Sharman, R., Tebaldi, C., Wiener, G. & Wolff, J. An integrated approach to mid- and upper-level turbulence forecasting. Weather Forecast. 21, 268–287 (2006).
Article ADS Google Scholar
Lee, D. B., Chun, H. Y., Kim, S. H., Sharman, R. D. & Kim, J. H. Development and evaluation of global Korean aviation turbulence forecast systems based on an operational numerical weather prediction model and in situ flight turbulence observation data. Weather Forecast. 37, 371–392 (2022).
Article ADS Google Scholar
Reichstein, M. et al. Deep learning and process understanding for data-driven Earth system science. Nature 566, 195–204 (2019).
Article ADS CAS PubMed Google Scholar
Zhang, D. et al. T2-Net: a semi-supervised deep model for turbulence forecasting. Proc. IEEE Int. Conf. Data Mining 2020-Novem, 1388–1393 (2020).
Google Scholar
Lee, Y., Ki, S. H. M., Noh, Y. J. & Ki, J. H. M. Deep learning-based summertime turbulence intensity estimation using satellite observations. J. Atmos. Ocean. Technol. 40, 1433–1448 (2023).
Article ADS Google Scholar
Pan, W.-J., Leng, Y.-F., Wu, T.-Y., Xu, Y.-X. & Zhang, X.-L. Conv-wake: a lightweight framework for aircraft wake recognition. J. Sens. 2022, 3050507 (2022).
Article Google Scholar
Shen, C. et al. Aircraft wake recognition and strength classification based on deep learning. IEEE J. Select. Top. Appl. Earth Obser. Remote Sens. 16, 2237–2249 (2023).
Article ADS Google Scholar
Khattak, A., Chan, P.W., Chen, F. & Peng, H. Time-series prediction of intense wind shear using machine learning algorithms: a case study of Hong Kong International Airport. Atmosphere https://doi.org/10.3390/atmos14020268 (2023).
Gonzalo, J., Domínguez, D., López, D. & García-Gutiérrez, A. An analysis and enhanced proposal of atmospheric boundary layer wind modelling techniques for automation of air traffic management. Chin. J. Aeronaut. 34, 129–144 (2021).
Article Google Scholar
Frehlich, R., Meillier, Y., Jensen, M. L., Balsley, B. & Sharman, R. Measurements of boundary layer profiles in an urban environment. J. Appl. Meteorol. Climatol. 45, 821–837 (2006).
Article ADS Google Scholar
Ng, C. W. & Hon, K. K. Fast dual-doppler LiDAR retrieval of boundary layer wind profiles. Weather 77, 134–142 (2022).
Article ADS Google Scholar
Muñoz-Esparza, D., Sharman, R. D. & Deierling, W. Aviation turbulence forecasting at upper levels with machine learning techniques based on regression trees. J. Appl. Meteorol. Climatol. https://doi.org/10.1175/JAMC-D-20-0116.1 (2020).
Joseph, T. S. The critical success index as an indicator of warning skill. Weather Forecast. 5, 570–575 (1990).
Article Google Scholar
Williams, J. K. Using random forests to diagnose aviation turbulence. Mach. Learn. 95, 51–70 (2014).
Article MathSciNet PubMed Google Scholar
Han, S., Hu, X., Huang, H., Jiang, M. & Zhao, Y. ADBench: anomaly detection benchmark. SSRN Electronic J. https://doi.org/10.2139/ssrn.4266498, https://arxiv.org/abs/2206.09426 (2022).
Liu, F. T., Ting, K. M. & Zhou, Z.-H. Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data 6, 1–39 (2012).
Article Google Scholar
Zhao, Y. & Hryniewicki, M. K. XGBOD: improving supervised outlier detection with unsupervised representation learning. In 2018 International Joint Conference on Neural Networks (IJCNN) 1–8 (IEEE, 2018).
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. Catboost: unbiased boosting with categorical features. Adv. Neural Inform. Process. Syst. 2018, 6638–6648 (2018).
Google Scholar
Chan, P. W. & Lee, Y. F. Application of short-range lidar in wind shear alerting. J. Atmos. Ocean. Technol. 29, 207–220 (2012).
Article ADS Google Scholar
Wigley, T. M. L. Climatology: impact of extreme events. Nature 316, 106–107 (1985).
Article ADS Google Scholar
Palmer, T. N. & Räisänen, J. Quantifying the risk of extreme seasonal precipitation events in a changing climate. Nature 415, 512–514 (2002).
Article CAS PubMed Google Scholar
Hamada, A., Takayabu, Y. N., Liu, C. & Zipser, E. J. Weak linkage between the heaviest rainfall and tallest storms. Nat. Commun. 6, 1–6 (2015).
Article Google Scholar
Waliser, D. & Guan, B. Extreme winds and precipitation during landfall of atmospheric rivers. Nat. Geosci. 10, 179–183 (2017).
Article ADS CAS Google Scholar
Hynek, D. P. TDWR Scan Strategy Implementation. Report ATC-222 (Lincoln Laboratory, 1994).
ICAO. Meteorological Service for International Air Navigation (Annex 3). Technical Report (International Civil Aviation Organization, 2007).
Chan, P. W., Lai, K. K., Kong, W. & Tse, S. M. Performance of windshear/microburst detection algorithms using numerical weather prediction model data for selected tropical cyclone cases. Atmos. Sci. Lett. 24, 1–15 (2023).
Article Google Scholar
Bi, K. et al. Accurate medium-range global weather forecasting with 3D neural networks. Nature 619, 533–538 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Gao, H. et al. A deep learning-based wind field nowcasting method with extra attention on highly variable events. IEEE Geosci. Remote Sens. Lett. 19, 1006405 (2022).
Article Google Scholar
Zhang, Q., Xiao, G., Lan, Y.-q & Li, R.-R. Atmospheric turbulence detection by PCA approach. Aerospace Syst. 2, 15–20 (2019).
Article ADS Google Scholar
Oude Nijhuis, A. C. P. et al. Assessment of the rain drop inertia effect for radar-based turbulence intensity retrievals. Int. J. Microwave Wireless Technol. 8, 835–844 (2016).
Article Google Scholar
Chan, P. W. In Aviation Turbulence: Processes, Detection, Prediction (eds Sharman, R. & Lane, T.) Ch. 9 (Springer, 2016).
Jiang, P., Yuan, J., Wu, K., Wang, L. & Xia, H. Turbulence detection in the atmospheric boundary layer using coherent Doppler wind lidar and microwave radiometer. Remote Sens. 14, 2951 (2022).
Article ADS Google Scholar
Banakh, V. A., Smalikho, I. N. & Falits, A. V. Estimation of the turbulence energy dissipation rate in the atmospheric boundary layer from measurements of the radial wind velocity by micropulse coherent Doppler lidar. Opt. Express 25, 22679 (2017).
Article ADS CAS PubMed Google Scholar
Frehlich, R. & Sharman, R. Estimates of upper level turbulence based on second order structure functions derived from numerical weather prediction model output. In 11th Conf. on Aviation, Range, and Aerospace Meteorology. 4–13 (American Meteorological Society, 2004).
Storer, L. N., Williams, P. D. & Gill, P. G. Aviation turbulence: dynamics, forecasting, and response to climate change. Pure Appl. Geophys. 176, 2081–2095 (2019).
Article ADS Google Scholar
Gao, H., Li, J., Chan, P. W. & Hon, K. K. Parameter retrieval of aircraft wake vortex based on its max-min distribution of Doppler velocities measured by a Lidar. J. Eng. 2019, 6852–6855 (2019).
Google Scholar
Bowles, R. L. Windshear detection and avoidance: airborne systems survey. In Proc. 29th Conference on Decision and Control 708–736 (IEEE, 1990).
EASA. Airborne Wind Shear Warning and Escape Guidance Systems (Reactive Type) for Transport Aeroplanes. Technical Report ETSO-C117b (European Aviation Safety Agency, 2020).
Lee, Y. F. & Chan, P. W. Lidar-based F-factor for wind shear alerting: different smoothing algorithms and application to departing flights. Meteorol. Appl. 21, 86–93 (2014).
Article ADS Google Scholar
Kameyama, S., Furuta, M. & Yoshikawa, E. Performance simulation theory of low-level wind shear detections using an airborne coherent Doppler lidar based on RTCA DO-220. IEEE Trans. Geosci. Remote Sens. 61, 1–12 (2023).
Google Scholar
Proctor, H. et al. A windshear hazard index. In 9th Conference on Aviation, Range and Aerospace Meteorology 482–487 (2000).
Ryan, M., Saputro, A. H. & Sopaheluwakan, A. Intelligent low-level wind shear alert prediction system based on anemometer sensor network and temporal convolutional network (TCN). Geogr. Tech. 17, 92–103 (2022).
Google Scholar
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).
Article ADS MathSciNet CAS PubMed Google Scholar
Xie, J., Girshick, R. & Farhadi, A. Unsupervised deep embedding for clustering analysis. Int. Conf. Mach. Learn. 1, 740–749 (2016).
Google Scholar
Le, Q. V. Building high-level features using large scale unsupervised learning. In 2013 International Conference on Acoustics, Speech and Signal Processing 8595–8598 (IEEE, 2013).
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. & Manzagol, P. A. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010).
MathSciNet Google Scholar
Met Office. Cartopy: a cartographic Python library with a Matplotlib interface. Exeter, Devon https://scitools.org.uk/cartopy (2010–2015).
Folium Contributors. Folium: data-driven, interactive maps. (2022).
Gao, H. et al. Interpretable semi-supervised clustering enables universal detection and intensity assessment of diverse aviation hazardous winds. Zenodo https://doi.org/10.5281/zenodo.12806274 (2024).
Zhang, B. et al. Super-resolution reconstruction of a 3 arc-second global dem dataset. Sci. Bull. 67, 2526–2530 (2022).
Article Google Scholar
Van Der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2625 (2008).
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 62231026 (J.L.)).

Author information

Authors and Affiliations

College of Electronic Science and Technology, National University of Defense Technology, Changsha, 410073, Hunan, China
Hang Gao, Chun Shen, Xuesong Wang & Jianbing Li
School of Electronic Information, Central South University, Changsha, 410083, Hunan, China
Hang Gao
State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, National University of Defense Technology, Changsha, 410073, Hunan, China
Chun Shen, Xuesong Wang & Jianbing Li
Hong Kong Observatory, Nathan Road, Hong Kong, 999077, China
Pak-Wai Chan & Kai-Kwong Hon

Authors

Hang Gao
View author publications
Search author on:PubMed Google Scholar
Chun Shen
View author publications
Search author on:PubMed Google Scholar
Xuesong Wang
View author publications
Search author on:PubMed Google Scholar
Pak-Wai Chan
View author publications
Search author on:PubMed Google Scholar
Kai-Kwong Hon
View author publications
Search author on:PubMed Google Scholar
Jianbing Li
View author publications
Search author on:PubMed Google Scholar

Contributions

H.G. and J.L. conceived the basis of the study and performed all data analyses. H.G. and J.L. wrote the manuscript and Supplementary Information. P.-W.C. and K.-K.H. assembled the experimental setup and conducted data collection. J.L., C.S., and X.W. supervised the study and directed the manuscript. All authors discussed the results.

Corresponding author

Correspondence to Jianbing Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Denghui Zhang, Huibin Zhou and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Gao, H., Shen, C., Wang, X. et al. Interpretable semi-supervised clustering enables universal detection and intensity assessment of diverse aviation hazardous winds. Nat Commun 15, 7347 (2024). https://doi.org/10.1038/s41467-024-51597-y

Download citation

Received: 06 July 2023
Accepted: 10 August 2024
Published: 26 August 2024
DOI: https://doi.org/10.1038/s41467-024-51597-y