Introduction

Complex winds, such as wind shear1,2,3, turbulence4, and aircraft wake5, can severely impact aviation safety, especially during the takeoff and landing phases. According to statistics6, over 40% of aviation incidents are mainly caused by complex wind conditions. For example, aircraft wake caused 130 accidents between 1983 and 20005, and wind shear-related accidents have even resulted in more than 1400 fatalities since 19437. Therefore, there is an urgent demand to robustly detect hazardous winds in air traffic management.

However, induced by local topography, surface roughness, and thermodynamics8, hazardous winds often involve multi-scale and chaotic dynamics. For example, turbulence can cause stochastic fluctuations in wind speed and direction within meters, whereas wind shear may produce sustained changes up to several nautical miles. Due to the chaotic dynamics of turbulence and the complex interaction between wind and aircraft, hazardous winds may show elusive and high-dimensional characteristics, making it challenging for hazardous winds to be universally detected.

During the past decades, some low-dimensional features have been developed in a hazard type-specific manner. For example, the hazard factor9 and severity factor10,11 (broadly known as F-factor and S-factor, respectively) are proposed for wind shear detection; the eddy dissipation rate12 and the velocity circulations13 were proposed to measure the intensity of turbulence and aircraft wake, respectively. However, since these expertise-reliant14 features neglected the high-dimensional characteristics of hazardous winds, their robustness is quite limited. Through the combination of low-dimensional hazard features, the integrated methods15,16 have demonstrated improved performance in detecting mid- and upper-level turbulence. Besides these manually crafted features, recent researches have indicated that machine learning, especially deep learning17, can contribute to turbulence forecasting18,19, wake vortex detection20,21, and wind shear prediction22 by extracting the high-dimensional features from ever-increasing data. Despite their promise, machine learning-based methods still confront potential limitations when applied in air traffic management.

Firstly, compared to the mid-to-upper atmosphere, low-level winds exhibit complex interactions with the underlying surface and local terrains, which could significantly affect aircraft with lower airspeed and altitude during takeoff and landing phases23. Therefore, it is essential to instantly identify whether a specific wind is hazardous for aviation safety and quantify its hazard intensity. However, existing approaches are tailored to specific types of hazardous winds (such as Cshear, Cturb, and Cwake in Fig. 1a), and the existence of a universal feature capable of detecting diverse types of hazardous winds (Cuniversal in Fig. 1a) remains as an open question.

Fig. 1: Illustration of the presented paradigm.
figure 1

a The hypothesis introduced in this study, where the existing hazardous wind detection criteria are wind type determined. Cshear represents wind shear detection criteria, such as S-factor, F-factor, wind speed fluctuation, etc.; Cturb represents turbulence detection criteria, such as eddy dissipation rate, turbulence index, structure function, etc.; whereas Cwake represents wake vortex detection criteria, such as circulation, velocity range. Rather than carefully tuning the thresholds for different hazardous wind criteria, this study aims to identify a universal feature (Cuniversal) capable of detecting a variety of hazardous wind types. b Illustration of the probabilistic models in the established cluster feature space. c The straightforward network which maps the input data space to latent cluster feature space. d The supervised classifier used for hazard intensity assessment, where a comprehensive criteria termed “hazard factor” is defined based on the probabilistic models.

Secondly, the labels in this task are obtained through pilot reports, which serve as indicators of hazardous conditions to determine whether a wind record measured by remote sensors24,25 is hazardous to flight. However, the majority of the wind records remain unlabeled because the takeoff and landing flights occupy only a small portion of the full time in airports. Also, the calm (non-hazardous) labels are unavailable since the pilot reports only concern the hazardous winds experienced by aircrafts, while ignoring those with minor or negligible impacts.

Thirdly, current machine learning methods interpret hazardous wind detection as classification18, regression26, or object detection21 tasks, where the models are trained to produce outputs that match the labels. These methods adhere to the conventional machine learning paradigms, and as a result, their interpretability are restricted due to the absence of prior knowledge.

In this work, we validate the hypothesis that different types of hazardous winds and calm winds exhibit distinct intrinsic characteristics through the development of an interpretable semi-supervised clustering paradigm. Our technical contributions are two-fold. Firstly, supposing that various types of hazardous winds and calm winds can be categorized into two distinct classes, we transform the hazardous wind detection challenge into a semi-supervised clustering task. According to the hypothesis, we introduce four pieces of prior knowledge to guide the clustering process in establishing optimal cluster features. The prior knowledge defines that: (1) calm winds should cluster together; (2) hazardous winds should also cluster together; (3) there should be a significant separation between calm winds and hazardous winds; (4) the majority of unlabeled wind records should be identified as calm, considering the infrequency of hazardous winds. Secondly, the scaling parameters and loss terms in the objective function are derived from their probabilistic models in the established cluster feature space (as shown in Fig. 1b), rather than empirically defined in conventional machine learning methods. The integration of the prior knowledge and the probabilistic models facilitates an interpretable and universal paradigm for hazardous wind detection. The paradigm has been validated on a straightforward network comprising several dense layers (as shown in Fig. 1c), which is designed to map the input wind records to feature points within the established latent cluster feature space. We have demonstrated that the extracted cluster features can universally detect and quantify various hazardous wind types in real-world detection scenarios.

Results

Analysis of the extracted cluster features

In this study, we have gathered real-world scenario data from five airports across China. The wind records collected at HKIA (Hong Kong International Airport, represented by the red dot in Fig. 2a) from January 1, 2017 to December 31, 2020, are utilized to train and test the model we proposed. Given that hazardous winds in various scenarios are expected to exhibit similar features that are hazardous for flights, we further verify the model’s generalization capability using wind measurement data from four airports in mainland China (blue stars in Fig. 2a).

Fig. 2: Cluster feature evaluation.
figure 2

a Locations of the selected airports, the red dot and the four blue stars represent the locations of HKIA and the other four airports in mainland China, respectively, the map is created with Cartopy61 and digital elevation model64; b visualization of the cluster features extracted in this work by t-SNE65 method. Source data are provided as a Source Data file.

The established cluster feature space is shown in Fig. 2b. Within the established cluster feature space, a criteria termed “hazard factor” is defined to assess the intensities of hazardous winds (as shown in Fig. 1d). It is found that the majority of the unlabeled data in the HKIA training dataset (dots “ ” in Fig. 2b) have negative hazard factors (refer to the colorbar) and are identified as calm winds. This is reasonable as calm winds occur more frequently than hazardous winds. In the HKIA test dataset, hazardous winds (circles with black edges “” in Fig. 2b) predominantly exhibit hazard factors higher than those of the majority of the unlabeled data, indicating that the extracted features can detect hazardous winds with high confidence.

For the hazardous winds collected from other airports, their extracted cluster features are represented by the stars with black edges “” in Fig. 2b. It is found that all the extracted features of the hazardous winds are situated in close proximity to the hazardous winds in the HKIA dataset. This finding suggests that hazardous winds in different scenarios share similar inherent characteristics, which can be accurately captured by the model we proposed. Furthermore, it is evident that the majority of the hazard features for other airports correspond to high hazard factor values, indicating that they can be properly detected by the model.

Detection of various hazardous wind types

To evaluate the hazard detection and intensity-assessment performance, a test dataset is established as illustrated in “Methods” section. Comparisons between the proposed method and the existing methods are conducted. The methods taken for comparison include the hazard type-specific criteria, as well as the representative machine learning techniques and operational systems.

The conventional hazard type-specific criteria for hazard detection were designed for specific types of hazards. For wind shear detection, the F-factor9 and S-factor10 are acknowledged as the most promising indicators. For turbulence detection, the eddy dissipation rate (EDR) has gained widespread acceptance. Consequently, the F-factor, S-factor, speed fluctuation15, and sampling scale-based EDR4 are presented as baseline methods. It is widely recognized that the thresholds of traditional criteria are pre-defined and often lack the adaptability for a variety of application scenarios. Consequently, we compare the proposed method (denoted as “ClusterNN”) against the conventional criteria not only upon the pre-defined thresholds, but also upon the optimal thresholds. These optimal thresholds are identified through an analysis of the receiver operating characteristic (ROC) curve derived from the training dataset, and the critical success index (CSI)27 is employed to measure the accuracy of hazard detection. From the comparison in Fig. 3a, it is evident that traditional criteria perform more effectively when utilizing optimal thresholds rather than relying on pre-defined ones. Furthermore, our model demonstrates superior performance when compared to the hazard type-specific criteria.

Fig. 3: Quantitative evaluation.
figure 3

a Comparison with the hazard type-specific criteria; b comparison with the representative machine learning techniques; c comparison with the existing operational systems, where “AUC” represents the area under curve. Source data are provided as a Source Data file.

Among the representative machine learning techniques, random forests (RF) technique has been broadly used in turbulence detection26,28. Moreover, considering the similarity of hazardous wind detection and anomaly detection, we compare the detection performance of our model against state-of-the-art anomaly detection methods. From ref. 29, the IForest30, XGBOD31, and CatB32 methods can respectively outperform other unsupervised, semi-supervised, and fully-supervised anomaly detection methods when very limited (<1%) label is available. Therefore, we take the IForest, XGBOD, and CatB methods as the comparison models in the evaluation. All models (ClusterNN in this paper, RF, IForest, XGBOD, and CatB) are trained and tested to detect hazardous winds. The adopted XGBOD method utilizes kNN (k-nearest neighbor), one-class SVM (support vector machine), and Isolation Forest as base outlier scoring functions in this study. The quantitative comparison of the CSI values is shown in Fig. 3b. It is obvious that, the fully-supervised learning method CatB shows inferior performance than other methods, demonstrating the importance of the information contained in the unlabeled wind records and scarce labels. The comparison further verifies that our model outperforms the state-of-the-art anomaly detection methods in realistic scenarios.

Two operational hazardous wind detection systems, known as the wind shear and turbulence warning system (WTWS) and the anemometer-based wind shear alerting rules-enhanced (AWARE) system, have been utilized at HKIA for many years. According to historical data, the WTWS and the AWARE issue alerts with true positive rates of 0.67 and 0.18, respectively. Their performances are represented by the dashed lines in Fig. 3c. The AWARE system shows inferior performance since it only utilizes the observations from in-situ measurements such as anemometers and buoys. While the WTWS, which assimilates various sources of observations (lidar, terminal Doppler weather radars, anemometers, and buoys), provides state-of-the-art hazardous wind alerts for HKIA33. Actually, the Turbulence Joint Safety Implementation Team, comprising experts from FAA (US Federal Aviation Administration), NASA, various federal laboratories, and end users, recommended that the true positive rate and false positive rate for turbulence detection should be higher than 0.8 and lower than 0.15, respectively. This remains unattainable for low-dimensional hazard criteria15. However, it is noticed that the proposed method outperforms the operational hazardous wind detection systems. Specifically, as shown in the relative operating characteristics (ROC) curves in Fig. 3c, the ROC curve of the classifier developed in the established cluster feature space (as shown in Fig. 1d), represented by red line, is evidently higher than the recommended performance (green dot) under turbulence circumstance. By the way, for all types of hazardous winds, when the true positive rate of our classifier is equal to that of WTWS, the expected false positive rate is only 0.011 (blue dot), which is quite satisfactory.

Intensity quantification of various hazardous wind types

From Fig. 4a, it is evident that the hazard factor distribution in the training dataset displays unimodal and positively skewed characteristics, and the hazardous ones (represented by the red bars) correspond to the tail of the distribution. According to the mainstream definition of extreme events34,35,36,37, it can be inferred that the hazardous winds are related to the extreme events defined by the hazard factor. In this way, wind records with higher hazard factor values occur less frequently but are more likely to be indicated as extremely hazardous situations.

Fig. 4: Intensity-assessment evaluation.
figure 4

a The distribution of the hazard factor values in the training dataset, the red bars represent the hazard factor values higher than the optimal threshold defined by the training dataset, indicating the presence of hazardous winds, and the blue bars correspond to calm winds; b comparison of the hazard factor values with different hazard intensities, where the violin plot indicates the median (middle line), maximum (upper line) and minimum (lower line) hazard factor values, “ws” refers to the intensity of wind shear, and “turb” stands for the turbulence intensity. The higher values of both “ws” and “turb” indicate stronger intensity. Source data are provided as a Source Data file.

The violin plot in Fig. 4b displays the distribution of the hazard factor values with respect to various hazard intensities reported by pilots. The comparison in the subfigure below demonstrates that the amplitudes of the hazard factors are directly proportional to the reported hazard intensities. Together, the comparisons show that the hazard factor values increase in tandem with the growth of hazard intensity, indicating that the hazard factor offers a rough assessment of hazard intensity, such that the higher the factor is, the severer the hazard will be. However, it is noted that the hazard intensities reported by pilots depend on the aircraft responses, which are functions of the dynamic characteristics of the wind, aircraft parameters (such as type, mass, altitude, and airspeed) and subjective factors like pilots’ experience (see Supplementary Information section 1.1). Therefore, we believe that, the relationship between the hazard factor and the reported hazard intensity can be more accurately analyzed with abundant pilot reports.

Discussion

We have demonstrated that different types of hazardous winds and calm winds exhibit distinct intrinsic characteristics, and they can be universally detected in a type-free manner. Moreover, a comprehensive set of features, which can universally identify and quantify various hazardous wind types, have been effectively extracted by an interpretable semi-supervised clustering network. By integrating prior knowledge and probabilistic models of winds, this work not only delivers an interpretable and transparent paradigm for evaluating aviation hazards, but also provides opportunities for the detection and elucidation of extreme events.

Currently, the implemented network is straightforward but has proven to be effective in illustrating the efficacy of the proposed paradigm in this work. Indeed, this paradigm is a versatile, plug-and-play mechanism, suggesting that improved performance can be anticipated by incorporating a more sophisticated network. It is noted that the model proposed in this work is currently trained on the HKIA lidar observation dataset, where the wind records along the corridors (Supplementary Fig. 2b) are updated approximately every 2 min. In this way, the trained model implicitly assumes that the wind records are quasi-stationary within ~2 min. In fact, the quasi-stationary assumption of natural wind within 2 min is widely accepted in the global civil aviation industry. For instance, the revisit time of the terminal Doppler weather radars (TDWRs) is about 5 min38, and the averaging period for surface wind observations in aviation management is 2 min39. Actually, the 2-min quasi-stationary assumption is also consistent with actual situations, as demonstrated in Supplementary Information section 1.2.

Moreover, although the model proposed in this work is trained using lidar observations from the HKIA dataset, it can be applied to other scenarios as long as fine spatial resolution wind records along the corridor can be obtained, which can be either observed or forecasted. In this way, we have demonstrated that the model can be generalized (1) from one airport to others, (2) from lidar data input to other types of wind observation data, and (3) from hazard assessment to prediction.

  • Generalization from one airport to another. Wind measurement data and pilot reports from other four airports in mainland China are selected to verify the generalization capability of this model. The results (Fig. 2) indicate that hazardous winds from other airports can be properly detected by the model proposed in this work. It is noticed that the deployment of lidar is becoming increasingly common at airports worldwide, which highlights the significant potential for the application of this model.

  • Generalization from lidar data input to other types of wind observation data. In cases where lidar observation data is not available, we have demonstrated that the hazard information in the wind records generated by a numerical weather prediction method40, can be accurately detected by the proposed model. The detailed process is available in Supplementary Information section 3.1.

  • Generalization from hazard assessment to prediction. Currently, wind field prediction has been well investigated41, and the authors of this paper have also proposed an accurate wind field prediction method based on the convolutional long short-term memory (ConvLSTM) neural network42. Therefore, by applying the hazard assessment method proposed in this work to the predicted wind fields, the prediction of hazard intensity for the wind can then be realized. A detailed description is available in Supplementary Information section 3.2.

Methods

Related works

Complex winds, including wind shear, turbulence, and aircraft wake, exhibit elusive and high-dimensional characteristics. Consequently, the existing hazard detection methods are defined for specific types of hazardous winds.

  • Turbulence detection. Based on the statistical signature of turbulence, features, such as the Doppler spectral width43, the variance of Doppler mean velocities44, and turbulence energy dissipation rate45,46,47 have been widely used to assess the intensity of turbulence. Specifically, for mid- and upper-level turbulence detection, it is impractical to directly and routinely measure or forecast the atmospheric motion at the scale that affects aircraft. Therefore, the defined expertise-reliant features15,48,49, and machine learning techniques such as random forest26,28 neural network18, utilize large-scale atmospheric motion forecasts as inputs. However, this reliance limits their adaptability and update rate.

  • Wake vortex detection. According to the unique crossover structure of the Doppler velocity field induced by wake vortices, the velocity range50, and other graphical features extracted by deep learning framework20,21 have been utilized to recognize aircraft wake vortex that evolve perpendicular to the corridor. Conversely, this work focuses on the hazardous winds that occur along the corridor, since these winds can cause significant and persistent changes to aircrafts.

  • Wind shear detection. To detect wind shear in ground proximity, the S-factor10, and the F-factor51 were proposed according to the induced wind speed variation, and related empirical thresholds were statistically defined52,53. These features are easy to use54, but the detection performance greatly depends on the pre-defined thresholds11,53,55. In the last years, machine learning methods have been used for wind shear detection56.

In general, the existing criteria are tailored for a specific type of hazardous winds, the potential to broaden the research scope from a single type of hazardous winds to various types of hazardous winds remains as an open question.

Dataset preparation

In this work, each wind record along the corridor has 115 sampling points, covering the range from the touchdown site to 6000 m away. The field campaigns in HKIA and other airports are reported in the Supplementary Information section 2.1. The wind records collected at HKIA from January 1, 2017 to December 31, 2020 are utilized to train and test the model we proposed. The wind records corresponding to pilot reports are labeled as hazardous winds (Supplementary Information section 1.1).

In the model development process, the data in 2017 and 2018 are employed. There are a total of Ntrain = 51,5733 wind records data, containing \({N}_{{{{\rm{train}}}}}^{{{{\rm{hazard}}}}}=181\) hazardous labels and \({N}_{{{{\rm{train}}}}}^{{{{\rm{unlabel}}}}}={N}_{{{{\rm{train}}}}}-{N}_{{{{\rm{train}}}}}^{{{{\rm{hazard}}}}}=515627\) unlabeled data. According to the seasonal occurrence probability of the hazardous winds observed in HKIA (Supplementary Information section 2.3), we randomly select \({N}_{{{{\rm{train}}}}}^{{{{\rm{calm}}}}}={N}_{{{{\rm{train}}}}}^{{{{\rm{hazard}}}}}\) wind records from the unlabeled data in October and November to further guide the training process. For testing purpose, we have established a test dataset consisting of the \({N}_{{{{\rm{test}}}}}^{{{{\rm{hazard}}}}}=178\) hazardous wind reports from 2019 to 2020, as well as \({N}_{{{{\rm{test}}}}}^{{{{\rm{calm}}}}}={N}_{{{{\rm{test}}}}}^{{{{\rm{hazard}}}}}\) calm wind records. These calm records are randomly selected from the unlabeled data in October and November during the same period.

Problem definition

From the machine learning perspective, the task of hazardous wind detection and intensity assessment can be regarded as a clustering problem, followed by a classification problem. The basic idea is to map the wind record along the corridor to the latent cluster features zi by a cluster network parameterized by Θ, and further estimate the hazard factor yi by a supervised classifier parameterized by Φ. The model can be illustrated as follows

$$\left[{y}_{i},{{{{\bf{z}}}}}_{i}\right]={\arg\max}_{\left[{y}_{i},{{{{\bf{z}}}}}_{i}\right]}\,P\left({y}_{i}| {{{{\bf{z}}}}}_{i},{{{{\bf{x}}}}}_{i};{{{\mathbf{\Phi }}}},{{{\mathbf{\Theta }}}}\right)P\left({{{{\bf{z}}}}}_{i}| {{{{\bf{x}}}}}_{i};{{{\mathbf{\Theta }}}}\right)$$
(1)

where xi is the wind record observed at time ti, and yi > yT (yi < yT) indicates whether the concerned wind is hazardous (calm). The amplitude of the hazard factor yi can assess the intensities of hazardous winds. yT is the optimal threshold of the classifier, it is obtained by the ROC curve in this work.

The model defined in Eq. (1) consists of two steps. The first step is to learn cluster features for hazardous wind clustering and perform clustering assignments simultaneously by semi-supervised clustering (Fig. 1c). The second step is to train a supervised classifier (Fig. 1d) from the extracted cluster features, where a hazard factor is proposed to assess the intensity of hazardous winds. The hazard factor is defined as the decision function of the classifier, it can be denoted as: \({y}_{i}={g}_{{{{\mathbf{\Phi }}}}}\left({{{{\bf{z}}}}}_{i}\right)\), where \({g}_{{{{\mathbf{\Phi }}}}}\left(\cdot \right)\) represents the transformation of the classifier.

Architecture and parameter initialization

The cluster network is defined as:

$${{{{\bf{h}}}}}_{1} ={{{\rm{ReLU}}}}\left({{{{\bf{W}}}}}_{1}^{{{{\rm{T}}}}}{{{\bf{x}}}}+{{{{\bf{b}}}}}_{1}\right)\\ {{{{\bf{h}}}}}_{2} ={{{\rm{ReLU}}}}\left({{{{\bf{W}}}}}_{2}^{{{{\rm{T}}}}}{{{{\bf{h}}}}}_{1}+{{{{\bf{b}}}}}_{2}\right)\\ {{{\bf{z}}}} ={{{{\bf{W}}}}}_{3}^{{{{\rm{T}}}}}{{{{\bf{h}}}}}_{2}+{{{{\bf{b}}}}}_{3}$$
(2)

where x and z are the input and the extracted cluster features, respectively, {h1h2} represent the hidden states, and {W1, , W3b1, , b3} are the model parameters.

The network is initialized with a stacked autoencoder (SAE)57, which has been proved to consistently produce semantically meaningful representations on real-world datasets58,59,60. Take the layers in the cluster network as encoded layers, the decoded layers in the SAE are defined as:

$${{{{\bf{h}}}}}_{4} ={{{\rm{ReLU}}}}\left({{{{\bf{W}}}}}_{4}^{{{{\rm{T}}}}}{{{\bf{z}}}}+{{{{\bf{b}}}}}_{4}\right)\\ {{{{\bf{h}}}}}_{5} ={{{\rm{ReLU}}}}\left({{{{\bf{W}}}}}_{5}^{{{{\rm{T}}}}}{{{{\bf{h}}}}}_{4}+{{{{\bf{b}}}}}_{5}\right)\\ {\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}} ={{{{\bf{W}}}}}_{6}^{{{{\rm{T}}}}}{{{{\bf{h}}}}}_{5}+{{{{\bf{b}}}}}_{6}$$
(3)

where \({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}}\) represents the reconstructed output, {h4h5} are the hidden states of the decoded layers, and {W4, , W6b4, , b6} are the parameters. The initialization process is performed by minimizing the least squares loss \(\parallel {{{\bf{x}}}}-{\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}}{\parallel }^{2}\).

Prior knowledge and probabilistic models

In this study, we hypothesize that various types of hazardous winds and calm winds can be categorized into two distinct classes. Specifically, four pieces of prior knowledge are incorporated:

  • Knowledge 1: calm winds should cluster together;

  • Knowledge 2: hazardous winds should cluster together;

  • Knowledge 3: there should be a significant separation between calm winds and hazardous winds;

  • Knowledge 4: the majority of unlabeled wind records should be identified as calm, since hazardous winds rarely occur in reality.

Taking the above prior knowledge into account, the objective function can be expressed as:

$${{{\mathcal{L}}}}={\lambda }_{1}{{{{\mathcal{L}}}}}_{{{{\rm{rec}}}}}+{\lambda }_{2}{{{{\mathcal{L}}}}}_{{{{\rm{unlabel}}}}}+{\lambda }_{3}{{{{\mathcal{L}}}}}_{{{{\rm{calm}}}}}+{\lambda }_{4}{{{{\mathcal{L}}}}}_{{{{\rm{hazard}}}}}+{\lambda }_{5}{{{{\mathcal{L}}}}}_{{{{\rm{calm}}}}-{{{\rm{hazard}}}}}$$
(4)

where the five terms on the right-hand side are the unsupervised reconstruction loss, unlabeled loss (Knowledge 4), calm loss (Knowledge 1), hazard loss (Knowledge 2), and calm-hazard loss (Knowledge 3), respectively, λ1, λ2, λ3, λ4, and λ5 correspond to their scaling parameters.

Probabilistic models

In traditional deep learning methods, the scaling parameters λ1, λ2, λ3, λ4, and λ5 in Eq. (4) are empirically set, which lacks interpretability. Our work defines the scaling parameters and the loss terms according to their probabilistic models.

For a wind record x, its extracted cluster features can be denoted as fΘ(x), where Θ represents the parameters of the cluster network. Suppose that the deviation between the reconstructed record \({f}_{\!\!\Theta }({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}})\) and the input record fΘ(x) obey Gaussian distribution in the cluster feature space, then the first term on the right-hand side of Eq. (4) can be denoted as:

$$P\left(\,{f}_{\!\!\Theta }({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}})-{f}_{\!\!\Theta }({{{\bf{x}}}});\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{1}^{2}\right)$$
(5)

where \({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}}\) and x represent the reconstructed record and the input, respectively, σ1 is the standard deviation of \({f}_{\!\!\Theta }({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}})-{f}_{\!\!\Theta }({{{\bf{x}}}})\).

We further assume that in the established cluster feature space, data points in each cluster obey Gaussian distribution. A two-dimensional illustration of the extracted cluster features is shown in Fig. 1b. In this way, the probability of the last four terms on the right-hand side of Eq. (4) can be denoted as:

$$P\left({f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{unlabel}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}});\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{2}^{2}\right)$$
(6)
$$P\left({f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}});\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{3}^{2}\right)$$
(7)
$$P\left({f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}});\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{4}^{2}\right)$$
(8)
$$P\left({f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})-{f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}})-\left[{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}})\right];\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{3}^{2}+{\sigma }_{4}^{2}\right)$$
(9)

where xunlabel, xcalm, and xhazard represent the unlabeled, calm, and hazardous winds, respectively, σ2, σ3, and σ4 correspond to their standard deviations. For the convenience of the training process, Eq. (9) can be further simplified by substituting the cluster features with the distances between them, yielding the following form:

$$P\left(\left\Vert \, {f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}})\right\Vert -d;\Theta \right) \sim {{{\mathcal{N}}}}\left(0,{\sigma }_{3}^{2}+{\sigma }_{4}^{2}\right)$$
(10)

where d indicates the expected distance between the centroids of the hazardous winds and calm winds in the established cluster feature space.

To guide the model in quantifying the intensities of hazardous winds, we have introduced an additional constraint to Eq. (9). This constraint, as shown in Fig. 1b, defines that the distance between strong hazardous winds and calm winds should be greater than d, and the distance for weak hazardous winds and calm winds should be less than d. In this way, considering the intensity levels of hazardous winds, the expression in Eq. (9) is divided into the following two components:

$$\left\{\begin{array}{l}P\big(y\big({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\big);\Theta \big)=\frac{1}{\sqrt{2\pi }{\sigma }_{5}}\exp \left(-\frac{{\left\Vert y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\right)\right\Vert }^{2}}{2{\sigma }_{5}^{2}}\right)\cdot {{{\rm{sigmoid}}}}\big(y\big({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\big)\big)\quad \\ P\left(y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right);\Theta \right)=\frac{1}{\sqrt{2\pi }{\sigma }_{5}}\exp \left(-\frac{{\left\Vert y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right)\right\Vert }^{2}}{2{\sigma }_{5}^{2}}\right)\cdot {{{\rm{sigmoid}}}}\left(-y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right)\right)\quad \end{array}\right.$$
(11)

where \({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\) and \({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\) represent the hazardous winds with strong intensities and weak intensities, respectively, \(y({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})=\left\Vert {f}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}})-{\bar{f}}_{\!\!\Theta }({{{{\bf{x}}}}}_{{{{\rm{calm}}}}})\right\Vert -d\) and \({\sigma }_{5}^{2}={\sigma }_{3}^{2}+{\sigma }_{4}^{2}\).

By leveraging the training dataset, which includes unlabeled data, calm and hazardous winds, the model parameters can be derived by maximizing the following equation:

$$\Theta={{\arg\max}_{\Theta } \ln \left({{{{\mathcal{L}}}}}_{P}\right)}$$
(12)

where \({{{{\mathcal{L}}}}}_{P}\) is the product of the aforementioned probabilistic models:

$${{{{\mathcal{L}}}}}_{P}= P\left({f}_{\!\!\Theta }\left({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}}\right)-{f}_{\!\!\Theta }({{{\bf{x}}}});\Theta \right)\cdot P\left({f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{unlabel}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right);\Theta \right)\\ \cdot P\left({f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right);\Theta \right)\cdot P\left({f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}\right);\Theta \right)\\ \cdot P\left(y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\right);\Theta \right)\cdot P\left(y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right);\Theta \right)$$
(13)

which can be transformed into:

$$-\ln \left({{{{\mathcal{L}}}}}_{P}\right)= \frac{{\sum }_{{N}_{1}}{\left\Vert {f}_{\!\!\Theta }\left({\hat{{{{\bf{x}}}}}}_{{{{\rm{rec}}}}}\right)-{f}_{\!\!\Theta }({{{\bf{x}}}})\right\Vert }^{2}}{2{\sigma }_{1}^{2}}+\frac{{\sum }_{{N}_{2}}{\left\Vert {f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{unlabel}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right)\right\Vert }^{2}}{2{\sigma }_{2}^{2}}\\ +\frac{{\sum }_{{N}_{3}}{\left\Vert {f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{calm}}}}}\right)\right\Vert }^{2}}{2{\sigma }_{3}^{2}}+\frac{{\sum }_{{N}_{4}}{\left\Vert {f}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}\right)-{\bar{f}}_{\!\!\Theta }\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}\right)\right\Vert }^{2}}{2{\sigma }_{4}^{2}}\\ +\sum\limits_{{N}_{5}}\frac{{\left\Vert y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\right)\right\Vert }^{2}}{2\left({\sigma }_{3}^{2}+{\sigma }_{4}^{2}\right)}-\log \left[{{{\rm{sigmoid}}}}\left(y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{strong}}}}}\right)\right)\right]\\ +\sum\limits_{{N}_{4}-{N}_{5}}\frac{{\left\Vert y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right)\right\Vert }^{2}}{2\left({\sigma }_{3}^{2}+{\sigma }_{4}^{2}\right)}-\log \left[{{{\rm{sigmoid}}}}\left(-y\left({{{{\bf{x}}}}}_{{{{\rm{hazard}}}}}^{{{{\rm{weak}}}}}\right)\right)\right]+{{{\rm{Constant}}}}$$
(14)

where N1N2, , and N5 represent the number of training data, unlabeled data, calm winds, hazardous winds, and strong hazardous winds, respectively.

Each term in Eq. (14) corresponds to a loss term in Eq. (4). By combining Eqs. (4) and (14), the implications of the scaling parameters and the associated loss terms become interpretable: within the established cluster feature space, the loss terms quantify the deviations of feature points from the centroids, while the scaling parameters signify the diversity of these points.

Moreover, according to the diversity of wind records in real-world scenarios, we further expect the following constraints: σ1 should be small since the reconstruction error should be very small to keep the network meaningful; σ3 < σ2 since the third term corresponds to calm winds, which are a subset of the unlabeled data, they should exhibit a lower standard deviation; and σ3 < σ4 since the calm winds are supposed to be more concentrated than the hazardous winds. Integrating the constraints and the objective function into the model can enhance its interpretability.

By incorporating prior knowledge and probabilistic models, this work offers an interpretable and transparent paradigm for assessing the aviation hazards.