Introduction

Sand and dust storms (SDS), characterized by strong winds that sweep up dust particles from the ground and significantly reduce visibility to less than 1000 meters, are a prevalent meteorological hazard in arid and semi-arid regions1. SDS have far-reaching consequences for both human society and ecosystems, primarily through the transportation and deposition of various pollutants2. SDS has adverse socio-economic impacts, such as destroying crops, disrupting transportation3, and affecting human health (particularly increasing the incidence rate of respiratory and cardiovascular diseases)4,5,6. The negative impact of dust events is directly correlated with their frequency and intensity7,8,9. For example, severe SDS could exacerbate desertification and land degradation, cause the loss of biodiversity and habitat2, and result in a rapid deterioration of air quality and more number of patients seeking medical care10. Therefore, an in-depth study of the changing patterns and driving factors of different types of dust events is of great significance for the effective prevention and control of SDS and mitigating their adverse impacts on human society and ecological environment.

The occurrence of SDS was influenced by many factors, such as climatic change, surface conditions, and geographic location11,12,13. Strong winds, bare ground, and dry soil are the basic conditions for dust formation14,15, while other factors such as temperature and precipitation, as well as surface conditions like snow cover, indirectly affect dust activities by influencing vegetation growth and soil moisture16,17,18. Although global warming and intensified desertification have provided favorable conditions for the occurrence of dust events19, recent studies have documented that dust activity in Asia has declined significantly over the last two decades20,21,22, possibly due to reduced surface wind speed, increased precipitation and soil moisture, and increased vegetation cover16,22,23. Considering that increasing vegetation cover since 2000 could be largely attributed to human land-use management (e.g., ecological restoration programs and multiple cropping)24,25,26, human activities could indirectly affect SDS in addition to climate change. However, most studies did not distinguish the effects of these drivers among different types of dust events (light, moderate, and severe). Given that meteorological conditions (such as wind speed thresholds) for the occurrence of different types of dust events may vary greatly, analyzing the contributions of various factors to different types of dust events is necessary.

There are three commonly used methods to disentangle the impacts of multiple factors on the occurrence of dust events. Traditional statistical analyses, such as correlation analysis and multiple linear regression, have difficulty in accurately quantifying the complex nonlinear relationship between dust activity and its driving factors27. Currently, numerical models are employed to quantify the contributions of multiple factors to dust activity. Numerical models could simulate the processes of dust emissions, transport, wet and dry deposition, as well as aerosol-cloud-radiation feedback28,29,30. For example, Wu et al.22 utilized a numerical dust emission model to quantify the contributions of wind speed, vegetation cover, and soil moisture to dust activity in East Asia over the past 20 years. However, the numerical models still have large uncertainty due to the inherent errors of input parameters31,32,33,34. Data-driven machine-learning models, with their advantages in handling nonlinear problems and high-dimensional data, began to be employed to simulate and predict long-term dust activities35.

In Asia, SDS remains a major concern because it is the second-largest source of dust emissions in the globe36. This study focuses on the hotspots of SDS in Asia. We first used surface observational data to investigate the spatiotemporal variation of three types of dust events (light, moderate, and severe) from 2000 to 2022 in this area. Subsequently, we employed a random forest algorithm and Shapley additive explanations to diagnose the effects of 15 driving factors on the occurrence probability of dust events for each type. Finally, through the disentangling method, we quantified the contributions of major factors, including wind speed, soil moisture, vegetation cover, snow cover, and previous precipitation, to the long-term trends of dust events for each type. Our aims are (1) to reveal the spatiotemporal pattern in the occurrence frequency of various dust events across Asia and (2) to identify the contribution of each factor on the long-term trend for different types of dust events.

Results

Temporal changes in dust events

Within the study area, dust events mainly occurred in northern China, Mongolia, eastern Iran, western Afghanistan, and western Kazakhstan (Fig. S1a). The annual frequency of dust events was 2.8 days averaged across all stations from 2000 to 2022. Based on visibility, dust events were categorized as light dust events (LDE), moderate dust events (MDE), and severe dust events (SDE). The distribution of LDE was similar to that of all dust events, with an average annual frequency of 1.9 days (Fig. S1b). MDE primarily occurred in the Taklamakan and Gobi Deserts in China, Mongolia, Tajikistan, Turkmenistan, and western Kazakhstan, with an average annual frequency of 0.6 days (Fig. S1c). SDE mainly occurred in Mongolia, with an average annual frequency of 0.2 days (Fig. S1d). Dust activity exhibits strong seasonal variations (Fig. 1), characterized by the highest frequency of dust activity in spring (April and May).

Fig. 1: Frequency of dust events from 2000–2022.
figure 1

Frequency was calculated as the number of days with dust events averaged from 715 stations within a given year or month. a All dust events; b light dust events; c moderate dust events; d several dust events. The trend lines were fitted by the Theil–Sen estimator. *p < 0.05; **p < 0.01.

Overall, the annual frequency of all dust events averaged from all stations decreased at a trend of 0.03 days/year (not significantly) (Fig. 1a). Specifically, the trend in LDE was not statistically significant, but the annual frequency of MDE and SDE significantly decreased at trends of 0.2 and 0.01 days/year, respectively (p < 0.01). Regarding the spatial pattern, the annual frequency of LDE showed a significant decreasing trend in northern China and Russia, while a significant increasing trend was observed in Mongolia and Central Asia (Fig. 2b). The annual frequency of MDE and SDE significantly decreased in most parts of northern China and Mongolia (p < 0.05; Fig. 2c, d). In Central Asia, the annual frequency of MDE significantly increased, while that of SDE significantly decreased (p < 0.05; Fig. 2c, d).

Fig. 2: Spatial pattern of trends in frequency of dust events from 2000 to 2022.
figure 2

‘×’ indicates significant trends (p < 0.05).

Importance and impacts of each factor on the occurrence of dust events

We employed a random forest (RF) algorithm and Shapley additive explanations to diagnose the effects of 15 driving factors on the occurrence probability of dust events. The main factors identified in the RF algorithm (Fig. 3a) were climatic factors (including wind speed, soil moisture, and precipitation in the previous 30 days), geographic factors (including latitude, elevation, and longitude), and surface conditions (including LAI and snow cover).

Fig. 3: Importance and response curves of various factors on the occurrence of dust events.
figure 3

a Importance ranking of each factor. bi SHAP values of occurrence probability of dust events in response to 8 most important factors. SHAP values were smoothed by the locally weighted regression method. LDE: light dust events; MDE: moderate dust events; SDE: severe dust events. Pre-30: precipitation in the previous 30 days.

Among all factors, maximum wind speed was the most important (Fig. 3a), but its impacts on the occurrence probabilities of dust events were nonlinear. When the maximum wind speed was low, the occurrence probabilities of LDE, MDE, and SDE all rose with the increase in maximum wind speed (Fig. 3b). However, when the maximum wind speed reaches 6.4 m/s, the probability of LDE starts to decrease, while the probability of MDE starts to increase. Thus, 6.4 m/s is the threshold of maximum wind speed for the dust events switching from light to moderate. Similarly, 13 m/s is the threshold for dust events switching from moderate to severe. The impacts of mean wind speed on the occurrence probabilities of different types of dust events were similar to those of maximum wind speed (Fig. 3d).

Higher soil moisture could substantially lower the occurrence probability of LDE and slightly diminish the occurrence probabilities of MDE and SDE (Fig. 3h). An increase in Pre-30 significantly reduced the occurrence probabilities of LDE and SDE but had a weak effect on MDE (Fig. S2k; Fig. S3k; Fig. S4k).

An increase in snow cover resulted in a decreased occurrence probability of LDE and MDE, but an increased occurrence probability of SDE (Fig. 3f). Higher LAI could significantly reduce the occurrence probability of LDE. However, MDE and SDE did not decrease with increasing LAI and even showed a slight increase (Fig. 3g).

The occurrence of dust events was markedly affected by the geographical location of stations. LDE was more prevalent at low-latitude sites (30°N–40°N), while the MDE and SDE were more prevalent at higher elevations (Fig. 3c). LDE and MDE were more likely to occur in low-elevation areas, while SDE was more likely to occur at high elevations (Fig. 3e). SDE more frequently occurred around 90°E (Fig. 3i). The response curves of other factors are presented in Figs. S2S4.

Contributions of each factor to trend in dust events

Based on the RF algorithm and disentangling method, we quantified the contributions of 5 most important factors to the temporal shift of different types of dust events. Wind speed exerted the greatest influence on the trend in the occurrence probability of 3 types of dust events (Fig. 4). Wind speed (+42.8%), snow cover (+14.3%), soil moisture (+14.3%), and Pre-30 (+14.3%) all positively contribute to the increased occurrence probability of LDE, while LAI (−14.3%) has a negative effect. The combined effects of these factors result in an inapparent trend in LDE. Except for a minor positive contribution from snow cover, wind speed (−51.1%), soil moisture (−14.8%), LAI (−19.7%), and Pre-30 (−12.0%) all contribute to the decrease in MDE, which explained the significantly decrease in the observed frequency of MDE. All factors contribute to the reduction in the occurrence probability of SDE, with wind speed having the largest contribution (−59.3%).

Fig. 4: Relative contribution of driving factors to occurrence probability of 3 types of dust events.
figure 4

ae Change in occurrence probability of each type of dust event averaged from all stations due to each factor. f The relative contribution of each factor to long-term trends in the occurrence probability of dust events. The trend lines were fitted by the Theil–Sen estimator. *p < 0.05; **p < 0.01. LDE light dust events, MDE moderate dust events, SDE severe dust events. Pre-30 precipitation in the previous 30 days.

Discussion

Our results showed that Central Asia, Mongolia, and Inner Mongolia were hotspots of dust activity in Asia (Fig. S1), which was consistent with Shao and Dong34 and Gholami et al.37. We further reveal that dust events primarily occur in spring (Fig. 1), because cold fronts were often formed in spring driven by Siberia-Mongolia high pressure25,38. Most previous studies relevant to dust activity in Asia ended before 201812,22,39. To reveal the recent trends in dust activities, this study extended the analysis to 2022. Although a significant increasing trend was observed in parts of Mongolia and Central Asia, dust activity declined in most parts of the study area from 2000 to 2022 (Fig. 2a). This finding was in agreement with Lee and Sohn17 and Nobakht et al.40. Climatic variability in the decadal scale may also contribute to the changes in the dust events. For example, the North Atlantic Oscillation, which has an 11-year cycle41, could affect dust activities in Asia because negative NAO phases could result in the Siberian Highs being established, strengthen, and move southward near the surface, guiding the cold air from high latitudes southward42. The Mongolian cyclones, which have strong decadal variability, showed a close relationship with spring dust storms in northern China43.

We found that maximum and mean wind speeds were the dominant factors influencing dust activity in Asia (Fig. 3a). This finding was in accordance with existing literature that underscores that wind was the driving force behind the formation of sandstorms13,16,22. Soil moisture was the second most important climatic factor. Elevated soil moisture could suppress the occurrence of LDE as soil moisture enhances soil aggregate stability15. Compared to soil moisture, Pre-30 exhibited relatively low importance (Fig. 3a), suggesting that precipitation may have a comparatively minor impact on the occurrence of dust events. Precipitation may indirectly influence dust activity by modulating soil moisture and vegetation growth44,45,46. Furthermore, consistent with the results of Xu et al.47, we found that air and land surface temperatures were not important for dust activity (Fig. 3a).

The local surface conditions could influence dust activity, particularly for LDE (Fig. 3a, g). Higher LAI could increase surface roughness and reduce wind erosion, effectively mitigating dust activity48. However, the suppressive effect of LAI on MDE and SDE was limited, potentially because vegetation canopy cannot hinder the long-distance transport of sand and dust under severe SDS49. Snow cover could also effectively suppress the occurrence of LDE and MDE (Fig. 3f) because it could protect bare soil from wind erosion50. Similar to LAI, snow cover did not affect the occurrence of SDE. Surface sand content provides the material basis for dust activity51, with a higher topsoil sand fraction increasing the likelihood of SDE occurrence (Fig. S4l).

Our findings unveiled the link between geographical locations and dust activity in Asia. At higher latitudes and elevations, such as the Mongolian Plateau, the occurrence probability of SDE was significantly higher (Fig. 3c, e), likely because these areas were close to dust sources like the Gobi and Taklamakan deserts (Fig. S1). In contrast, LDE (dust in suspension and blowing dust) was more prevalent at lower latitudes (30°N–40°N) like Iran, Pakistan, and eastern coastal China, as well as in areas with lower elevations such as the Tarim Basin (Fig. 3c), which may be attributed to local prevailing weak winds there34,47. The distance to deserts exhibited low importance (Fig. 3a), possibly because we only considered the distance to the nearest deserts but did not consider the actual dust transport pathways52,53.

It was worth noting that the relationship between most of the driving factors and dust activities was nonlinear because there are specific thresholds for the impact of these factors on dust events. For example, the impact of maximum wind speed on the MDE was nonlinear. When the maximum wind speed reaches 13.0 m/s, the probability of MDE starts to decrease, while the probability of SDE starts to increase. This result is in line with expectations because stronger wind tends to increase the intensity of dust events. Similarly, higher LAI first began to decrease the occurrence probability of MDE, but then increased it when LAI was higher enough. This is because the MDE that occurred in high vegetation cover conditions could blow the sand into the air above canopy height. In these cases, the vegetation cover has almost no obstructive effect on dust events. These findings suggested that machine learning models incorporated with Shapley additive explanations were able to uncover the threshold association between driving factors and occurrences of dust events.

The contribution of various factors to the trends in the occurrence frequency of dust activities varied among different types of dust events. Overall, wind speed contributed 51.1% and 59.3% to decreasing trends of MDE and SDE, respectively (Fig. 4f). Although the maximum wind speed averaged from all stations remained relatively stable over the past two decades (Fig. S6a), the number of strong wind days (maximum wind speed > 6.4 m/s or 13 m/s) declined significantly (Fig. 5a, b), explaining the observed significant reduction in MDE and SDE. These findings support the conclusion of Wu et al.22 that a reduction in strong wind frequency led to a decrease in dust emission. The observed significant increase in LAI during 2020–2022 (Fig. S6d) contributed to −14.3%%, −19.7%, and −11.0% to decreasing trends of LDE, MDE, and SDE, respectively (Fig. 4f). Similar effects of LAI were reported by Mao et al.54. The non-significant decrease in soil moisture and precipitation across the study area during 2000–2022 (Fig. S6c, f) limited the decline in LDE (14.3%), but exacerbated the declining trends in MDE and SDE (possibly due to interactions with other factors). As expected, the decrease in snow cover during 2000–2022 (Fig. S6e) tends to mitigate the declining trends in LDE and MDE, but it exacerbated the decreasing trends in SDE, possibly due to increased soil moisture resulting from snowmelt.

Fig. 5: Number of strong wind days (averaged from all stations) defined by different threshold wind speeds.
figure 5

a Maximum wind speed> 6.4 m/s (threshold for moderate dust events). b Maximum wind speed >13.0 m/s (threshold for severe dust events). c Mean wind speed> 4 m/s (threshold for moderate dust events). d Mean wind speed> 9.3 m/s (threshold for severe dust events). The threshold wind speed (mean or maximum) for moderate dust events was determined as the wind speed when the occurrence probability of light dust events began to decrease, but that of moderate dust events still increased (Fig. 3b, d). The threshold wind speed (mean or maximum) for severe dust events was determined as the wind speed when the occurrence probability of moderate dust events began to decrease and that of severe dust events still increased (Fig. 3b, d). *p < 0.05; **p < 0.01.

This study employed a machine learning model (RF) to identify dominant factors and their nonlinear impacts on dust events. Compared to traditional linear regression models, machine learning models could use unprecedentedly rich information to extract underlying patterns and thus better understand complex relationships between influencing factors and the occurrence of dust events. Using the output data of wind speed, soil moisture, and vegetation cover from Earth system models (e.g., datasets from CMIP6 simulations), the machine learning model developed in this study can forecast the occurrence probability of regional dust events under different climate change scenarios.

We used the data from 715 meteorological stations. These stations monitored dust events according to WMO criteria every 3 hours. Because the hourly dust data was unavailable, this study investigated the occurrence of dust events on a daily scale rather than an hourly scale (the day with at least one dust event observed was analyzed). Within a day, we assume that dust events are most likely to occur during the span with the highest wind speed. The results showed that daily maximum wind speed was the most important factor, which confirmed our assumption. However, high wind speed may only last shortly compared to the duration of dust events, so not strictly detecting whether the occurrence timing of maximum speed corresponds to the timings of dust events within a day may introduce some uncertainties in our analyses. Considering that the mean wind speed may better reflect the overall state of wind speed within a day, we included the mean speed in the analysis and found that the mean wind speed was the second important factor affecting the occurrence probability of dust events. Meanwhile, we found that the daily mean speed and maximum speed correlated with each other significantly (p < 0.05) at all stations with a mean correlation coefficient (R) of 0.85 (Fig. S7), suggesting the short-term maximum wind speed could reflect the overall wind speed within a day to a certain degree. However, it is still suggested that the relationship between the duration and intensity of dust events and wind speed be explored on an hourly scale.

Furthermore, wind speed data from the meteorological stations may have some uncertainties due to station relocations, anemometer height changes, instrumentation malfunctions, instrumentation model changes, inconsistent sampling, and averaging intervals55. To assess the uncertainties in the wind data used in this study, we compared the daily mean wind speed data at the station level with the ERA5 reanalysis data from 2000–2005, because reanalysis data could serve as a proxy of the actual atmospheric conditions. Considering that wind speed data from ERA-5 may be biased due to insufficiently resolved topographic features and the parametrization of orographic drag in the model56, it could be expected that there are certain differences in wind speed between ERA-5 and meteorological stations. Even so, we found that the mean wind speeds were significantly (p < 0.05) correlated with the reanalyzed data at 98% of stations (Fig. S8), suggesting that using different wind speed data probably did not affect our results. However, Zhang and Wang57 suggested that the inhomogeneity in the observed surface wind speed over China is detectable at 59% of national weather stations. Thus, future studies could generate a homogenized surface wind speed dataset in Asia by adopting a similar method and then use the homogenized dataset to predict the occurrence of dust events.

In summary, this study revealed that the reduced frequency of three types of dust events over the past two decades in Asia was mainly attributed to decreasing wind speeds. Wind speeds may continue to decrease in the future as the Arctic amplification effect would attenuate temperature differences between middle and high latitudes in the Northern Hemisphere58,59. Thus, the occurrence of dust activity may further decrease in the future and thus reduce the incidence of respiratory-related diseases60. Moreover, we found that an increase in LAI also contributed to reduced dust activities. The increase in LAI in the past several decades could be attributed to recent ecological restoration programs in China. Future studies are still needed to quantify the contribution of human land-use management to the changes in the frequency of dust events.

Methods

Datasets

Dust storm data was derived from the Global Integrated Surface Hourly Dataset (https://www.ncei.noaa.gov/maps/hourly/). This dataset includes records from over 3500 stations worldwide, spanning from 1901 to the present. We only included 715 stations within the study area where at least one dust event occurred from 2000 to 2022 (Fig. 6). Dust events in this dataset are recorded as 10 numeric codes according to the WMO definition, with a temporal resolution of 1 to 3 hours13. Based on the description of visibility for each code13, dust activities were categorized as light dust events (LDE, including codes 07-09), moderate dust events (MDE, including codes 30–32 and 98), and severe dust events (SDE, including codes 33–35) (Table S1). We analyzed the occurrence of dust events on a daily basis. Thus, each date was classified into four types (LDE, MDE, SDE, or no dust events) based on the strongest dust events recorded on that date.

Fig. 6: Distribution of 715 meteorological stations in Asia.
figure 6

The stations with no dust events occurred were removed.

The geographical position (longitude, latitude, and elevation) and wind speed of each station were recorded in the duststorm dataset. The hourly air temperature, land surface temperature, precipitation, snow cover, and soil moisture data (0–7 cm) were obtained from the ERA5 Land hourly dataset, with a spatial resolution of 0.1°. The distribution of desert (vector data) was derived from the RESOLVE Biodiversity and Wildlife Solutions ecoregion map (https://ecoregions.appspot.com/). LAI was obtained from the GLASS dataset, with a spatial resolution of 0.05° and a temporal resolution of 8 days. Topsoil sand fraction was derived from the Harmonized World Soil Database v1.2, with a spatial resolution of 30 arc-seconds. The detailed spatiotemporal resolutions and download links for these data are shown in Table S2.

Spatiotemporal pattern of dust events

For each month or year, we calculated the total number of days that LDE, MDE, and SDE occurred at all stations. Since the variation in the number of active stations changed over time61, the frequency of LDE, MDE, and SDE within the study area was defined as the total number of days with dust events in each month or year divided by the number of active stations. The Theil–Sen estimator was employed to analyze temporal trends of dust days and potential influencing variables from 2000 to 2022, and the significance of these trends was tested using the Mann-Kendall test62,63.

Drivers of dust events

Since wind speed is the primary driver of dust activities22, we introduced two variables—maximum wind speed and mean wind speed—to characterize daily surface wind conditions during dust events. As precipitation events elevate soil moisture content and may be reflected in the soil moisture record for weeks64, we selected accumulated precipitation during the previous 30 days of the current day as the variable. Furthermore, soil moisture, air and surface temperatures, snow cover, vegetation cover, the distance of the station from the desert and the topsoil sand fraction were all reported to exert a certain impact on dust events and thus should be included in the influencing variables12,16,17,18.

Based on the above findings, we selected 15 variables, covering 7 climatic factors (maximum and mean wind speed, accumulated precipitation (Pre-30) and the number of rainless days (Rl-30) in the previous 30 days, soil moisture, air temperature, land surface temperature), 4 geographical factors (longitude, latitude, elevation, distance from the desert), 3 surface condition factors (snow cover, topsoil sand fraction, LAI), and 1 variable representing the season (day of the year, DOY). In these variables, only precipitation (Pre-30 and Rl-30) was used a 30-day time window before the current day. We also tested time windows of 60 and 90 days (Pre-60, Pre-90, Rl-60, and Rl-90). The results showed that the Pre-30 (Rl-30) significantly correlated with Pre-60 (Rl-60) and Pre-90 (Rl-90). Meanwhile, their impacts on dust events were quite similar (Figs. S9, S10). Thus, we only reported the results of Pre-30 in the main text. For other climatic factors, we used the data on the current day as explanatory variables.

Daily air and land surface temperature were calculated as the mean of hourly data. Daily maximum and mean wind speed were calculated as the maximum and mean hourly data, respectively. For each date, precipitation in the previous 30 days (Pre-30) and the number of rainless days (days with <0.1 mm precipitation) in the previous 30 days were calculated from hourly precipitation data. For each station, the distance from the desert was calculated as the distance between the station and the nearest desert pixel in the ecoregion map using the Haversine formula. Regarding the LAI data, we interpolated the 8-day values to daily values using cubic spline interpolation. To minimize the uncertainties of the data, the variable averaged from the pixels within a range of ~30 km (0.3°) centered on the corresponding meteorological station was used except for data directly from the stations.

Modeling different types of dust events

We utilized the random forest (RF) model to explore the effects of each variable in influencing dust events, because RF is suitable for the prediction of dust events65,66,67. The dependent variable in the RF model is the category of dust occurrence (0: no dust; 1: LDS; 2: MDS; 3: SDS). The number of records for categories 0, 1, 2, and 3 were 3751869, 27461, 8139, and 2694, respectively. The independent variables were 15 variables on each date at each station.

Before fitting the RF model, we first excluded years with fewer than 300 days of records for each station. Because the unbalanced sample sizes among categories would affect the performance of RF65,68, we adopted two-step data preprocesses to the original data. We used the random undersampling method to reduce the number of category 0 to match the number of category 1. Then, we employed the SMOTE oversampling method to increase the number of Category 2 and 3 to match the number of Category 1. After the balanced dataset was generated, it was randomly split (with all stations and years) into 70% training and 30% testing sets. During training, we implemented five-fold cross-validation to optimize hyper-parameters of RF, including the number of estimators, the maximum number of features considered at each split (max_features), and the maximum tree depth (max_depth).

To identify the dominant factors affecting dust activity, we calculated the permutation importance of each factor. To further analyze the relationships between the probability of dust events and driving factors, we used Shapley additive explanations (SHAP), a method to explain the output of any machine learning model. SHAP value could quantify the impact of each factor on the model’s predictions for each sample69,70. We applied the SHAP on the test sets and plotted scatter plots of SHAP values against each variable. To better visualize the response of the dust events to changes in individual variables, we applied a locally weighted regression model to smooth the response curves.

Contribution analysis

To isolate the contribution of 5 most important factors (identified in the previous step) to the long-term trend of dust events, we excluded stations with records of less than 10 years (retaining only 491 stations) and re-trained the RF algorithm using these records. We evaluated the model’s performance using precision (Eq. (1)) and recall (Eq. (2)). The results showed that the precision and recall for all three types of dust events was higher than 0.98 (Table S3), which indicated that the model had high accuracy.

$$\Pr =\frac{TP}{TP+FP}$$
(1)
$${\mathrm{Re}}=\frac{TP}{TP+FN}$$
(2)

where Pr is Precision and Re is Recall. TP is the number of correctly predicted positive samples; FP is the number of incorrectly predicted positive samples. FN is the number of samples classified as negative but is actually positive.

Based on the RF model, we designed a disentangling method. First, we used a baseline scenario (S0) where all factors varied with their observed values. For scenarios S1–S5, the corresponding factor was fixed from year to year (e.g., at each date, the value was set to the multi-year median), while other factors varied with their observed values. We fixed maximum/mean wind speed, soil moisture, LAI, snow cover, and Pre-30 for S1, S2, S3, S4, and S5 scenarios, respectively (Table S4). Subsequently, we used the input data from S0–S5 to predict the occurrence probability of three types of dust events. The difference in the model outputs between the S0 and S1–S5 could reflect the contribution of individual factors to the occurrence probability of dust events (Eq. (3)).

$${C}_{ij}={P}_{{S}_{0}j}-{P}_{{S}_{i}j}$$
(3)

where Cij is the contribution of factor i to the occurrence probability of a specific type of dust event in year j averaged from all stations. PS0j and PSij are the occurrence probability of a specific type of dust event averaged from all stations in year j predicted by RF under scenarios S0 and Sj, respectively.

Furthermore, we computed the relative contribution of each factor to long-term trends of dust events (Eq. (4)).

$$R{C}_{i}=\frac{Tren{d}_{{\rm{i}}}}{\mathop{\sum }\nolimits_{i=1}^{5}Tren{d}_{i}}$$
(4)

where Trendi is the linear trend (2020–2022) of Cij for a specific type of dust event; RCi is the relative contribution of factor i to the long-term trend of the probability of dust events.

All analyses were performed in Python 3.11. “SMOTE” and “RandomUnderSampler” functions from the “Imblearn” package were used for undersampling and oversampling, respectively. Random forest model, feature importance, and model performance evaluation were based on the “RandomForestClassifier,” “permutation_importance,” “classification_report,” and “cohen_kappa_score” functions from the “Scikit-learn” package. The “shap” package was used to calculate SHAP values, and the “Statsmodels” package was used for locally weighted regression.