Deep multi-task learning for early warnings of dust events implemented for the Middle East

Sarafian, Ron; Nissenbaum, Dori; Raveh-Rubin, Shira; Agrawal, Vikhyat; Rudich, Yinon

doi:10.1038/s41612-023-00348-9

Download PDF

Article
Open access
Published: 23 March 2023

Deep multi-task learning for early warnings of dust events implemented for the Middle East

npj Climate and Atmospheric Science volume 6, Article number: 23 (2023) Cite this article

4951 Accesses
15 Citations
27 Altmetric
Metrics details

Subjects

Abstract

Events of high dust loading are extreme meteorological phenomena with important climate and health implications. Therefore, early forecasting is critical for mitigating their adverse effects. Dust modeling is a long-standing challenge due to the multiscale nature of the governing meteorological dynamics and the complex coupling between atmospheric particles and the underlying atmospheric flow patterns. While physics-based numerical modeling is commonly being used, we propose a meteorological-based deep multi-task learning approach for forecasting dust events. Our approach consists of forecasting the local PM₁₀ (primary task) measured in situ, and simultaneously to predict the satellite-based regional PM₁₀ (auxiliary task); thus, leveraging valuable information from a correlated task. We use 18 years of regional meteorological data to train a neural forecast model for dust events in Israel. Twenty-four hours before the dust event, the model can detect 76% of the events with even higher predictability of winter and spring events. Further analysis shows that local dynamics drive most misclassified events, meaning that the coherent driving meteorology in the region holds a predictive skill. Further, we use machine-learning interpretability methods to reveal the meteorological patterns the model has learned, thus highlighting the important features that govern dust events in the Middle East, being primarily lower-tropospheric winds, and Aerosol Optical Depth.

Contributions of climatic factors and vegetation cover to the temporal shift in Asian dust events

Article Open access 28 December 2024

Upper troposphere dust belt formation processes vary seasonally and spatially in the Northern Hemisphere

Article Open access 10 February 2022

Historical footprints and future projections of global dust burden from bias-corrected CMIP6 models

Article Open access 02 January 2024

Introduction

High dust events are extreme meteorological phenomena that often initiate in arid and semi-arid regions but can extend to remote areas. Studies have shown that dust events significantly impact both the physical and human environments due to the dust particles’ properties^1,2. The associations between high dust loading and cardiovascular and respiratory diseases have been documented in a large body of literature³. As climate change and droughts have caused this phenomenon to endanger the lives of many people worldwide, studying dust events formation and forecasting is of ever-growing importance.

High dust events are particularly frequent in the Middle East and specifically in Israel, located at the eastern end of the Mediterranean Sea, on the margin of the global dust belt. This region has a diverse geography and is influenced by various synoptic systems associated with dust events^4,5, hence attracting substantial research interest.

Modeling and predicting high dust events is a long-standing challenge due to the multiscale nature of the governing meteorological dynamics and the complex coupling between the atmospheric particles and the underlying wind patterns. The latter necessitates numerical coupling of model dynamics and physics to other processes involving aerosol processes. There are still significant uncertainties about the underlying coupled processes governing dust emission, transport", and deposition, especially given that they occur on scales that current models do not resolve.

Detailed physics-based numerical modeling is typically used for dust forecasting^6,7. These models are based on assumptions regarding the emission and transport of dust storms; these predictions are thus highly model-dependent and often suffer from low skill⁸. Data-driven machine-learning algorithms have become a popular alternative to theory-driven models due to their ability to handle highly complex dynamic systems without being explicitly programmed^9,10. In particular, deep learning, a family of machine-learning methods that have shown phenomenal successes in computer vision, natural language processing, and other fields, is increasingly adopted in the Earth and climate sciences, achieving notable success¹¹.

Recent studies successfully implemented machine learning in dust modeling and forecasting^{12,13,14,15,16,17,18,19,20,21,22}. For example, Shi et al.¹⁵ and Jiang et al.¹⁴ performed pixel-wise dust detection in typical dust regions of Asia using Support Vector Machine (SVM) and Convolutional Neural Network (CNN), respectively. Other studies concentrated on forecasting particulate matter with a diameter equal or smaller than 10 μm (PM₁₀), a metric often used to define hazardous dust events¹³. Kang et al.¹⁶, Kowalski et al.²¹, and Nidzgorska-Lencewicz²² forecasted PM₁₀ one or several hours ahead based on historical measurements from those monitoring stations where forecasting is made, using deep-learning approaches. Chae et al.¹² spatially interpolated similar stationary measurements and then transferred them into a CNN for real-time spatial PM₁₀ prediction. In the Middle East region, Shtein et al.¹⁷ spatially predicted intra-daily PM₁₀ levels in Israel using satellite Aerosol Optical Depth (AOD) and regression models; Harba et al.¹⁸ used CNN to predict the direction of a dust storm using a small set of satellite images from Iraq; Boroughani et al.²⁰ used different machine-learning models and satellite images to identify dust sources in north-eastern Iran.

These models either detect dust using real-time data (nowcasting) or forecast dust with only a few hours of lead time from historical ground stationary measurements. None of these models use large-scale meteorological history as an input. Furthermore, in some studies, dust event definition depends on methods of questionable accuracy²³. In other studies, it is difficult to generalize the results to longer spatial or temporal ranges because the training data contains only a few esoteric cases. To the best of our knowledge, a large-scale meteorology-based deep neural model for forecasting dust levels at lead times longer than 24 h has not been studied. In particular, a model that can be reliably validated using ground in situ observation.

We hypothesize that the lack of such studies mainly stems from three unique challenges that the combination of meteorology, dust, and deep-learning poses: (i) meteorological training samples are highly correlated; (ii) dust events are relatively rare; (iii) meteorological fields are not shift-invariant. More precisely, (i) and (ii) imply redundancy and the low entropy of the already scarce training samples; this appears to attenuate the performance of deep-learning models, which largely depend on the quantity and the diversity of the training data²⁴. Moreover, (iii) invalidates common data augmentation techniques for extracting more information from small datasets. This issue is further discussed in “Discussion”.

In this study, we developed a meteorology-based deep neural model, which forecasts local dust events in Israel from 12 to 72 h in advance. We use in situ PM₁₀ measurements for the years 2003–2020 to define high dust events. We propose a multi-task learning²⁵ model to overcome the abovementioned challenges. This model simultaneously predicts regional PM₁₀ levels (regional task) and PM₁₀ levels in Israel (local task). For the regional task supervision, we use satellite-based PM₁₀ retrievals. As discussed in “Discussion”, the multi-task framework improves local dust event forecasting by utilizing regional-scale information in the regional task’s meteorological signals. The model consists of two neural networks trained simultaneously: An autoencoder network learns to compress multivariate meteorological samples into concise codes that keep both local and regional features; a fully connected (FC) network learns to forecast local PM₁₀ levels using sequences of these codes.

The model can skillfully detect about 76% of dust events 24 h ahead, with even higher predictability of winter and spring events. Our analysis shows that most misclassified events are driven by local dynamics. Furthermore, we reveal the spatiotemporal importance that the neural networks attribute to the meteorological variables and thus conclude the essential meteorological features that govern dust events in the region.

Results

${\Phi}_{loc}$ and ${\Phi}_{reg}$ are deep neural networks that predict the local and the regional PM₁₀ concentration, respectively, as detailed in “Model”.

Multi-task validation

Figure 1a presents selected examples of regional PM₁₀ original maps, and their predictions, obtained by ${\Phi}_{reg}$. The examples were sampled from the validation set. As can be seen, the decoder can reconstruct the regional PM₁₀ fields with high precision from the encoder’s 512-length codes. Although the outputs of ${\Phi}_{reg}$ are usually smoother than the original, the fact that the leading regional dust regions are detected is well noticeable. Our analysis suggests that the accuracy of ${\Phi}_{reg}$ is relatively stable over time (not shown). In addition, we find that ${\Phi}_{reg}$’s errors are systematically smaller in the north of the Sahara and the southwestern Arabian Peninsula in relation to other regions (Supplementary Fig. 1). A potential explanation for this is that the network focuses on predicting PM₁₀ in the regions of the sources and the atmospheric pathways of the local PM₁₀. This finding may indicate that the network has learned to discriminate between regions in order to improve the local prediction, thus reinforcing the efficiency of our solution.

**Fig. 1: Autoencoder’s learned representation.**

We use principal component analysis (PCA) to visually illustrate the ability of the encoder codes to capture local-scale features, in addition to the regional-scale features. The PCA was computed over all validation set’s encoder codes. Note that PCA’s first component is the direction that best explains the variability between codes. Figure 1b presents the distribution of the codes’ first principal component over eight successive time points; we compare sequences that are followed by a dust event (orange) and those that are not (blue). The first principal component can statistically discriminate between the two groups already 108 h before a dust event. These results indicate that the encoder codes preserve local-scale dust-related features. They also demonstrate the high compressibility of the model; i.e., it reduces 194,040 floating-point numbers (pixels in the 40 × 49 × 99 input tensor) to a single value with significant discriminative power.

Dust event forecasting

Figure 2 presents the recall and precision (defined in “Model”) of ${\Phi}_{loc}$ for increasing forecast lead time over the validation set. We also report these metrics for a subset of samples taken in the winter–spring months (December to April), when most of the events occur. For comparison, we report the performance of a Naive classifier, which always returns the last observed state (persistence), and a Shallow classifier (i.e., non-deep machine learning), which is a state-of-the-art gradient boosting algorithm (XGBoost) operated on the entire (flattened) input data.

As expected, ${\Phi}_{loc}$’s performance decreases with forecast lead time. The increase in recall (e.g., from 12 to 24 h) is traded for a sharp precision decline. Twenty-four hours ahead, ${\Phi}_{loc}$ can detect about 76% of the events (recall) or even 83% of winter–spring events, with 67% precision. For the same task, the shallow classifier achieves 67% recall and 47% precision, and the naive classifier achieves both recall and precision of about 51%. The shallow classifier is only slightly better than the naive classifier (for 24-h or longer lead time), indicating the difficulty of performing statistical learning from high-dimensional small sample-sized data. Note that the performance gap between ${\Phi}_{loc}$ and the naive/shallow classifiers increases with forecast lead time, indicating our model’s ability to generalize large-scale processes.

We evaluate the model’s forecasts by analyzing the evolution of regional PM₁₀ fields. Figure 3 shows the average field during and before events while comparing well- and misclassified events at 24-h lead time. Well-classified events are characterized by high PM₁₀ concentration in the northern coastal areas of Africa and the deserts of the Arabian Peninsula, advancing towards Israel, as well as in local areas. These areas show much lower PM₁₀ concentrations in the misclassified events. In misclassified events, there is no significant change in the average regional PM₁₀ between 72 and 24 h before the dust event. A particular local change occurs less than 24 h before, indicating that the events occur on small spatiotemporal scales. In terms of ground measurements, misclassified events also show lower PM₁₀ concentration on average (108.1 μgm⁻³), compared to well-classified events (145.5 μgm⁻³). These events are challenging to forecast more than a day in advance since they are often formed by dust emissions from local sources (e.g., the Negev or Sinai deserts) and have short-range transport; hence, large-scale meteorology holds no skill for their accurate forecasting. The findings indicate that events driven by local dynamics are the primary source of the model’s misclassification.

**Fig. 3: Regional PM₁₀ patterns of correct and incorrect local dust event forecasts.**

Estimating the performance of ${\Phi}_{loc}$ based on a binary PM₁₀ definition of dust event could be misleading. Since ${\Phi}_{loc}$ is trained to classify samples into multiple PM₁₀ levels (see a discussion in “Discussion”), we can estimate its performance more continuously. Figure 4 shows a confusion matrix that represents the number of observed (rows) and predicted (columns) validation samples, and is ordered according to the local PM₁₀ levels for the 24-h forecast.

**Fig. 4: Twenty-four hour forecast confusion matrix.**

As can be seen, most of the samples lie close to the confusion matrix diagonal, indicating errors of small magnitude. This suggests that the network has learned the physical phenomenon quite well. Figure 4 also illustrates the model’s improved ability to forecast extremes compared to more average levels.

Model interpretability

Model interpretability is key to gaining a better understanding of the meteorological and atmospheric conditions that favor dust events. We use the integrated gradients²⁶ approach for attributing the contribution of the model’s input to the prediction. Here, we compute the gradients of the local PM₁₀ level’s probability with respect to the input tensor, which inform the magnitude of the pixels’ effect on the model’s forecast. The integrated gradients of an input sample x (a meteorological tensor; see “Model”) are obtained by integrating over the gradients along a linear path from the sample to a baseline ${x}^{{\prime} }$. We compute the numerical integrated gradient approximation of pixel i ($I{G}_{i}^{approx}$) by accumulating the gradients over a series of 32 tensors created from interpolations between x and ${x}^{{\prime} }$:

$$I{G}_{i}^{approx}(x):= ({x}_{i}-{x}_{i}^{{\prime} })\times \mathop{\sum }\limits_{k=1}^{32}\frac{\partial {{{\Phi }}}_{loc}({x}^{{\prime} }+\frac{k}{32}\times (x-{x}^{{\prime} }))}{\partial {x}_{i}}\times \frac{1}{32},$$

(1)

where x_i indicate the value of pixel i in tensor x. We set the baseline to be a zero tensor, representing an 18-year pixel-wise average. Analyzing the integrated gradients allows for inferring the importance of different explanatory variables to dust prediction. The approach returns attribution output at the input dimension; we can thus study how this importance is distributed over space and time for each input variable.

Figure 5a presents selected variables from the average integrated gradients field of the model 24-h forecast. The composite average is over the historical meteorology sequences of all the independent dust events (defined in “Data”); this ensures that we infer the patterns the network is “looking for” when forecasting dust events before they arrive rather than during their occurrence. The results suggest that the network has learned to scan mostly for meteorological signals that emerge in the northern regions of Africa about 3 days before the event occurs in Israel.

**Fig. 5: Meteorological feature importance for dust event forecasting.**

To obtain a comparable measure for the relative importance of different variables, such as wind speed, sea-level pressure, potential vorticity, and others, we sum the absolute integrated gradients values of independent dust events over time and space; this results in a global, time-invariant variable importance index. Figure 5b compares this importance index across the explanatory variables.

“Discussion” discusses the meteorological and atmospheric insights from our integrated gradients analysis in more detail. For completeness, we implemented other machine-learning interpretability methods, including Gradient SHAP²⁷, DeepLIFT²⁸, and Occlusion techniques²⁹. The results were qualitatively similar (Supplementary Fig. 2).

Discussion

This study proposes a meteorology-based Deep Neural Network model for dust events forecasting in 12–72 h lead time. The model’s multi-task framework is designed to overcome unique statistical learning challenges that the data poses, by predicting regional PM₁₀ fields (auxiliary task), in addition to the Local in situ measured PM₁₀ forecasting (primary task). Twenty-four hours ahead, the model detects 76% of the dust events (83% of winter–spring events) at 67% precision.

Meteorological and atmospheric insights

The Middle East is subjected to dust storms that originate in several sources, with the Sahara desert being the largest. In Israel in particular, North African dust contributes 60–80% of the eolian material³⁰, while arid and semi-arid regions to the east of Israel (e.g., the Syrian and Arabian deserts), and local dust sources (e.g., the Negev desert) contribute the rest. The atmospheric mechanisms occur on different scales; from the large scale (1000s of km) to the mesoscale (down to 1 km) and even smaller. Although a large case-to-case variability exists, certain structures of the subtropical, Mediterranean, and mid-latitude weather systems are typically organized in concert to provide the necessary dust emission and transport environments. We harness the proposed data-driven neural model to relate weather patterns and atmospheric dust levels unbiasedly and quantitatively. Specifically, we search for the meteorological conditions that indicate an increased probability of an approaching dust event.

At 24-h forecast lead time, most of the network’s attention is devoted to identifying events that originate in areas west of Israel, as demonstrated in Fig. 5a. Although the spatial and temporal focus slightly differs across variables, eastward-moving patterns of low-tropospheric cyclonic activity, relatively high temperatures, low humidity, and high AOD concentration activate the network. These findings are consistent with previous studies³¹.

A closer look at Fig. 5a reveals a more complex picture that is masked by averaging different types of events. For example, it can be seen that high levels of 10 m zonal wind (u10) increase dust event probability when they occur in two different regions: (i) large areas in Libya (signal starts about 48 h before the event), and (ii) smaller areas in the eastern Mediterranean, including Israel (signal starts about 24 h before the event); the reason is that the network distinguishes between strong u10 signals in the Libyan deserts that indicate an approaching large-scale dust storm, and around Israel that indicate a local event in time-space scales (sometimes during the event itself). Moreover, the meridional 10-m wind (v10) is highlighted only to the west of Israel, suggesting that both wind components are important for Saharan dust sources, compared to the dominance of the zonal near-surface winds for other sources in providing the forecast skill.

Interestingly, the network associates an increased dust event probability with high SLP levels advancing eastward along the Mediterranean Sea. We would rather expect to see an inverse correlation, as previous studies linked high dust loading in the region with a cyclone situated over the eastern Mediterranean (Cyprus Low), which is expressed in low SLP anomalies³¹. Since the network does not ignore the phenomena but assigns it an opposite effect, and as SLP is strongly correlated with other variables with strong explanatory power for PM₁₀ levels (e.g., u10, v10, Z), we speculate that the network associates eastward approaching low SLP cyclones with a latent feature that reduces dust event probability (e.g., a feature correlated with seasonality, or a particular cluster of dust events). A possible interpretation is that cyclones, in general, are statistically rather correlated with precipitation in Israel^32,33,34. However, a deeper understanding requires further research.

Figure 5a may be somewhat misleading about the network’s attention to events governed by dust transport towards Israel from the east. The network did learn to associate high AOD concentrations in the Red Sea coastal areas of Saudi Arabia with dust events in Israel. It is less noticeable in Fig. 5a for two reasons: first, because events originating east of Israel are relatively rare³⁰; second, compared to the AOD signals of dust events that originated west of Israel, the eastern signals are still quite weak more than 24 h before the events itself, and usually rise later.

Figure 5b summarizes the overall importance of different variables in predicting dust events. For 24-h forecast lead time, lower-tropospheric winds, AOD, and SLP are of relatively high importance, while air temperature and specific humidity are of low importance. In geopotential height, zonal wind, meridional wind, and specific humidity, the importance usually increase in the lower-tropospheric levels. In contrast, in vertical velocity and potential vorticity it seems to peak at the mid-troposphere, suggesting that these features may provide forcing for dust emission and/or transport at lower levels by destabilization and/or downward momentum transfer. Our analysis shows that the relationship between the importance of the different variables that appears in Fig. 5b is quite stable over lead time.

The high importance of AOD and lower-tropospheric winds indicates that the network associates high probability of dust events with dynamical processes. The fact that SLP also shows high importance may suggest that the network also considers the organization of the weather system. Moreover, the low importance of temperature and specific humidity indicates that thermodynamics processes have no significant contribution to the network’s decision regarding dust events.

Implications of the dust event definition

There is no consistent definition for dust events, and studies have shown that different definitions may lead to different conclusions regarding their impact^35,36,37. Numerous studies have reported high PM₁₀ concentrations during dust events¹³, and there is a broad consensus on the use of PM₁₀ concentration to approximate dust mass. A common practice is to categorize time samples based on a single PM₁₀ threshold, sometimes in combination with weather records to define dust events^30,38,39.

We could use such a definition to train the network for a binary dust event classification; instead, we train the network to model PM₁₀ itself, i.e., to discriminate even between PM₁₀ levels that are not considered dust events. In our understanding, learning the phenomenon of dust accumulation, a continuous atmospheric process in nature, is scientifically superior to learning arbitrary definitions. A regression setup is optimally suited for this task; however, its long-tailed distribution makes PM₁₀ numerical value forecasting very challenging. Hence, we propose an ordinal regression setup, where the network is trained to forecast 13 equally distributed levels of PM₁₀.

Once trained, we evaluate the network performance based on the distinct definition detailed in “Data”, which results in 11 dust-free and 2 dust event classes. Further, we discuss the statistical learning challenges that this skewed class proportions pose.

Middle Eastern dust event sequences vary significantly in duration and magnitude³⁹. While in some sequences PM₁₀ concentrations increase and then decrease monotonically, a significant portion is characterized by unstable duration of PM₁₀ records, during which the measured PM levels cross the threshold value several times in a short period (sometimes less than 24 h). Skillful forecasts with 24 h or longer lead times at 3-h resolution are challenging during such events. Some studies define dust event sequences by referring to the dust-free points between dust-flagged sequences as dust events when they are close in time³⁹. Since we do not want the network to depend on a closeness definition, we choose not to train it this way (although we use such as definition for validation only).

Challenges of deep learning with meteorological and dust data

Meteorological fields may be intuitively treated as digital images (with more than three channels) and their sequences as videos. This intuition is only partially true; Racah et al.⁴⁰ mention several unique machine-learning challenges arising from the difference between climate and computer vision data. We identified three main challenges which may explain the relative paucity of research on dust forecasting using regional-scale meteorology and deep learning. We then discuss how multi-task learning might overcome these challenges.

The first challenge is that samples of meteorological fields exhibit high temporal correlations. This is because a 3-hourly sampling rate is quite fast relative to the more slowly-evolving atmospheric dynamics. This implies redundancy in the training data. Even though our dataset is supposedly large (52,560 3-hourly samples), it may not provide enough discriminative information to train an efficient neural network. For comparison, modern computer vision neural networks are trained on visual databases such as ImageNet⁴¹, which contains more than 14 million labeled images.

Second, the size and richness of the training data are effectively even smaller due to the imbalanced proportion of dust and dust-free classes. This is even clearer when considering independent dust events (defined in “Data”), consisting of only 356 autocorrelated sequences during 18 years. The redundancy and the low entropy of scarce training dust events lead to insufficient coverage of the true meteorological-dust mutual distribution. Our analysis suggests that resampling techniques for class imbalances are not helpful.

Third, unlike computer vision images, meteorological fields are geo-located, hence are shift-variant. Put differently, the fields do not retain their context after manipulations. For example, unlike cat images, axis flipping, or color jittering, will completely change the context of the meteorological fields. This characteristic implies that the input data can hardly be augmented; as further discussed, this makes learning very challenging.

The machine-learning literature offers some remedies for the problem of insufficient or small training data. One is transfer learning⁴², specifically, using pre-trained models. A pre-trained network is trained on an extensive visual database in computer vision before fine-tuning a specific dataset. As the statistical properties of climate data and natural images are very different, using available pre-trained networks is usually unsuccessful⁴⁰; our analysis confirms this hypothesis. Another possible remedy is data augmentation⁴³, where input data is slightly modified, or synthetically created from real samples, to increase the training data. Common techniques for computer vision include position augmentation (e.g., cropping, scaling, rotation) and color augmentation (e.g., brightness, contrast, or saturation adjustment). Meteorological fields’ shift variance makes these techniques inefficient.

Multi-task learning

Multi-task learning is a machine-learning paradigm that aims to solve multiple related learning tasks simultaneously. Here, we train a CNN encoder to learn a representation of the meteorological time frame inputs. This representation is then shared between the local task, which ultimately interests us, and the regional tasks, which serves as an auxiliary task⁴⁴.

We suggest a multi-task setting to overcome the scarcity of independent real-world dust events and the difficulty of generating informative meteorological augmentations, as discussed above. To explain why multi-task learning is beneficial, we follow the lines of Ruder⁴⁵, who reviews its underlying mechanisms⁴⁵. First, since local and regional tasks introduce different noise patterns, our setting contains an implicit data augmentation. That is, it causes learning a representation that ignores the data-dependent noise of two tasks rather than one. Second, when training with high-dimensional small sample-sized input, attention focusing is essential; multi-task learning helps the model focus on regional-scale meteorological features that matter. Third, multi-task learning leverages valuable information from correlated tasks to improve generalization capability. Here, the encoder network can be considered as a feature extractor that introduces an inductive bias⁴⁶ toward a regional-scale and more general view; i.e., it learns representations that also consider the regional dynamics of dust. This may help the model generalize dust events with meteorological patterns that are locally rare but globally more common. Fourth, multi-task learning acts as a regularization mechanism that makes the model less prone to overfitting.

Model architecture

Despite the differences pointed out between the model’s input and classical videos, we found that video classification architectures are most suitable for our problem. Therefore, we implemented a typical separable architecture (CNN for spatial frames, FC for time sequence). We found that an FC network, which operates on the entire sequence, outperforms other architectures, including Long short-term memory (LSTM) networks, which exhibit temporal dynamic behavior.

Along with the model architecture, we examined different lengths and frequencies of the time series input. We found that a 96-h history of 12-h time intervals, i.e., eight frames per sample, is optimal in terms of local dust forecasting performance. This structure compromises between long sequences of correlated frames that flood the network, and short or too-sparse sequences that prevent generalization.

The autoencoder network, which is composed of convolutional (encoder) and deconvolutional (decoder) layers, is an instance of a U-Net architecture⁴⁷, initially proposed for image pixel-wise segmentation. While a low compression ratio (i.e., larger code size) improves the regional task performance, the effect on the local task is bidirectional: it enriches the frame’s representation but increases the risk of overfitting. We found that a code size of 512 elements is optimal for the local task performance.

The rationale for adding auxiliary tasks is discussed above. The regional task seems a natural choice since it requires learning relevant regional-scale spatial features. Yet, other tasks may be helpful. For instance, we examined assigning the autoencoder a self-supervision task, where it is requested to reconstruct the original input. In another attempt, we examined auxiliary tasks that process whole sequences to learn inherent spatiotemporal representations from dynamic inputs. For example, predicting the future t + k regional PM₁₀ field or sorting the shuffled input sequences. None of these proved better than the proposed architecture.

Methods

Data

Half-hourly PM₁₀ measurements from 30 ground air quality monitoring stations for 18 years, from January 2003 through 31 December 2020, were retrieved from the Israeli Ministry of Environmental Protection (MOEP) database (https://www.svivaaqm.net). The stations are scattered over an area of about 200 km by 50 km, covering its Mediterranean coastline and central mountains’ western slopes. Stations’ spatial distribution tends towards the populated coastal areas and the central cities (Fig. 6a). PM₁₀ measurements were conducted using Tapered-Element Oscillating Microbalance (TEOM) instruments operated according to the United States Environmental Protection Agency (US-EPA) guidelines⁴⁸. Recent years show a certain decrease in PM₁₀ concentrations, mainly in the frequency of extremes (Fig. 6b), measured primarily during winter and spring (Fig. 6c).

**Fig. 6: Statistics of in situ PM₁₀ measurements.**

In consistency with previous studies⁴⁹, we define a dust event when the local PM₁₀ exceeds 2 standard deviations above the background concentration (average local PM₁₀ at the summer season), which results in a threshold of 65.2 μgm⁻³. “Discussion” discusses the model implications of this definition.

Considering 3-hourly averages, there are 7019 events, roughly 13% of the studied period. Out of these, the average PM₁₀ concentration is 142.2 μgm⁻³, the median is 93.1 μgm⁻³, and the standard deviation is 159.8 μgm⁻³. As can be seen in Fig. 6c, most of the events occur in the winter and spring months (33.7% and 36.5%, respectively), less in the fall (25.5%), and only a few in the summer (4.1%).

For model interpretability (see “Model interpretability”), we define independent dust events as sequences of 3-hourly dust events preceded by a dust-free period of at least 96 h. This definition allows focusing on new, rather than mixed, events by examination of the atmospheric conditions during the 96 h prior to the onset of the events. During the studied period, we observed 356 independent dust events.

Regional atmospheric data are extracted for the region covering the Mediterranean, the Saharan desert, and the Arabian Peninsula (19-43N, 2-51E), essentially encompassing all sources of dust potentially transported to Israel. Variables and pressure ranges selection is based on prior scientific knowledge of the quantities that may be potentially relevant to dust events in the region^4,5. They were retrieved from the European Centre for Medium-Range Weather Forecasts Reanalysis 5th Generation (ERA5) database⁵⁰ and include geopotential height (Z); zonal wind (U); meridional wind (V); vertical velocity (W); potential vorticity (PV); specific humidity (Q); and air temperature (T), all taken at the five pressure levels of 250, 500, 700, 850, and 900 hPa; along with sea-level pressure (SLP). Additional 2-dimensional variables over the same domain were extracted from the Copernicus Atmosphere Monitoring Service (CAMS) reanalysis database⁵¹ and include zonal and meridional wind at 10 m (u10, v10); Total Column Water Vapour (TCWV); Dust Aerosol Optical Depth at 550 nm (AOD); and PM₁₀ (regional PM₁₀).

For the local task, 3-hourly averages from the 30 stations were computed (local PM₁₀). Samples were omitted if 50% of the averaged data (stations or time points) were missing. Overall, 52,245 3-hourly samples remained.

For the regional task, both ERA5 and CAMS regional data were interpolated to 0.5-degree horizontal resolution, resulting in 49 × 99 gridded fields. As CAMS data are taken 6-hourly, we temporally interpolated it linearly to match the 3-hourly resolution of the ERA5 and the local PM₁₀ data. Missing spatial data (less than 0.5% of the pixels), e.g., due to surface intersection, was imputed using cubic splines interpolation.

All gridded fields are standardized (mean removed and divided by standard deviation) at the pixel level before being transferred to the neural model. Means and standard deviations were computed over the entire period; data can therefore be interpreted in terms of anomalies from the 18-year average.

Model

Figure 7 illustrates the network’s architecture. Let ${{{\mathcal{X}}}}$ denote the space of the meteorological input tensors (for readability, we sometimes omit the atmospheric adjective of the input). ${{{\mathcal{X}}}}$ contains tensors of size 40 × 49 × 99, which consist of all gridded fields except regional PM₁₀. Also, let ${{{\mathcal{Z}}}}$ denote the space of regional PM₁₀ fields; ${{{\mathcal{F}}}}$ the set of possible time features (month, day of week, and hour); and ${{{\mathcal{Y}}}}$ the set of local PM₁₀ levels, where $| {{{\mathcal{Y}}}}| =13$. Measurements are taken at 3-hourly timestamp t. We denote the state of t by s_t = (x_t, f_t, y_t), where ${x}_{t}\in {{{\mathcal{X}}}}$, ${f}_{t}\in {{{\mathcal{F}}}}$, and ${y}_{t}\in {{{\mathcal{Y}}}}$.

**Fig. 7: Deep multi-task learning architecture.**

Our interest is in learning a mapping ${{{\Phi }}}_{loc}:{[{{{\mathcal{X}}}}\times {{{\mathcal{F}}}}\times {{{\mathcal{Y}}}}]}^{N}\to {{{\mathcal{Y}}}}$, which given an input sequence of N states outputs the k-hour forecast of local PM₁₀ level, ${\hat{y}}_{t+k}\in {{{\mathcal{Y}}}}$, where k ∈ {12, 24, . . . , 72}. For reasons discussed in “Discussion”, we do not learn ${\Phi}_{loc}$ directly, but simultaneously with ${{{\Phi }}}_{reg}:{{{\mathcal{X}}}}\times {{{\mathcal{F}}}}\times {{{\mathcal{Y}}}}\to {{{\mathcal{Z}}}}$. Namely, ${\Phi}_{reg}$ receives a state s_t, and outputs the predicted corresponding regional PM₁₀ field, ${\hat{z}}_{t}\in {{{\mathcal{Z}}}}$; it is thus invariant to time sequences.

${\Phi}_{reg}$ is constructed by stacking an encoder and a decoder in an autoencoder architecture. The encoder network consists of a CNN with transformer encoder layer⁵², made up of self-attention and feed-forward networks, and residual blocks⁵³. It learns the spatial features of the meteorological input. Additionally, it includes two functions that map the discrete values of f_t and y_t into vectors of continuous numbers, thus embed timestamp and current local PM₁₀ information. These embeddings are concatenated to the CNN output. The encoder receives a state, s_t, and returns its code, namely, a vector of 512 elements we denote by c_t. The decoder network learns to decode c_t into z_t. It is composed of learnable transposed convolutional layers, also known as deconvolutions⁵⁴ which up-sample the code back to a field of similar spatial dimensions as the meteorological tensors.

${\Phi}_{loc}$ consists of two parts: an encoder and a classifier network. The encoder described above is common to ${\Phi}_{reg}$ and ${\Phi}_{loc}$; for each t, it transforms the sequence ${\{{s}_{i}\}}_{i = t-N}^{t}$, which encapsulates t’s N-length meteorological history, into a sequence of codes ${\{{c}_{i}\}}_{i = t-N}^{t}$. For simplicity, we denote these sequences by ${s}_{t}^{N}$ and ${c}_{t}^{N}$, respectively. The classifier network concatenates the code sequence to a single vector, then passes it through two FC layers, followed by ReLU and sigmoid activations, respectively. Eventually, for each t, the classifier network returns a vector of probabilities attributed to each local PM₁₀ level at timestamps t + k and outputs the level with the highest probability.

The above architecture uses the encoder codes ${c}_{t}^{N}$, to forecast the future local PM₁₀ level y_t+k, and to predict the current regional PM₁₀ maps ${z}_{t}^{N}={\{{z}_{i}\}}_{i = t-N}^{t}$. As follows, the encoder learns a meteorological compression that preserves two information types. For the local task, it saves local-scale dynamics features necessary to forecast the future local PM₁₀. For the regional task, it saves regional-scale features necessary for reconstructing the regional PM₁₀.

To take advantage of the ordinal behavior of y_t, ${\Phi}_{loc}$ is trained by minimizing a soft ordinal regression⁵⁵ (SORD) loss function, which penalizes for the distance of the error in terms of ${{{\mathcal{Y}}}}$ levels. It is formulated as a cross-entropy loss, only replacing the one-hot representation with an operator δ(y), that encodes the distance (in terms of probability) of every PM₁₀ level $i\in \{1,...,| {{{\mathcal{Y}}}}| \}$ from the true level y:

$${{{{\mathcal{L}}}}}_{SORD}({s}_{t}^{N},{y}_{t+k})=-\mathop{\sum }\limits_{i=1}^{| {{{\mathcal{Y}}}}| }\delta {({y}_{t+k})}_{i}\log \frac{\exp \{{{{\Phi }}}_{loc}{({s}_{t}^{N})}_{i}\}}{\mathop{\sum }\nolimits_{j = 1}^{| {{{\mathcal{Y}}}}| }\exp \{{{{\Phi }}}_{loc}{({s}_{t}^{N})}_{j}\}},$$

(2)

where ${{{\Phi }}}_{loc}{({s}_{t}^{N})}_{i}$ is the predicted probability for level i, and δ(y)_i is the corresponding element in the distance-based SORD encoding vector⁵⁵.

${\Phi}_{reg}$ is trained by minimizing the pixel-wise mean squared error of the decoder’s prediction from the regional PM₁₀ field:

$${{{{\mathcal{L}}}}}_{MSE}({s}_{t},{z}_{t})=\parallel {{{\Phi }}}_{reg}({s}_{t})-{z}_{t}{\parallel }_{2}^{2}.$$

(3)

Finally, the multi-task model is trained end-to-end by minimizing:

$${{{\mathcal{L}}}}({s}_{t}^{N},{y}_{t+k},{z}_{t}^{N})={{{{\mathcal{L}}}}}_{SORD}({s}_{t}^{N},{y}_{t+k})+\lambda \mathop{\sum }\limits_{i=t-N}^{t}{{{{\mathcal{L}}}}}_{MSE}({s}_{i},{z}_{i}),$$

(4)

where λ is a constant hyperparameter representing the regional task weight.

A training batch consists of 32 random sampled sets of tensors ${s}_{t}^{N},{z}_{t}^{N}$ and scalars y_t+k. For each batch, the encoder computes 32*N codes, regardless of their sequences, and in parallel passes them to the decoder and the classifier networks. The decoder outputs 32*N regional PM₁₀ fields. Thus ${{{{\mathcal{L}}}}}_{MSE}$ is computed 32*N times. The classifier combines the codes back into sequences, and outputs 32 forecasts; thus ${{{{\mathcal{L}}}}}_{SORD}$ is computed 32 times. Optimization is performed using the Adam Optimizer⁵⁶. A grid search provided λ = 0.05.

Local PM₁₀ values vary only in time (not in space) and have short-range temporal dependence. In order to keep the sets independent, performance evaluation was performed based on long non-overlapping series. Specifically, we split the data into training and validation sets by randomly sampling a third of the calendar years for validation, while omitting overlapping timestamps.

Although ${\Phi}_{loc}$ classifies PM₁₀ levels, we are primarily interested in the highest levels. The small proportion of dust events makes accuracy a misleading metric; instead, we measure performance in terms of recall and precision:

$$recall=\frac{TP}{TP+FN}\,,\,\,\,\,\,\,precision=\frac{TP}{TP+FP}\,,$$

(5)

where TP is the number of dust events samples that were correctly classified as events, FN is the number of dust events samples that were mistakenly classified as dust-free, and FP is the number of dust-free samples that were mistakenly classified as dust events. Note that recall and precision have an inverse relationship.

Data availability

All data that support the findings in this study is freely available. CAMS data can be downloaded from Atmosphere Data Store at https://atmosphere.copernicus.eu/data. ERA5 data can be downloaded at https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5. In situ PM₁₀ measurements can be downloaded at https://www.svivaaqm.net.

Code availability

The codes that support the findings of this study are available from the corresponding authors on request.

References

Ginoux, P., Prospero, J. M., Gill, T. E., Hsu, N. C. & Zhao, M. Global-scale attribution of anthropogenic and natural dust sources and their emission rates based on modis deep blue aerosol products. Rev. Geophys. 50 (2012).
Middleton, N. J. Desert dust hazards: a global review. Aeolian Res. 24, 53–63 (2017).
Article Google Scholar
Goudie, A. S. Desert dust and human health disorders. Environ. Int. 63, 101–113 (2014).
Article Google Scholar
Alpert, P., Osetinsky, I., Ziv, B. & Shafir, H. Semi-objective classification for daily synoptic systems: application to the eastern Mediterranean climate change. Int. J. Climatol.: J. Royal Meteorol. Soc. 24, 1001–1011 (2004).
Article Google Scholar
Dayan, U., Ziv, B., Shoob, T. & Enzel, Y. Suspended dust over southeastern Mediterranean and its relation to atmospheric circulations. Int. J. Climatol.: J. Royal Meteorol. Soc. 28, 915–924 (2008).
Article Google Scholar
Basart, S., Pérez, C., Nickovic, S., Cuevas, E. & Baldasano, J. Development and evaluation of the BSC-DREAM8b dust regional model over northern Africa, the Mediterranean and the Middle East. Tellus B: Chem. Phys. Meteorol. 64, 18539 (2012).
Article Google Scholar
Kukkonen, J. et al. A review of operational, regional-scale, chemical weather forecasting models in Europe. Atmos. Chem. Phys. 12, 1–87 (2012).
Article Google Scholar
Knippertz, P. & Todd, M. C. Mineral dust aerosols over the Sahara: meteorological controls on emission and transport and implications for modeling. Rev. Geophys. 50 (2012).
Zhong, S. et al. Machine learning: new ideas and tools in environmental science and engineering. Environ. Sci. Technol. 55, 12741–12754 (2021).
Google Scholar
Rolnick, D. et al. Tackling climate change with machine learning. ACM Comput. Surv. (CSUR) 55, 1–96 (2022).
Article Google Scholar
Reichstein, M. et al. Deep learning and process understanding for data-driven earth system science. Nature 566, 195–204 (2019).
Article Google Scholar
Chae, S. et al. Pm10 and pm2. 5 real-time prediction models using an interpolated convolutional neural network. Sci. Rep. 11, 1–9 (2021).
Article Google Scholar
Krasnov, H., Katra, I., Koutrakis, P. & Friger, M. D. Contribution of dust storms to pm10 levels in an urban arid environment. J. Air Waste Manag. Assoc. 64, 89–94 (2014).
Article Google Scholar
Jiang, H. et al. Dust storm detection of a convolutional neural network and a physical algorithm based on fy-4a satellite data. Adv. Space Res. 69, 4288–4306 (2022).
Article Google Scholar
Shi, L., Zhang, J., Zhang, D., Igbawua, T. & Liu, Y. Developing a dust storm detection method combining support vector machine and satellite data in typical dust regions of Asia. Adv. Space Res. 65, 1263–1278 (2020).
Article Google Scholar
Kang, S., Kim, N. & Lee, B.-D. Fine dust forecast based on recurrent neural networks. in 2019 21st International Conference on Advanced Communication Technology (ICACT), 456–459 (IEEE, 2019).
Shtein, A. et al. Estimating daily and intra-daily pm10 and pm2. 5 in Israel using a spatio-temporal hybrid modeling approach. Atmos. Environ. 191, 142–152 (2018).
Article Google Scholar
Harba, H. S., Harba, E. & Farttoos, M. Prediction of dust storm direction from satellite images by utilized deep learning neural network. in 2020 6th International Engineering Conference “Sustainable Technology and Development"(IEC), 179–184 (IEEE, 2020).
Ebrahimi-Khusfi, Z., Taghizadeh-Mehrjardi, R. & Mirakbari, M. Evaluation of machine learning models for predicting the temporal variations of dust storm index in arid regions of Iran. Atmos. Pollut. Res. 12, 134–147 (2021).
Article Google Scholar
Boroughani, M. et al. Application of remote sensing techniques and machine learning algorithms in dust source detection and dust source susceptibility mapping. Ecol. Inform. 56, 101059 (2020).
Article Google Scholar
Kowalski, P. A., Sapala, K. & Warchalowski, W. Pm10 forecasting through applying convolution neural network techniques. Air Pollut. Stud. 47, 31–43 (2020).
Google Scholar
Nidzgorska-Lencewicz, J. Application of artificial neural networks in the prediction of pm10 levels in the winter months: a case study in the Tricity agglomeration, Poland. Atmosphere 9, 203 (2018).
Article Google Scholar
Lee, J. et al. Machine learning based algorithms for global dust aerosol detection from satellite images: inter-comparisons and evaluation. Remote Sens. 13, 456 (2021).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. nature 521, 436–444 (2015).
Article Google Scholar
Caruana, R. Multitask learning. Machine learning 28, 41–75 (1997).
Article Google Scholar
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. in International Conference on Machine Learning, 3319–3328 (PMLR, 2017).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4765–4774 (2017).
Google Scholar
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. in International Conference on Machine Learning, 3145–3153 (PMLR, 2017).
Kumar Singh, K. & Jae Lee, Y. Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. in Proceedings of the IEEE International Conference on Computer Vision, 3524–3533 (IEEE, 2017).
Krasnov, H., Katra, I. & Friger, M. Increase in dust storm related pm10 concentrations: a time series analysis of 2001–2015. Environ. Pollut. 213, 36–42 (2016).
Article Google Scholar
Kalkstein, A. J., Rudich, Y., Raveh-Rubin, S., Kloog, I. & Novack, V. A closer look at the role of the cyprus low on dust events in the negev desert. Atmosphere 11, 1020 (2020).
Article Google Scholar
Saaroni, H., Halfon, N., Ziv, B., Alpert, P. & Kutiel, H. Links between the rainfall regime in Israel and location and intensity of Cyprus lows. Int. J. Climatol.: J. Royal Meteorol. Soc. 30, 1014–1025 (2010).
Article Google Scholar
Raveh-Rubin, S. & Wernli, H. Large-scale wind and precipitation extremes in the Mediterranean: a climatological analysis for 1979–2012. Q.J.R. Meteorol. Soc. 141, 2404–2417 (2015).
Article Google Scholar
Raveh-Rubin, S. & Wernli, H. Large-scale wind and precipitation extremes in the Mediterranean: dynamical aspects of five selected cyclone events. Q.J.R. Meteorol. Soc. 142, 3097–3114 (2016).
Article Google Scholar
Kushta, J., Pozzer, A. & Lelieveld, J. Uncertainties in estimates of mortality attributable to ambient pm2. 5 in Europe. Environ. Res. Lett. 13, 064029 (2018).
Article Google Scholar
Tong, D. Q. et al. Dust storms, valley fever, and public awareness. GeoHealth 6, e2022GH000642 (2022).
Article Google Scholar
Sarafian, R., Kloog, I. & Rosenblatt, J. D. Optimal-design domain-adaptation for exposure prediction in two-stage epidemiological studies. J. Exposure Sci. Environ. Epidemiol. 1–8 (2022).
Achilleos, S. et al. Spatio-temporal variability of desert dust storms in eastern Mediterranean (Crete, Cyprus, Israel) between 2006 and 2017 using a uniform methodology. Sci. Total Environ. 714, 136693 (2020).
Article Google Scholar
Sorek-Hamer, M., Stupp, A., Alpert, P. & Broday, D. M. et al. Characteristics of the east Mediterranean dust variability on small spatial and temporal scales. Atmos. Environ. 120, 51–60 (2015).
Article Google Scholar
Racah, E. et al. Extremeweather: a large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events. Adv. Neural Inf. Process. Syst. 30, 3402–3413 (2017).
Google Scholar
Russakovsky, O. et al. Imagenet large scale visual recognition challenge. Int. J. Computer Vision 115, 211–252 (2015).
Article Google Scholar
Zhuang, F. et al. A comprehensive survey on transfer learning. Proc. IEEE 109, 43–76 (2020).
Article Google Scholar
Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 1–48 (2019).
Article Google Scholar
Liebel, L. & Körner, M. Auxiliary tasks in multi-task learning. Preprint at https://arxiv.org/abs/1805.06334 (2018).
Ruder, S. An overview of multi-task learning in deep neural networks. Preprint at https://arxiv.org/abs/1706.05098 (2017).
Mitchell, T. M. The Need for Biases in Learning Generalizations (Department of Computer Science, Laboratory for Computer Science Research, Rutgers University, New Brunswick, NJ, 1980).
Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. in International Conference on Medical Image Computing and Computer-assisted Intervention, 234–241 (Springer, 2015).
EPA. Quality Assurance Handbook for Air Pollution Measurement Systems: “volume II: Ambient Air Quality Monitoring Program" (United States Environmental Protection Agency (USEPA), RTP, NC, 2017).
Vodonos, A. et al. The impact of desert dust exposures on hospitalizations due to exacerbation of chronic obstructive pulmonary disease. Air Quality Atmos. Health 7, 433–439 (2014).
Article Google Scholar
Hersbach, H. et al. The era5 global reanalysis. Q.J.R. Meteorol. Soc. 146, 1999–2049 (2020).
Article Google Scholar
Inness, A. et al. The cams reanalysis of atmospheric composition. Atmos. Chem. Phys. 19, 3515–3556 (2019).
Article Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).
Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (IEEE, 2016).
Zeiler, M. D., Krishnan, D., Taylor, G. W. & Fergus, R. Deconvolutional networks. in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2528–2535 (IEEE, 2010).
Roy, S. et al. Deep learning for classification and localization of covid-19 markers in point-of-care lung ultrasound. IEEE Transact. Medical Imaging 39, 2676–2687 (2020).
Article Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

Download references

Acknowledgements

The authors wish to thank Shai Bagon for his comments and ideas and Vered Silverman for extracting and preparing the reanalysis data. The European Centre for Medium-range Weather Forecasts and Israeli Meteorological Service are acknowledged for providing access to ERA5. The research was partially supported by a research grant from the Maggie Kaplan Research Fund, by the Israeli Council for Higher Education (CHE) via the Weizmann Data Science Research Center, and by the Helen Kimmel Center for Planetary Science at the Weizmann Institute of Science.

Author information

These authors contributed equally: Ron Sarafian, Dori Nissenbaum.

Authors and Affiliations

Department of Earth and Planetary Sciences, Weizmann Institute of Science, Rehovot, 76100, Israel
Ron Sarafian, Dori Nissenbaum, Shira Raveh-Rubin & Yinon Rudich
Department of Physics, Indian Institute of Technology Bombay, Mumbai, Maharashtra, 400076, India
Vikhyat Agrawal

Authors

Ron Sarafian
View author publications
Search author on:PubMed Google Scholar
Dori Nissenbaum
View author publications
Search author on:PubMed Google Scholar
Shira Raveh-Rubin
View author publications
Search author on:PubMed Google Scholar
Vikhyat Agrawal
View author publications
Search author on:PubMed Google Scholar
Yinon Rudich
View author publications
Search author on:PubMed Google Scholar

Contributions

R.S. and D.N. conceived of the presented idea; R.S. developed the theory with support from D.N. and performed the computations; D.N. built and processed the data; S.R.R. and Y.R. verified the analytical methods, the scientific insights, and supervised the findings of this study; R.S. wrote the manuscript with support from D.N., S.R.R., and Y.R.; V.A. programmed the visualization and explanation tools. All authors discussed the results and contributed to the final manuscript.

Corresponding authors

Correspondence to Ron Sarafian, Dori Nissenbaum, Shira Raveh-Rubin or Yinon Rudich.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sarafian, R., Nissenbaum, D., Raveh-Rubin, S. et al. Deep multi-task learning for early warnings of dust events implemented for the Middle East. npj Clim Atmos Sci 6, 23 (2023). https://doi.org/10.1038/s41612-023-00348-9

Download citation

Received: 20 October 2022
Accepted: 09 March 2023
Published: 23 March 2023
Version of record: 23 March 2023
DOI: https://doi.org/10.1038/s41612-023-00348-9

This article is cited by

Deriving ozone and PM pollution vertical profiles using lightweight, cost-effective sensors and deep learning
- D. Nissenbaum
- R. Sarafian
- Y. Rudich
npj Climate and Atmospheric Science (2025)