Fig. 7: Deep multi-task learning architecture.
From: Deep multi-task learning for early warnings of dust events implemented for the Middle East

For every timestamp t, meteorological input tensor xt, time feature ft, and in situ PM10 level yt are passed to an encoder network which returns a code ct. The encoder is composed of stacked CNN layers which consist of batch-norms (BN), convolutions (Conv) with ReLU activations; and transformers + residual blocks (ResBlock) which consist of (spatial) positional encoding, a multi-head attention network, and a feed-forward network; lastly fully connected (FC) layers transform the output into a 512-element vector, ct. The decoder network receives ct and returns a regional PM10 prediction \({\hat{z}}_{t}\). The decoder is composed of stacked CNN layers and deconvolutional (deConv) layers. A sequence of codes ct−N, . . . , ct are transferred to the classifier network which returns a single local PM10 forecast \({\hat{y}}_{t+k}\). The classifier is composed of a concatenation (Concat) of N codes, dropout (Drop) with rate 0.5, BN, FC, and ReLU activation, which follows an additional FC layer with a softmax activation. The local and regional tasks are solved simultaneously through optimizing a weighted loss.