Fig. 3: Architecture of MESEL.

The input is an ECG recording with a duration of 15 s, and the output is 12 sets of prediction vectors of length 254. The backbone of the network has two available options: the shared backbone and the private backbone. The shared backbone requires only one model to be trained, which has 12 projection heads and outputs corresponding to the 12 losses, and the total loss is the sum of these 12 losses. The private backbone uses the 12 losses to train 12 models, respectively. The final prediction of the input ECG is the average of the 12 predictions.