Fig. 10: Architecture of the multi-task decoding and loss coordination module.

Each task input is first processed by a feed-forward network, shown as rectangular boxes, and then aggregated using Set2Set pooling. The pooled representations are passed through linear layers and task-specific decoders to produce predictions for each property at the output layer. On the right, the loss coordination module combines predictions and targets. Mean squared error loss is calculated, statistical information is computed, and weighting factors are introduced, denoted as beta and gamma blocks, to balance different tasks. These components together form the coordinated loss that guides training.