Fig. 1

An overview of the proposed PC-VAE framework. For the sake of simplicity, we show the processing steps for a single particle only. The loss function is computed by averaging over a mini-batch of M sampled particles. In the encoder, each input localization point is transformed independently through three parallel MLP blocks to higher dimensions. These MLP blocks are applied to each localization individually, meaning that every localization undergoes the same transformation regardless of its position in the sequence. The resulting features are pooled to generate a global particle representation, which is then passed through two separate multi-layer MLPs to produce Gaussian latent variables. The encoder is designed to be invariant to the order of the input points. In the decoder, samples drawn from the Gaussian distribution are transformed to match the shape of the input localizations. In the figure, Skip denotes the skip connection, and Multi-head Att. denotes the multi-head attention mechanism. The reconstruction loss function also ensures that the order of the reconstructed points is irrelevant, maintaining the framework’s robustness to input permutations.