Fig. 1: Architecture of VaDeSC-EHR.

First, an embedding is computed for a patient’s diagnosis sequence. This embedding serves as input for multiple transformer blocks, where the final pooling layer generates the latent representation Z for the patient. Z is regularized toward a Gaussian mixture distribution by including a variational term in the loss function. Z is then used to predict the time-to-event21, as well as passed to the transformer decoder for reconstructing the patient’s diagnosis sequence. Weights are shared between the encoder and decoder. For more details, please refer to the “Methods” section.