Fig. 1: Illustration of the patient embedding model and downstream application.

a Encoding diagnosis and procedure codes into numerical space as basic vocabularies in downstream training using an autoencoder architecture. b Representing patients’ visits within a year as sentences and diagnoses, procedure codes as vocabularies. c Feed each visit from b into a Transformer model and concatenate through the Patient Embedding model to generate the final patient vectors. d Downstream applications workflow of disease onset prediction (left), bulk phenotyping (middle), and clustering subgroup analysis (right).