Fig. 1: Model architectures for the early detection of PC using embeddings from pre-trained models.
From: Enhancing EHR-based pancreatic cancer prediction with LLM-derived embeddings

A An approach to integrate embeddings from pre-trained models. B A multi-label classification model designed to predict PC risk across multiple future time intervals (0–3, 3–6, 6–12, 12–36, and 36–60 months before diagnosis). This framework illustrates example patient data labeled as [0, 0, 0, 1, 1], indicating a PC diagnosis occurring 12–36 months after the last reported medical condition. The Transformer model predicts five binary outcomes, and the model’s predictions for the 12-36 month interval are evaluated based on the fourth label of the predicted probabilities. C Three binary classification models for early prediction at 3–6, 6–12, and 12–36 months before diagnosis, with augmented datasets. For example, to build the 12–36 month binary prediction model, we used patient data with a 0-3 month label [1,1,1,1,1] and excluded records reported less than 12 months before cancer diagnosis. By removing these proximal conditions, we adjusted the time gap between the last diagnosis and the cancer diagnosis, ensuring that the data aligns with the respective time intervals for each binary prediction model.