Fig. 3: Training strategies employed. | npj Digital Medicine

Fig. 3: Training strategies employed.

From: Empirical evaluation of artificial intelligence distillation techniques for ascertaining cancer outcomes from electronic health records

Fig. 3

This figure illustrates the training, development, and evaluation strategies for different model configurations. The teacher model (green diamond) is trained on a PHI-containing DFCI dataset, producing predictions and outcomes. Three student models (purple diamonds) are trained using different datasets: (1) MIMIC-IV (blue), (2) Wiki-text (pink), and (3) matched synthetic data generated by a GPT-4 model (yellow and purple). The synthetic data model is further divided into a teacher-informed student (trained on labels assigned by the teacher model to the synthetic text data) a fully synthetic student (trained exclusively on GPT-4-generated synthetic data). All trained models are evaluated using a held-out test set, generating predictions and outcomes. Solid arrows represent direct training and inference, while dashed arrows denote applications of trained models. DFCI Dana-Farber Cancer Institute, MIMIC-IV Medical Information Mart for Intensive Care IV, GPT-4 Generative Pre-trained Transformer 4.

Back to article page