Fig. 2: Delphi-2M accurately models the rates of a wide range of diseases.
From: Learning the natural history of human disease with generative transformers

a, The predicted rates for nine exemplary diagnoses and death (y axis) as a function of age (x axis). The points show predictions at each recorded input token. Colours separate biological sex; the darker colours indicate predictions immediately before the diagnosis in question. The purple and turquoise lines are disease rates observed for each yearly age bin in the training data. The solid black line connects consecutive predictions for one randomly selected case throughout age. b, Average age–sex-stratified AUC values (y axis) as a function of training occurrences (x axis). Shown are data for n = 906 diagnoses for male individuals and n = 957 diagnoses for female individuals for which a sufficient number of events was recorded in the validation data to evaluate AUC values. c, The same as b, but aggregated by the ICD-10 chapter. d, The same as b, aggregated by sex. e, AUC values of all diagnoses in b for different time gaps between prediction and diagnoses (x axis). f, ROC curves for Delphi and other clinical or machine learning methods for three selected end points evaluated on the internal longitudinal testing set. g, AUC values of MILTON31, a biomarker-based machine learning model (x axis), in prognostic mode, compared with Delphi-2M AUC values from the UK Biobank validation set (y axis) for n = 410 diagnoses. The box plots in c–e show the median (centre line), the first to the third quartile (box limits) and the 0.025 and 0.975 quantiles (whiskers).