Extended Data Fig. 10: Effects of ethnicity and deprivation. | Nature

Extended Data Fig. 10: Effects of ethnicity and deprivation.

From: Learning the natural history of human disease with generative transformers

Extended Data Fig. 10

a, Modelled rate per year separated by sex and ethnic background. b, Modelled rate per year separated by sex and Townsend deprivation index bins (increasing for greater deprivation index values). The boxplots in a and b use the entire validation cohort (n = 100639 individual trajectories) and feature median as the center line, the box from the first to the third quartile, the whiskers for 1.5x IQR and the outliers. c-d, Average number of disease tokens per year, shown for different ethnicities (c) and deprivation indices (d). e-f, Age and sex stratified AUCs for 10 selected diseases. AUCs are averaged across 5-year age groups ranging from 40 to 80 years of age. The same average is used as the center for error bars. AUCs for individual age and sex brackets are shown as grey dots. 95% confidence intervals are calculated using DeLong’s method. g-h, Width of DeLong’s 95% confidence intervals for AUC vs number of cases, shown for different ethnicities and deprivation strata. For rare diseases, AUC estimates have high variance. i, Standard deviation between AUC estimates for different strata vs number of cases of this disease for the training dataset. Each dot represents a disease. j, Average validation AUC across 5-year age groups ranging from 40 to 80 years of age, aggregated by the corresponding ICD chapters. Difference between average AUCs calculated for participants with birth years before 1944 and after 1960. The boxplots feature the median as the center line, the box from the first to the third quartile and the whiskers for 1.5x IQR, clipped at min/max data points. Shown are data for n = 906 diagnoses for males and n = 957 diagnoses for females for which sufficiently many events were recorded in the validation data to evaluate AUCs.

Back to article page