Fig. 2: MILTON time-models and phenome-wide performance across ancestries.

a, Schematic showing how different time-models are defined and the frequency of individuals that had biomarker sample collection certain years before or after diagnosis date. Diagnosis dates recorded in UKB fields 41280, 40000 or 40005 were taken for each individual (Methods). b, MILTON AUC performance across all ICD10 codes, five ancestries and three time-models. c, Comparison of median AUC and sensitivity performance of MILTON models across ten replicates trained on 1,466, 73 and 56 ICD10 codes under EUR, SAS and AFR ancestries, respectively, and different time-models. MWU, two-sided P values are shown. Each box plot shows the median as center line, 25th percentile as lower box limit and 75th percentile as upper box limit, and whiskers extend to 25th percentile − 1.5 × interquartile range at the bottom and 75th percentile + 1.5 × interquartile range at the top; points denote outliers. d, Distribution of median AUC across ten replicates with increasing number of training cases per ICD10 code across different time-models and ancestries. Error-bar represents 95% confidence interval with center representing mean statistic. Pearson correlation coefficients (r) and two-sided P values (P) for each time-model are provided.