Fig. 3: MILTON validation and benchmarks with proteomics data and PRSs. | Nature Genetics

Fig. 3: MILTON validation and benchmarks with proteomics data and PRSs.

From: Disease prediction with multi-omics and biomarkers empowers case–control genetic discoveries in the UK Biobank

Fig. 3: MILTON validation and benchmarks with proteomics data and PRSs.

a, Overview of capped analysis. Here, all individuals diagnosed until 1 January 2018 were used during model training and all individuals diagnosed thereafter were used as the test set for predictions. A 2 × 2 contingency table was constructed to capture whether known cases and controls were eventually correctly predicted by MILTON. b, Distribution of odds ratio obtained from Fisher’s exact test (FET) in capped analysis on 1,748 ICD10 codes across multiple prediction probability thresholds, indicating the power of MILTON to predict known cases hidden from the training set. Results with predicted probability threshold ≥ 0.6 are filled with orange color and those corresponding to threshold = 0.7 are highlighted in black boundary. c, Performance comparison of MILTON time-agnostic models when trained on 67 traits versus disease-specific PRSs across 151 ICD10 codes. d, Box plots comparing the performance of MILTON time-agnostic models when trained on 67 traits versus all 36 PRSs across 499 ICD10 codes. e, Performance comparison of MILTON time-agnostic models when trained on protein expression data + covariates ± 67 traits versus 67 traits across 1,574 ICD10 codes (Methods). f, AUC differences when MILTON is trained on different feature set combinations for 1,299 ICD10 codes (time-agnostic model). Left, x axis represents median AUC3k proteins+67 traits − median AUC67 traits for matched ICD10 codes. Right, x axis represents median AUC3k proteins+67 traits − median AUC3k proteins for matched ICD10 codes. In bf, each box plot shows median as center line, 25th percentile as lower box limit and 75th percentile as upper box limit; whiskers extend to 25th percentile − 1.5× interquartile range at the bottom and 75th percentile + 1.5× interquartile range at the top; points denote outliers. MWU, two-sided P values are shown in ce.

Source data

Back to article page