Fig. 4: Predictive models can generalize across healthcare systems and populations. | Nature Communications

Fig. 4: Predictive models can generalize across healthcare systems and populations.

From: Medical history predicts phenome-wide disease onset and enables the rapid response to emerging health threats

Fig. 4

a External validation of the differences in discriminatory performance quantified by the C-Index between CPH models trained on age and biological sex and age, biological sex, and the risk state for 1519 endpoints in the All of Us cohort. We find significant improvements over the baseline model (age and biological sex only) for 1171 (77.1%) of the 1519 investigated endpoints. b Direct comparison of the absolute C-Index in the UK Biobank (x-axis) and the All Of Us cohort (y-axis). Significant improvements can be replicated for 1115 (78.9%, green points) of 1414 endpoints in the All Of Us cohort. c Comparison of mean delta C-Index per delta percentile (derived from the UK Biobank from the 1519 endpoints available in All Of Us). Improvements in the All Of Us cohort are consistent with the UK Biobank cohort: Small improvements in the UK Biobank tend to be larger in All Of Us, while large improvements in the UK Biobank tend to be attenuated in All Of Us. d Distribution of C-Indices for the 1519 investigated endpoints stratified by communities historically underrepresented in biomedical research (UPD)73. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. e For the same groups, confidence intervals for the additive performance as measured by the C-Index compared to the baseline model. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. f Absolute discriminatory performance in terms of C-Index comparing the baseline (age and biological sex, black point) with the added routine health records risk state (red points) for a selection of 24 endpoints. g The differences in C-index for the same models. Statistical measures for UKB in (b and c) were derived from 502,489 individuals, and for AoU in (ag) were derived from 259,234 individuals. Dots indicate medians and whiskers extend to the Bonferroni-corrected 95% confidence interval for a distribution bootstrapped over 100 iterations. Source data are provided.

Back to article page