Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Meta-prediction of coronary artery disease risk

An Author Correction to this article was published on 19 August 2025

This article has been updated

Abstract

Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide, and accurately predicting individual risk is critical for prevention. Here we aimed to integrate unmodifiable risk factors, such as age and genetics, with modifiable risk factors, such as clinical and biometric measurements, into a meta-prediction framework that produces actionable and personalized risk estimates. In the initial development of the model, ~2,000 predictive features were considered, including demographic data, lifestyle factors, physical measurements, laboratory tests, medication usage, diagnoses and genetics. To power our meta-prediction approach, we stratified the UK Biobank into two primary cohorts: first, a prevalent CAD cohort used to train predictive models for cross-sectional prediction at baseline and prospective estimation of contributing risk factor levels and diagnoses (baseline models) and, second, an incident CAD cohort using, in part, these baseline models as meta-features to train a final CAD incident risk prediction model. The resultant 10-year incident CAD risk model, composed of 15 derived meta-features with multiple embedded polygenic risk scores, achieves an area under the curve of 0.84. In an independent test cohort from the All of Us research program, this model achieved an area under the curve of 0.81 for predicting 10-year incident CAD risk, outperforming standard clinical scores and previously developed integrative models. Moreover, this framework enables the generation of individualized risk reduction profiles by quantifying the potential impact of standard clinical interventions. Notably, genetic risk influences the extent to which these interventions reduce overall CAD risk, allowing for tailored prevention strategies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of cohort construction, model development and performance assessment for 10-year incident CAD risk meta-prediction in the UKBB.
Fig. 2: Comparative performance of meta-prediction stratified by standard risk factors in the UKBB population.
Fig. 3: SHAP summary plot of features in the meta-prediction framework in the UKBB population.
Fig. 4: Identification of CAD risk subgroups and distinguishing features in the UKBB population.
Fig. 5: Benefit of clinical interventions in genetic risk and risk subgroups in the UKBB population.

Similar content being viewed by others

Data availability

All data are made available from the UKBB55 (https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access) and All of Us research program74 (https://workbench.researchallofus.org/login) to researchers from universities and other institutions with genuine research inquiries following institutional review board and biobank approval. This research has been conducted using the UKBB resource under application number 41999 and the All of Us v7 Curated Data Repository (R2022Q4R9 and C2022Q4R9 versions).

Code availability

The machine learning code used to generate the meta-predictions is available via GitHub at http://github.com/TorkamaniLab/CAD_meta_prediction.

Change history

References

  1. Damask, A. et al. Patients with high genome-wide polygenic risk scores for coronary artery disease may receive greater clinical benefit from alirocumab treatment in the ODYSSEY OUTCOMES trial. Circulation https://doi.org/10.1161/CIRCULATIONAHA.119.044434 (2020).

  2. Marston, N. A. et al. Predicting benefit from evolocumab therapy in patients with atherosclerotic disease using a genetic risk score. Circulation https://doi.org/10.1161/CIRCULATIONAHA.119.043805 (2020).

  3. Mega, J. L. et al. Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials. Lancet 385, 2264–2271 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Natarajan, P. et al. Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting. Circulation 135, 2091–2101 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Bolli, A., Di Domenico, P., Pastorino, R., Busby, G. B. & Bottà, G. Risk of coronary artery disease conferred by low-density lipoprotein cholesterol depends on polygenic background. Circulation 143, 1452–1454 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Ye, Y. et al. Interactions between enhanced polygenic risk scores and lifestyle for cardiovascular disease, diabetes, and lipid levels. Circ. Genom. Precis. Med. 14, E003128 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Muse, E. D. et al. Impact of polygenic risk communication: an observational mobile application-based coronary artery disease study. NPJ Digit. Med. 5, 30 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Hollands, G. J. et al. The impact of communicating genetic risks of disease on risk-reducing health behaviour: systematic review with meta-analysis. BMJ 352, i1102 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Widén, E. et al. How communicating polygenic and clinical risk for atherosclerotic cardiovascular disease impacts health behavior: an observational follow-up study. Circ. Genom. Precis. Med. 15, E003459 (2022).

    Article  PubMed  Google Scholar 

  10. Knowles, J. W. et al. Impact of a genetic risk score for coronary artery disease on reducing cardiovascular risk: a pilot randomized controlled study. Front. Cardiovasc. Med. 4, 53 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Maamari, D. J. et al. Clinical implementation of combined monogenic and polygenic risk disclosure for coronary artery disease. JACC Adv. 1, 100068 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Roberts, M. C., Khoury, M. J. & Mensah, G. A. Perspective: The clinical use of polygenic risk scores: race, ethnicity, and health disparities. Ethn. Dis. 29, 513–516 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Lewis, A. C. F. & Green, R. C. Polygenic risk scores in the clinic: new perspectives needed on familiar ethical issues. Genome Med. 13, 14 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Martens, F. K., Tonk, E. C. M. & Janssens, A. C. J. W. Evaluation of polygenic risk models using multiple performance measures: a critical assessment of discordant results. Genet. Med. 21, 391–397 (2019).

    Article  PubMed  Google Scholar 

  15. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Khan, S. S. et al. Coronary artery calcium score and polygenic risk score for the prediction of coronary heart disease events. JAMA 329, 1768–1777 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Mosley, J. D. et al. Predictive accuracy of a polygenic risk score compared with a clinical risk score for incident coronary heart disease. JAMA 323, 627–635 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Wünnemann, F. et al. Validation of genome-wide polygenic risk scores for coronary artery disease in French Canadians. Circ. Genom. Precis. Med. 12, e002481 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Murthy, V. L. et al. Polygenic risk, fitness, and obesity in the Coronary Artery Risk Development in Young Adults (CARDIA) study. JAMA Cardiol. 5, 263–271 (2020).

    Article  Google Scholar 

  20. Wells, Q. S. et al. Polygenic risk score to identify subclinical coronary heart disease risk in young adults. Circ. Genom. Precis. Med. 14, e003341 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Marston, N. A. et al. Predictive utility of a coronary artery disease polygenic risk score in primary prevention. JAMA Cardiol. 8, 130–137 (2023).

    Article  PubMed  Google Scholar 

  22. Isgut, M., Sun, J., Quyyumi, A. A. & Gibson, G. Highly elevated polygenic risk scores are better predictors of myocardial infarction risk early in life than later. Genome Med. 13, 13 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Mars, N. et al. Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat. Med. 26, 549–557 (2020).

    Article  CAS  PubMed  Google Scholar 

  24. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Elliott, J. et al. Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease. JAMA 323, 636–645 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Aragam, K. G. et al. Limitations of contemporary guidelines for managing patients at high genetic risk of coronary artery disease. J. Am. Coll. Cardiol. 75, 2769–2780 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Aragam, K. G. & Natarajan, P. Polygenic scores to assess atherosclerotic cardiovascular disease risk: clinical perspectives and basic implications. Circ. Res. 126, 1159–1177 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Sun, L. et al. Polygenic risk scores in cardiovascular risk prediction: a cohort study and modelling analyses. PLoS Med. 18, e1003498 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Riveros-Mckay, F. et al. Integrated polygenic tool substantially enhances coronary artery disease prediction. Circ. Genom. Precis. Med. 14, E003304 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Hindy, G. et al. Abstract 16565: Integration of a genome-wide polygenic score with ACC/AHA pooled cohorts equation in prediction of coronary artery disease events in >285,000 participants. Circulation 140, abstr. 16565 (2019).

  31. Inouye, M. et al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 72, 1883–1893 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Ntalla, I. et al. Genetic risk score for coronary disease identifies predispositions to cardiovascular and noncardiovascular diseases. J. Am. Coll. Cardiol. 73, 2932–2942 (2019).

    Article  PubMed  Google Scholar 

  33. Abraham, G. et al. Genomic risk score offers predictive performance comparable to clinical risk factors for ischaemic stroke. Nat. Commun. 10, 5819 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Lin, J. et al. Integration of biomarker polygenic risk score improves prediction of coronary heart disease. JACC Basic Transl Sci. 8, 1489–1499 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Vassy, J. L. et al. Cardiovascular disease risk assessment using traditional risk factors and polygenic risk scores in the Million Veteran Program. JAMA Cardiol. 8, 564–574 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Agrawal, S. et al. Selection of 51 predictors from 13,782 candidate multimodal features using machine learning improves coronary artery disease prediction. Patterns 2, 100364 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Patel, A. P. et al. A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease. Nat. Med. https://doi.org/10.1038/s41591-023-02429-x (2023).

  38. Torkamani, A., Andersen, K. G., Steinhubl, S. R. & Topol, E. J. High-definition medicine. Cell 170, 828–843 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

    Article  CAS  PubMed  Google Scholar 

  40. Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).

    Article  CAS  PubMed  Google Scholar 

  41. Gola, D., Erdmann, J., Müller-Myhsok, B., Schunkert, H. & König, I. R. Polygenic risk scores outperform machine learning methods in predicting coronary artery disease status. Genet. Epidemiol. 44, 125–138 (2020).

    Article  PubMed  Google Scholar 

  42. Xu, Y. et al. A machine learning model for disease risk prediction by integrating genetic and non-genetic factors. In Proc. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (eds Adjeroh, D. et al.) 868–871 (IEEE, 2022).

  43. Gibson, G. On the utilization of polygenic risk scores for therapeutic targeting. PLoS Genet. 15, e1008060 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Goddard, K. A. B., Lee, K., Buchanan, A. H., Powell, B. C. & Hunter, J. E. Establishing the medical actionability of genomic variants. Annu. Rev. Genomics Hum. Genet. 23, 173–192 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Wang, Y., Tsuo, K., Kanai, M., Neale, B. M. & Martin, A. R. Challenges and opportunities for developing more generalizable polygenic risk scores. Annu. Rev. Biomed. Data Sci. 5, 293–320 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Wray, N. R., Wijmenga, C., Sullivan, P. F., Yang, J. & Visscher, P. M. Common disease is more complex than implied by the core gene omnigenic model. Cell https://doi.org/10.1016/j.cell.2018.05.051 (2018).

  48. Mathieson, I. The omnigenic model and polygenic prediction of complex traits. Am. J. Hum. Genet. 108, 1558–1563 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Aragam, K. G. et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat. Genet. 54, 1803–1815 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Tcheandjieu, C. et al. Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat. Med. 28, 1679–1692 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. You, J. et al. Development of machine learning-based models to predict 10-year risk of cardiovascular disease: a prospective cohort study. Stroke Vasc. Neurol. 8, 475–485 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Zeitouni, M. et al. Performance of guideline recommendations for prevention of myocardial infarction in young adults. J. Am. Coll. Cardiol. 76, 653–664 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. De Filippis, A. P. et al. Risk score overestimation: the impact of individual cardiovascular risk factors and preventive therapies on the performance of the American Heart Association–American College of Cardiology–Atherosclerotic Cardiovascular Disease risk score in a modern multi-ethnic cohort. Eur. Heart J. 38, 598–608 (2017).

    Google Scholar 

  54. Livingstone, S. et al. Effect of competing mortality risks on predictive performance of the QRISK3 cardiovascular risk prediction tool in older people and those with comorbidity: external validation population cohort study. Lancet Healthy Longev. 2, e352–e361 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Patel, A. P., Wang, M., Kartoun, U., Ng, K. & Khera, A. V. Quantifying and understanding the higher risk of atherosclerotic cardiovascular disease among South Asian individuals: results from the UK Biobank prospective cohort study. Circulation 144, 410–422 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Goff, D. C. et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines. Circulation 129, 49–73 (2014).

    Article  Google Scholar 

  58. Hippisley-Cox, J., Coupland, C. & Brindle, P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 357, j2099 (2017).

  59. CatBoost Encoder Category Encoders 2.6.3 documentation; https://contrib.scikit-learn.org/category_encoders/catboost.html

  60. Wilson, S. miceRanger: multiple imputation by chained equations with random forests. R version 4.0.0 https://cran.r-project.org/package=miceRanger (2021).

  61. Khan, S. S. et al. Novel prediction equations for absolute risk assessment of total cardiovascular disease incorporating cardiovascular–kidney–metabolic health: a scientific statement from the American Heart Association. Circulation https://doi.org/10.1161/CIR.0000000000001191 (2023).

  62. Arnett, D. K. et al. 2019 ACC/AHA guideline on the primary prevention of cardiovascular disease: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on clinical practice guidelines. Circulation 140, e563–e595 (2019).

    PubMed  PubMed Central  Google Scholar 

  63. Chen, S. F. et al. Genotype imputation and variability in polygenic risk score estimation. Genome Med. 12, 100 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  64. McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Nielsen, J. B. et al. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat. Genet. https://doi.org/10.1038/s41588-018-0171-3 (2018).

  66. Evangelou, E. et al. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 50, 1412–1425 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Conti, D. V. et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet. 53, 65–75 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50, 1505–1513 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  74. The All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).

  75. Khattab, A., Chen, S.-F., Wineinger, N. & Torkamani, A. AoUPRS: a cost-effective and versatile PRS calculator for the All of Us program. Preprint at bioRxiv https://doi.org/10.1101/2024.07.11.603165 (2024).

  76. Norgeot, B. et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat. Med. https://doi.org/10.1038/s41591-020-1041-y (2020).

  77. Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann. Intern. Med. 162, 55–63 (2015).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We thank J. C. Ducom, L. Dong and the Scripps High-Performance Computing service for their support. Thanks to E. Topol for his comments on this paper. This work is supported by R01HG010881 to A.T. as well as grant UM1TR004407. We recognize that some of the factors labeled unmodifiable in this paper may be modifiable in some circumstances.

Author information

Authors and Affiliations

Authors

Contributions

Concept and design: A.T. Acquisition, analysis or interpretation of data: A.T., S.-F.C., S.E.L., H.J.S., C.H., J.-F.C. and N.E.W. Drafting of the paper: A.T., S.-F.C., J.-B.P. and A.K. Critical revision of the paper for important intellectual content: A.T., J.-B.P. and E.D.M.

Corresponding author

Correspondence to Ali Torkamani.

Ethics declarations

Competing interests

A.T. declares that he is a cofounder and equity shareholder of GeneXwell Inc. A.T. is an advisor to InsideTracker. The other authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Michael Basson, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Feature importance and SHAP summary for 10-year prospective CAD risk prediction in the UK Biobank.

This composite figure provides two views of the importance of the final 50 features used in prospective 10-year CAD meta-prediction for the incident CAD cohort (n = 33,419). The left panel displays a bar plot ranking the features in order of importance as quantified by the mean absolute SHAP value. The length of each bar represents the magnitude of a feature’s importance. The right panel presents a SHAP summary plot with each point representing the feature’s SHAP value at an individual level. Color coding represents the feature value (red for high, blue for low). Positive values correspond to the prediction contribution to the positive class, and negative values correspond to the prediction contribution to the negative class. Abbreviations: Dx: Diagnosis; MUF: modifiable and unmodifiable factors; UF: unmodifiable factors.

Extended Data Fig. 2 Evaluating the calibration and predictive value of feature categories for the meta-prediction model in the UK Biobank.

This figure evaluates the performance of the meta-prediction model compared to existing clinical risk scores within the test set of the incident CAD cohort (n = 33,419), highlighting both the model’s calibration and the predictive contribution of distinct feature categories. a. Calibration plot showing the predicted vs. observed 10-year CAD risk by decile. Brier’s scores are provided as a summary calibration measure. Conventional risk scores were re-calibrated using the same approach within the UK Biobank training cohort. b. Predictive performance of individual feature categories and their combinations in the meta-prediction model. The left panel illustrates the Area Under the Receiver Operating Characteristic (AUROC) for various models. The right panel shows the Area Under the Precision-Recall Curve (AUPRC). The models compared include the full meta-prediction model, partial models restricted to a single feature class and feature class combination, including 15 meta-features; 22 polygenic risk scores (PRSs) and 13 modifiable risk factors (MFs); 13 MFs; sex and age with 12 PRSs; 12 PRSs; and sex and age alone. Abbreviation: MF: Modifiable factors.

Extended Data Fig. 3 Comparative performance of CAD risk prediction models in the UK Biobank.

This figure compares the predictive performance of our final perspective 10-year CAD meta-prediction model to established clinical risk scores, including PCE, QRISK3, PREVENT, and previous polygenic score benchmarks including GPSCAD, metaGRSCAD, Aragam2022, Tcheandjieu2022, and GPSMult, as well as ML models including ML4HEN-COX and UKBCRP using the incident CAD cohort (n = 33,419). For each model, the left panel depicts a scatter plot illustrating the incidence of CAD events across percentile bins of predicted risk, showcasing the predictive density of each model at 10-years of follow-up, while the right panel displays the 10-year cumulative risk trajectories for each risk prediction model, highlighting the ability of each model to stratify risk across time. Shaded regions in the right panel represent 95% confidence intervals (CI). In all cases, meta-prediction significantly outperforms prior approaches.

Extended Data Fig. 4 SHAP summary plots for meta-features in the final model in the UK Biobank.

This figure presents a comprehensive collection of SHAP summary plots for each of the 35 meta-features integrated into the meta-prediction for the incident CAD cohort (n = 33,419). Each subplot provides insight into the contribution of individual baseline features towards the prediction of each meta-feature. The color coding indicates each feature value (red = high, blue = low), with the SHAP value on the y-axis reflecting the impact on the model’s output (positive = contribution to the positive class, negative = contribution to the negative class). Abbreviations: Dx: Diagnosis; MUF: modifiable and unmodifiable factors; UF: unmodifiable factors.

Extended Data Fig. 5 External validation of the streamlined meta-prediction.

Comparative test accuracy for the streamlined meta-prediction model in UKBB (AUROC = 0.81) versus AoU external validation with bootstrapping 95% confidence interval (CI) as the shaded part. a. Tested on the larger AoU validation cohort (AUROC = 0.81), further stratified by self-reported European (EUR), Africna (AFR) and Hispanic (HIS) groups; AoU-EHR (AUROC = 0.81), AoU-AFR (AUROC = 0.79), and AoU-HIS (AUROC = 0.84) respectively. b. And tested in the AoU sub-cohort with complete phenotypes (AUROC = 0.78). Additional conditions tested in AoU include PCE (AUC = 0.72), QRISK3 (AUC = 0.73), and PREVENT (AUC = 0.73). Abbreviations: AoU: All of Us research program; AUC: area under curve; UKBB: UK Biobank.

Extended Data Fig. 6 SHAP explanation of streamlined meta-prediction in UK Biobank and All of Us research program.

Both panels display the total 50 features contributing to the streamlined meta-prediction. The vertical axis orders each feature by its overall importance to risk prediction. Each point represents a participant and is color-coded according to the feature’s direction of contribution to the individuals’ risk prediction (red increased risk, blue decreased risk). The value associated with each point on the x-axis represents the magnitude of its contribution to the individuals’ risk prediction. The left panel presents the results from the test set of UK Biobank (n = 33,419). The right panel presents the results from the external validation set of All of Us research program (n = 198,424). Abbreviations: Dx: Diagnosis; MUF: modifiable and unmodifiable factors; UF: unmodifiable factors.

Extended Data Fig. 7 Overview of generalizable genetic meta-prediction model in the UK Biobank.

Using the incident CAD cohort (n = 33,419), we demonstrated a. Calibration plot of generalizable genetic meta-prediction model for predicted vs observed risk by decile within the test set of the incident CAD cohort (Brier’s score of generalizable genetic meta-prediction: 0.073). b. Cumulative risk curve of CAD (%) development over the 10-year follow-up period stratified by percentile of predicted risk. c. Incidence rates of CAD observed across the test cohort, stratified by percentile of predicted risk. Data are presented as mean values ± SD. d. and e. Comparative test accuracy (n = 33,419) for the generalizable genetic meta-prediction model (AUROC = 0.80; AUPRC = 0.28) versus other standard clinical and research risk scores, including PCE (AUROC = 0.73; AUPRC = 0.20), QRISK3 (AUROC = 0.74; AUPRC = 0.22), PREVENT (AUROC = 0.72; AUPRC = 0.19) and GPSCAD (AUROC = 0.73; AUPRC = 0.20). Abbreviations: AUC: Area under curve; CAD: coronary artery disease; EHR: electric health records; PCE: pool cohort equations.

Extended Data Fig. 8 Feature importance and SHAP summary for 10-year prospective CAD risk prediction of generalizable genetic model in the UK Biobank.

This composite figure provides two views of the importance of the final 50 features used in prospective 10-year CAD meta-prediction of generalizable genetic model for the incident CAD cohort (n = 33,419). The left panel displays a bar plot ranking the features in order of importance as quantified by the mean absolute SHAP value. The length of each bar represents the magnitude of a feature’s importance. The right panel presents a SHAP summary plot with each point representing the feature’s SHAP value at an individual level. Color coding represents the feature value (red for high, blue for low). Positive values correspond to the prediction contribution to the positive class, and negative values correspond to the prediction contribution to the negative class. Abbreviations: Dx: Diagnosis; MUF: modifiable and unmodifiable factors; UF: unmodifiable factors.

Extended Data Fig. 9 Feature distribution pre- and post-imputation in the UK Biobank.

This figure demonstrates the preservation of variable distributions pre- and post- imputation, stratified by sex. For numeric features, density plots are presented, with bolded edges representing the distribution pre-imputation and light edges post-imputation. For categorical features, two sets of stacked histograms are presented side by side for comparison: on the left are the original distributions pre-imputation, and on the right are the distributions post-imputation of missing data.

Extended Data Table 1 Baseline characteristics of the UK Biobank participants in the study (n = 339,667)

Supplementary information

Reporting Summary

Supplementary Tables

Supplementary Tables 1–27.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, SF., Lee, S.E., Sadaei, H.J. et al. Meta-prediction of coronary artery disease risk. Nat Med 31, 2277–2288 (2025). https://doi.org/10.1038/s41591-025-03648-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41591-025-03648-0

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research