Meta-prediction of coronary artery disease risk

Chen, Shang-Fu; Lee, Sang Eun; Sadaei, Hossein Javedani; Park, Jun-Bean; Khattab, Ahmed; Chen, Jie-Fu; Henegar, Corneliu; Wineinger, Nathan E.; Muse, Evan D.; Torkamani, Ali

doi:10.1038/s41591-025-03648-0

Article
Published: 16 April 2025

Meta-prediction of coronary artery disease risk

Nature Medicine volume 31, pages 2277–2288 (2025)Cite this article

10k Accesses
9 Citations
180 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 19 August 2025

This article has been updated

Abstract

Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide, and accurately predicting individual risk is critical for prevention. Here we aimed to integrate unmodifiable risk factors, such as age and genetics, with modifiable risk factors, such as clinical and biometric measurements, into a meta-prediction framework that produces actionable and personalized risk estimates. In the initial development of the model, ~2,000 predictive features were considered, including demographic data, lifestyle factors, physical measurements, laboratory tests, medication usage, diagnoses and genetics. To power our meta-prediction approach, we stratified the UK Biobank into two primary cohorts: first, a prevalent CAD cohort used to train predictive models for cross-sectional prediction at baseline and prospective estimation of contributing risk factor levels and diagnoses (baseline models) and, second, an incident CAD cohort using, in part, these baseline models as meta-features to train a final CAD incident risk prediction model. The resultant 10-year incident CAD risk model, composed of 15 derived meta-features with multiple embedded polygenic risk scores, achieves an area under the curve of 0.84. In an independent test cohort from the All of Us research program, this model achieved an area under the curve of 0.81 for predicting 10-year incident CAD risk, outperforming standard clinical scores and previously developed integrative models. Moreover, this framework enables the generation of individualized risk reduction profiles by quantifying the potential impact of standard clinical interventions. Notably, genetic risk influences the extent to which these interventions reduce overall CAD risk, allowing for tailored prevention strategies.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of cohort construction, model development and performance assessment for 10-year incident CAD risk meta-prediction in the UKBB.**

**Fig. 2: Comparative performance of meta-prediction stratified by standard risk factors in the UKBB population.**

**Fig. 3: SHAP summary plot of features in the meta-prediction framework in the UKBB population.**

**Fig. 4: Identification of CAD risk subgroups and distinguishing features in the UKBB population.**

**Fig. 5: Benefit of clinical interventions in genetic risk and risk subgroups in the UKBB population.**

Exome sequence analysis identifies rare coding variants associated with a machine learning-based marker for coronary artery disease

Article 11 June 2024

A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease

Article Open access 06 July 2023

The necessity of incorporating non-genetic risk factors into polygenic risk score models

Article Open access 20 February 2023

Data availability

All data are made available from the UKBB⁵⁵ (https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access) and All of Us research program⁷⁴ (https://workbench.researchallofus.org/login) to researchers from universities and other institutions with genuine research inquiries following institutional review board and biobank approval. This research has been conducted using the UKBB resource under application number 41999 and the All of Us v7 Curated Data Repository (R2022Q4R9 and C2022Q4R9 versions).

Code availability

The machine learning code used to generate the meta-predictions is available via GitHub at http://github.com/TorkamaniLab/CAD_meta_prediction.

Change history

19 August 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41591-025-03925-y

References

Damask, A. et al. Patients with high genome-wide polygenic risk scores for coronary artery disease may receive greater clinical benefit from alirocumab treatment in the ODYSSEY OUTCOMES trial. Circulation https://doi.org/10.1161/CIRCULATIONAHA.119.044434 (2020).
Marston, N. A. et al. Predicting benefit from evolocumab therapy in patients with atherosclerotic disease using a genetic risk score. Circulation https://doi.org/10.1161/CIRCULATIONAHA.119.043805 (2020).
Mega, J. L. et al. Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials. Lancet 385, 2264–2271 (2015).
Article CAS PubMed PubMed Central Google Scholar
Natarajan, P. et al. Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting. Circulation 135, 2091–2101 (2017).
Article PubMed PubMed Central Google Scholar
Bolli, A., Di Domenico, P., Pastorino, R., Busby, G. B. & Bottà, G. Risk of coronary artery disease conferred by low-density lipoprotein cholesterol depends on polygenic background. Circulation 143, 1452–1454 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ye, Y. et al. Interactions between enhanced polygenic risk scores and lifestyle for cardiovascular disease, diabetes, and lipid levels. Circ. Genom. Precis. Med. 14, E003128 (2021).
Article CAS PubMed PubMed Central Google Scholar
Muse, E. D. et al. Impact of polygenic risk communication: an observational mobile application-based coronary artery disease study. NPJ Digit. Med. 5, 30 (2022).
Article PubMed PubMed Central Google Scholar
Hollands, G. J. et al. The impact of communicating genetic risks of disease on risk-reducing health behaviour: systematic review with meta-analysis. BMJ 352, i1102 (2016).
Article PubMed PubMed Central Google Scholar
Widén, E. et al. How communicating polygenic and clinical risk for atherosclerotic cardiovascular disease impacts health behavior: an observational follow-up study. Circ. Genom. Precis. Med. 15, E003459 (2022).
Article PubMed Google Scholar
Knowles, J. W. et al. Impact of a genetic risk score for coronary artery disease on reducing cardiovascular risk: a pilot randomized controlled study. Front. Cardiovasc. Med. 4, 53 (2017).
Article PubMed PubMed Central Google Scholar
Maamari, D. J. et al. Clinical implementation of combined monogenic and polygenic risk disclosure for coronary artery disease. JACC Adv. 1, 100068 (2022).
Article PubMed PubMed Central Google Scholar
Roberts, M. C., Khoury, M. J. & Mensah, G. A. Perspective: The clinical use of polygenic risk scores: race, ethnicity, and health disparities. Ethn. Dis. 29, 513–516 (2019).
Article PubMed PubMed Central Google Scholar
Lewis, A. C. F. & Green, R. C. Polygenic risk scores in the clinic: new perspectives needed on familiar ethical issues. Genome Med. 13, 14 (2021).
Article PubMed PubMed Central Google Scholar
Martens, F. K., Tonk, E. C. M. & Janssens, A. C. J. W. Evaluation of polygenic risk models using multiple performance measures: a critical assessment of discordant results. Genet. Med. 21, 391–397 (2019).
Article PubMed Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Article CAS PubMed PubMed Central Google Scholar
Khan, S. S. et al. Coronary artery calcium score and polygenic risk score for the prediction of coronary heart disease events. JAMA 329, 1768–1777 (2023).
Article CAS PubMed PubMed Central Google Scholar
Mosley, J. D. et al. Predictive accuracy of a polygenic risk score compared with a clinical risk score for incident coronary heart disease. JAMA 323, 627–635 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wünnemann, F. et al. Validation of genome-wide polygenic risk scores for coronary artery disease in French Canadians. Circ. Genom. Precis. Med. 12, e002481 (2019).
Article PubMed PubMed Central Google Scholar
Murthy, V. L. et al. Polygenic risk, fitness, and obesity in the Coronary Artery Risk Development in Young Adults (CARDIA) study. JAMA Cardiol. 5, 263–271 (2020).
Article Google Scholar
Wells, Q. S. et al. Polygenic risk score to identify subclinical coronary heart disease risk in young adults. Circ. Genom. Precis. Med. 14, e003341 (2021).
Article CAS PubMed PubMed Central Google Scholar
Marston, N. A. et al. Predictive utility of a coronary artery disease polygenic risk score in primary prevention. JAMA Cardiol. 8, 130–137 (2023).
Article PubMed Google Scholar
Isgut, M., Sun, J., Quyyumi, A. A. & Gibson, G. Highly elevated polygenic risk scores are better predictors of myocardial infarction risk early in life than later. Genome Med. 13, 13 (2021).
Article PubMed PubMed Central Google Scholar
Mars, N. et al. Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat. Med. 26, 549–557 (2020).
Article CAS PubMed Google Scholar
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Article CAS PubMed PubMed Central Google Scholar
Elliott, J. et al. Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease. JAMA 323, 636–645 (2020).
Article PubMed PubMed Central Google Scholar
Aragam, K. G. et al. Limitations of contemporary guidelines for managing patients at high genetic risk of coronary artery disease. J. Am. Coll. Cardiol. 75, 2769–2780 (2020).
Article CAS PubMed PubMed Central Google Scholar
Aragam, K. G. & Natarajan, P. Polygenic scores to assess atherosclerotic cardiovascular disease risk: clinical perspectives and basic implications. Circ. Res. 126, 1159–1177 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sun, L. et al. Polygenic risk scores in cardiovascular risk prediction: a cohort study and modelling analyses. PLoS Med. 18, e1003498 (2021).
Article CAS PubMed PubMed Central Google Scholar
Riveros-Mckay, F. et al. Integrated polygenic tool substantially enhances coronary artery disease prediction. Circ. Genom. Precis. Med. 14, E003304 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hindy, G. et al. Abstract 16565: Integration of a genome-wide polygenic score with ACC/AHA pooled cohorts equation in prediction of coronary artery disease events in >285,000 participants. Circulation 140, abstr. 16565 (2019).
Inouye, M. et al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 72, 1883–1893 (2018).
Article PubMed PubMed Central Google Scholar
Ntalla, I. et al. Genetic risk score for coronary disease identifies predispositions to cardiovascular and noncardiovascular diseases. J. Am. Coll. Cardiol. 73, 2932–2942 (2019).
Article PubMed Google Scholar
Abraham, G. et al. Genomic risk score offers predictive performance comparable to clinical risk factors for ischaemic stroke. Nat. Commun. 10, 5819 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lin, J. et al. Integration of biomarker polygenic risk score improves prediction of coronary heart disease. JACC Basic Transl Sci. 8, 1489–1499 (2023).
Article PubMed PubMed Central Google Scholar
Vassy, J. L. et al. Cardiovascular disease risk assessment using traditional risk factors and polygenic risk scores in the Million Veteran Program. JAMA Cardiol. 8, 564–574 (2023).
Article PubMed PubMed Central Google Scholar
Agrawal, S. et al. Selection of 51 predictors from 13,782 candidate multimodal features using machine learning improves coronary artery disease prediction. Patterns 2, 100364 (2021).
Article PubMed PubMed Central Google Scholar
Patel, A. P. et al. A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease. Nat. Med. https://doi.org/10.1038/s41591-023-02429-x (2023).
Torkamani, A., Andersen, K. G., Steinhubl, S. R. & Topol, E. J. High-definition medicine. Cell 170, 828–843 (2017).
Article CAS PubMed PubMed Central Google Scholar
Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).
Article CAS PubMed Google Scholar
Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).
Article CAS PubMed Google Scholar
Gola, D., Erdmann, J., Müller-Myhsok, B., Schunkert, H. & König, I. R. Polygenic risk scores outperform machine learning methods in predicting coronary artery disease status. Genet. Epidemiol. 44, 125–138 (2020).
Article PubMed Google Scholar
Xu, Y. et al. A machine learning model for disease risk prediction by integrating genetic and non-genetic factors. In Proc. 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (eds Adjeroh, D. et al.) 868–871 (IEEE, 2022).
Gibson, G. On the utilization of polygenic risk scores for therapeutic targeting. PLoS Genet. 15, e1008060 (2019).
Article PubMed PubMed Central Google Scholar
Goddard, K. A. B., Lee, K., Buchanan, A. H., Powell, B. C. & Hunter, J. E. Establishing the medical actionability of genomic variants. Annu. Rev. Genomics Hum. Genet. 23, 173–192 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y., Tsuo, K., Kanai, M., Neale, B. M. & Martin, A. R. Challenges and opportunities for developing more generalizable polygenic risk scores. Annu. Rev. Biomed. Data Sci. 5, 293–320 (2022).
Article PubMed PubMed Central Google Scholar
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wray, N. R., Wijmenga, C., Sullivan, P. F., Yang, J. & Visscher, P. M. Common disease is more complex than implied by the core gene omnigenic model. Cell https://doi.org/10.1016/j.cell.2018.05.051 (2018).
Mathieson, I. The omnigenic model and polygenic prediction of complex traits. Am. J. Hum. Genet. 108, 1558–1563 (2021).
Article CAS PubMed PubMed Central Google Scholar
Aragam, K. G. et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat. Genet. 54, 1803–1815 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tcheandjieu, C. et al. Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat. Med. 28, 1679–1692 (2022).
Article CAS PubMed PubMed Central Google Scholar
You, J. et al. Development of machine learning-based models to predict 10-year risk of cardiovascular disease: a prospective cohort study. Stroke Vasc. Neurol. 8, 475–485 (2023).
Article PubMed PubMed Central Google Scholar
Zeitouni, M. et al. Performance of guideline recommendations for prevention of myocardial infarction in young adults. J. Am. Coll. Cardiol. 76, 653–664 (2020).
Article CAS PubMed PubMed Central Google Scholar
De Filippis, A. P. et al. Risk score overestimation: the impact of individual cardiovascular risk factors and preventive therapies on the performance of the American Heart Association–American College of Cardiology–Atherosclerotic Cardiovascular Disease risk score in a modern multi-ethnic cohort. Eur. Heart J. 38, 598–608 (2017).
Google Scholar
Livingstone, S. et al. Effect of competing mortality risks on predictive performance of the QRISK3 cardiovascular risk prediction tool in older people and those with comorbidity: external validation population cohort study. Lancet Healthy Longev. 2, e352–e361 (2021).
Article PubMed PubMed Central Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Patel, A. P., Wang, M., Kartoun, U., Ng, K. & Khera, A. V. Quantifying and understanding the higher risk of atherosclerotic cardiovascular disease among South Asian individuals: results from the UK Biobank prospective cohort study. Circulation 144, 410–422 (2021).
Article CAS PubMed PubMed Central Google Scholar
Goff, D. C. et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines. Circulation 129, 49–73 (2014).
Article Google Scholar
Hippisley-Cox, J., Coupland, C. & Brindle, P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 357, j2099 (2017).
CatBoost Encoder Category Encoders 2.6.3 documentation; https://contrib.scikit-learn.org/category_encoders/catboost.html
Wilson, S. miceRanger: multiple imputation by chained equations with random forests. R version 4.0.0 https://cran.r-project.org/package=miceRanger (2021).
Khan, S. S. et al. Novel prediction equations for absolute risk assessment of total cardiovascular disease incorporating cardiovascular–kidney–metabolic health: a scientific statement from the American Heart Association. Circulation https://doi.org/10.1161/CIR.0000000000001191 (2023).
Arnett, D. K. et al. 2019 ACC/AHA guideline on the primary prevention of cardiovascular disease: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on clinical practice guidelines. Circulation 140, e563–e595 (2019).
PubMed PubMed Central Google Scholar
Chen, S. F. et al. Genotype imputation and variability in polygenic risk score estimation. Genome Med. 12, 100 (2020).
Article PubMed PubMed Central Google Scholar
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nielsen, J. B. et al. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat. Genet. https://doi.org/10.1038/s41588-018-0171-3 (2018).
Evangelou, E. et al. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 50, 1412–1425 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Article CAS PubMed PubMed Central Google Scholar
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
Article CAS PubMed PubMed Central Google Scholar
Conti, D. V. et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet. 53, 65–75 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50, 1505–1513 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).
Article CAS PubMed PubMed Central Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Article PubMed PubMed Central Google Scholar
The All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).
Khattab, A., Chen, S.-F., Wineinger, N. & Torkamani, A. AoUPRS: a cost-effective and versatile PRS calculator for the All of Us program. Preprint at bioRxiv https://doi.org/10.1101/2024.07.11.603165 (2024).
Norgeot, B. et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat. Med. https://doi.org/10.1038/s41591-020-1041-y (2020).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann. Intern. Med. 162, 55–63 (2015).
Article PubMed Google Scholar

Download references

Acknowledgements

We thank J. C. Ducom, L. Dong and the Scripps High-Performance Computing service for their support. Thanks to E. Topol for his comments on this paper. This work is supported by R01HG010881 to A.T. as well as grant UM1TR004407. We recognize that some of the factors labeled unmodifiable in this paper may be modifiable in some circumstances.

Author information

Authors and Affiliations

Scripps Research Translational Institute, La Jolla, CA, USA
Shang-Fu Chen, Hossein Javedani Sadaei, Ahmed Khattab, Corneliu Henegar, Nathan E. Wineinger, Evan D. Muse & Ali Torkamani
Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, CA, USA
Shang-Fu Chen, Hossein Javedani Sadaei, Ahmed Khattab, Corneliu Henegar, Nathan E. Wineinger, Evan D. Muse & Ali Torkamani
Department of Cardiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Sang Eun Lee
Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
Jun-Bean Park
Cardiovascular Center, Seoul National University Hospital, Seoul, Republic of Korea
Jun-Bean Park
Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Jie-Fu Chen
Scripps Clinic, La Jolla, CA, USA
Evan D. Muse

Authors

Shang-Fu Chen
View author publications
Search author on:PubMed Google Scholar
Sang Eun Lee
View author publications
Search author on:PubMed Google Scholar
Hossein Javedani Sadaei
View author publications
Search author on:PubMed Google Scholar
Jun-Bean Park
View author publications
Search author on:PubMed Google Scholar
Ahmed Khattab
View author publications
Search author on:PubMed Google Scholar
Jie-Fu Chen
View author publications
Search author on:PubMed Google Scholar
Corneliu Henegar
View author publications
Search author on:PubMed Google Scholar
Nathan E. Wineinger
View author publications
Search author on:PubMed Google Scholar
Evan D. Muse
View author publications
Search author on:PubMed Google Scholar
Ali Torkamani
View author publications
Search author on:PubMed Google Scholar

Contributions

Concept and design: A.T. Acquisition, analysis or interpretation of data: A.T., S.-F.C., S.E.L., H.J.S., C.H., J.-F.C. and N.E.W. Drafting of the paper: A.T., S.-F.C., J.-B.P. and A.K. Critical revision of the paper for important intellectual content: A.T., J.-B.P. and E.D.M.

Corresponding author

Correspondence to Ali Torkamani.

Ethics declarations

Competing interests

A.T. declares that he is a cofounder and equity shareholder of GeneXwell Inc. A.T. is an advisor to InsideTracker. The other authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Michael Basson, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Feature importance and SHAP summary for 10-year prospective CAD risk prediction in the UK Biobank.

This composite figure provides two views of the importance of the final 50 features used in prospective 10-year CAD meta-prediction for the incident CAD cohort (n = 33,419). The left panel displays a bar plot ranking the features in order of importance as quantified by the mean absolute SHAP value. The length of each bar represents the magnitude of a feature’s importance. The right panel presents a SHAP summary plot with each point representing the feature’s SHAP value at an individual level. Color coding represents the feature value (red for high, blue for low). Positive values correspond to the prediction contribution to the positive class, and negative values correspond to the prediction contribution to the negative class. Abbreviations: Dx: Diagnosis; MUF: modifiable and unmodifiable factors; UF: unmodifiable factors.

Extended Data Fig. 2 Evaluating the calibration and predictive value of feature categories for the meta-prediction model in the UK Biobank.

This figure evaluates the performance of the meta-prediction model compared to existing clinical risk scores within the test set of the incident CAD cohort (n = 33,419), highlighting both the model’s calibration and the predictive contribution of distinct feature categories. a. Calibration plot showing the predicted vs. observed 10-year CAD risk by decile. Brier’s scores are provided as a summary calibration measure. Conventional risk scores were re-calibrated using the same approach within the UK Biobank training cohort. b. Predictive performance of individual feature categories and their combinations in the meta-prediction model. The left panel illustrates the Area Under the Receiver Operating Characteristic (AUROC) for various models. The right panel shows the Area Under the Precision-Recall Curve (AUPRC). The models compared include the full meta-prediction model, partial models restricted to a single feature class and feature class combination, including 15 meta-features; 22 polygenic risk scores (PRSs) and 13 modifiable risk factors (MFs); 13 MFs; sex and age with 12 PRSs; 12 PRSs; and sex and age alone. Abbreviation: MF: Modifiable factors.

Extended Data Fig. 3 Comparative performance of CAD risk prediction models in the UK Biobank.

This figure compares the predictive performance of our final perspective 10-year CAD meta-prediction model to established clinical risk scores, including PCE, QRISK3, PREVENT, and previous polygenic score benchmarks including GPS_CAD, metaGRS_CAD, Aragam₂₀₂₂, Tcheandjieu₂₀₂₂, and GPS_Mult, as well as ML models including ML4H_EN-COX and UKBCRP using the incident CAD cohort (n = 33,419). For each model, the left panel depicts a scatter plot illustrating the incidence of CAD events across percentile bins of predicted risk, showcasing the predictive density of each model at 10-years of follow-up, while the right panel displays the 10-year cumulative risk trajectories for each risk prediction model, highlighting the ability of each model to stratify risk across time. Shaded regions in the right panel represent 95% confidence intervals (CI). In all cases, meta-prediction significantly outperforms prior approaches.

Extended Data Fig. 4 SHAP summary plots for meta-features in the final model in the UK Biobank.

This figure presents a comprehensive collection of SHAP summary plots for each of the 35 meta-features integrated into the meta-prediction for the incident CAD cohort (n = 33,419). Each subplot provides insight into the contribution of individual baseline features towards the prediction of each meta-feature. The color coding indicates each feature value (red = high, blue = low), with the SHAP value on the y-axis reflecting the impact on the model’s output (positive = contribution to the positive class, negative = contribution to the negative class). Abbreviations: Dx: Diagnosis; MUF: modifiable and unmodifiable factors; UF: unmodifiable factors.

Extended Data Fig. 5 External validation of the streamlined meta-prediction.

Comparative test accuracy for the streamlined meta-prediction model in UKBB (AUROC = 0.81) versus AoU external validation with bootstrapping 95% confidence interval (CI) as the shaded part. a. Tested on the larger AoU validation cohort (AUROC = 0.81), further stratified by self-reported European (EUR), Africna (AFR) and Hispanic (HIS) groups; AoU-EHR (AUROC = 0.81), AoU-AFR (AUROC = 0.79), and AoU-HIS (AUROC = 0.84) respectively. b. And tested in the AoU sub-cohort with complete phenotypes (AUROC = 0.78). Additional conditions tested in AoU include PCE (AUC = 0.72), QRISK3 (AUC = 0.73), and PREVENT (AUC = 0.73). Abbreviations: AoU: All of Us research program; AUC: area under curve; UKBB: UK Biobank.

Extended Data Fig. 6 SHAP explanation of streamlined meta-prediction in UK Biobank and All of Us research program.

Both panels display the total 50 features contributing to the streamlined meta-prediction. The vertical axis orders each feature by its overall importance to risk prediction. Each point represents a participant and is color-coded according to the feature’s direction of contribution to the individuals’ risk prediction (red increased risk, blue decreased risk). The value associated with each point on the x-axis represents the magnitude of its contribution to the individuals’ risk prediction. The left panel presents the results from the test set of UK Biobank (n = 33,419). The right panel presents the results from the external validation set of All of Us research program (n = 198,424). Abbreviations: Dx: Diagnosis; MUF: modifiable and unmodifiable factors; UF: unmodifiable factors.

Extended Data Fig. 7 Overview of generalizable genetic meta-prediction model in the UK Biobank.

Using the incident CAD cohort (n = 33,419), we demonstrated a. Calibration plot of generalizable genetic meta-prediction model for predicted vs observed risk by decile within the test set of the incident CAD cohort (Brier’s score of generalizable genetic meta-prediction: 0.073). b. Cumulative risk curve of CAD (%) development over the 10-year follow-up period stratified by percentile of predicted risk. c. Incidence rates of CAD observed across the test cohort, stratified by percentile of predicted risk. Data are presented as mean values ± SD. d. and e. Comparative test accuracy (n = 33,419) for the generalizable genetic meta-prediction model (AUROC = 0.80; AUPRC = 0.28) versus other standard clinical and research risk scores, including PCE (AUROC = 0.73; AUPRC = 0.20), QRISK3 (AUROC = 0.74; AUPRC = 0.22), PREVENT (AUROC = 0.72; AUPRC = 0.19) and GPS_CAD (AUROC = 0.73; AUPRC = 0.20). Abbreviations: AUC: Area under curve; CAD: coronary artery disease; EHR: electric health records; PCE: pool cohort equations.

Extended Data Fig. 8 Feature importance and SHAP summary for 10-year prospective CAD risk prediction of generalizable genetic model in the UK Biobank.

This composite figure provides two views of the importance of the final 50 features used in prospective 10-year CAD meta-prediction of generalizable genetic model for the incident CAD cohort (n = 33,419). The left panel displays a bar plot ranking the features in order of importance as quantified by the mean absolute SHAP value. The length of each bar represents the magnitude of a feature’s importance. The right panel presents a SHAP summary plot with each point representing the feature’s SHAP value at an individual level. Color coding represents the feature value (red for high, blue for low). Positive values correspond to the prediction contribution to the positive class, and negative values correspond to the prediction contribution to the negative class. Abbreviations: Dx: Diagnosis; MUF: modifiable and unmodifiable factors; UF: unmodifiable factors.

Extended Data Fig. 9 Feature distribution pre- and post-imputation in the UK Biobank.

This figure demonstrates the preservation of variable distributions pre- and post- imputation, stratified by sex. For numeric features, density plots are presented, with bolded edges representing the distribution pre-imputation and light edges post-imputation. For categorical features, two sets of stacked histograms are presented side by side for comparison: on the left are the original distributions pre-imputation, and on the right are the distributions post-imputation of missing data.

Extended Data Table 1 Baseline characteristics of the UK Biobank participants in the study (n = 339,667)

Full size table

Supplementary information

Reporting Summary

Supplementary Tables

Supplementary Tables 1–27.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, SF., Lee, S.E., Sadaei, H.J. et al. Meta-prediction of coronary artery disease risk. Nat Med 31, 2277–2288 (2025). https://doi.org/10.1038/s41591-025-03648-0

Download citation

Received: 01 December 2023
Accepted: 07 March 2025
Published: 16 April 2025
Version of record: 16 April 2025
Issue date: July 2025
DOI: https://doi.org/10.1038/s41591-025-03648-0

This article is cited by

AI approaches for predicting progression to acute coronary syndrome among stable coronary heart disease patients
- Haozhong Ma
- Hexiang Bai
- Jian Wu
npj Cardiovascular Health (2025)