Abstract
Polygenic risk scores (PRSs) can improve clinical risk tools for coronary artery disease (CAD). This study assessed a risk model integrating PRS across populations, focusing on individuals with borderline/intermediate clinical risk. We developed ancestry-specific ensemble models combining multi-ancestry PRSs for CAD and type 2 diabetes. The cross-ancestry PRS (caPRS) was integrated with the Pooled Cohort Equations (PCE) to derive the cross-ancestry Integrated Risk Score (caIRS), estimating 10-year CAD risk. The caIRS outperformed the PCE across four cohorts, including UK Biobank and Penn Medicine Biobank, with significant improvements for Hispanic and South Asian individuals. For those at borderline/intermediate PCE risk (5–20%), the caIRS reclassified between 7.0% and 10.7% into the high-risk group, which had higher CAD incidence and hazard ratios ranging from 3.20 to 3.84. The CAD caIRS, combining genetic and clinical factors, enhances high-risk CAD identification across diverse populations, potentially improving treatment guidance.
Similar content being viewed by others
Introduction
Coronary artery disease (CAD) is the leading cause of death in the United States1. An estimated 20.5 million Americans 20 years of age or older have prevalent CAD2. Development of CAD is influenced by several factors, including age, sex, genetics, lifestyle, and comorbidities3. However, CAD is preventable, making it a focus of public health efforts to reduce risk through lifestyle therapies and clinical intervention3.
The American College of Cardiology/American Heart Association (ACC/AHA) Task Force on Clinical Practice Guidelines currently recommends using the Pooled Cohort Equations (PCE) Atherosclerotic Cardiovascular Disease (ASCVD) risk tool to estimate 10-year risk of a first ASCVD event and to guide decisions on preventive interventions in asymptomatic adults 40–75 years of age4,5. The PCE model includes a limited number of established cardiovascular risk factors: age, race, sex, systolic blood pressure, total cholesterol level, high-density lipoprotein cholesterol (HDL-C) level, diabetes status, and smoking status4. However, it does not take into account family history or genetic risk and only has two categories of race, Black/African-American and White4. It is also limited by its focus on 10-year risk and does not include individuals younger than 40, both of which make it less useful in younger adults.
The calibration of the original PCE model has been demonstrated to vary depending on the target cohort and its characteristics6,7. As a consequence, current guidelines recommend considering additional risk-enhancing factors to guide preventive interventions in case of borderline (5-7.5%) and intermediate PCE risk (7.5–20%)5. Polygenic risk scores (PRSs), which aggregate information about genetic liability of disease across thousands or millions of genetic variants, have the potential to improve the accuracy of clinical risk prediction tools such as the PCE8. Recent studies integrating a CAD PRS with traditional risk factors have shown improved risk prediction for CAD9,10, and PRSs have been recognized as potential risk-enhancing factors for CAD risk prediction in a recent AHA Scientific Statement11. Nevertheless, generalizability across cohorts and populations remains a major challenge for CAD PRS models. A major factor limiting generalizability has been the paucity of large genetic studies among individuals with diverse genetic backgrounds, and most studies to date have largely validated their models in non-US based populations.
We sought to enhance the performance of CAD risk prediction across diverse populations by constructing a cross-ancestry Integrated Risk Score (caIRS) model. This model combines established clinical risk factors with a cross-ancestry PRS (caPRS) to account for population differences often overlooked in traditional PRS models. The caPRS methodology builds on the previously described model for Breast Cancer risk prediction12, which leverages continuous ancestry estimates and population-specific effect size estimates to better tailor risk prediction for genetically diverse populations and further improves it by replacing single PRS models with optimized PRS ensemble scores. The caIRS provides a unified framework that integrates genetic predisposition and traditional clinical predictors. We evaluated the caPRS and caIRS using 4 independent validation cohorts, including a contemporary US-based cohort from the Penn Medicine Biobank (PMBB). We specifically assessed the utility of the caIRS as a screening tool to identify high-risk individuals who have uncertain risk based on traditional risk assessments for CAD and may benefit from early intervention.
Results
We validated the performance of the CAD caPRS and caIRS in prospectively predicting the 10-year risk of incident CAD using 4 independent, multi-ancestry validation cohorts and compared the predictive performance of the caIRS to that of the PCE model. The corresponding validation cohort characteristics are presented in Table 1.
Predictive power of the caPRS in 10-year incident CAD risk prediction across ethnicities
To evaluate the predictive performance of the caPRS in estimating the 10-year risk of incident CAD, we used a Cox PH model, adjusting for baseline age and sex within each validation cohort and distinct self-reported ethnicity groups, analyzed independently.
The association between caPRS and 10-year CAD incidence was statistically significant at the 0.05 level across all 4 validation cohorts. The overall HR per SD ranged from 1.41 to 1.79 with a corresponding C-index from 0.72 to 0.77 (Fig. 1a and b). The caPRS was significantly associated with incident CAD across all (self-reported) ethnic subgroups. The strongest associations observed in Hispanic (HR per SD: 1.69; 95% CI, 1.24–2.30), East Asian (Asian American) individuals (HR per SD: 1.77; 95% CI, 1.62–1.93) and South Asian individuals (HR per SD: UKB, 1.82; 95% CI, 1.43–2.32) were comparable in magnitude to the effects observed among White population (HR per SD ranging from 1.47 to 1.82). However, the association was relatively weaker in the Black/African American individuals, with an HR per SD of 1.35 (95% CI, 1.07–1.80).
a caPRS Hazard Ratio (HR) adjusted for age and sex and b the corresponding C-index of the Cox PH model. Values on the left correspond to the plotted point estimates and their 95% CI. Cohort labels include count of incident cases and the total. META corresponds to the meta analysis of underrepresented ethnicities. c, d PRS stratified cumulative incidence of CAD among the borderline/intermediate (PCE) risk group for the White and African-American populations, respectively.
We additionally assessed PRS-based risk stratification among individuals at uncertain (borderline/intermediate) clinical risk (PCE) (Fig. 1c and d). Within the White population, high caPRS (top 20%) was associated with a 10-year CAD incidence of 27.5% (95% CI, 23.2–31.5) as compared to 14.0% (95% CI, 12.0–15.9) and 8.9% (95% CI, 6.5–11.2) for those with average (mid 40–60%) and low (bottom 20%) PRS. While in the case of Black/African-American population, high PRS was associated with a CAD incidence of 14.6% (95% CI, 9.6–19.4) compared with 8.8 (95% CI, 6.5–11.1) and 5.0% (95% CI, 1.4–8.4) for those with average and low PRS.
We observe a similar stratification of individuals at <5% and ≥20% clinical risk with individuals in the top 20% of caPRS having a higher incidence of disease overall in validation cohorts (Supplementary Figs. 1 and 2).
In head-to-head comparison, the caPRS model demonstrated improved performance compared to the recently developed multiancestry GPSMult9 model (and other models from PGS catalog), specifically within East Asian/Asian American and Black/African-American populations, with equivalent performance to GPSMult in pooled analysis across all populations (Supplementary Fig. 3).
Evaluation of the caIRS in comparison to PCE
We developed the caIRS model by combining the caPRS with the PCE, which serves as the current standard for assessing ASCVD risk within the US. We compared the performance of the caIRS to the PCE alone in the identification of individuals at high risk of developing CAD over a 10-year period.
The caIRS model consistently outperformed the baseline PCE across all validation cohorts, exhibiting improvements in discrimination as reflected by the C-index. The improvements ranged from 1.5 percentage points in MESA to 3 percentage points in UKB (Table 2). When using a fixed classification threshold of 20%, corresponding to ACA/AHA “high risk” threshold5, we observed a consistent increase in sensitivity, PPV, and NPV across all validation cohorts with a marginal decrease in specificity, except for in MESA (Table 2). This improvement was further corroborated by the positive and significant NRI values, which ranged from 5.9% (95% CI, 1.2–11.3) in MESA to 9.8% (95% CI, 8.3–11.5) in UKB (Table 2). The caIRS also outperformed the baseline PCE model when using lower classification thresholds of 7.5% and 10%, with the exception of the MESA cohort where the overall NRI did not reach statistical significance (Supplementary Table 1). Those alternative risk thresholds correspond to the recommendation for statin initiation by the US Preventive Services Task Force Recommendation Statement (7.5%)13 and European Prevention Guidelines (10%)14.
When examining self-reported ethnicity subgroups, the caIRS also surpassed the performance of the PCE model, with the most significant improvements seen among Hispanic (PMBB) and South Asian (UKB) individuals, where the NRI reached 16.2% (95% CI, 6.1–25.8) and 15.0% (95% CI, 6.1–24.2), respectively (Supplementary Table 2).
We also assessed the calibration of both the baseline PCE and caIRS models and found that both tended to overestimate the risk, mainly in the UKB and MESA cohorts, where CAD incidence is relatively lower, and, to a lesser extent in ARIC (Supplementary Fig. 4). The calibration was best in the case of the PMBB cohort where the average predicted risk only slightly deviated from the observed incidence for both models (PCE: 8.4%; caIRS: 8.8%; actual incidence: 8.2%).
Refining CAD risk stratification in intermediate/borderline risk individuals
Decisions on the initiation and intensity of statin therapy for individuals with a borderline/intermediate PCE risk depend on the presence of one or more risk-enhancing factors, such as family history of ASCVD, metabolic syndrome, and chronic kidney disease3. In its recent scientific statement, the AHA/ACC also recognized PRS as a potential risk-enhancing factor11.
To understand the utility of the caIRS in refining clinical risk estimates, we tested its performance among the subset of individuals classified as borderline/intermediate risk by the standard PCE. We used the recommended PCE risk threshold of 5% to less than 20% to assign subjects into the borderline/intermediate risk group and applied a 20% threshold to identify individuals who would be considered at high risk when PRS is factored into risk estimation. This 20% risk threshold corresponds to a Class I AHA/ACC recommendation for statin initiation5.
Across all validation cohorts, we observed a clear separation in the 10-year cumulative incidence of CAD between the high caIRS (≥20%) group and all others (<20%) (Fig. 2). The largest difference was observed in the PMBB where the 10-year cumulative incidence of CAD in the high caIRS group was 36.8% (95% CI, 31.3–41.8), compared to 11.6% (95% CI, 10.4–12.8) in all others and 14% (95% CI, 12.8–15.2) in the overall borderline/intermediate PCE group. Importantly, the caIRS model identified additional cases which would have been otherwise missed by the PCE alone with sensitivity ranging between 19.0% (MESA) and 26.8% (UKB). In PMBB, where the baseline PCE model demonstrated best calibration, the caIRS was able to reclassify 163 of 622 cases (26%) as high risk.
We observed consistent risk stratification results across all validation cohorts with the HR for 10-year CAD for the high caIRS group compared to all others (classified as borderline/intermediate by the PCE) ranging between 3.20 and 3.84 (Fig. 2).
Refining CAD risk stratification among Black/African-American population at intermediate/borderline risk
To assess the reclassification performance across self-reported ethnicity groups, we independently applied the caIRS to Black/African-American and White individuals at the borderline/intermediate PCE risk within the PMBB cohort (Fig. 3). The Black/African-American population reclassified into the high-risk group by the caIRS experienced significantly elevated CAD risk (P < 0.001) with the HR between the high-risk caIRS group and all others 4.21 (95% CI, 2.12–8.35) comparable to the White population (HR, 3.48; 95% CI, 2.86–4.25) and the observed 10-year incidence of CAD consistent with the expected high-risk threshold.
Discussion
We developed and validated an integrated risk score for prediction of CAD in individuals of diverse ancestries by combining a caPRS with the PCE risk estimator, a commonly used tool to predict 10-year risk of a first ASCVD event4,5. Unlike prior studies that predominantly focused on European populations, we utilized genetically diverse development cohorts, ensuring broader applicability. To construct the CAD caPRS, we incorporated ensemble PRS into the previously described caPRS development framework12. The caIRS further integrates caPRS and clinical risk factors into a unified risk prediction and includes a calibration constant to ensure that the average risk based on caIRS aligns with the average risk predicted by PCE alone for unaffected controls within each decile of the PCE score. Our study builds on prior efforts to improve PRS performance across diverse populations9,10 and demonstrates that the caPRS significantly improves risk prediction for CAD in individuals of diverse ancestries beyond traditional risk factors. Through extensive validation using 4 independent cohorts, including the contemporary US-based PMBB, our results add to the growing body of evidence supporting potential utility of PRSs in refining risk estimates for CAD and improving primary prevention efforts.
Current screening paradigms for primary prevention of CAD in the US do not account for genetic risk factors and fail to identify individuals at high polygenic risk15, which represents a missed opportunity for preventive interventions. We designed our study with the goal of building a PRS and integrated model applicable across multiple ancestries, including those traditionally underrepresented in GWAS studies. By incorporating multiethnic GWAS summary statistics16,17,18,19,20,21 in PRS ensemble construction and including non-European cohorts (Table 3) in the caPRS and caIRS development, we were able to obtain a model with a strong discrimination capacity across multiple ancestries. This is exemplified by the robust association of the caPRS (HR per SD) with incident CAD in Asian and Hispanic individuals, comparable in magnitude to White individuals. Similarly, we observed notable improvements in classification metrics when comparing the caIRS model with PCE with a NRI of 16.2% (95% CI, 6.1–25.8), 15.0% (95% CI, 6.1–24.2), and 9.8% (95% CI, −13.3–34.0) for Hispanic (PMBB), South Asian (UKB), and East Asian (Asian American, MESA) ethnicities, respectively.
Improved risk assessment for CAD has the potential to substantially impact the use of preventive interventions. Our results demonstrate that the caPRS and integrated model can efficiently refine risk estimates for individuals in the borderline and intermediate PCE risk categories, including Black/African-American population, traditionally underrepresented in genetic research. In the case of PMBB, where both (caIRS and PCE) models were well calibrated, the caIRS reclassified 9.7% of the borderline/intermediate risk subjects into the high-risk group which experienced almost four-fold increase in CAD incidence. By validating the caIRS across diverse cohorts with varied population structures and case definitions, we demonstrated the robustness of our methodology in real-world applications. For example, if all individuals reclassified as high risk (starting with those in the borderline/intermediate clinical risk category) initiated statin use and we assume taking statins results in a 25% reduction in CAD events22, then we can extrapolate that 1 additional event over 10 years could be prevented for every 179 individuals screened using the caIRS based on performance in our three US validation cohorts (PMBB, ARIC, MESA). This estimated reduction in events may be even greater as a number of studies reported that patients in the high end of the polygenic risk spectrum experience higher relative and absolute risk reduction from cholesterol-lowering medications, such as statins or PCSK9 inhibitors23,24.
Despite improvements in reclassification of Black/African-American individuals at borderline/intermediate clinical risk (when using caPRS and caIRS), the overall performance within this population was attenuated compared to other ethnicities. This likely reflects the reduced availability of GWAS summary statistics and individual-level data for PRS development, which is further compounded by the high genetic diversity of the African population25,26. In addition, recent studies demonstrated a continuous decay of PRS performance as the sample’s genetic distance from the training cohort increases27. This may further contribute to attenuated performance as the PRS developed in this study relied primarily on GWASs derived from European and Asian populations. Significant performance improvements will require access to larger GWAS studies with better representation of African individuals. The on-going initiatives to improve the diversity in genetics research, such as the All of Us research program28, and Million Veterans Program29 are expected to contribute to reducing the gap in PRS performance, but existing limitations should be carefully considered as PRSs begin to enter clinical practice.
The results presented here underscore the value of PRS as a risk-enhancing factor for CAD and warrant further prospective validation in a real-world setting. The NHGRI-funded eEMERGE-IV consortium is currently assessing the clinical impact of incorporating PRS for several conditions, including CAD, into a genome-informed risk assessment and delivering this to the EHR with clinical decision support30. In addition, we are currently embarking on a prospective clinical study to assess the utility of the CAD caIRS in primary prevention (trial registration number: NCT06542432).
Our study has some limitations. First, this study only examined CAD as an outcome, whereas the PCE was developed to predict the risk of ASCVD, which also includes fatal and nonfatal stroke.
Second, in line with previous studies6,7, we observed that the baseline PCE model tended to overestimate the risk to various extents depending on the target cohort. This was especially evident in the case of the UKB cohort, which is generally biased towards healthier individuals and where CAD incidence rates are lower than in US cohorts31. Nevertheless, we consider the original PCE model as the most appropriate baseline for comparison as this is the model which is being used for risk assessment in clinical practice in the US. We observed the best calibration in PMBB, which is a contemporary US-based health system. However, this may not be the case across all US populations and in a real-world setting recalibration of the model to a given health-system population could be considered if necessary6.
Third, this study did not evaluate use of the caIRS in guiding treatment decisions or improving patient outcomes in a clinical setting. Additional prospective, real-world evidence is needed to support the utility of caIRS in this context.
Fourth, the largest development and validation cohort in this study was UKB. This is not ideal given that most individuals are of European ancestry and there is a lower incidence of CAD compared to the US cohorts. Despite these issues, the caIRS model outperformed the PCE across all validation cohorts. Future studies utilizing more diverse biobanks in the development and validation process should be conducted to further improve performance and generalizability of the caIRS model.
Fifth, the study was restricted to individuals aged 40–79, the age range for which the PCE is intended to be used. However, PRS may be most effective in a prevention setting among a younger population for whom current clinical risk factor-based tools are not well developed or recommended. A recent study showed that the incidence of very early-stage atherosclerosis increased with PRS quintile32.
Finally, we excluded individuals on lipid-lowering medication, a potential population that could benefit from earlier screening and intervention using the caIRS. Long-term, prospective studies are needed to determine if identifying individuals at higher genetic risk at an earlier age and before statin initiation has an impact on CAD outcomes.
In summary, our study adds to the growing evidence that genetics can meaningfully improve the accuracy of CAD risk prediction, beyond traditional clinical risk factors, in individuals of diverse ancestries. The CAD caIRS, a tool that combines clinical and genetic risk, has the potential to improve the identification of individuals at high risk for CAD, particularly in populations underrepresented in current risk assessment tools, such as Black/African-American population. Future research should focus on validating the CAD caIRS in larger diverse populations and assessing its utility in guiding clinical decisions for primary prevention of CAD.
Methods
Study populations
We used genotype and phenotype data from multiple cohorts to develop and validate the CAD caIRS. These cohorts included the UK Biobank (UKB), Multi-Ethnic Study of Atherosclerosis (MESA, dbGAP study phs000209.v13.p3), Atherosclerosis Risk in Communities study (ARIC, dbGAP study phs000280.v8.p2), Hispanic Community Health Study (HCHS, dbGAP study phs000810.v1.p1), Cardiovascular Health Study (CHS, dbGAP study phs000287.v7.p1), Jackson Heart Study (JHS, dbGAP studies phs000286.v6.p2), and Penn Medicine BioBank (PMBB)33. Written informed consent was obtained from all participants prior to their inclusion in each cohort study. Individuals in ARIC and CHS were part of the training cohort for the PCE model4. However, given that our validation focused on comparison of the performance of the caIRS to the PCE (and CHS was not used for validation), we do not expect it to be biased.
The UK Biobank (UKB) is a large, prospective, cohort study of the causes, treatment, and prevention of common complex disease34,35. Between 2006 and 2010, the study enrolled over 500,000 individuals aged 40–69 years from the general population of the United Kingdom. At enrollment, participants completed a detailed questionnaire to self-report sex, ancestry, lifestyle factors, and environmental exposures and underwent extensive physical examination, including cardiac imaging and monitoring, and collection and storage of biological samples (blood, urine, and saliva)36,37. Participants have now been followed up for over a decade and have a wide range of biomarker and genetic data, including whole exome and genome sequences, available for all 500,000 participants35.
The Multi-Ethnic Study of Atherosclerosis (MESA, dbGAP study phs000209.v13.p3) is a medical research study investigating the prevalence, correlates, and progression of subclinical cardiovascular disease (CVD) in a population-based sample of more than 6000 men and women aged 45-84 years and free of CVD at baseline from 6 communities in the United States (New York, NY; Baltimore, MD; Chicago, IL; Los Angeles, CA; Twin Cities, MN; and Winston Salem, NC). Recruitment took place between 2000 and 200238,39. The cohort is approximately 38% White, 28% African-American, 22% Hispanic, and 12% Asian, predominantly of Chinese descent. Extensive cohort data were collected over 6 exams, with participants contacted every 9 to 12 months during the study to assess clinical morbidity and mortality. The study collected a comprehensive set of data for standard coronary risk factors and various aspects of cardiovascular health as well as sociodemographic factors, lifestyle factors, and psychosocial factors. Selected measures of subclinical disease and risk factors were repeated at follow-up visits through 2018 allowing study of the progression of disease. Blood samples, DNA, and lymphocytes were collected and preserved. Participants are followed for identification and characterization of CVD events, including acute myocardial infarction and other coronary heart disease, stroke, peripheral vascular disease, and congestive heart failure; therapeutic interventions for CVD; and mortality40,41.
The Atherosclerosis Risk in Communities study (ARIC, dbGAP study phs000280.v8.p2) is a prospective epidemiologic study (1987 to present) conducted in 4 US communities (Forsyth County, NC; Jackson, MS; the northwest suburbs of Minneapolis, MN; and Washington County, MD). The ARIC is investigating the etiology and natural history of atherosclerosis, the etiology of clinical atherosclerotic diseases, and variation in cardiovascular risk factors, medical care, and disease by race, gender, location, and date. The ARIC study includes 2 components: a cohort and community surveillance. The present study used data from the cohort component. In the cohort component, over 15,000 participants, aged 45-64 years, were recruited between 1987–1989 and received an extensive examination, including medical, social, and demographic data. Data collection took place during 7 clinic visits between 1987 and 201942.
The Hispanic Community Health Study / Study of Latinos (HCHS/SOL, dbGAP study phs000810.v1.p1) is a prospective, multi-center, epidemiologic study in Hispanic/Latino populations to determine the prevalence of chronic conditions (eg, CVD, diabetes, and pulmonary disease), identify risk and protective factors, and quantify all-cause mortality, fatal and non-fatal CVD and pulmonary disease, and pulmonary disease exacerbation over time. Between 2008 and 2011, over 16,000 men and women, aged 18–74 years, of Cuban, Dominican, Mexican, Puerto Rican, Central American, and South American backgrounds were recruited through 4 centers (Miami, Florida; Bronx, New York; Chicago, Illinois; and San Diego, California). Participants underwent an extensive baseline clinical examination, including biological, behavioral, and socio-demographic assessments. To determine a range of health outcomes, participants underwent 2 additional clinic visits between 2014 and 2017 and 2020 and 2023 and have annual follow-up interviews43,44.
The Cardiovascular Health Study (CHS, dbGAP study phs000287.v7.p1) is a prospective study of risk factors for development and progression of coronary heart disease and stroke in people aged ≥65 years. The study enrolled approximately 6000 participants from 4 US communities (Forsyth County, NC; Sacramento County, CA; Washington County, MD; and Pittsburgh, PA) between 1989 and 1990 and a supplemental cohort of 687 predominantly African-American participants between 1992 and 1993. The study participants have undergone extensive clinic examinations for evaluation of markers of subclinical CVD at study baseline and at annual visits through 1998-1999 and again in 2005–2006. They have also been followed up every 6 months by phone to identify cardiovascular events and to assess physical and cognitive health.
The Jackson Heart Study (JHS, dbGAP studies phs000286.v6.p2) is a large, prospective, community-based, observational study investigating environmental and genetic factors associated with CVD among African Americans. Between 2000 and 2004, over 5000 participants, aged 35–84 years, were recruited from urban and rural areas of 3 counties that comprise the Jackson, MS metropolitan statistical area45,46. Participants underwent 3 extensive clinical examinations (Exam 1, 2000–2004; Exam 2, 2005–2008; and Exam 3, 2009–2013) that collected data on traditional and putative cardiovascular disease risk factors and measures of subclinical CVD, including echocardiography, cardiac magnetic resonance imaging, and computed tomography scans, and collection of biological samples (i.e., blood, urine, DNA, lymphocytes). Participants receive annual telephone follow-up and have ongoing surveillance of hospitalizations for cardiovascular events and of deaths47.
The Penn Medicine BioBank (PMBB) is an electronic health record (EHR)-linked biobank maintained at the University of Pennsylvania. The PMBB was established in 2013 and includes a large variety of health-related information including diagnosis codes, laboratory measurements, imaging data, and lifestyle information as well as genomic and biomarker data. To date, over 174,000 participants have been enrolled, with approximately 30% of participants being of non-European ancestry. The EHR has a median of 7 years of longitudinal data available on participants33.
We divided the data into 3 development cohorts (Table 3 and Fig. 4) and 4 independent, longitudinal validation cohorts (Table 1 and Fig. 4). The Development Cohorts comprised 26,923 individuals with diagnosed CAD (cases) and 220,909 unaffected individuals (non-cases) from UKB (a subset), HCHS, CHS, and JHS. The Validation Cohorts comprised 11,008 individuals (774 cases) from ARIC, 4162 (240 cases) from MESA, 14 182 (1158 cases) from PMBB, and 120 590 (3050 cases) from UKB who were not included in model development.
Eligibility criteria
Across all cohorts, we excluded individuals with missing genotype data. In cases where related individuals were present, we retained one individual per pair of first and second-degree relatives. In the UKB cohort, where whole exome sequencing data was available, we additionally excluded individuals with pathogenic or likely pathogenic variants in 3 genes known to cause Familial Hypercholesterolemia (APOB, LDLR, PCSK9). We filtered whole-exome sequencing VCF files to three genes of interest, removed variants in low complexity regions and segmental duplications, and annotated remaining variants using SnpEff, Clinvar, and gnomad. Pathogenic variants were designated in each of the annotations as follows: (1) SnpEff: variants with “HIGH” predicted function effect impact, (2) Clinvar: variants with “Pathogenic” or “Likely pathogenic” clinical significance, (3) Gnomad: variants with <0.5% MAF. Individuals were identified as having a pathogenic variant if they had one or more variants that match these filters.
Individuals in Development Cohorts 1 and 2 were eligible for inclusion if they were between 18 and 79 years of age at the time of diagnosis or assessment for cases and those unaffected, respectively. We also excluded unaffected individuals taking cholesterol-lowering medications. In the case of Development Cohort 3 we additionally restricted the eligible age range to 40–79 and removed individuals missing any of the factors required to calculate the PCE score (Fig. 4).
Individuals in the Validation Cohorts were eligible for inclusion if they had no personal history of CAD and were 40–79 years of age at the time of first assessment. We excluded all individuals taking cholesterol-reducing medication at the time of first assessment, individuals who developed CAD within 30 days of assessment, and affected individuals missing age at diagnosis. Because the CAD caIRS was derived from the PCE, we also excluded individuals missing any data for risk factors included in the PCE model (see Supplementary Table 3). In the case of PMBB where the PCE variables were incomplete (mainly lipid values) for a relatively large subset of eligible individuals (7796 out of 14,182), we imputed missing values using the cohort median to avoid loss of a substantial fraction of the cohort from the analysis. Repeating the PMBB analysis upon exclusion of participants with missing PCE variables (instead of imputing) yielded highly consistent results which we included for comparison (see Supplementary Table 4 and Supplementary Fig. 5).
Genotype imputation
We combined multiple genotype data sources, which required distinct preprocessing steps depending on the genome build and availability of externally imputed WGS data (see Supplementary Table 5 for dataset-specific preprocessing/imputation steps). In short, we used the existing imputed array genotyping data for UKB and dbGAP datasets, if they were available, and used CrossMap (v0.6.1)48 to lift over their coordinates to Genome Reference Consortium Human Build 37 (GRCh37) where necessary. For internal imputation of the array data we converted coordinates to hg19 using array annotation (where necessary), then phased genotypes using SHAPEIT449 and imputed unobserved genotypes with IMPUTE550 using the UK10K reference panel51. For WGS datasets we used CrossMap (v0.6.1)48 to liftover coordinates to GRCh37. In the case of PMBB, we used the imputed array data (GRCh38) for which genotyping and imputation details have been described previously33. In short, for PMBB, we performed genotype imputation using Eagle52 and Minimac453 on the TOPMed Imputation Server54. For sites that could not be successfully imputed, we obtained a population-specific allele frequency from gnomAD v3.1.155 to estimate the average contribution of the variant when scoring PRSs. For Validation Cohorts, we excluded individuals missing genotypes for more than 5% of PRS sites. In cases where the same participant data was present in multiple datasets within a cohort, we used scores from the dataset with the smallest number of missing PRS sites.
Use of population descriptors and inference of genetic ancestry
We partitioned individuals from the Development Cohorts (see Model Development below) among 5 ancestry groups based on genetic similarity to one of five continental reference populations: African (AFR), Hispanic/Admixed American (AMR), East Asian (EAS), European (EUR), and South Asian (SAS). Genetic ancestry was decomposed into these five ancestry groups using AIPS56 with the 1KGP as a reference panel. We used the 80% ancestral fraction as a threshold to classify participants into “pure” ancestry groups for the purpose of estimating ancestry specific model coefficients (see Model Development).
In order to assess the performance of the caPRS and caIRS in real-world populations, which represent broader genetic diversity than 5 reference population groups, we partitioned validation cohorts based on self-identified ethnicity, without restricting the analysis to genetically “pure” individuals. The following rules were applied to group and standardize self-identified ethnicity labels:
-
(1)
Individuals who self-identified as Black, Black British, Caribbean, African, African-American, or any other Black background were labeled as Black/African American (or Black/Black British in case of UKB).
-
(2)
Individuals who self-identified as White, Caucasian, White British, Irish, or other White background were labeled as White/Caucasian.
-
(3)
Individuals who self-reported as Indian, Pakistani, Bangladeshi were grouped as South Asian (UKB only).
-
(4)
Individuals who self-identified as Chinese, Chinese American, or Asian (U.S. cohorts; MESA, PMBB) were labeled as East Asian / Asian American.
-
(5)
Individuals who self-identified as Hispanic or Latino were labeled as Hispanic.
-
(6)
Finally, individuals with missing ethnicity information or self-identified as other or one of mixed categories were labeled as “Other”.
Phenotype definitions
CAD designation was standardized across cohorts with diverse phenotypic data as individuals with myocardial infarction (MI), coronary revascularization, or fatal coronary heart disease (CHD). Standardized codes from medical records, such as CAD-related International Classification of Diseases, Tenth Revision (ICD-10), or operation codes, were used when available (I21, I22, I23, I24.1, I25.2). Otherwise, doctor or self-reported data adjudicated by individual cohort committees were used.
Model development
We undertook a multi-step process to first develop a caPRS for CAD, and then incorporate this caPRS into a clinical prediction model (PCE) that also includes traditional cardiovascular risk factors (Fig. 4). Model development comprised multiple stages including training of candidate PRS models from GWAS summary statistics, training and selecting ancestry-specific PRS ensemble models, estimating ancestry-specific PRS ensemble coefficients for caPRS, and estimating caIRS parameters which include the overall caPRS effect size and calibration coefficients. The steps were followed by the validation of the caPRS and caIRS models performance in independent longitudinal cohorts.
We constructed multiple internal PRS models using multi-ancestry genome-wide association study (GWAS) summary statistics for CAD and type 2 diabetes (T2D), which is a frequently coexisting condition and major risk factor for CAD. We used GWAS summary statistics from CARDIoGRAM (EUR cohort)16, Biobank Japan (GWAS cat: GCST90018706; 2 Japanese/EAS cohorts)17, Million Veterans Program (dbGAP study phs001672.v9.p1; EUR, AFR, AMR cohorts)18, DIAGRAM (GWAS cat: GCST004773; EUR)19, GERA (GWAS cat: GCST90086068)20, and AFR population metaanalysis (GWAS cat: GCST008114)21. PRS models were independently constructed for CAD and T2D using PRS-CSx57 with a range of values for the global shrinkage parameter (see Supplementary Table 6 for the list GWAS studies and corresponding parameter values used in development).
PRS centering and standardization were performed as previously described12. Briefly, to account for ancestry-specific mean and variance, principal components (PCs) were computed for all individuals by projecting their genotypes onto PCs calculated from the 1KGP dataset using the R package bigsnp58. Each ancestry-specific PRS was then centered by subtracting the PRS predicted from a linear regression of PRS against the first four PCs in unaffected individuals59. The centered PRS was subsequently divided by the standard deviation (SD) of the corresponding 1KGP population. The scores of PGS catalog models, derived from GWASs independent of the validation cohorts, for CAD and T2D models were linearly combined with PRS-CSx models developed internally. The optimal mixing weights for PRSs were independently learned for each (genetically inferred) continental ancestry group (AFR, AMR, EAS, EUR, SAS) using Development Cohort 1 (Table 3) via the Elastic Net using the LogisticRegressionCV function from the sklearn (Python) package with the following hyperparameters: penalty=“elasticnet”, solver=“saga”, L1_ratios = [0.0, 0.2, 0.4, 0.6, 0.8, 1], scoring=“roc_auc”. In case of ancestral groups originating from multiple cohorts (AFR: UKB + JHS, AMR: UKB + HCHS) an additional term was included in the elastic net model to adjust for the cohort effect. See the Supplementary Table 6 for the list of internally and externally developed PRS models whose scores were used for ensemble development and their corresponding weights.
The performance of the ancestry-specific ensemble models was compared to the ensemble derived from EUR individuals (largest sample) in the Development Cohort 2 (Table 3) across five ancestry groups, using multivariable logistic regression adjusted for age at enrollment, sex, and first-degree family history of CAD/CVD (Supplementary Fig. 6).
Cross ancestry polygenic risk score (caPRS)
We used the best-performing ensemble model for each ancestry to construct the caPRS. The caPRS is defined as a linear combination of the (ensemble) PRS, multiplied by the fractional ancestry estimate and the PRS effect size, which was estimated for each continental ancestry group from the Development Cohort 2 using multivariable logistic regression, adjusting for age at enrollment, sex, first-degree family history of CAD (where available) and cohort. This methodology follows the previously described caPRS model for Breast Cancer risk prediction12 and makes use of continuous genetic ancestry estimates, when calculating the PRS score, obfuscating the need for fixed ancestry labels at prediction time and naturally accommodating genetically admixed individuals. We modified our original method by replacing single best-performing PRS for each population with the best-performing ensemble PRS score. More specifically the caPRS is defined as:
where \({\beta }_{i}\), \({f}_{i}\) and \({{PRS\; ensemble}}_{i}\) correspond to PRS effect size, fractional ancestry estimate, and selected PRS ensemble score, respectively, for each continental ancestry group i.
Cross ancestry integrated risk score (caIRS)
We estimated the effect size associated with the caPRS in Development Cohort 3 using a multivariable logistic regression including caPRS, age at enrollment, sex, and cohort. We then calculated a calibration constant which depends on the absolute 10-year risk estimate from the PCE model. Finally, we calculated the 10-year CAD risk based on the PCE and caIRS (PCE combined with the caPRS).
The caIRS combines genetic and clinical information and is defined as follows:
where PCE is the 10-year risk calculated using the PCE algorithm, β is the effect size associated with caPRS estimated using Development Cohort 3 via a logistic regression model adjusted for age at enrollment, sex, and cohort and Ck corresponds to a calibration constant which depends on the absolute 10-year risk estimate from the PCE model. More specifically, each Ck was calculated using unaffected individuals from the Development Cohort 3 within the strata formed by the deciles of the PCE score, such that the average risk of CAD predicted by the caIRS aligns with the average risk predicted by the PCE alone for unaffected controls within each strata. This means that the average contribution of the PRS to the integrated score within a decile is 1, namely:
for each group k.
The distributions of caPRS, PCE, and caIRS scores are provided in the Supplementary Content (Supplementary Fig. 7).
Model validation
We evaluated the performance of the caPRS and caIRS using validation cohorts that were independent from those used to generate caPRS and caIRS models. According to expert recommendations regarding reporting of polygenic scores in risk prediction studies60, we evaluated our models by considering measures of model discrimination, calibration, and effect size. Associations of the caPRS with 10-year CAD risk were evaluated in terms of HR per SD increase in the caPRS with a 95% confidence interval from multivariable Cox proportional-hazards (PH) models adjusted for age at enrollment and sex. The test statistic was the change in the likelihood deviance metric between the full model and the appropriate reduced model. The C-index was used to assess model discrimination. In situations where specific ancestries lacked sufficient statistical power for reliable analyses, we opted to integrate the effect sizes across studies using meta-analyses, suitably adjusting for the heterogeneity observed among the studies being combined. This approach enabled us to enhance the robustness and generalizability of our findings by leveraging the collective power of multiple datasets. We used a Mantel-Cox log rank test, with caIRS coded as a binary variable, to evaluate whether 10-year CAD incidence was significantly lower for patients at borderline/intermediate PCE risk (5-20%) with a caIRS score below vs above the pre-specified risk threshold (20%). Proportional hazards assumptions were verified for all proposed models using tests and graphs based on the Schoenfeld residuals. We used the Kaplan-Meier method to estimate the cumulative incidence rates of 10-year CAD for subjects above/below the caIRS score threshold. For caIRS and PCE, we assessed net reclassification improvement (NRI), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) using a 20% high risk threshold4. For additional comparison we also evaluated performance using alternative risk classification thresholds of 7.5% and 10%. We assessed PCE and caIRS model calibration by inspecting the concordance between the observed and predicted risks visually and quantitatively via the estimation of calibration intercepts and slopes. To estimate these parameters, we performed logistic regression with the predicted probabilities \((\underline{Y})\) as the independent variable and the observed outcomes \((Y)\) as the dependent variable. More specifically, the calibration intercept \(({{\rm{\beta }}}_{0})\) was estimated using the following model: \(logit(Y)\,={\beta }_{0}\,+\,logit(\underline{Y})\) and the calibration slope \(({{\rm{\beta }}}_{1})\) from: \(logit(Y)\,={\beta }_{0}\,+{\beta }_{1}\,*\,logit(\underline{Y})\).
All analyses were performed using R Statistical Software (v4.1.0 or higher)61.
Data availability
Datasets used for the analyses described in this manuscript were obtained from UK Biobank Resource under Application Number 48991 and dbGaP at http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap through dbGaP accession study numbers phs000209.v13.p3, phs000280.v8.p2, phs000810.v1.p1, phs000287.v7.p1, and phs000286.v6.p2. The Penn Medicine Biobank genetic data was generated by Regeneron Genetics and made available to study authors for model validation by Penn Medicine Biobank with the permission of Regeneron Genetics.
References
National Heart, Lung, and Blood Institute. What Is Coronary Heart Disease? https://www.nhlbi.nih.gov/health/coronary-heart-disease (2024).
Tsao, C. W. et al. Heart disease and stroke statistics-2023 update: a report from the American Heart Association. Circulation 147, e93–e621 (2023).
Grundy, S. M. et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol: a report of the American College of Cardiology/American Heart Association task force on clinical practice guidelines. Circulation 139, e1082–e1143 (2019).
Goff, D. C. Jr et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association task force on practice guidelines. J. Am. Coll. Cardiol. 63, 2935–2959 (2014).
Arnett, D. K. et al. 2019 ACC/AHA guideline on the primary prevention of cardiovascular disease: a report of the American College of Cardiology/American Heart Association task force on clinical practice guidelines. J. Am. Coll. Cardiol. 74, e177–e232 (2019).
Sussman, J. B. et al. The Veterans Affairs Cardiac Risk Score: recalibrating the atherosclerotic cardiovascular disease score for applied use. Med. Care 55, 864–870 (2017).
Pennells, L. et al. Equalization of four cardiovascular risk algorithms after systematic recalibration: individual-participant meta-analysis of 86 prospective studies. Eur. Heart J. 40, 621–631 (2019).
Levin, M. G. & Rader, D. J. Polygenic risk scores and coronary artery disease: ready for prime time? Circulation 141, 637–640 (2020).
Patel, A. P. et al. A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease. Nat. Med. 29, 1793–1803 (2023).
Weale, M. E. et al. Validation of an integrated risk tool, including polygenic risk score, for atherosclerotic cardiovascular disease in multiple ethnicities and ancestries. Am. J. Cardiol. 148, 157–164 (2021).
O’Sullivan, J. W. et al. Polygenic risk scores for cardiovascular disease: a scientific statement from the American Heart Association. Circulation 146, e93–e118 (2022).
Tshiaba, P. T. et al. Integration of a cross-ancestry polygenic model with clinical risk factors improves breast cancer risk stratification. JCO Precis. Oncol. 7, e2200447 (2023).
US Preventive Services Task Force. et al. Statin use for the primary prevention of cardiovascular disease in adults: US preventive services task force recommendation statement. JAMA 328, 746–753 (2022).
Mortensen, M. B., Tybjærg-Hansen, A. & Nordestgaard, B. G. Statin eligibility for primary prevention of cardiovascular disease according to 2021 European prevention guidelines compared with other international guidelines. JAMA Cardiol. 7, 836–843 (2022).
Aragam, K. G. et al. Limitations of contemporary guidelines for managing patients at high genetic risk of coronary artery disease. J. Am. Coll. Cardiol. 75, 2769–2780 (2020).
Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Tcheandjieu, C. et al. Large-scale genome-wide association study of coronary artery disease in genetically diverse populations. Nat. Med. 28, 1679–1692 (2022).
Scott, R. A. et al. An expanded genome-wide association study of type 2 diabetes in Europeans. Diabetes 66, 2888–2902 (2017).
Guindo-Martínez, M. et al. The impact of non-additive genetic associations on age-related complex diseases. Nat. Commun. 12, 2436 (2021).
Chen, J. et al. Genome-wide association study of type 2 diabetes in Africa. Diabetologia 62, 1204–1211 (2019).
Li, M. et al. Statins for the primary prevention of coronary heart disease. Biomed. Res. Int. 2019, 4870350 (2019).
Mega, J. L. et al. Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials. Lancet 385, 2264–2271 (2015).
Marston, N. A. et al. Predicting benefit from evolocumab therapy in patients with atherosclerotic disease using a genetic risk score: results from the FOURIER trial. Circulation 141, 616–623 (2020).
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Tishkoff, S. A. et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009).
Ding, Y. et al. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 618, 774–781 (2023).
All of Us Research Program Investigators. et al. The ‘All of Us’ research program. N. Engl. J. Med. 381, 668–676 (2019).
Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).
Linder, J. E. et al. Returning integrated genomic risk and clinical recommendations: the eMERGE study. Genet. Med. 25, 100006 (2023).
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Guarischi-Sousa, R. et al. Contemporary polygenic scores of low-density lipoprotein cholesterol and coronary artery disease predict coronary atherosclerosis in adolescents and young adults. Circ. Genom. Precis Med. 16, e004047 (2023).
Verma, A. et al. The Penn Medicine BioBank: towards a genomics-enabled learning healthcare system to accelerate precision medicine in a diverse population. J. Pers. Med. 12, 1974 (2022).
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Conroy, M. C. et al. UK Biobank: a globally important resource for cancer research. Br. J. Cancer 128, 519–527 (2023).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
About our data. https://www.ukbiobank.ac.uk/enable-your-research/about-our-data.
Bild, D. E. et al. Multi-ethnic study of atherosclerosis: objectives and design. Am. J. Epidemiol. 156, 871–881 (2002).
Olson, J. L., Bild, D. E., Kronmal, R. A. & Burke, G. L. Legacy of MESA. Glob. Heart 11, 269–274 (2016).
MESA Overview and Protocol. https://www.mesa-nhlbi.org/aboutMESAOverviewProtocol.aspx.
Study Timeline and Procedures. https://www.mesa-nhlbi.org/aboutMESAStudyTime.aspx.
Project Overview. https://aric.cscc.unc.edu/aric9/about/project_overview.
Sorlie, P. D. et al. Design and implementation of the Hispanic Community Health Study/Study of Latinos. Ann. Epidemiol. 20, 629–641 (2010).
Study Overview. https://sites.cscc.unc.edu/hchs/StudyOverview.
Taylor, H. A. et al. Design and methods of the Jackson Heart Study. Ethn. Dis. 15, 4–17 (2005).
Fuqua, S. R. et al. Methods, response rates, and sample description. Ethn. Dis. 15, 18–29 (2005).
About The JHS. https://www.jacksonheartstudy.org/About/About-The-JHS.
Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007 (2014).
Delaneau, O., Zagury, J.-F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019).
Rubinacci, S., Delaneau, O. & Marchini, J. Genotype imputation using the positional burrows wheeler transform. PLoS Genet 16, e1009049 (2020).
UK10K Consortium. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Taliun, D. et al. Sequencing of 53,831 Diverse Genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).
Byun, J. et al. Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure. BMC Genomics 18, 789 (2017).
Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Nat. Genet. 54, 573–580 (2022).
Privé, F., Aschard, H., Ziyatdinov, A. & Blum, M. G. B. Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr. Bioinformatics 34, 2781–2787 (2018).
Khera, A. V. et al. Whole-genome sequencing to characterize monogenic and polygenic contributions in patients hospitalized with early-onset myocardial infarction. Circulation 139, 1593–1602 (2019).
Wand, H. et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature 591, 211–219 (2021).
R Core Team. R: A language and environment for statistical computing. https://www.r-project.org/.
Acknowledgements
The study was funded by MyOme. Medical writing and editorial support was provided by Lauren Dembeck, PhD, funded by MyOme. The authors thank the staff and participants of the Multi-Ethnic Study of Atherosclerosis (MESA) study. MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts N01-HC95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC95166, N01-HC-95167, N01-HC-95168, N01-HC-95169 and CTSA UL1-RR-024156. Funding for SHARe genotyping was provided by NHLBI Contract N02-HL-64278. Genotyping was performed at Affymetrix (Santa Clara, California, USA) and the Broad Institute of Harvard and MIT (Boston, Massachusetts, USA) using the Affymetrix Genome-Wide Human SNP Array 6.0. The Atherosclerosis Risk in Communities study has been funded in whole or in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institute of Health, Department of Health and Human Services, under contract numbers (HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I, and HHSN268201700005I). The authors thank the staff and participants of the ARIC study for their important contributions. Funding for CARe genotyping was provided by NHLBI Contract N01-HC-65226. The authors thank the staff and participants of the Hispanic Community Health Study / Study of Latinos (HCHS/SOL) study. A complete list of staff and investigators is available on the study website http://www.cscc.unc.edu/hchs/.The Hispanic Community Health Study/Study of Latinos is funded by contracts from the National Heart, Lung, and Blood Institute (NHLBI) to the University of North Carolina (N01-HC65233), University of Miami (N01-HC65234), Albert Einstein College of Medicine (N01-HC65235), Northwestern University (N01-HC65236), and San Diego State University (N01-HC65237). The following Institutes/Centers/Offices contribute to the HCHS/SOL through a transfer of funds to the NHLBI: National Center on Minority Health and Health Disparities, the National Institute of Deafness and Other Communications Disorders, the National Institute of Dental and Craniofacial Research, the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institute of Neurological Disorders and Stroke, and the Office of Dietary Supplements. The authors thank the staff and participants of the Cardiovascular Health Study (CHS). The CHS is sponsored by the NHLBI. This research was supported by contracts HHSN268201200036C, HHSN268200800007C, HHSN268201800001C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086, 75N92021D00006, and grants U01HL080295 and U01HL130114 from the National Heart, Lung, and Blood Institute (NHLBI), with additional contribution from the National Institute of Neurological Disorders and Stroke (NINDS). Additional support was provided by R01AG023629 from the National Institute on Aging (NIA). A full list of principal CHS investigators and institutions can be found at CHS-NHLBI.org. The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I/HHSN26800001) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute for Minority Health and Health Disparities (NIMHD). The authors also wish to thank the staff and participants of the JHS. Funding for CARe genotyping was provided by NHLBI Contract N01-HC-65226. Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung, and Blood Institute (NHLBI). The Jackson Heart Study (phs000964.v1.p1) was performed at the Northwest Genomics Center (HHSN268201100037C). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed.
Author information
Authors and Affiliations
Contributions
Study conception and design: Dariusz Ratman, Placede Tshiaba, Michael Levin, Daniel J Rader, Premal Shah, Matthew Rabinowitz, Akash Kumar, Kate Im. Collection and assembly of data: Dariusz Ratman, Placede Tshiaba, Jiayi Sun, Michael Levin, Robert Maier, Kate Im. Data analysis and interpretation: Dariusz Ratman, Placede Tshiaba, Michael Levin, Jiayi Sun, Tate Tunstall, Matthew Rabinowitz, Akash Kumar, Kate Im. Manuscript writing: Dariusz Ratman, Placede Tshiaba, Kate Im, Michael Levin. Final approval of manuscript: All authors. Accountable for all aspects of the work: All authors.
Corresponding author
Ethics declarations
Competing interests
This study was funded by MyOme, Inc. Dariusz Ratman, Placede Tshiaba, Jiayi Sun, Tate Tunstall, Robert Maier, Premal Shah, Matthew Rabinowitz, Akash Kumar, and Kate Im are either current or previous employees of MyOme.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ratman, D., Tshiaba, P., Levin, M. et al. Polygenic risk scores improve CAD risk prediction in individuals at borderline and intermediate clinical risk. npj Cardiovasc Health 2, 13 (2025). https://doi.org/10.1038/s44325-025-00049-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s44325-025-00049-7
This article is cited by
-
Advancing global cardiovascular research and clinical translation
npj Cardiovascular Health (2025)






