Abstract
Type 2 diabetes (T2D) risk prediction remains a challenge, particularly in underrepresented populations, including people living with HIV (PWH) and those of non-European ancestry. We evaluated the performance of two metaPRS (polygenic risk score) models, integrating genetic markers related to inflammation and lipid metabolism, in predicting T2D risk across ancestry groups (African and European), with and without HIV. The metaPRS were generated in a subset from the Reasons for Geographic and Racial Differences in Stroke (REGARDS) study (6,034 Black; 11,972 White) and validated in 7,580 (4,120 Black; 3,460 White) PWH from the Centers for AIDS Research of Integrated Clinical Systems (CNICS), as well as an additional 4,152 (2,586 Black; 1,566 White) seronegative participants from REGARDS. Incorporating the metaPRS into models provided non-significant improvements in T2D risk prediction compared to single-trait T2D PRS and clinical risk factors. Performance was similar in PWH and in people without HIV, suggesting that these general population-derived genetic scores are transferable to PWH. Future studies should focus on refining PRS models in diverse populations and exploring genetic factors specific to PWH regarding T2D risk.
Similar content being viewed by others
Introduction
Type 2 diabetes (T2D) is a complex and heterogeneous condition characterized by glucose dysregulation and insulin resistance, and is the most common form of diabetes, comprising > 90% of individuals with diabetes in the United States1. Chronic hyperglycemia contributes to long-term complications, including dysfunction and damage to the kidneys, heart, and blood vessels, making diabetes a major risk factor for cardiovascular diseases2. Despite advances in treatment and management, the growing prevalence of T2D3 underscores the need for improved risk prediction and early intervention strategies tailored to individual patients.
T2D has a strong genetic component in the general population, with heritability estimates from twin and family studies ranging from 30% to 70%4,5,6,7and array-based analyses ranging from 18% to 34%8. Genome-wide association studies (GWAS) have identified over 500 genetic variants associated with T2D9,10,11, highlighting its complex polygenic architecture. Polygenic risk scores (PRS), which aggregate the effects of multiple genetic variants associated with disease risk, have emerged as promising tools for predicting future T2D and improving early diagnosis and prevention efforts12. However, PRS have primarily been developed using populations of European ancestry, which may limit their predictive accuracy and generalization to other populations13,14 or those with coexisting chronic conditions.
The precision medicine paradigm seeks to tailor disease prevention, diagnosis, and treatment to individual patients by leveraging genetic, environmental, and clinical data15. In multi-factorial diseases, such as T2D, integrating genetic biomarkers (e.g., PRS) into clinical practice offers the potential to identify high-risk individuals and customize interventions16. Recent advances in genetic risk modeling have enhanced PRS construction by combining multiple PRS or sets of summary statistics into a meta-score (metaPRS)17. MetaPRS approaches have shown improvement in risk prediction over single-trait PRS in other complex diseases, such as coronary artery disease (CAD)18 and ischemic stroke17, by capturing a more comprehensive genetic risk profile. We hypothesize that incorporating genetic markers related to inflammation and lipid metabolism into metaPRS models may improve T2D prediction, particularly in populations with HIV where these inflammatory biological pathways play a critical role.
People living with HIV (PWH) face an elevated risk of T2D compared to the general population19,20, driven by a complex interplay of traditional and HIV-specific risk factors, including disruptions in lipid metabolism20,21, chronic systemic inflammation, and insulin resistance20. Some antiretroviral therapy (ART) agents have been linked to metabolic side effects, including weight gain, dyslipidemia, and insulin resistance22, further compounding the risk of T2D in PWH. Genetic variation in inflammatory and lipid metabolism pathways may contribute to this heightened risk, but our understanding of the role of these genetic markers in T2D prediction among PWH remains limited.
In the current study, we evaluated the utility of metaPRS models for predicting prevalent T2D in populations with and without HIV across two ancestry groups (African and European), utilizing genetic and phenotypic data from 22,158 participants in the Reasons for Geographic and Racial Differences in Stroke (REGARDS) study and 7,580 participants from the Centers for AIDS Research of Integrated Clinical Systems (CNICS) cohort. Specifically, we examined whether a general population T2D score translates to an HIV population and whether the inclusion of genetic markers related to inflammation or lipid metabolism improved the predictive accuracy of T2D in PWH compared to seronegative individuals. This research underscores the potential of genetic risk models, such as metaPRS, as precision medicine tools to enhance risk prediction and inform targeted interventions.
Methods
Study design and participants
Reasons for geographic and racial differences in stroke (REGARDS) study
The REGARDS study is a national, longitudinal study of incident stroke and associated risk factors, enrolling over 30,000 self-identified Black and White adults aged 45 years or older from all 48 contiguous US states and the District of Columbia23. Participants completed a computer-assisted telephone interview (CATI) and an in-home visit where blood and urine were collected, and a medication inventory was taken. Participants are contacted at six-month intervals to obtain information regarding incident stroke or secondary outcomes. Institutional review boards approved this study at all participating institutions, and all participants provided written informed consent.
Genotyping in REGARDS was conducted in two independent batches. The first batch included 10,788 participants (84% Black), genotyped on the Illumina Infinium AMR/AFR (MEGA) BeadChip array. Quality control procedures on the sample and variant level have been previously described24. Briefly, variants were excluded if they were located on sex chromosomes, had ambiguous strands, were multi-allelic, violated Hardy Weinberg Equilibrium (HWE, p < 1.00e-12), and/or had a missing rate > 10%. Individuals were excluded based on sex discrepancies (genotyped versus self-reported), internal duplicates, or HapMap controls. Imputation was performed using the Trans-omics for Precision Medicine (TOPMed) release 2 (Freeze 8) reference panel25. Principal components (PCs) were generated using EIGENSTRAT26 to account for population substructure and as a measure of genetic similarity.
A second, independent batch included 12,118 participants of European ancestry genotyped on the Illumina Infinium Global Diversity Array-9 (GDA) with custom content targeting neurodegenerative disease-focused variants27. Variants were excluded if they were located on sex chromosomes, had ambiguous strands, were multi-allelic, were indels, violated Hardy Weinberg Equilibrium (HWE, p < 1.00e-05), and/or had a missing rate > 5%. Individuals were excluded based on sex discrepancies (genotyped versus self-reported), internal duplicates, or those with < 98% call rates. Imputation was performed using the Trans-omics for Precision Medicine (TOPMed) release 3 reference panel25. PCs were generated using EIGENSTRAT.
T2D in REGARDS was defined based on fasting glucose ≥ 126 mg/dL (7 mmol/L), non-fasting glucose ≥ 200 mg/dL (11.1 mmol/L), or the use of diabetes medications (e.g., oral hypoglycemic pills or insulin)8. Age, sex, and race were self-reported at baseline. Smoking history was obtained during the CATI, at which participants were asked if they currently smoke cigarettes, if they did in the past, or if they never smoked. For the current study, cigarette smoking was evaluated as current smoking versus past/never smoking. Height and weight were measured during the in-home examination, and body mass index (BMI) was calculated as a continuous variable (kg/m2). Systolic blood pressure (SBP) was measured two times following a standardized protocol using an aneroid sphygmomanometer after participants had rested for five minutes23. The mean of the two measurements was used to define SBP28.
The final analytical sample consisted of 8,620 Black and 13,538 White REGARDS participants passing quality control parameters, with complete genotyping, outcome, and covariate data. For analysis in Black REGARDS participants, the dataset was randomly divided into training (70%, n = 6,034) and validation (30%, n = 2,586) subsets. In White REGARDS participants, two unique subsets of individuals, based on genotyping platform, were used for training (n = 11,972) and validation (n = 1,566).
The centers for AIDS research of integrated clinical systems (CNICS) cohort
CNICS is a well-characterized longitudinal observation cohort of PWH engaged in clinical care after 1/1/1995 at ten sites across the United States. Full details on CNICS have been previously published29,30; however in brief, the CNICS data repository integrates comprehensive clinical data from sites from both outpatient and inpatient visits, including demographic, clinical, and laboratory data, medications, and clinical history29. In addition, CNICS has patient reported outcomes (PROs) initiated as part of care in ~ 2009 (date varies by site) including symptoms and risk behaviors such as smoking. CNICS also has biospecimen collection across sites which enables an ongoing genetics project of PWH with multiple racial/ethnic backgrounds.
In CNICS, the methods for genotyping and imputation have been detailed elsewhere31,32. Briefly, DNA was extracted from peripheral blood mononuclear cells or buffy coats. Genotyped data was generated using the Illumina Infinium Multi-Ethnic Global Array, the MEGA expanded array, and the Infinium Multi-Ethnic Global-8 Kit. Quality control (QC) for genotyping was conducted before imputation, using PLINK v1.933, and excluded variants with call rates < 95%, MAF < 1%, and deviations from HWE (p < 1E-05). Samples with call rates < 90%, sex discrepancies, and pairwise identity-by-descent (pi-hat > 0.9) were also excluded. Ethnicity was inferred using GRAF-pop software34, as previously described32. Imputation was performed separately for each ancestral group using the TOPMed reference panel, and variants with an imputation quality score were retained. Harmonization across arrays was performed post-imputation. We restricted the analysis to common variants between platforms and used the McCarthy Group Tools35 to check for strand flips. The top 10 PCs were calculated using EIGNESTRAT, were ancestry-specific, and were used in subsequent analysis, as well as in harmonization. We tested all variants for associations with the platforms (e.g., MEGA = 1, MEGAex = 0) as a dichotomous outcome, adjusting for the 10 PCs. Variants with p < 5E-08 when comparing differences between platforms were removed. This process was repeated to merge all three platforms36.
PWH from eight CNICS sites with the relevant data were included in the current study if they had genome-wide genotype data and completed one or more PRO assessments prior to July 2023 (data end date). The T2D definition in CNICS differed slightly from that used in REGARDS. T2D in CNICS was defined as (1) hemoglobin A1c ≥ 6.5% or (2) use of a diabetes-specific medication such as insulin or (3) use of a diabetes-related medication frequently but not exclusively used to treat diabetes (e.g., biguanides) and having a diagnosis of diabetes mellitus37. BMI was calculated from height and weight as a continuous variable (kg/m2) at initial visit. SBP measures were averaged across all visits. Tobacco use was captured using the CNICS clinical assessment of patient-reported outcomes. CNICS participants self-reported a history of never, former, or current smoking and answered items regarding duration and quantity. We analyzed the cigarette smoking phenotype as current versus non-current and PWH were classified as ART-naïve or ART-experienced. The final analytical sample consisted of 4,120 Black and 3,460 White CNICS participants passing quality control parameters, with complete genotyping, outcome, and covariate data.
This secondary data analysis was approved by the University of Alabama at Birmingham Institutional Review Board (IRB-300010485). All methods were performed in accordance with the relevant guidelines and regulations, including those outlined by the UAB IRB and the Declaration of Helsinki. All REGARDS and CNICS participants provided informed consent.
Generation of metaprs
The REGARDS training set was used to construct the metaPRS and was excluded from any further analysis. We applied nine PRS from the Polygenic Score Catalog38 (Supplemental Table S1) for phenotypes associated with T2D, lipid metabolism, or inflammation: T2D (PGS005033)39, interleukin-1 receptor antagonist protein (IL-1ra, PGS000250)40, interleukin-6 (IL-6, PGS000252)40, interleukin-8 (IL-8, PGS000254)40, tumor necrosis factor receptor 1 (TNF-R1, PGS000287)40, C-reactive protein (CRP, PGS002164)41, low-density lipoprotein (LDL, PGS004981) cholesterol39, high-density lipoprotein (HDL, PGS004086) cholesterol42, and triglycerides (TG, PGS002197)41. To minimize overfitting, we did not include scores that utilized REGARDS or CNICS in their generation or optimization. Using PLINK, each PRS was calculated for everyone in the training set by multiplying the variant-specific weight by the imputed allelic dosage and summing across the variants. Each PRS was standardized (zero mean, unit standard deviation, SD). We employed elastic net logistic regression43 using the R package glmnet44 to model the association between the PRS and T2D, adjusting for age, age-squared, sex, and 10 PCs. A range of models with different penalties was evaluated using a ten-fold cross-validation. We selected our best model in terms of the highest cross-validated area under the receiving-operating characteristic curve (AUC), incorporating all traits for each metaPRS (inflammation and lipid metabolism). The log odds estimates obtained were used as a constant in the metaPRS models. We generated two metaPRS for each ancestry stratum. One consisted of the weighted average of the standardized scores for T2D and inflammatory markers (meta-inflammation):
where \(\:{Z}_{i1},\:{Z}_{i2},\:{Z}_{i3},\:{Z}_{i4},\:{Z}_{i5},\:{Z}_{i6}\) are the zero mean and unit standardized T2D, IL-1ra, IL-6, IL-8, TNF-R1, and CRP risk scores for the ith individual, respectively, and \(\:{\beta\:}_{1},\:\:{\beta\:}_{2},\:\:{\beta\:}_{3},\:\:{\beta\:}_{4,\:}\:\:{\beta\:}_{5},\:\:{\beta\:}_{6}\) are the coefficient (log odds) for each PRS.
Similarly, the other metaPRS consisted of the weighted average of the standardized scores for lipid metabolism (meta-lipids) plus T2D:
where \(\:{Z}_{i1},\:{Z}_{i2},\:{Z}_{i3},\:{Z}_{i4}\) are the zero mean and unit standardized T2D, LDL-C, HDL-C, and TG risk scores for the ith individual, respectively, and \(\:{\beta\:}_{1},\:\:{\beta\:}_{2},\:\:{\beta\:}_{3},\:\:{\beta\:}_{4\:}\) are the log odds for each PRS.
Statistical analysis
Baseline characteristics were presented as means and SDs for continuous variables, and as counts and frequencies for categorical data. Each metaPRS developed using the training set was held fixed and evaluated in CNICS (n = 4,120 Black; n = 3,460 White), as well as the REGARDS validation set (n = 2,586 Black; n = 1,566 White). The models were stratified by genetic ancestry. Each PRS was evaluated per quintile, as well as per 1-SD. The association of each metaPRS with prevalent T2D was evaluated in CNICS using a multivariable logistic regression model accounting for age, age-squared, sex, first 10 PCs, and genotyping array. A fully adjusted model further accounted for medications including ART and statin use, as well as BMI, SBP, and cigarette smoking status. To assess the predictive utility of these scores for T2D, the AUC was calculated for five models: a base model (Model 1) adjusting for age, age-squared, sex, 10 PCs, and genotyping array; a clinical model (Model 2) adjusting for the base model plus ART use, statin use, BMI, SBP, and cigarette smoking; a clinical model plus T2D PRS model (Model 3), which is the clinical model plus the single trait T2D PRS; a clinical model plus the meta-inflammation PRS (Model 4); and the clinical model plus the meta-lipids PRS (Model 5). Delong’s test for two correlated receiver operator characteristic curves compared the pairwise performance of the models in the R package pROC. The fit of each model was determined by Nagelkerke pseudo-R2. The net reclassification improvement (NRI) was used to assess the potential for improved discrimination across the models when adding the different PRS. The R package nricens was applied for NRI analysis. Additional validation was performed in the holdout subset of REGARDS, with similar models, except for genotyping array, ART use, and statin use. P values are all 2-sided with a 0.05 cutoff for significance. All statistical analyses were conducted using R version 4.3.1 (R Foundation).
Results
The characteristics of the REGARDS and CNICS participants are shown in Table 1. In the REGARDS training cohort, Black participants (N = 6,034) were younger on average (mean age 63.6 ± 9.2 years) than White participants (N = 11,972; mean age 64.2 ± 9.1 years). Black participants, compared to White participants, also had a higher average BMI (30.8 kg/m2 versus 28.3 kg/m2), as well as a higher prevalence of T2D (30.2% versus 14.9%). These trends were also observed in the REGARDS validation cohort. In the CNICS validation cohort, Black participants (N = 4,120) were slightly younger (mean age 39.1 ± 10.8 years) and had a higher prevalence of T2D (26.9%) compared to White participants (N = 3,460; mean age 40.5 ± 9.9 years; 16.5% T2D).
Using the CNICS validation set, we evaluated the association between the T2D single-trait PRS and our two metaPRS per one SD increase in the PRS, as well as by quintile (Table 2). For Black participants, the single-trait T2D PRS, meta-inflammation PRS, and meta-lipids PRS demonstrated similar patterns of association with T2D. Individuals in the lowest quintile (Q1) had significantly lower odds of T2D compared to the reference group (Q3), with odds ratios (OR) ranging from 0.66 to 0.72 across the three PRS models. Conversely, individuals in the highest PRS quintiles (Q4 and Q5) exhibited increased odds of T2D, particularly in the highest quintile (Q5), with OR = 1.67 (95% CI: 1.33–2.09) for the T2D PRS, OR = 1.45 (95% CI: 1.17–1.82) for the meta-inflammation PRS, and OR = 1.61 (95% CI: 1.29–2.02) for the meta-lipids PRS, indicating a stronger association with increase in quintile. The per-SD increase in PRS was associated with a similar increase in T2D risk across all three models (T2D PRS OR = 1.35, 95% CI: 1.25–1.45; meta-inflammation PRS OR = 1.33, 95% CI: 1.23–1.44; meta-lipids PRS OR = 1.35, 95% CI: 1.25–1.46).
For White participants, the single-trait T2D PRS and both metaPRS followed a similar pattern to Black individuals, with ORs increasing across quintiles, particularly in Q5, where the highest risk estimates were observed (OR = 1.81, 95% CI: 1.37–2.40 for T2D PRS; OR = 1.81, 95% CI: 1.37–2.40 for meta-inflammation PRS; and OR = 1.96, 95% CI: 1.48–2.60 for meta-lipids PRS). The single-trait T2D PRS and the meta-inflammation PRS showed the strongest per-SD effect (OR = 1.57, 95% CI: 1.42–1.73); however, the meta-lipids PRS showed a similarly strong association (OR = 1.53, 95% CI: 1.40–1.69) (Table 2).
Next, we compared the predictive performance of the single-trait T2D PRS and two metaPRS to a base model (Model 1, adjusting for age, age-squared, sex, 10 PCs, and genotyping array). In Black individuals, Model 1 had an AUC of 70.16% (95% CI: 68.36%–71.97%) and a model R² of 14.67%. The addition of clinical risk factors (Model 2, adjusted for Model 1 plus ART use, statin use, BMI, SBP, and cigarette smoking) substantially improved predictive performance compared to Model 1, increasing the AUC to 79.67% (95% CI: 78.16%–81.18%) and the model R² to 30.19%. Furthermore, this model had a net reclassification improvement (NRI) of 16.59% (95% CI: 12.88%–20.80%). Incorporating PRS further improved model performance, with the single-trait T2D PRS (Model 3) yielding the highest AUC (80.41%, 95% CI: 78.91%–81.90%) (Supplemental Figure S1A) and a PRS-specific R² =1.88%. The meta-lipids PRS (Model 5) also contributed meaningfully, achieving the highest NRI of 18.94% (95% CI: 15.08%–22.77%) and PRS R2 = 1.91%. However, the improvements from the meta-lipids PRS were modest compared to single-trait PRS (Table 3).
Among White individuals, the overall trends were similar, though AUCs were slightly lower (Table 3). Model 1 had an AUC of 68.70% (95% CI: 66.22%–71.18%) and a model R² of 10.24%, which increased significantly in Model 2 (AUC = 76.51%, 95% CI: 74.34%-78.68%; model R² = 21.31%; NRI = 15.48%, 95% CI: 10.71%-20.36%). The inclusion of PRS provided additional predictive power, with the single-trait T2D PRS (Model 3) showing the strongest improvement (AUC = 78.28%, 95% CI: 76.15%–80.40%; PRS R² = 3.18%) (Supplemental Figure S1B). The meta-inflammation PRS (Model 4) yielded a slightly higher NRI (16.57%, 95% CI: 11.49%–22.17%) compared to the meta-lipid PRS (Model 5). Similar to what we observed in Black CNICS individuals, the most significant improvement in prediction came from traditional clinical risk factors rather than any genetic score alone. To formally evaluate differences in predictive performance, we used DeLong’s test for correlated ROC curves, as summarized in Supplemental Table S2. When incorporating clinical risk factors (Model 2) and genetic information (Models 3–5), predictive performance improved significantly (p < 0.05) compared to the base model (Model 1). The addition of PRS models further enhanced the prediction compared to clinical risk factors alone, both examining the PRS per quintile, as well as per SD; however, comparisons between different PRS models were not significant.
We were also interested in validating the performance in a subset of non-overlapping REGARDS participants. In REGARDS Black participants, those in the highest quintile (Q5) had significantly higher odds of T2D compared to the reference group (Q3) for all PRS models, with a 2-fold increase in the odds using the meta-lipids PRS (OR = 2.06, 95% CI:1.57–2.70). Similarly, all participants in Q1 had a reduction of odds of T2D compared to Q3, ranging from 38% to 45%. A 1-SD increase in PRS was associated with 62–63% higher odds of T2D (Supplemental Table S3). Among White individuals, the effect sizes were more pronounced. The highest quintile (Q5) showed a substantial increase in T2D risk compared to Q3, with odds ratios ranging from 2.28 (95% CI: 1.57–3.32) for the meta-inflammation PRS to 2.75 (95% CI: 1.89–4.04) for the meta-lipids PRS. A 1-SD increase was associated with over a 2-fold increase in odds for all PRS scores (Supplemental Table S3).
Supplemental Table S4 presents prediction results from the REGARDS validation cohort. Among Black participants, the base model (Model 1) had a modest AUC of 57.89% (95% CI: 55.44% − 60.34%) and a full model R2 of 2.23%. The addition of clinical risk factors in Model 2 increased the AUC to 65.25% (95% CI: 62.98, 67.59%) and the R2 to 8.65%, with an NRI of 8.43% (95% CI: 3.29%-13.42%). Including the single-trait T2D PRS (Model 3) further improved prediction (AUC = 69.60%, 95% CI: 67.36%-71.84%; R2 = 13.84%; NRI = 17.31%, 95% CI: 11.95%-22.94%) (Supplemental Figure S2A). Similar model enhancements were found in Models 4 and 5, with the highest AUC, model R2, and NRI observed for the meta-inflammation PRS (AUC = 69.68%, 95% CI: 67.44–71.92%; R2 = 14.03%; NRI = 18.15%, 95% CI: 12.88%-23.28%). In White participants, the predictive performance was higher, with the Model 1 AUC = 58.52% (54.96%-62.07%) and R2 = 2.23%. The AUC increased to 76.77% (73.74%-79.79%) and an R2 = 22.59% in Model 5, with the addition of the meta-lipids PRS (Supplemental Figure S2B). Similar to the CNICS results, the REGARDS validation cohort shows significant improvements in predictive performance when clinical risk factors (Model 2) and genetic information via PRS per SD (Models 3–5) are added to Model 1. The most significant improvements are between Models 1 and 4 for Black (p = 5.62E-18) participants and Models 1 and 5 for White participants (p = 3.61E-20) participants (Supplemental Table S2).
Discussion
Transferability across populations both ancestral and clinical remains a key priority in the development of PRS. Several PRS have been developed for T2D, and multi-ancestry versions derived from large summary datasets have promise for predicting disease in diverse populations. Further, meta scores, which combine PRS for the trait of interest and its risk factors, have demonstrated improvements over single trait scores for other chronic diseases, such as stroke17 and CAD18. In an analysis of 12,740 self-reported Black (~ 32% with HIV) and 5,063 self-reported White (~ 68% with HIV) participants from CNICS and the REGARDS study, we evaluated the utility of metaPRS models for predicting T2D in populations with and without HIV across two ancestry groups, using existing genetic and phenotypic data. Our results confirm the transferability of single-trait PRS across ancestries and HIV status, but metaPRS showed no meaningful improvement regarding prediction accuracy, suggesting that combining lipid/inflammation PRS adds limited value in these populations. Our results highlight the transferability of a well-validated PRS, but our metaPRS trained for inflammation and lipid metabolisms show no meaningful improvement in prediction for T2D.
To our knowledge, few prior studies have applied a meta-scoring approach to T2D risk prediction using genetic data for risk factor traits. One such study integrated variants across T2D-related traits and demonstrated improved discrimination and reclassification over conventional methods45. While that work established the utility of trait-informed meta-genomic risk scores for T2D, our study builds on this by incorporating trait-specific scores for lipids and inflammation and evaluating performance in two demographically distinct cohorts. We found that genetic risk prediction models incorporating the metaPRS did not improve risk discrimination compared to single-trait PRS for T2D according to the AUC and NRI (models 3–5 in Table 3 and Supplemental Table 4). Consistent with other chronic diseases, we found demographic factors and traditional clinical risk factors were most important for T2D risk prediction in both CNICS and REGARDS. Notably, AUCs were slightly higher in CNICS than in REGARDS, which could be due to the younger age of the CNICS cohort compared to the REGARDS cohort and additional clinical risk factor data in CNICS, including ART. Overall, the added benefit of PRS above clinical risk factors in our risk prediction models was moderate (PRS R2 ranging ~ 2%-10%), but on par with other studies14,46,47,48. As PRS methods improve and other biomarker scores become available, we expect metaPRS to improve for T2D in the general population and PWH.
Our findings align with previous studies evaluating the performance of single-trait PRS derived from multi-ancestry data for T2D in both Black and White individuals. In a prior CNICS study, Cheng et al. developed a multiethnic PRS and reported that individuals in the top 5% of the distribution had 2-fold increased odds of T2D in White participants and 59% increased odds of T2D in Black participants32. Similarly, in general population studies, Ge et al. found that individuals had a ~ 1.5 to 2-fold increased odds of T2D per 1 SD of the PRS across Black and White participants, respectively, with liability R2 estimates ranging from 2.8% (Black) to 9.2% (White) in these individuals14.
While PRS derived from the general population retain predictive utility in PWH, their effects are modest relative to clinical factors and may be diminished by HIV-specific pathways (e.g., ART-related metabolic changes). For example, cardiometabolic PRS have been associated with subclinical CAD in PWH, particularly with severe phenotypes such as stenosis > 50% and non-calcified vulnerable plaque, independent of demographic and HIV-related factors49. In a separate HIV cohort, a CAD PRS comprising 23 SNPs identified in the general population was independently associated with CAD events and notably explained more variance than traditional clinical risk factors like T2D or hypertension50. Prior work in CNICS has shown that PRS for lipid traits, T2D, and myocardial infarction, using summary statistics from the general population, are significantly correlated with phenotypes in PWH, suggesting that genetic contributions remain informative in the context of HIV. The same study also reported genome-wide associations for cardiometabolic traits among PWH32. Building on this foundational work, our study tests the transferability of PRS derived from large-scale, general population studies and provides complementary insight into the potential to inform T2D risk prediction in PWH.
Although we observed smaller effect sizes and R2 values in CNICS compared to REGARDS, this may reflect differential ascertainment, clinical context, or interactions with HIV-related exposures51. In the absence of large HIV-specific T2D GWAS datasets, our findings suggest that multi-ancestry single-trait PRS for T2D, developed in the general population, remain informative for both Black and White PWH, although their predictive utility may be reduced due to HIV-related factors. Notably, metaPRS did not outperform single-trait PRS in either cohort, possibly because the lipid and inflammation scores did not provide independent predictive signals beyond those captured by the T2D PRS. Future studies should aim to develop and validate HIV-specific PRS for T2D to improve risk prediction in this population.
The strengths of our study include the assessment of PRS transferability across ancestries and HIV status in the largest cohort of PWH with genetic data to date, which contributes to the growing body of literature on genetic prediction in diverse populations52,53. Our study is unique in evaluating metaPRS performance in a high-risk population of PWH and across ancestries, providing insight into the utility of genetic risk prediction tools in underrepresented groups for cardiometabolic diseases, including T2D.
However, this study is not without limitations. First, PRS used to create the metaPRS were primarily derived from European populations, limiting generalizability to non-European ancestries. Larger studies in diverse populations are needed to enhance PRS accuracy and applicability. Second, the PRS constructed were from HIV-negative cohorts, however, we note that HIV status may have been missed or not recorded in those cohorts. Importantly, as more GWAS data becomes publicly available, future T2D PRS work in HIV populations should consider PRS trained in HIV-specific GWAS data for comparison to general population scores. Third, we were limited by availability of biologically relevant PRS for metaPRS construction by those traits published in the PGS catalog. Fourth, the inclusion of multiple PRS methodologies may have introduced additional variability, although we standardized scores to mitigate this effect. Fifth, we were unable to fully account for lifestyle and environmental factors, such as diet and physical activity, which may influence the observed associations and contribute to residual confounding. Sixth, HIV-related chronic inflammation, which is distinct from inherited inflammatory pathways, may obscure genetic effects, thereby diminishing the contribution of inflammation-related variants to T2D risk in PWH. Seventh, definitions of T2D differed slightly between REGARDS and CNICS, with REGARDS relying primarily on glucose measures and medication use, while CNICS used hemoglobin A1c, medication data, and diagnostic codes. These differences may have contributed to some degree of misclassification and variability in predictive performance across cohorts. Finally, our analysis was limited to prevalent T2D cases, potentially leading to misclassification and limiting generalizability to incident T2D.
In summary, while metaPRS showed no meaningful improvement in T2D prediction, single-trait PRS and traditional clinical risk factors remain critical components of T2D risk prediction. Future studies should focus on refining PRS models using larger, diverse populations and exploring the contributions of HIV-specific genetic factors to T2D risk in PWH. Leveraging genetic biomarkers, such as PRS and metaPRS, can serve as a tool for individualized prevention and management in precision medicine.
Data availability
The REGARDS (dbGaP accession: phs002719.v1.p1) and CNICS (dbGaP accession: phs001788.v1.p1) phenotypic and genetic data are available through the database of Genotypes and Phenotypes (dbGaP; https://www.ncbi.nlm.nih.gov/gap/) under controlled access. In addition, REGARDS data supported by the National Institute on Aging (NIA) and the National Institute of Neurological Disorders and Stroke (NINDS) through the Center for Alzheimer’s and Related Dementias (CARD) will be available via the AD Knowledge Portal (https://adknowledgeportal.synapse.org).
References
Martin, S. S. et al. Heart Disease and Stroke Statistics: A Report of US and Global Data From the American Heart Association. Circulation 151, e41-e660 (2025). (2025). https://doi.org/10.1161/CIR.0000000000001303
Prasad, R. B. & Groop, L. Genetics of type 2 diabetes-pitfalls and possibilities. Genes (Basel). 6, 87–123. https://doi.org/10.3390/genes6010087 (2015).
Khan, M. A. B. et al. Epidemiology of type 2 Diabetes - Global burden of disease and forecasted trends. J. Epidemiol. Glob Health. 10, 107–111. https://doi.org/10.2991/jegh.k.191028.001 (2020).
Kaprio, J. et al. Concordance for type 1 (insulin-dependent) and type 2 (non-insulin-dependent) diabetes mellitus in a population-based cohort of twins in Finland. Diabetologia 35, 1060–1067. https://doi.org/10.1007/BF02221682 (1992).
Newman, B. et al. Concordance for type 2 (non-insulin-dependent) diabetes mellitus in male twins. Diabetologia 30, 763–768. https://doi.org/10.1007/BF00275741 (1987).
Poulsen, P., Kyvik, K. O., Vaag, A. & Beck-Nielsen, H. Heritability of type II (non-insulin-dependent) diabetes mellitus and abnormal glucose tolerance–a population-based twin study. Diabetologia 42, 139–145. https://doi.org/10.1007/s001250051131 (1999).
Medici, F., Hawa, M., Ianari, A., Pyke, D. A. & Leslie, R. D. Concordance rate for type II diabetes mellitus in monozygotic twins: actuarial analysis. Diabetologia 42, 146–150. https://doi.org/10.1007/s001250051132 (1999).
Armstrong, N. D. et al. Variant level heritability estimates of type 2 diabetes in African Americans. Sci. Rep. 14, 14009. https://doi.org/10.1038/s41598-024-64711-3 (2024).
Mahajan, A. et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat. Genet. 54, 560–572. https://doi.org/10.1038/s41588-022-01058-3 (2022).
Vujkovic, M. et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat. Genet. 52, 680–691. https://doi.org/10.1038/s41588-020-0637-y (2020).
Suzuki, K. et al. Genetic drivers of heterogeneity in type 2 diabetes pathophysiology. Nature 627, 347–357. https://doi.org/10.1038/s41586-024-07019-6 (2024).
Chatterjee, N., Shi, J. & Garcia-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406. https://doi.org/10.1038/nrg.2016.27 (2016).
Kachuri, L. et al. Principles and methods for transferring polygenic risk scores across global populations. Nat. Rev. Genet. 25, 8–25. https://doi.org/10.1038/s41576-023-00637-2 (2024).
Ge, T. et al. Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Genome Med. 14, 70. https://doi.org/10.1186/s13073-022-01074-2 (2022).
Akhoon, N. & Precision Medicine A new paradigm in therapeutics. Int. J. Prev. Med. 12, 12. https://doi.org/10.4103/ijpvm.IJPVM_375_19 (2021).
Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590. https://doi.org/10.1038/s41576-018-0018-x (2018).
Abraham, G. et al. Genomic risk score offers predictive performance comparable to clinical risk factors for ischaemic stroke. Nat. Commun. 10, 5819. https://doi.org/10.1038/s41467-019-13848-1 (2019).
Inouye, M. et al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 72, 1883–1893. https://doi.org/10.1016/j.jacc.2018.07.079 (2018).
Hernandez-Romieu, A. C., Garg, S., Rosenberg, E. S., Thompson-Paul, A. M. & Skarbinski, J. Is diabetes prevalence higher among HIV-infected individuals compared with the general population? Evidence from MMP and NHANES 2009–2010. BMJ Open. Diabetes Res. Care. 5, e000304. https://doi.org/10.1136/bmjdrc-2016-000304 (2017).
Sarkar, S. et al. T. in Endotext (eds K. R. Feingold (2000).
Gazzaniga, G. et al. Dyslipidemia and Statin use in people with HIV-1 infection: beyond the lipid-lowering effect. Nutr. Metab. Cardiovasc. Dis. 35, 104110. https://doi.org/10.1016/j.numecd.2025.104110 (2025).
Ergin, H. E., Inga, E. E., Maung, T. Z., Javed, M. & Khan, S. HIV, antiretroviral therapy and metabolic alterations: A review. Cureus 12, e8059. https://doi.org/10.7759/cureus.8059 (2020).
Howard, V. J. et al. The reasons for geographic and Racial differences in stroke study: objectives and design. Neuroepidemiology 25, 135–143. https://doi.org/10.1159/000086678 (2005).
Armstrong, N. D. et al. Genetic contributors of incident stroke in 10,700 African Americans with hypertension: A Meta-Analysis from the genetics of hypertension associated treatments and reasons for geographic and Racial differences in stroke studies. Front. Genet. 12, 781451. https://doi.org/10.3389/fgene.2021.781451 (2021).
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287. https://doi.org/10.1038/ng.3656 (2016).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909. https://doi.org/10.1038/ng1847 (2006).
Bandres-Ciga, S. et al. NeuroBooster array: A Genome-Wide genotyping platform to study neurological disorders across diverse populations. Mov. Disord. 39, 2039–2048. https://doi.org/10.1002/mds.29902 (2024).
Akinyelure, O. P. et al. Racial differences in blood pressure control following stroke: the REGARDS study. Stroke 52, 3944–3952. https://doi.org/10.1161/STROKEAHA.120.033108 (2021).
Kitahata, M. M. et al. Cohort profile: the centers for AIDS research network of integrated clinical systems. Int. J. Epidemiol. 37, 948–955 (2008).
Crane, H. M. et al. Routine collection of patient-reported outcomes in an HIV clinic setting: the first 100 patients. Curr. HIV Res. 5, 109–118. https://doi.org/10.2174/157016207779316369 (2007).
Lee, W. J. et al. Polygenic risk scores point toward potential genetic mechanisms of type 2 myocardial infarction in people with HIV. Int. J. Cardiol. 383, 15–23. https://doi.org/10.1016/j.ijcard.2023.04.058 (2023).
Cheng, H. et al. Genetic architecture of cardiometabolic risks in people living with HIV. BMC Med. 18, 288. https://doi.org/10.1186/s12916-020-01762-z (2020).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7. https://doi.org/10.1186/s13742-015-0047-8 (2015).
Jin, Y., Schaffer, A. A., Feolo, M., Holmes, J. B. & Kattman, B. L. GRAF-pop: A Fast Distance-Based Method To Infer Subject Ancestry from Multiple Genotype Datasets Without Principal Components Analysis. G3 (Bethesda) 9, 2447–2461 (2019). https://doi.org/10.1534/g3.118.200925
Rayner, N. W. M. M.I. Development and Use of a Pipeline to Generate Strand and Position Information for Common Genotyping Chips. (2011).
Haas, C. B. et al. Assessing the associations between known genetic variants and substance use in people with HIV in the united States. PLoS One. 18, e0292068. https://doi.org/10.1371/journal.pone.0292068 (2023).
Crane, H. M., Kadane, J. B., Crane, P. K. & Kitahata, M. M. Diabetes case identification methods applied to electronic medical record systems: their use in HIV-infected patients. Curr. HIV Res. 4, 97–106. https://doi.org/10.2174/157016206775197637 (2006).
Lambert, S. A. et al. The polygenic score catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425. https://doi.org/10.1038/s41588-021-00783-5 (2021).
Gunn, S. et al. Comparison of methods for Building polygenic scores for diverse populations. HGG Adv. 6, 100355. https://doi.org/10.1016/j.xhgg.2024.100355 (2025).
Folkersen, L. et al. Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat. Metab. 2, 1135–1148. https://doi.org/10.1038/s42255-020-00287-2 (2020).
Prive, F. et al. Portability of 245 polygenic scores when derived from the UK biobank and applied to 9 ancestry groups from the same cohort. Am. J. Hum. Genet. 109, 12–23. https://doi.org/10.1016/j.ajhg.2021.11.008 (2022).
Monti, R. et al. Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning. Am. J. Hum. Genet. 111, 1431–1447. https://doi.org/10.1016/j.ajhg.2024.06.003 (2024).
Hui, Z. H. Regularization and variable selection via the elastic net. J. Roy Stat. Soc. 67, 301–320 (2005).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Chen, X. et al. Genomic risk score provides predictive performance for type 2 diabetes in the UK biobank. Acta Diabetol. 58, 467–474. https://doi.org/10.1007/s00592-020-01650-1 (2021).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to Monogenic mutations. Nat. Genet. 50, 1219–1224. https://doi.org/10.1038/s41588-018-0183-z (2018).
Irvin, M. R. et al. Polygenic risk for type 2 diabetes in African Americans. Diabetes 73, 993–1001. https://doi.org/10.2337/db23-0232 (2024).
Parcha, V. et al. Association of a multiancestry Genome-Wide blood pressure polygenic risk score with adverse cardiovascular events. Circ. Genom Precis Med. 15, e003946. https://doi.org/10.1161/CIRCGEN.122.003946 (2022).
Zou, R. S. et al. Polygenic scores and preclinical cardiovascular disease in individuals with HIV: insights from the REPRIEVE trial. J. Am. Heart Assoc. 13, e033413. https://doi.org/10.1161/JAHA.123.033413 (2024).
Rotger, M. et al. Contribution of genetic background, traditional risk factors, and HIV-related factors to coronary artery disease events in HIV-positive persons. Clin. Infect. Dis. 57, 112–121. https://doi.org/10.1093/cid/cit196 (2013).
Jun, H. et al. Fasting blood glucose variability and unfavorable trajectory patterns are associated with the risk of colorectal cancer. Gut Liver. 16, 423–432. https://doi.org/10.5009/gnl210048 (2022).
Gyawali, P. K. et al. Improving genetic risk prediction across diverse population by disentangling ancestry representations. Commun. Biol. 6, 964. https://doi.org/10.1038/s42003-023-05352-6 (2023).
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649. https://doi.org/10.1016/j.ajhg.2017.03.004 (2017).
Acknowledgements
The authors thank the study participants from the Center for AIDS Research Network of Integrated Clinical Systems (CNICS). Additionally, the authors thank the other investigators, the staff, and the participants of the REGARDS study for their valuable contributions. A full list of participating REGARDS investigators and institutions can be found at: https://www.uab.edu/soph/regardsstudy/.
Funding
The current CNICS work was supported by the National Heart, Lung, and Blood Institute (NHLBI) R01HL125027 (IP, HMC) and National Human Genome Research Institute R01HG010649 (IP, HMC). CNICS is an NIH-funded program (R24 AI067039) made possible by the National Institute of Allergy and Infectious Diseases (NIAID). The CFAR sites involved in CNICS include University of Alabama at Birmingham (P30 AI027767), University of Washington (P30 AI027757), University of California San Diego (P30 AI036214), University of California San Francisco (P30 AI027763), Case Western Reserve University (P30 AI036219), Johns Hopkins University (P30 AI094189, U01 DA036935), Fenway Health/Harvard (P30 AI060354), University of North Carolina Chapel Hill (P30 AI50410), Vanderbilt University (P30 AI110527), and University of Miami (P30 AI073961). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The REGARDS genetic study (R01HL136666, MRI) and polygenic risk score study (R35HL155466, MRI) were supported by the NHLBI. The parent REGARDS study was supported by cooperative agreement U01 NS041588, co-funded by the National Institute of Neurological Disorders and Stroke (NINDS) and the National Institute on Aging (NIA, ZO1 AG000949). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NINDS or the NIA. This research was supported, in part, by the Intramural Research Program of the National Institutes of Health (NIH). The contributions of the NIH author(s) are considered Works of the United States Government. The findings and conclusions presented in this paper are those of the author(s) and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services. Other funding sources include NHLBI K12HL143958 (NDA), K08HL159290 (RG), and R24AI067039 (JMJ).
Author information
Authors and Affiliations
Contributions
NDA, RG, HKT, and MRI contributed to the concept and design of the study. NDA, RN, LP, HI, and VS performed quality control of the genomics data. PWH, HMC, IP, SLH, ERC, and HKT provided insight into methodology. NDA, MRI, HKT, RG, PWH, HMC, IP, SLH, GAB, ERC, JMJ, RDM, and TD provided interpretation of the results. HKT and MRI supervised the study. NDA and MRI drafted the manuscript. NDA, HMC, IP, SLH, HI, LS, and MRI funded the study. All authors provided critical edits on subsequent versions of the manuscript and approved the final version.
Corresponding author
Ethics declarations
Competing interests
RG received research funding from Sanofi and served as a Consultant for the Advisory Board for Sanofi (not related to the current study). GAB received research funding from Cepheid and Merck Foundation and served as a consultant to MedIQ within the past 36 months (not related to this study). HI’s participation in this project was part of a competitive contract awarded to DataTecnica LLC by the National Institutes of Health to support open science. All other authors declare that they have no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Armstrong, N.D., Srinivasasainagendra, V., Pilla, L. et al. Genomic risk prediction of type 2 diabetes in people living with and without HIV. Sci Rep 16, 3078 (2026). https://doi.org/10.1038/s41598-025-31471-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-31471-7


