Introduction

Insulin resistance (IR) is a well-documented pathophysiological condition characterized by the diminished responsiveness of cells to insulin, a hormone essential for the regulation of glucose metabolism1. In individuals with IR, cellular sensitivity to insulin is markedly reduced, resulting in elevated blood glucose levels and compensatory hyperinsulinemia as the pancreas attempts to overcome this resistance2. This metabolic dysregulation is closely associated with an increased risk of several metabolic disorders, including type 2 diabetes (T2D), obesity, dyslipidemia, and cardiovascular diseases3,4. The impaired insulin action in key target tissues— especially skeletal muscle, liver, and adipose tissue—plays a central role in the pathogenesis of these conditions5. This underscores the critical importance of IR in the development and progression of metabolic disorders, necessitating targeted therapeutic strategies to mitigate its impact6.

The euglycemic-hyperinsulinemic clamp7 is considered the most reliable test for evaluating IR, but its complexity limits its use in large-scale studies. Prior research has demonstrated a robust correlation between IR scores derived from the homeostasis model assessment (HOMA) approach and IR assessed by glucose clamp techniques8. The triglyceride to high-density lipoprotein cholesterol (TG:HDL-C) ratio is an alternative IR marker that is more cost-effective and accessible in larger population studies compared to the HOMA test9,10,11,12. A recent genome-wide association study (GWAS) on TG:HDL-C in European populations discovered 114 single-nucleotide polymorphisms (SNPs) associated with IR13. However, this previous GWAS on TG:HDL-C was limited to populations of European ancestry. Expanding genetic studies to diverse populations presents a valuable opportunity to uncover insights into the complex genetic foundations of IR14. In addition, IR is associated with impaired fatty acid utilization; increased free fatty acid flow from adipose to nonadipose tissue leads to aberrant fat metabolism and worsens IR. Hence, the TyG index, calculated from fasting plasma glucose and triglycerides, has emerged as another simple surrogate marker for IR15,16. The TyG index has high sensitivity for recognizing IR among apparently healthy subjects, compared with the HOMA-IR index16. As of the time of this writing and to our current knowledge, GWASs focusing on TyG have not been undertaken.

In this study, we conducted the GWAS on IR surrogate markers, including TyG, TG:HDL-C, and the logarithmically transformed TG:HDL-C ratio (log(TG:HDL-C)), in the Taiwanese population utilizing the Taiwan Biobank. We also carried out comparisons of heritability and genetic correlations among these markers. Due to the strong correlation between these IR surrogate markers and the lack of GWASs on TyG—an emerging and robust marker of IR—our subsequent analyses were centered on TyG. Additionally, fine mapping and polygenic risk score (PRS) analysis were performed for TyG. Moreover, we explored the relationships between disease phenotypes and IR surrogate markers. Furthermore, because sex differences in IR have been suggested17, we undertook sex-stratified and sex-differentiated analyses of TyG in both female and male cohorts, along with a transcriptome-wide association study (TWAS) of TyG in the whole cohort. Finally, pathway analysis was accomplished on genes associated with TyG to reveal additional clues to the causes and consequences of IR-related phenotypes.

Results

Taiwan Biobank study cohort

Supplementary Table 1 presents baseline characteristics of the Taiwanese population within the Taiwan Biobank, stratified by whole, female, and male cohorts. In our study utilizing the Taiwan Biobank, we analyzed a total of 136,735 individuals with measurements for TyG and 138,303 individuals for TG:HDL-C and log(TG:HDL-C) (Supplementary Table 1). The gender distribution in both samples consisted of ~35.5% male participants and 64.5% female participants.

GWASs of IR markers

We conducted GWASs on three IR surrogate markers–TyG, TG:HDL-C, and log(TG:HDL-C)–within the Taiwan Biobank sample. The GWAS test statistics were calibrated at a genome-wide level, ensuring robustness and reliability (Supplementary Fig. 1). Figure 1 presents the Manhattan plot illustrating the distribution of association p values across the genome for SNPs associated with these markers.

Fig. 1: Manhattan plots.
figure 1

The Manhattan plots for the identified SNPs associated with the three IR surrogate markers, including a the TyG index, b TG:HDL-C ratio, and c log(TG:HDL-C) ratio. IR insulin resistance, TG:HDLC the triglyceride to high-density lipoprotein cholesterol ratio, TyG the product of fasting plasma glucose and triglycerides (Ln[fasting triglycerides (mg/dL) × fasting glucose (mg/dL)/2]). This GWAS analysis utilized BOLT-LMM’s mixed linear models with a two-sided chi-square test. The conventional genome-wide significance threshold of P < 5 × 10−8 was applied. Source data is provided as a Source Data file.

Supplementary Table 2 provides the number of significant SNPs identified at various significant thresholds for each marker, offering a comprehensive overview of the genetic landscape associated with IR. Furthermore, Supplementary Table 3 presents the GWAS results of the top 20 significant SNPs after clumping for each marker, highlighting the genetic variants with the strongest associations with IR.

Four gene clusters are associated with TyG

Supplementary Tables 4 and 5 provide an overview of all identified variants and the top 20 variants/genes with genome-wide significance for TyG, respectively. These variants primarily clustered within four gene regions, centered on GCKR (chromosome 2), MLXIPL (chromosome 7), APOA5 (chromosome 11), and APOC1 (chromosome 19). Further analysis identified additional significant genes within these clusters: 29 genes in the GCKR cluster, 17 in the MLXIPL cluster, 15 in the APOA5 cluster, and 6 in the APOC1 cluster (Tables 14 and Fig. 2).

Table 1 Summary of top variants in the GCKR gene cluster exhibiting genome-wide significance for the TyG index, a marker of insulin resistance, identified within the Taiwan Biobank cohort
Table 2 Summary of top variants in the MLXIPL gene cluster exhibiting genome-wide significance for the TyG index, a marker of insulin resistance, identified within the Taiwan Biobank cohort
Table 3 Summary of top variants in the APOA5 gene cluster exhibiting genome-wide significance for the TyG index, a marker of insulin resistance, identified within the Taiwan Biobank cohort
Table 4 Summary of top variants in the APOC1 gene cluster exhibiting genome-wide significance for the TyG index, a marker of insulin resistance, identified within the Taiwan Biobank cohort
Fig. 2: Locus zoom plots of gene clusters associated with TyG index in the Taiwan Biobank.
figure 2

The locus zoom plot for the a GCKR, b MLXIPL, c APOA5, and d APOC1 gene clusters concerning the TyG index in the Taiwan Biobank illustrates single nucleotide polymorphisms (SNPs) by their chromosomal positions and their association with TyG (−log10 P). The SNPs are color-coded to indicate their linkage disequilibrium with the top SNPs a rs1260326; b rs3812316; c rs651821; and d rs483082. Additionally, estimated recombination rates are depicted in cyan, derived from Asian subjects in the 1000 Genomes Project. The plot was generated using LocusZoom. This GWAS analysis utilized BOLT-LMM’s mixed linear models with a two-sided chi-square test. The conventional genome-wide significance threshold of P < 5 × 10−8 was applied. The topologically associating domain (TAD) TAD-1 spans from 25,920,000 to 27,320,000 (GRCh38) in pancreatic tissue. The TAD-2 spans from 27,680,000 to 28,520,000 (GRCh38) in pancreatic tissue. The TAD-3 spans from 73,160,000 to 74,760,000 (GRCh38) in pancreatic tissue. The TAD−4 spans from 116,760,000 to 119,360,000 (GRCh38) in pancreatic tissue. Notably, these TADs cover a significant portion of the GCKR, MLXIPL, and APOA5 gene clusters. Source data are provided as a Source Data file.

We conducted additional GWASs for TG:HDL-C and glucose levels (Supplementary Tables 6-7). Supplementary Tables 89 present the top variants exhibiting genome-wide significance for TG:HDL-C and glucose levels, respectively. The top 20 variants for TG:HDL-C are identical to those for TyG, with different rankings (Table 5). GCKR, a major factor in T2D risk18, is the only gene consistently associated with glucose levels, TG:HDL-C, and TyG (Table 5).

Table 5 Comparison summary of top variants exhibiting genome-wide significance for the TyG index, TG:HDL-C index, and glucose levels, identified within the Taiwan Biobank (sorted by gene)

Supplementary Table 10 summarizes genome-wide significance for variants associated with insulin-related traits in previous studies. Our analysis revealed 76 genome-wide significant genes in TyG that had been previously associated with IR in GWAS studies on European and/or East Asian populations (Supplementary Table 11).

Genes for IR identified in the Taiwan Biobank

Our GWAS on TyG within the Taiwan Biobank unveiled 197 genes not previously documented in the context of IR (Supplementary Table 12). According to the NHGRI-EBI GWAS Catalog19, these genes have been associated with TG (99 genes), HDL-C (58 genes), T2D (29 genes) separately, as well as traits known to be associated with T2D risk including Alzheimer’s Disease (AD; 14 genes), fasting glucose levels (22 genes), and body mass index (49 genes).

The top 20 genes, previously unreported in the literature, exhibiting genome-wide significance for TyG (Table 6) are primarily clustered within four gene regions: GCKR, MLXIPL, APOA5, and APOC1. Supplementary Fig. 2 displays a Manhattan plot of these top 20 TyG genes, emphasizing SNPs neutral in European studies but significant in our cohort, indicating population-specific effects. Analysis of LD between top SNPs and SNPs previously unreported in the literature for TyG (Supplementary Table 13) revealed varying degrees of LD within each gene region, providing insights into potential functional relationships. For example, we observed strong LD between MLXIPL rs3812316 and TBL2 rs13246490 (r2 = 0.835), suggesting a possible shared genetic effect. In contrast, APOA1 rs12718464 and APOA5 rs651821 exhibited weak LD (r2 = 0.008), indicating potentially independent genetic influences on TyG within this region.

Table 6 Top 20 novel genes exhibiting genome-wide significance for the TyG index, a marker of insulin resistance, identified within the Taiwan Biobank cohort

Conditional analyses on the top 20 genes, previously unreported in the literature, associated with TyG (Supplementary Table 14) helped discern independent genetic effects within closely located loci. For instance, APOA1 rs12718464 maintained a significant association with TyG (P = 1.1E-13) when conditioning on APOA5 rs651821, suggesting an independent effect. Conversely, conditioning on MLXIPL rs3812316 nullified the association of TBL2 rs13246490 (P = 0.32), indicating these variants likely represent the same genetic signal. These findings enhance our understanding of the complex interplay between genetic variants influencing TyG and help identify truly independent genetic associations.

Heritability and genetic correlations for IR surrogate markers

We estimated the heritability20 of TyG, TG:HDL-C, and log(TG:HDL-C) to be 15.5% (standard deviation (stdev) = 2.5%), 13.9% (stdev = 2.3%), and 17.3% (stdev = 2.5%), respectively (Supplementary Table 15). These values indicate the proportion of phenotypic variance attributed to genetic factors for each IR surrogate marker. Intercept values ranging from 1.07 to 1.09 suggest minimal biases due to confounding factors.

We also assessed genetic correlations to determine the extent to which genetic factors influencing TyG also affect TG:HDL-C and log(TG:HDL-C)20. The genetic correlations among TyG, TG:HDL-C, and log(TG:HDL-C) were notably high, ranging from 0.96 to 0.99 (Supplementary Table 15). This strong correlation underscores the close relationship among these three IR surrogate markers, highlighting their interconnectedness in the context of genetic influences on IR.

Fine mapping analysis for TyG

We employed fine mapping analysis to identify candidate causal variants linked to TyG21, consolidating overlapping loci into 11 distinct genomic regions of interest (Supplementary Table 16 and Supplementary Fig. 3). This approach enhanced analytical precision and prioritized areas for further investigation. For instance, within the chromosomal region 43378777–46429300 base pair on chromosome 19, five credible sets were identified with purity values ranging from 0.92 to 1.0 (Supplementary Table 16), suggesting a high degree of independence among the SNPs. Coverage values for these sets ranged from 0.95 to 1.0, indicating a high probability that the true causal variant for IR is represented within the credible sets. Certain SNPs exhibited PIP values exceeding 0.9, signifying a heightened probability of these variants being linked to IR (Supplementary Fig. 3). This indicates a robust likelihood that these specific genetic variants may play a significant role in IR manifestation or susceptibility, based on evidence from the Bayesian variable selection approach.

PRS analysis for TyG

We conducted PRS analysis to quantify the cumulative impact of multiple genetic variants on TyG22. The analysis used 854,050 SNPs to calculate the PRS for TyG (Supplementary Table 17 and Supplementary Fig. 4). The PRS model demonstrated substantial explanatory power, with 23.80% of the variance in TyG attributed to genetic factors. The incremental predictive value (R2R2 Null) achieved by integrating genetic information into the PRS model was 10.57% compared to a model without genetic predictors. The R2 and R2R2 Null values for TyG surpassed those reported in diverse phenotypes from prior investigations (e.g., triglyceride23 in Supplementary Table 17 and Supplementary Fig. 5), underscoring the robustness and effectiveness of the PRS model in elucidating the genetic underpinnings of TyG in the context of IR.

Association of disease phenotypes with IR surrogate markers

This study investigated the relationship between 27 disease phenotypes and IR surrogate markers (TyG, TG:HDL) and their PRS within the Taiwan Biobank (Table 7). Statistical significance was determined using the Bonferroni correction threshold of p = 0.00185. TyG exhibited associations with 20 different diseases, including coronary heart disease and mood disorders (P < 0.0001). TG:HDL showed associations with 14 diseases, such as gout and hyperlipidemia (P < 0.0001). PRS of TyG was linked to gout (P = 0.0016) and hyperlipidemia (P < 0.0001). PRS of TG:HDL was also associated with gout (P = 0.001) and hyperlipidemia (P < 0.0001). These findings highlight the relationships between IR surrogate markers, their PRS, and various disease phenotypes within the Taiwan Biobank.

Table 7 The association of disease phenotypes with the TyG index, TG:HDL ratio, PRS of the TyG index, and PRS of the TG:HDL ratio

Sex-stratified and sex-differentiated analyses of TyG

Sex-specific genetic differences have been found previously in the context of IR, and exploring sex-specific genetic effects may reveal a substantial number of important genes, previously unreported in the literature17. We conducted sex-stratified analyses, revealing 5787 significant SNPs in females (Supplementary Table 18) and 2817 SNPs in males (Supplementary Table 19).

Sex-differentiated analyses identified 915 SNPs with statistically heterogeneous effects between sexes (heterogeneity p value < 0.05) among those with genome-wide significance (Supplementary Table 20). For instance, the SNP rs7412 in APOE showed a heterogeneity p value of 2.37E-4 between sexes, with a p value of 2.29E-33 (effect size = 0.0796) for males and a p value of 4.43E-24 (effect size = 0.0494) for females. We identified 42 genes with sex-specific effects in females and 25 in males (Supplementary Table 21). Notably, female-specific genes were found in the MLXIPL cluster (FZD9 and BAZ1B) and APOA5 cluster (APOA4, APOA1, PAFAH1B2, SIDT2, TAGLN, and PCSK7). Male-specific genes such as MRPL33 (GCKR cluster), BACE1, CEP164, DSCAML1 (APOA5 cluster), and APOC1P1 (APOC1 cluster) were identified. The top loci with stronger female-specific effects were mapped to KLF14, APOA4, NID2, and TNFAIP8. Conversely, loci mapping to GMIP, TRIB1, and PBX4 showed stronger male-specific effects. Specifically, the locus rs1364422 (KLF14) demonstrated the strongest female-specific effect (heterogeneity p value = 1.51E-05). The locus rs190712692 (APOC1) showed the strongest male-specific effect (heterogeneity p value = 1.24E-05). However, some APOC1 loci also exhibited female-specific effects.

In our parent-offspring cohort, we detected parent-of-origin effects at the KLF14 locus (rs1364422), with 0 paternal and 1173 maternal transmissions of the minor allele in 1173 heterozygous offspring, indicating strong maternal inheritance (p value < 2.2E-16; Supplementary Table 22).

TWAS of TyG in the whole cohort

We used a TWAS approach24 to identify genes with expression differences associated with TyG. Supplementary Tables 2327 present genes exhibiting significant associations with TyG in TWAS at genome-wide significance across chromosomes 2, 7, 11, 19, and in aggregate. In the GCKR gene cluster, 23 of 29 genes showed significant associations in both TWAS and GWAS, predominantly in adipose-subcutaneous, muscle-skeletal, and thyroid tissues (Table 1). The MLXIPL gene cluster revealed 9 of 17 genes with significant associations in both analyses, primarily in adipose-subcutaneous tissue (Table 2). In the APOA5 gene cluster, 13 of 15 genes demonstrated significant associations, mainly in adipose-visceral-omentum tissue (Table 3). The APOC1 gene cluster showed 4 of 6 genes with significant associations in both TWAS and GWAS (Table 4).

Pathway analysis of TyG

Pathway analysis was employed to elucidate key pathways influenced by TyG and gene-metabolic interactions25. The analysis included genome-wide significant genes, sex-specific genes, and genes from previous studies (Supplementary Tables 2831). A summary of GO analysis results for four gene groups is presented in Table 8 and Supplementary Table 32.

Table 8 The summary of the pathway analysis results for the whole cohort, sex-specific cohorts, and genes from previous studies

Genome-wide significant genes are involved in AD pathogenesis, glucose homeostasis, insulin resistance and signaling, neurodegenerative pathways, Wnt signaling cascade, and plasma lipoprotein dynamics (Table 8). These enriched GO terms and pathways demonstrate substantial overlap with those associated with genes from previous studies (Table 8), suggesting a consistent functional profile across multiple investigations.

Female-specific genes exhibited similar enrichment to genome-wide significant genes (Table 8). The congruence between female-specific and genome-wide significant genes implies a potentially prominent role for these pathways in female-specific disease mechanisms. Conversely, male-specific genes revealed a distinct functional profile (Table 8) and lacked several pathways enriched in female-specific genes (e.g., estrogen signaling pathway). The male-specific genes showed enrichment in the tight junction pathway, which was absent in the female-specific gene set. This sexual dimorphism in pathway enrichment suggests potential differences in disease mechanisms between sexes.

Discussion

A recent European GWAS pinpointed 114 genetic loci that play a role in TG:HDL-C13. However, despite the large number of identified loci, they do not fully account for the heritability of IR, suggesting the existence of additional undiscovered genetic variants. Because the majority of GWAS studies on IR have focused predominantly on European populations, genetic variants that contribute to variations in traits across different populations are likely to be discovered by expanding genetic investigations to diverse populations14,26. Indeed, this inaugural GWAS of IR markers in the Taiwan Biobank has provided a wealth of loci, previously unreported in the literature, enhancing our understanding of the genetic underpinnings of IR-related diseases.

In our GWAS of IR markers, we identified genome-wide significant associations with loci in four distinct genomic regions: GCKR, MLXIPL, APOA5, and APOC1. SNPs with minimal impact in European studies are significant here (Supplementary Fig. 2), indicating potential unique genetic or environmental influences in Asians, warranting further investigation. The enrichment of IR-associated loci over such large regions might be attributed to the unique population structure and haplotype patterns present in the Taiwanese population. However, we found that these clusters are included within haploblocks identified in broader population studies14,27,28,29,30,31. For instance, our analysis of the East Asian population from the 1000 Genomes Project revealed peaks in recombination rates at the boundaries of the APOA5 gene cluster (Fig. 2). Furthermore, although they have not been highlighted in previous studies, similar clusters over the same genetic regions have also appeared in studies within the European, African, Korean, South Asian, and diverse populations14,27,28,29,30,31. For example, a recent study identified these four gene clusters in a diverse cohort, associating them with blood glucose, HDL-C, and TG levels (Supplementary Table 33)28. These clustered associations, therefore, may reflect low recombination rates within the regions that lead to a high probability of co-inheritance driven by strong IR-related loci, such as GCKR32,33.

Another explanation for genetic association to extend across such broad genomic regions is speculative but bears some mention here. We noted that the clustered high-association regions were mostly contained within the same or closely adjacent topologically associated domains (TADs), three-dimensional chromatin structures that enclose adjacent genes that are subject to some degree of co-regulation and co-expression in vivo (Tables 14 and Fig. 2). Regulatory elements within TAD structures can influence the disease-related expression of multiple neighboring genes34,35, and especially considering the TWAS results indicating the common mis-expression for these clustered genes in adipocytes (Tables 14), we speculate that the extended association clusters we identified could reflect the influence of noncoding variants that alter regulatory relationships and extend across the TADs. In contrast to the other three clusters, the APOC1 cluster is not enclosed within an established TAD (Fig. 2); however, the genes in this cluster are known to interact with the same long-distance enhancers36,37, suggesting that enhancer variation could contribute to a cluster-wide association with IR in this region as well.

Our sex-differentiated analyses of TyG identified 40 female-specific and 25 male-specific genes. Notably, KLF14 exhibited the most pronounced sex-specific effect in females, consistent with recent findings in European populations13. KLF14-associated SNPs have consistently shown sex-specific associations with metabolic traits, including T2D, waist-to-hip ratio, TG, HDL-C, and low-density lipoprotein cholesterol38,39,40,41, and corroborating these human studies, KLF14 deficiency in adipocytes leads to increased adiposity in female but not male mice42. The observed stronger associations in females are hypothesized to stem from the modulation of KLF14 expression rather than from hormonal influences43,44. We also identified several female-specific genes located within the APOA5 gene cluster, aligning with previous research45,46,47,48. These findings underscore sex-dependent mechanisms in lipid regulation and metabolism, exemplifying the importance of considering sex-specific effects in genetic studies of metabolic traits.

The imprinting analysis of KLF14 not only corroborates its known role in sex-specific genetic influence but also highlights the utility of large biobank datasets with familial information for dissecting complex genetic mechanisms49. Our findings suggest that the imprinting status of KLF14 could influence its biological roles, potentially affecting metabolic pathways or disease risk in a manner dependent on the sex of the parent from whom the allele is inherited44,50. Future studies could expand this approach to other genes with suspected imprinting effects, furthering our understanding of how genetic and epigenetic factors interplay in human health and disease.

Comparing gene sets from previous studies and the current genome-wide significant genes revealed substantial concordance in enriched pathways, particularly those implicated in insulin resistance/signaling, lipid metabolism, and glucose homeostasis, highlighting the critical role of these pathways in the etiology of IR. Interesting, GO analysis of the four distinct gene groups (genome-wide significant, female-specific, male-specific, and previously identified genes) also consistently demonstrated significant involvement in neurodegenerative processes; pathways associated with AD, including those related to nervous system development, are prominently represented across all groups. This shared enrichment suggests a potential mechanistic link underlying the high association between IR and neurodegenerative disorders51,52,53.

Our pathway analysis revealed some potential differences in male and female groups. For example, the GO analysis of female-specific genes revealed a distinct enrichment in pathways associated with estrogen signaling; this pathway’s prominence in female-specific genes is consistent with its proposed role in modulating disease risk or progression in women54. Conversely, the male-specific gene set exhibits a unique enrichment in the tight junction pathway, which is crucial for maintaining cellular barriers55. This pathway is absent in the female-specific gene set, indicating a potential sex-based difference in cellular mechanisms. Furthermore, insulin signaling pathway genes were more highly represented in the female-specific genes, pointing to potentially divergent metabolic regulation between sexes56,57. The identification of distinct molecular pathways in male and female-specific gene sets underscores the potential for developing sex-specific therapeutic approaches58,59. Sex-specific genetic factors and disease mechanisms interplay complexly17, suggesting that sex-tailored interventions may improve treatment efficacy60. Further research is warranted to elucidate the functional implications of these sex-specific pathway enrichments and their impact on disease risk and treatment strategies61,62.

The absence of medication data limits our understanding of drug effects on insulin resistance markers. Our sensitivity analysis, excluding self-reported diabetes mellitus (DM) cases (Supplementary Fig. 6 and Supplementary Table 34), shows minimal change in genetic associations for the TyG index, suggesting robustness despite potential DM treatment effects. Nonetheless, this does not substitute for medication records, emphasizing the need for future studies to include such data to explore medication’s influence on genetic associations with metabolic traits.

Methods

Taiwan Biobank

The study cohort comprised 147,880 Taiwanese subjects from the Taiwan Biobank63,64,65,66,67,68,69. Detailed inclusion and exclusion criteria26 are presented in Supplementary Methods. Ethical approval for the study was obtained from the Institutional Review Board of Taipei Veterans General Hospital (approval number: 2023-04-007CC#1). All participants provided informed consent in accordance with established guidelines and regulations.

Genotyping and imputation

Stringent quality control measures were applied to ensure reliable SNP analysis66,70. SNPs were excluded if they deviated from Hardy-Weinberg equilibrium (P < 1 × 10−6), exhibited a genotyping call rate below 95%, or a minor allele frequency less than 1%. From an initial set of 686,370 directly genotyped and 15,851,039 imputed SNPs, 7,604,854 SNPs remained after quality control and were used for functional prediction.

Statistical analysis

We conducted a GWAS to investigate the genetic underpinnings of three IR surrogate markers: TyG16, TG:HDLC13, and log(TG:HDLC). The TyG index was derived using the formula Ln[fasting triglycerides (mg/dL)×fasting glucose (mg/dL)/2]. The GWAS analysis employed mixed linear models in BOLT-LMM (version 2.4.1)71, adjusting for sex and the top 10 principal components (PCs) to account for potential confounding factors. Manhattan and quantile-quantile (Q-Q) plots were generated utilizing the R package ‘qqman’. The study used a genome-wide significance threshold of P < 5 × 10−8.

To identify independent genetic variants, we employed the clumping method in PLINK70, which accounts for linkage disequilibrium (LD) patterns in the genome. We performed fine mapping using susieR (v.0.12.35)21, which includes two metrics: Posterior Inclusion Probability (PIP) and Residual Sum of Squares Posterior Inclusion Probability (RSS PIP). We estimated heritability and genetic correlations using LD Score Regression (LDSC) v.1.0.120. We constructed PRS analysis using PRS-CS (v.1.1.0)22. Detailed methodologies for clumping, fine mapping, LDSC statistics, and PRS models are available in Supplementary Methods.

Sex-stratified and sex-differentiated analyses

Sex-stratified analyses used mixed linear models in BOLT-LMM (version 2.4.1)71, separately for male (n = 48,189) and female (n = 88,564) subjects, adjusting for sex and the top 10 PCs. Sex-differentiated analysis72 utilized GWAMA73 with the “-sex” option to identify sex-specific allelic effects and test for heterogeneity between sexes.

Transcriptome-wide association studies

We conducted TWAS to investigate the relationship between gene expression and TyG. Using FUSION24, we combined gene expression measurements with GWAS summary statistics. We obtained precomputed expression reference weights from GTEx(v7) for 11 human tissues, including various adipose, brain, and organ tissues. FUSION was then used to calculate TWAS P-values, identifying genes with significant associations between their expression levels and TyG susceptibility.

Pathway analysis

We constructed protein-protein interaction (PPI) networks utilizing NDEx-The Network Data Exchange74 with significant genes associated with TyG. The Human Integrated Protein–Protein Interaction Reference database75 was employed for PPI analysis within NDEx. We then conducted pathway analysis on the genes within the PPI network using ClueGO25, a Cytoscape76 plugin. ClueGO leverages various ontology source databases to identify gene ontology (GO) terms and pathway networks, elucidating the functional significance of identified genes within specific biological processes and pathways25.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.