A cross-population compendium of gene–environment interactions

Namba, Shinichi; Sonehara, Kyuto; Koyanagi, Yuriko N.; Kikuchi, Takezo; Ojima, Takafumi; Edahiro, Ryuya; Sato, Go; Yamaji, Taiki; Tomofuji, Yoshihiko; Ueda, Hiroyuki; Yamamoto, Kenichi; Ogawa, Yosuke; Suzuki, Ken; Kanai, Akinori; Higashiue, Shinichi; Kobayashi, Shuzo; Yamaguchi, Hiroki; Nagata, Yasunobu; Okazaki, Yasushi; Matsumoto, Naoyuki; Motomura, Kenta; Koga, Hidenobu; Hishida, Asahi; Ikezaki, Hiroaki; Hara, Megumi; Nagayoshi, Mako; Oze, Isao; Nakano, Shiori; Oda, Yoshiya; Suzuki, Yutaka; Iwasaki, Motoki; Sawada, Norie; Matsuo, Keitaro; Morisaki, Takayuki; Yamauchi, Toshimasa; Kadowaki, Takashi; Matsuda, Koichi; Okada, Yukinori

doi:10.1038/s41586-025-10054-6

Download PDF

Article
Open access
Published: 28 January 2026

A cross-population compendium of gene–environment interactions

Nature volume 651, pages 688–697 (2026)Cite this article

30k Accesses
2 Citations
91 Altmetric
Metrics details

Subjects

Abstract

Environmental differences in genetic effect sizes, namely, gene–environment interactions, may uncover the genetic encoding of phenotypic plasticity^1,2,3. We provide a cross-population atlas of gene–environment interactions comprising 440,210 individuals from European and Japanese populations, with replication in 539,794 individuals from diverse populations. By decomposing the contributions from age, sex and lifestyles, we delineate the aetiology of these gene–environment interactions, including a reverse-causality from a disease-related dietary change. Genome-wide analyses uncovered missing heritability and trait–trait relationships connected by the synergistic effects of genome and environments, which systematically affected polygenic prediction accuracy and cross-population portability. Single-cell projection revealed aging shift of pathways and cell types responsible for genetic regulation. Omics-level gene–environment analyses identified multiple sex-discordant genetic effects in lipid metabolism, informing clinical trial failures for genetically supported drug development. Our comprehensive gene–environment study decodes the dynamics of genetic associations, offering insights into complex trait biology, personalized medicine and drug development.

An approach to identify gene-environment interactions and reveal new biological insight in complex traits

Article Open access 22 April 2024

Gene-lifestyle interactions in the genomics of human complex traits

Article Open access 22 March 2022

Genotype × environment interactions in gene regulation and complex traits

Article 10 June 2024

Main

There has been great success in human genetics—particularly genome-wide association studies (GWAS)—at revealing disease pathophysiology and complex traits biology⁴. Genetic association mapping on multi-omics layers has covered proteomics⁵, metabolomics⁶ and single-cell RNA sequencing (scRNA-seq)⁷, providing granular insights into trait-associated genetic loci. However, such efforts focus on fixed genetic effects (more precisely, marginal effects), oversimplifying the intrinsic complexity of trait biology¹. Essentially, human phenotypes show dramatic changes in response to multifactorial environmental exposures, including sex, senescence and lifestyle. Inter-individual heterogeneity in responses to environments has been shaped by genetic adaptation^8,9 and affects present disease risks² and drug efficacy¹⁰. Genetically, this phenotypic plasticity manifests as changes in genetic effect sizes across environmental factors (or, equivalently, changes in environmental effects across genotypes), namely, gene–environment (G×E) interactions³ (Fig. 1a). G×E interactions capture dynamic changes in genetic effects, unveiling the genetic regulation of phenotypic plasticity. In some traits, G×E interaction studies have begun to explain phenotypic variation not captured by marginal effects (that is, missing heritability) and disparities in polygenic risk prediction¹¹. Therefore, identifying G×E interactions may contribute to mitigating health disparities and implementing personalized medicine precisely¹².

**Fig. 1: A cross-population atlas of G×E interactions.**

Nevertheless, after decades of effort³, there is a limited number of established G×E interactions in humans¹³, and their biological interpretation remains underestablished¹⁴. Past studies have suffered from low replication rates^15,16 due to low statistical power¹⁷, heavy multiple-testing burdens¹⁸, arbitrary filtering of genetic variants¹⁵ and, in some cases, imprecise statistical testing¹⁹. G×E interactions have been studied at scale only for limited traits and environments, primarily in European populations²⁰. Therefore, a global overview of G×E interactions across phenotypes, environments and populations remains unknown.

Here, using the recent advent of population-scale biobanks²¹ and computationally efficient methods²², we conducted parallel genome-wide G×E interaction studies using UK Biobank (UKB) and Biobank Japan (BBJ) to provide a cross-population atlas of G×E interactions. We validated the identified G×E interactions in four independent cohorts with diverse populations, annotated the environmental contributors and assessed their impacts on heritability, polygenic prediction accuracy and responsible cell types. Multi-omics G×E analyses provided molecular insights into clinical G×E interactions. These multi-resolution analyses demonstrated that G×E interactions have pivotal roles in regulating dynamic phenotypic plasticity, informing personalized phenotype prediction and drug development.

G×E interactions in individual biobanks

To reliably detect G×E interactions, we divided UKB and BBJ into discovery and replication cohorts (UKB1, N_max = 273,453; UKB2, N_max = 38,149; BBJ1, N_max = 166,757; BBJ2, N_max = 65,373) (Fig. 1b and Supplementary Table 1). Targeting 38 biomarkers and 9 diseases, which spanned 10 categories (anthropometric, metabolic, proteins, kidney-related, electrolytes, liver-related, inflammatory, haematological, blood pressure and diseases), G×E interactions were tested for nine environmental factors individually and jointly, with P values aggregated per variant on the Cauchy distribution²³ to assess genome-wide significance (Extended Data Fig. 1 and Supplementary Tables 2 and 3). The environmental factors included age, sex, ever-drinking, ever-smoking and current-smoking, and four clusters for diet and physical activity derived from questionnaire data (Extended Data Fig. 2 and Supplementary Table 4).

In UKB1, we identified 64 genome-wide significant G×E interactions at 45 loci spanning all trait categories (P_G×E < 5.0 × 10⁻⁸), with 31 interactions at 23 loci remaining after Bonferroni correction (P_G×E < 5.3 × 10⁻¹⁰), indicating that G×E interactions were widespread in human complex traits (Fig. 1c,d and Supplementary Tables 5 and 6). These included known interactions, G×Current-smoking at the HYKK locus for body mass index (BMI)²⁴, G×Age at the UMOD locus for estimated glomerular filtration rate (eGFR)²⁵, and G×E at the FTO locus for BMI driven by multiple environments such as physical activity, diet, age, drinking and smoking²⁶, empirically validating our results. The remaining interactions—based on our curation of GWAS Catalog²⁷—were not reported at P_G×E < 5.0 × 10⁻⁸ (Supplementary Note 2 and Supplementary Table 7). In total, 16 loci overlapped with recent UKB G×E reports using different protocols^13,28 (Supplementary Note 3 and Supplementary Table 8). We observed pleiotropy at 13 loci (10 intra- and 3 inter-categorical), with 2 inter-categorical loci showing distinct significant variants across trait categories (Extended Data Fig. 3), suggesting trait-category specificity of G×E pleiotropy in contrast to the broader pleiotropy of marginal effects²⁹.

In BBJ1, 36 significant G×E interactions were detected across 15 loci (26 across 8 loci after Bonferroni correction) (Fig. 1e,f and Supplementary Table 6). These included the well-established locus in the European population, that is, the FTO locus for BMI, which we confirmed in the East Asian population (driven by G×Age and G×Ever-drinking). Other loci with P_G×E < 5.0 × 10⁻⁸ have not been reported in the GWAS Catalog, emphasizing the importance of studying G×E interactions in non-European populations. Notably, 58% (21 out of 36) of G×E interactions were at the ALDH2 locus, which harbours an East-Asian-specific missense variant (rs671) with a strong dominant effect on alcohol metabolism³⁰, consistent with its high pleiotropy for GWAS²⁹.

Inflation was minimal in both cohorts (Supplementary Table 5), and the results were robust to phenotype normalization (Supplementary Table 9). A stepwise variable selection approach revealed that a mean of 2.4 environments contributed to G×E interactions (range = 1–7; Supplementary Table 6), supporting our approach of testing G×E interactions both individually and jointly across environments. In 91% (10 out of 11) of the intra-categorical pleiotropic loci, at least one shared environment contributed to all traits, suggesting that the combinations of trait categories and environments were major determinants of G×E pleiotropy.

Replication within populations

Of the 64 G×E interactions in UKB1, 23 were nominally replicated in UKB2 (P_G×E < 0.05), and 6 remained significant after Bonferroni correction (Fig. 1c and Supplementary Table 6). In BBJ, 28 out of 36 G×E interactions were nominally replicated, and 19 were significant in BBJ2 (Fig. 1f and Supplementary Table 6). These included a G×Ever-drinking interaction at the ALDH2 locus for haemoglobin in BBJ1 (P_G×E = 2.2 × 10⁻¹⁵; P_marginal = 1.7 × 10⁻³), replicated in BBJ2 (P_G×E = 2.9 × 10⁻⁹; P_marginal = 8.9 × 10⁻³), highlighting context-specific effects that would be missed by marginal genetic tests. The same locus also showed a significantly replicated G×Ever-drinking interaction for type 2 diabetes (P_G×E = 4.8 × 10⁻¹⁶ in BBJ1 and 1.2 × 10⁻⁶ in BBJ2). This interaction remained significant after adjusting for haemoglobin (P_G×E = 8.1 × 10⁻¹⁵ in BBJ1), suggesting minimal mediation by haemoglobin. For replication, we stringently required consistent environments and effect directions; failure in either was deemed non-replicated, regardless of P values. Replication rates were comparable with those in GWAS, supporting the robustness of our findings (Supplementary Note 4).

We further tested replication in independent cohorts: the European population in All of Us (N_max = 208,700) and the East Asian population in two Japanese cohorts: (1) the Japan Multi-Institutional Collaborative Cohort–Hospital-based Epidemiologic Research Program at Aichi Cancer Center (J-MICC/HERPACC) (N_max = 70,909), and (2) the Japan Public Health Center-Based Prospective Study (JPHC) (N_max = 10,904) (Supplementary Table 10). Despite differences in lifestyle questionnaires, dietary clusters (for example, ‘Japanese cuisine (Washoku)’ in all Japanese cohorts, and ‘meat and cheese’ in UKB1 and JPHC) were consistently recovered across cohorts, supporting the robustness of our clustering approach (Extended Data Fig. 4). Bonferroni replication rates were 27% in All of Us (17 out of 64 trait–locus pairs) and 56% in J-MICC/HERPACC (20 out of 36; Extended Data Fig. 5a,b and Supplementary Table 11). Notably, the pleiotropic G×E interactions at the ALDH2 locus were replicated at 81% (17 out of 21) in J-MICC/HERPACC, with six also replicated in JPHC despite its smaller sample size.

Collectively, our approach using cross-population biobank resources thoroughly detected and validated G×E interactions across diverse trait categories and environments.

Cross-population consistency

Combining UKB and BBJ results yielded 94 trait–locus pairs across 54 loci. Six loci were shared between biobanks (40% of BBJ1 loci; Fig. 1c,f), often involving essential (‘core’) genes for the target phenotypes. For example, ALPL for alkaline phosphatase (ALP) was commonly driven by G×Sex, GGT1 for γ-glutamyl transpeptidase (GGT) by G×Sex, G×Age and G×Ever-smoking, and UMOD for eGFR by G×Age (Extended Data Fig. 5c–f).

To assess cross-population sharing more broadly, we examined replication in the other population’s discovery cohort. After excluding the ALDH2 locus (rs671) to avoid introducing bias due to its well-known East Asian specificity, 22 out of 73 interactions were nominally (and nine significantly) replicated (P_G×E < 6.8 × 10⁻⁴). The conservative signal-sharing estimate (Storey’s π₁; ref. ³¹) was 0.41, indicating moderate consistency of G×E interactions across populations. The cross-population-replicated loci included three that were originally from UKB1: G×Age at the APOE locus for total cholesterol; G×Sex at the ABCG2 locus for urate; and G×Sex and G×Physical-activity at the SURF6 locus for ALP (P_G×E = 5.7 × 10⁻⁴, 3.1 × 10⁻⁵ and 1.5 × 10⁻⁴ in BBJ1, respectively). These shared G×E interactions suggested that cross-population meta-analyses would be beneficial. Indeed, we detected one additional G×E interaction through a meta-analysis across BBJ and UKB (Supplementary Note 5 and Supplementary Table 12).

Minor allele frequencies of the lead variants tended to be higher in the population where the G×E interactions were detected (Extended Data Fig. 5g). Population specificity (that is, dietary environments) and differing distributions of environments probably also contributed to the population-specific G×E detection (Extended Data Fig. 5h).

We further evaluated replication in the African and American populations in All of Us (N_max = 70,558 and 66,556, respectively) and the Israeli population in the Human Phenotype Project (HPP; N_max = 8,645). Three G×E interactions were significantly replicated in the African population—two of which overlapped with UKB1–BBJ1 shared loci (Extended Data Fig. 5a and Supplementary Table 13). In the American population, two G×E interactions for pulse pressure—primarily driven by G×Age—were significantly replicated. Although no G×E interaction reached significance in the Israeli population, possibly due to its small sample size, the top signal aligned with a UKB1–BBJ1 shared interaction. These findings demonstrated both shared and population-specific G×E interactions. Subsampling analyses revealed that detecting G×E interactions required biobank-scale sample sizes, and detected loci were not saturated (Extended Data Fig. 5i), encouraging future global collaboration to thoroughly capture worldwide G×E interactions.

Environments contributing to G×E

Gene–environment interactions can enhance locus interpretation by revealing context-specific genetic associations. In UKB1, diet-related environments contributed to five G×E interactions—all of which are at least nominally replicated. These included the ABCG2 locus, where association with eGFR was specific to non-consumers of ‘meat and cheese’ (P_G×E = 1.5 × 10⁻¹⁴; Fig. 2a,b). Raw questionnaire data confirmed that low meat consumption unmasked the genetic effect (Extended Data Fig. 6). As ABCG2 encodes a primarily intestinal urate exporter³² and urate is the end-product of purine metabolism, high purine intake from meat may obscure the genetic effect.

**Fig. 2: Representative loci with G×E interactions.**

In BBJ1, although ALDH2 primarily affects alcohol metabolism, multiple environments contributed to the pleiotropic G×E interactions at the ALDH2 locus after adjusting for G×Drinking interactions (Supplementary Note 6 and Supplementary Table 14). Stratified analysis in ever- and never-drinkers revealed that 12 of 19 biomarkers showed strong non-additive effects in ever-drinkers (P_non-additive < 5.0 × 10⁻⁸; Fig. 2c and Supplementary Table 15), consistent with the dominant deleterious effect on alcohol metabolism of the lead variant (rs671). Four haematopoietic traits showed purely additive effects in never-drinkers (red blood cells, haemoglobin, haematocrit and white blood cells; P_additive < 5.0 × 10⁻⁸ and P_non-additive > 0.05; Fig. 2d), opposite to the functional role and inheritance pattern of rs671. These effects were all replicated in BBJ2 (P_additive < 0.05/19 = 2.6 × 10⁻³ and P_non-additive > 0.05). Given the long-range linkage disequilibrium (~2.44 Mb) with signs of recent selection³³ at this locus, other causal variants or genes may underlie these haematopoietic associations. The region harbours haematopoiesis-related genes (for example, SH2B3 and PTPN11), whose roles in common-variant genetics warrant further investigation. To facilitate future research, we applied a deep learning model to prioritize variant–gene pairs with potential regulatory effects (Supplementary Note 7 and Supplementary Table 16). These loci together demonstrated the utility of G×E interactions for gaining biological insights into genetic loci.

A reverse-causal G×E interaction

In BBJ1, the PITX2 locus for arrhythmia showed a G×E interaction primarily driven by natto (fermented soybean) intake (P_G×E = 2.8 × 10⁻¹²; P_G×E = 2.1 × 10⁻¹⁰ when testing G×Natto alone; Extended Data Fig. 7). The lead variant, rs72900155, has been reported to be associated with atrial fibrillation²⁹—a subgroup of arrhythmia. Clinically, warfarin, a long-standing anticoagulant, may link natto and atrial fibrillation. As vitamin K in natto reduces the anticoagulation effect of warfarin, patients on warfarin are advised to avoid it (Fig. 2e). In BBJ1, the arrhythmia prevalence was markedly high in the homozygous carriers of natto non-consumers (Fig. 2f), and this pattern was primarily driven by atrial fibrillation or atrial flutter, for which warfarin was the sole first-line anticoagulant before the launch of direct oral anticoagulants (DOACs) (Fig. 2g). In this subgroup, natto intake declined markedly after warfarin initiation in the same individuals (Fig. 2h), suggesting that this G×E interaction was driven by reverse causality from the disease to the environment.

The marginal effect size of rs72900155 was substantially larger for atrial fibrillation or flutter than for the other subgroups (0.49 (95% CI, 0.46–0.52) versus 0.13 (0.10–0.15)). Owing to this effect size heterogeneity, the overall effect size for arrhythmia would vary with the proportion of atrial fibrillation or flutter patients across natto intake strata, explaining the link between the reverse causality and the G×Natto interaction. Consistently, the G×Natto interaction was not significant in either subgroup when evaluated separately (P_G×E = 0.84 for atrial fibrillation or flutter; 2.6 × 10⁻⁴ for the other subgroups).

This G×E interaction was not replicated in BBJ2 (P_G×E = 0.40; Fig. 2i). Notably, this replication failure might be reasonable as BBJ1 participants were recruited from 2003 to 2008, whereas most BBJ2 participants (84.9%) were recruited from 2013 to 2017, and between these periods, DOACs replaced warfarin in more than half of atrial fibrillation patients in Japan³⁴. As DOACs do not require natto restriction (Fig. 2e), the atrial fibrillation or flutter patients taking DOACs in BBJ2 did not show increased natto avoidance (Fig. 2j).

In summary, we identified the G×E interaction driven by reverse causality. Although machine-learning-based locus interpretation is increasingly investigated³⁵, these results indicate that this technology is not readily applicable to G×E interactions, and careful interpretation by specialists is necessary to disentangle their causal mechanisms.

To evaluate causal directions at other loci, we leveraged repeat biomarker measurements to sort out temporal ordering from environments to phenotypes, and performed time-to-event Cox analyses for disease onset and overall survival (Supplementary Note 8 and Supplementary Tables 17 and 18). We identified a significant G×E interaction for overall survival at the ALDH2 locus driven by interactions with sex, ever-drinking and age (P = 1.7 × 10⁻¹¹), suggesting potential G×E effects on human lifespan.

Pleiotropic G×E effects on diseases

We conducted phenome-wide G×E interaction analyses to assess pleiotropy on diseases (Supplementary Table 19). In UKB1, we detected one additional G×E interaction at the APOE locus for dyslipidaemia primarily driven by G×Sex (P_G×E = 3.8 × 10⁻⁷; false discovery rate (FDR) < 0.05), consistent with the G×E interactions in the main analyses for cholesterol biomarkers (total cholesterol, triglycerides and low-density lipoprotein cholesterol (LDL-C); Extended Data Fig. 8 and Supplementary Table 20). In BBJ1, 11 G×E interactions were also significant. The ALDH2 locus exhibited widespread G×E pleiotropy across diseases, including the established G×Drinking interaction for oesophageal cancer³⁶. We also observed a G×Age interaction at the HLA-DQB1 locus for rheumatoid arthritis (P_G×E = 1.2 × 10⁻⁶), originally detected for asthma in BBJ1 and immune cells in UKB1 (lymphocytes, eosinocytes and white blood cells), suggesting shared G×E effects across immune phenotypes. These results showed that the G×E interactions for clinical biomarkers also affected disease statuses through pleiotropy.

Genome-wide heritability

We estimated G×E heritability to evaluate consistency across populations at the genome-wide level³⁷. Evaluating individual environments, 14 and 12 trait–environment pairs were significant in UKB1 and BBJ1, respectively (FDR < 0.05; Supplementary Table 21), although statistical power was limited by multiple testing burden. When aggregating across environments, G×E heritability was significantly positive for seven traits in UKB1 and 11 traits in BBJ1, including four overlapping traits: height, BMI, high-density lipoprotein cholesterol (HDL-C) and diastolic blood pressure (DBP) (Fig. 3a–c). G×E-to-marginal heritability ratio was much larger for BMI than height (0.100 (95% CI, 0.057–0.142) versus 0.028 (0.007–0.048) in UKB1; 0.245 (0.130–0.360) versus 0.062 (0.009–0.115) in BBJ1), replicating a previous report in the European population³⁸ and suggesting a shared G×E architecture across populations for the anthropometric traits. Other traits with significant G×E heritability showed ratios ranging from 0.03 to 0.52, indicating heterogeneity in G×E contributions across traits (Supplementary Table 21). Notably, G×E heritability across quantitative traits was significantly correlated between biobanks (Spearman’s ρ = 0.41, P = 0.011; Fig. 3d), suggesting moderately concordant G×E contributions across populations.

**Fig. 3: Genome-wide consistency of G×E interactions across populations.**

We next estimated the cross-trait correlation of G×E interactions. Significant correlations were observed for 20 and 29 trait–environment pairs in UKB1 and BBJ1, respectively (FDR < 0.05; Supplementary Table 22). Although marginal genetic correlations formed a single cluster, G×E correlations were clustered by trait categories (Fig. 3e,f). Notably, the same environmental factors mediated G×E correlations across populations: current smoking in liver-related traits; sex and dietary consumption in blood pressure traits; and age and sex in renal-related traits. These trait–environment relationships were consistent with known epidemiology and recovered without previous clinical input, suggesting that the trait–environment relationships were embedded in the genome-wide G×E architecture.

Unfiltered approach for G×E detection

Past studies have often limited G×E analyses to prefiltered variants to reduce multiple testing burden^15,18. A common approach is variance quantitative trait loci (vQTL) analysis, which tests associations between genotypes and phenotypic variance without requiring environmental measurements¹³. In Supplementary Note 9, we examined overlaps between G×E interactions and vQTL. G×E loci were 14.6-fold enriched for vQTL compared with GWAS loci, supporting vQTL as an effective prefiltering strategy¹³. However, vQTL missed most G×E interactions (54.8% in UKB1 and 80.6% in BBJ1; Supplementary Table 23) and their detection was sensitive to phenotype normalization. Moreover, vQTL heritability did not correlate with G×E heritability across traits. These results underscore the necessity of using environmental data explicitly for comprehensive G×E detection.

Influence on polygenic prediction

Polygenic score (PGS)-based disease risk prediction is actively explored, but environmental differences within and across populations can reduce its prediction accuracy for specific traits^11,39, potentially exacerbating health disparities. This might affect other traits in general, considering the G×E heritability for broad trait categories. To systematically assess environmental effects on polygenic prediction, we stratified the discovery and replication cohorts by environments into two groups (for example, ever-smokers and never-smokers; younger and older halves of the group). We performed GWAS and constructed PGS within individual strata of the discovery cohorts, and evaluated their prediction accuracy in the strata of the replication cohorts (Fig. 4a).

**Fig. 4: Genome-wide properties of G×E interactions.**

Among the 26 trait–environment–biobank triplets with significantly positive G×E heritability, 20 exhibited significant intra-population differences in prediction accuracy in at least one stratum (FDR < 0.05; Fig. 4b,c, Extended Data Fig. 9a and Supplementary Table 24). Prediction accuracy was generally the highest when applied to the same environmental group at PGS construction. Excluding one related to a UKB1-specific environment cluster (BMI–fish-and-vegetable), 11 out of 25 triplets also showed significant differences in cross-population portability (Extended Data Fig. 9b–d), although the prediction accuracy was generally attenuated.

Polygenic scores constructed from G×E interactions (G×E-PGS)⁴⁰ consistently significantly explained phenotypic variance in independent cohorts (11 out of 25 (9 out of 25) trait–environment pairs within (across) populations; Supplementary Table 25). Notably, a G×Sex-based PGS constructed in BBJ1 successfully stratified BMI in opposite directions between sexes in J-MICC/HERPACC, capturing the polygenic architecture of sex differences (Fig. 4d). This stratification was apparent even for each sex and among individuals with similar marginal PGS, suggesting that PGS extended to two dimensions could enhance phenotype prediction. Indeed, a model incorporating G×E-PGS improved BMI prediction accuracy by 16% over a model without G×E-PGS (R² = 0.128 versus 0.110). By contrast, gains in prediction accuracy were modest for other trait–environment pairs (Supplementary Table 25). Larger sample sizes and methods tailored for G×E-PGS construction are warranted to fully realize the potential of G×E interactions in precision medicine.

Collectively, these observations demonstrated that environmental factors systematically impacted intra- and cross-population PGS prediction accuracy, and incorporating G×E interactions could enhance genetic risk prediction of human complex traits.

Aging shift of pulse pressure genetics

As genome-wide G×E architecture recapitulated epidemiologically plausible trait–environment relationships, we reasoned that G×E interactions could uncover biological mechanisms underlying the genetic dynamics of complex traits. We focused on G×Age interactions for pulse pressure, given their strong signals: all four G×E loci in UKB1 were driven by G×Age, G×Age heritability was significantly positive and PGS prediction accuracy varied across age groups (Supplementary Tables 6 and 21, and Fig. 4b).

We divided BBJ1 and UKB1 into two equal-sized age groups and conducted GWAS for pulse pressure within each group. Cross-population meta-analyses of MAGMA gene-set analyses⁴¹ revealed that vascular smooth muscle contraction was enriched in younger individuals and cellular senescence enriched in older individuals (Extended Data Fig. 9e). When projecting polygenic effects onto tissue-wide scRNA-seq data from Tabula Sapiens^7,42, blood vessel cell types were significantly associated with pulse pressure in both age groups, whereas their relative strength of associations differed by age (Fig. 4e–g and Supplementary Table 26). To examine this closely, we repeated the analysis using a scRNA-seq dataset of monkey arteries⁴³ (Fig. 4h). In younger individuals, genetic effects were associated with smooth muscle cells (P = 8.3 × 10⁻³ and 9.4 × 10⁻³ for the two subtypes), whereas in older individuals they were associated with a subgroup of coronary endothelial cells (P = 2.3 × 10⁻³) (Fig. 4i and Extended Data Fig. 9f,g). As endothelial cells are central to vascular senescence and atherosclerosis⁴⁴, these results support the findings of the gene set enrichment analysis.

As pulse pressure was defined as the difference between systolic and diastolic blood pressure (SBP and DBP, respectively), we estimated age-stratified genetic correlations among these traits. Although the genetic correlation (R_g) with pulse pressure remained stable for SBP, that for DBP declined in older individuals (from R_g of 0.46 (95% CI, 0.40–0.53) to 0.27 (0.19–0.35); Extended Data Fig. 9h), indicating relatively increased SBP influence with aging. Cross-age genetic correlation for pulse pressure was modest as expected (R_g = 0.42 (0.35–0.49); Extended Data Fig. 9i).

Collectively, these results suggested that the age-related changes in pulse pressure genetics are driven by increasing SBP influence over DBP, reflecting a shift from smooth muscle-mediated regulation in youth to an endothelial-driven SBP increase with vascular aging. Our results demonstrated that G×E interactions can reveal dynamic trait biology missed by typical GWAS.

Sex-discordant regulation in metabolites

In addition to single-cell analysis, molecular QTL mapping can offer granular biological insights into genetic loci. We analysed 2,924 proteins (N_max = 28,561 in UKB1 and 2,153 in BBJ1; Supplementary Table 27) and 325 metabolites (N_max = 153,410 in UKB1 and 89,040 in BBJ1; Supplementary Table 28) for 57 lead variants. We identified 13 significant protein–locus pairs in meta-analyses across biobanks (8 loci; FDR_G×E < 0.05; Supplementary Table 29), with the ALDH2 and SURF6 loci reaching genome-wide significance. For metabolites, 2,326 significant metabolite–locus pairs (38 loci) were detected, and 650 (15 loci) passed genome-wide significance, aided by their large sample sizes (Supplementary Table 30). These omics-level G×E interactions covered 70% (40 out of 57) of loci, with the same environments driving both omics and clinical G×E interactions at all loci, indicating that most clinical G×E interactions were detectable at the molecular level.

The relationship between G×E and marginal P values exhibited five distinct patterns across 15 loci with the genome-wide significant metabolite G×E interactions (Fig. 5a and Extended Data Fig. 10): (1) high P_G×E–P_marginal correlation, (2) G×E-specific signals for 1–2 lipid metabolites, (3) bimodal distribution at the TNFAIP8 locus, (4) significant P_G×E for 1–2 non-lipid metabolites; and (5) much smaller P_marginal for most metabolites. The first three patterns were related to lipid metabolites. In the first pattern, metabolites highly correlated with clinical lipid biomarkers formed distinct clusters with a broad P -value spectrum, suggesting that clinical lipid biomarkers may adequately capture genetic and G×E structure at these loci.

**Fig. 5: Sexual dimorphism in lipid metabolites.**

In the second pattern, the nearest genes were involved in lipid metabolism, and strong G×E-specific signals were detected for lipid metabolites not highly correlated with clinical biomarkers, suggesting that direct metabolite measurement was necessary to detect G×E at these genes. For these loci, sex was the top contributor in 86% of G×E metabolite QTL (568 out of 664 metabolite–locus pairs). Among them, cholesteryl ester transfer protein (CETP) is an intriguing target of genetics-driven drug discovery. Although CETP inhibitors showed promise in GWAS for raising HDL-C and lowering LDL-C to reduce coronary artery disease risk⁴⁵, several were discontinued in phase 3 trials. The CETP locus showed a G×E-specific signal for the percentage of triglycerides in LDLs (LDL_TG_pct; P_G×E = 1.8 × 10⁻¹², P_marginal = 0.98). Given the role of CETP in exchanging triglycerides from LDLs and other lipoproteins with cholesteryl esters from HDL, LDL_TG_pct might represent a key metabolic process for this protein. Notably, effect directions differed by sex (Fig. 5b and Supplementary Table 31), and LDL_TG_pct predicted all-cause and coronary artery disease mortality in both sexes (hazard ratio per unit of s.d. in UKB1: 1.23 (95% confidence interval, 1.21–1.25) and 1.25 (1.18–1.32), respectively; Fig. 5c and Supplementary Table 32). As the known causal variant (rs1801706) showed increasing effects on CETP expression^46,47, these results suggested that CETP inhibition might decrease LDL_TG_pct in men but increase it in women, potentially leading to an increased female mortality risk. We also confirmed a previously reported female-specific effect on clinical LDL-C using the same UKB data¹⁰ (Extended Data Fig. 11), although LDL-C hazard ratios were not significantly positive, probably due to statin use. These effects might together help explain the clinical trial failure. We also examined the other loci with the second pattern. All G×E-specific signals showed opposite-effect directions between sexes (Extended Data Fig. 12a), suggesting that sex-discordant genetic regulation may be common in lipid metabolism. Some effect sizes were also varied by age, possibly reflecting age-dependent declines in sex hormones, which warrants further investigation.

The TNFAIP8 locus showed a bimodal P value distribution (third pattern). Marginal effects acted mainly through the subfraction of triglycerides, whereas multiple HDL metabolites (especially very large HDL, XL_HDL) exhibited sex-discordant effects (Extended Data Fig. 12b–d). The nearest gene, TNFAIP8, is implicated in oncogenesis and inflammation, and has recently been shown to bind lipid messengers⁴⁸. The adjacent gene, HSD17B4, encodes 17-β-hydroxysteroid dehydrogenase 4, a multi-functional enzyme involved in fatty acid and sex steroid metabolism⁴⁹. This multi-functionality might underlie the bimodal pattern, though distinct genetic influences on HDL and triglyceride subfractions are also possible. As fine-mapping and co-localization analyses could not pinpoint causal variants (Supplementary Note 10), further experiments are warranted to characterize this locus.

Motivated by these G×E metabolite QTLs, we expanded the metabolome-wide G×E analysis to genome-wide variants, identifying 30 (11) genome-wide significant loci in the UKB1 (BBJ1), yielding 736 (228) metabolite–locus pairs. Among these, 16 (4) loci in the UKB1 (BBJ1) passed the Bonferroni threshold. Notably, G×Sex interactions at the ALDH1A2 and ZNF259 loci were observed in both cohorts (Supplementary Table 33), and several loci showed sex-discordant effects (Extended Data Fig. 13 and Supplementary Table 34), again underscoring the sex-specific genetic architecture of metabolome regulation. Although not Bonferroni-significant, we also detected current-smoker-specific, never-drinker-specific and G×E-only associations (Extended Data Fig. 13), which warrant further validation and functional studies.

Collectively, our analysis demonstrated that omics G×E studies could provide granular insights into the molecular basis of genetic effect plasticity, particularly for the prominent sexual dimorphism in lipid metabolism.

Discussion

We provided a G×E atlas across the genome, phenomes and environments in two populations and tested replication in diverse populations, substantially expanding the catalogue of human G×E interactions. Leveraging this atlas, we demonstrated that G×E interactions yielded both granular and holistic insights into biological dynamics at the locus, genome-wide, single-cell and molecular levels. At the locus level, G×E analyses revealed underlying biological mechanisms and highlighted the need for careful interpretation by specialists. At the genome-wide level, G×E interactions affected trait heritability, PGS prediction accuracy and cross-population portability, emphasizing the value of incorporating environmental context into genetic prediction. Single-cell- and omics-level analyses uncovered age-, sex- and other environment-specific effects, underscoring the dynamic and molecular nature of G×E interactions. Although moderate G×E sharing was observed across populations, population-specific signals and limited data from underrepresented groups highlight the need for more diverse cohorts with detailed environmental and omics data (Supplementary Note 11). In conclusion, we provided a rich resource for future genetic studies, establishing the importance of G×E interactions in decoding the dynamics of complex trait biology, refining personalized medicine and informing drug development.

Methods

Biobank Japan

The BBJ is a prospective hospital-based biobank with 267,289 participants, all of whom were diagnosed with at least one of the target diseases of BBJ by physicians at the cooperating hospitals^50,51,52. All of the participants provided written informed consent approved by the ethics committees of the Institute of Medical Sciences, the University of Tokyo and RIKEN Center for Integrative Medical Sciences. The BBJ comprises two cohorts, which were genotyped separately: the first (BBJ1, N = 182,536) and second (BBJ2, N = 68,534) cohorts. The participants in BBJ1 were genotyped with the Illumina HumanOmniExpressExome BeadChip or a combination of the Illumina HumanOmniExpress and HumanExome BeadChip, whereas the participants in BBJ2 were genotyped with the Illumina Asian Screening Array. All BBJ1 participants and 17% of the BBJ2 participants (N = 11,716) were recruited from 2003 to 2008. The remaining BBJ2 participants (N = 56,818) were recruited from 2013 to 2017.

Definition of the discovery and replication cohorts

We used BBJ1 (BBJ2) as the discovery (replication) cohort.

Quality control of genotype data

We conducted a quality control of the participants and the genotypes, and excluded sample relatedness in BBJ1 via the same approach described previously⁵³. The genotype data were imputed with 1000 Genomes Project Phase 3 (N = 2,504) and Japanese whole-genome sequencing data (N = 1,037) using Minimac3 software⁵⁴. We excluded variants with an imputation quality of R_sq < 0.7 or a minor allele frequency (MAF) of less than 0.01, resulting in 7,444,735 autosomal variants analysed in total. We analysed 166,757 participants of the Japanese population as estimated by the visual inspection of principal component analysis (PCA).

In BBJ2, we excluded participants with a low call rate (<0.98) and outliers from the Japanese Hondo (that is, the main islands) cluster estimated on the basis of PCA. We excluded the variants meeting the following criteria: (1) with a low call rate (<0.99); (2) with low minor allele counts (<5); and (3) with a Hardy–Weinberg equilibrium test P value of <1.0 × 10⁻¹⁰. We performed statistical phasing of the genotype data using Shapeit4 (ref. ⁵⁵) and imputation using Minimac4 (ref. ⁵⁶) with the same reference panel as used in the discovery cohort. After imputation, we excluded variants with an imputation quality of <0.7 or a MAF less than 0.01. We used King⁵⁷ to exclude relatives within second degrees, resulting in 65,373 participants being analysed.

UK Biobank

The UKB is a population-based biobank with approximately 500,000 participants recruited between 2006 and 2010, aged 40–69 years⁵⁸. Participants were genotyped using either the UK BiLEVE Axiom Array or UK Biobank Axiom Array. The genotypes were then imputed by IMPUTE4 software using a combination reference panel of the Haplotype Reference Consortium, UK10K and 1000 Genomes Project Phase 3. We accessed the UKB data under the project number 47821.

Definition of the discovery and replication cohorts

We included British European for the discovery cohort (UKB1), defined as the intersection of the self-reported British participants and the genetically ‘Caucasian’ participants (UK Biobank Data Fields 21000 and 22006), to strictly reduce the inflation of test statistics due to population stratification. We excluded one participant from every related pair within the third degree. For the replication cohort (UKB2), we included all other genetically ‘Caucasian’ participants and excluded one participant from every related pair within the second degree. We also excluded the participants related to any participants in UKB1 within the second degree.

Quality control of genotype data

We analysed 9,813,264 autosomal variants with an imputation quality of >0.7 and MAF of >0.01, which was equivalent to the threshold used in BBJ. We excluded participants with: (1) sex chromosome aneuploidy; (2) a mismatch between genetic and self-reported sex; or (3) outliers for heterozygosity or missing rate.

Descriptions of the independent replication cohorts, J-MICC/HERPACC (refs. ^59,60,61), JPHC (ref. ⁶²), All of Us (ref. ⁶³) and HPP (ref. ⁶⁴) are available in Supplementary Methods.

Quality control of phenotypes and environments

Clinical traits

The definition and quality control of clinical traits were summarized in Supplementary Tables 2 and 3. In brief, we obtained biomarker phenotypes from the initial assessment data for UKB and the medical records for BBJ. For BBJ, we used the biomarker phenotypes measured at the nearest dates to the baseline assessment for the main analyses, whereas we used those measured most recently for the temporal order analyses. For the UKB, we used the baseline assessment data for the main analyses, whereas we used the most recent data from the second to fourth revisit assessments for the temporal order analyses, namely, the first repeat assessment visit, the imaging visit, and the first repeat imaging visit. We applied the same quality controls for both biobanks, including (1) excluding participants with age <18 or age >85; (2) excluding participants with particular disease status that might have affected the phenotype values; (3) correction of phenotype values for participants taking anti-hypertensive medications or statins; (4) excluding outliers whose measured values were outside of three times the interquartile range (upper or lower quartile), or outside three standard deviations from the mean; and (5) applying natural log transformation for the phenotypes with right-skewed distributions. For the temporal order analyses, we restricted the data to those measured at least half a year after the baseline survey.

For disease statuses, we combined diagnoses data (ICD-10), operation data (OPCS-4) and self-reported illness and operation data for UKB. We determined the subtypes of diabetes mellitus based on ICD-10 codes and an established algorithm developed for UKB self-reported data⁶⁵. We excluded participants with diabetes mellitus other than T2D, and patients with T2D who were also inferred to have type 1 diabetes from both cases and controls. Following a previous report⁶⁶, we included ischaemic stroke patients as cases only if they had any evidence of stroke other than self-reports and excluded the participants with self-reported stroke from controls. For BBJ, we defined disease statuses based on the union of diagnoses by doctors at the cooperating hospitals and past medical history retrieved from electronic medical records. When testing T2D, we excluded the participants with diabetes mellitus other than T2D from both cases and controls. We defined clinical traits and performed quality control in the replication cohorts in the same manner, as detailed in Supplementary Methods.

Environmental factors

As the resolution of individual questionnaire items was limited due to their few discrete response options, we employed a clustering-based approach to summarize correlated items into high-resolution latent environments⁶⁷ as summarized in Supplementary Table 4. The questionnaires for dietary consumption and physical activity were available for both UKB and BBJ, with low missing rates (0.01–0.27% in UKB1 and 1.4–13.3% in BBJ1), and were used in this manuscript. We converted the categorical responses into the continuous scale following ref. ⁶⁸. For example, the responses to dietary consumption in BBJ were ‘Almost every day’, ‘3–4 days a week’, ‘1–2 days a week’ and ‘Rarely’, and we converted them into 7, 3.5, 1.5 and 0, respectively. We treated the responses ‘Do not know’ and ‘Prefer not to answer’ as missing values. For clustering analyses, we first regressed out age, sex, age², age × sex, and age² × sex from the environmental factors derived from questionnaires about dietary consumption and physical activities. We then performed consensus clustering analyses using the ConsensusClusterPlus R package⁶⁹ for the environmental factors. We used (1 − Pearson’s correlation) as the distance between environmental factors and employed the hierarchical clustering algorithm with Ward’s method. We note that we multiplied the values of coffee consumption in UKB1 by −1 as we observed its strong negative correlation with tea consumption. We changed the number of consensus clusters from two to ten and defined the number of clusters based on the stability of the cumulative distribution function curve and the item tracking. We named these clusters based on the questionnaire included in the clusters and used the first principle component as their environmental values. We matched the direction of the first principle component to that of the raw questionnaire data included in individual clusters. The cluster scores were standardized to a mean of zero and a standard deviation of one. Finally, we excluded the outliers whose first principle component were outside of three standard deviations from the mean. The clustering of questionnaires for environmental factors in the replication cohorts was described in the Supplementary Methods and Supplementary Table 4.

Disease statuses used for the phenome-wide association study

In addition to the nine disease statuses used in the main analyses, we defined disease statuses based on ICD-10 in UKB1 and past medical history in BBJ1, as summarized in Supplementary Table 19. For the complications of diabetes mellitus (diabetic nephropathy and retinopathy), we restricted cases and controls to the T2D patients. We tested G×E interactions for the diseases with more than 1,000 cases in individual cohorts. Consequently, we tested 35 diseases for 49 lead variants in UKB1 and 52 diseases for 38 lead variants in BBJ1.

Metabolites measurement

We started from the internally quality controlled Nightingale NMR metabolome measurement data and removed its technical variation using the ukbnmr R package⁷⁰. Briefly, this package removed the technical variation derived from: (1) the elapsed time from sample preparation to measurement, (2) the position (row and column) of the 96-well plate and (3) a series of measurement dates for each spectrometer. This package also calculated 76 useful biomarkers based on the ratio between directly measured ones, in addition to 249 biomarkers provided by Nightingale NMR (325 biomarkers in total; Supplementary Table 28). We then removed participants aged <18 or >85, or whose measured values were outside of three times the interquartile range (upper or lower quartile) or outside of three standard deviations from the mean. We also removed the following participants for particular biomarkers: participants with renal insufficiency (eGFR < 15 ml min^–1 per 1.73 m²) for creatinine; participants with liver diseases, haematological malignancies, nephrotic syndrome or autoimmune diseases for albumin; or participants with diabetes mellitus for glucose and participants with autoimmune diseases for glycoprotein acetyls. We applied natural log transformation to the measurement values and standardized them to mean of 0 and s.d. of 1.

Protein expression measurement

We used the normalized Olink protein expression data (Olink Explore 3072). As implemented by Olink, these data were already bridge-normalized across measurement batches into the unit of normalized protein expression. As we did for other quantitative phenotypes, we removed participants aged <18 or >85, or whose measured values were outside of three times the interquartile range (upper or lower quartile), or outside of three standard deviations from the mean. We used 2,923 proteins with valid measurement data for both BBJ1 and UKB1 (Supplementary Table 27).

G×E interaction testing

All statistical tests were two-sided unless otherwise noted. As phenotypes, we targeted quantitative traits (clinical biomarkers, metabolites, and protein expression measurements), dichotomous traits (disease statuses), and time-to-event traits (disease onset for incidental cases and overall survival). For quantitative and dichotomous traits, we tested G×E interactions using GEM, a fast and scalable implementation of linear and logistic regressions with G×E interaction terms²² (Extended Data Fig. 1). We estimated the model-robust ‘sandwich’ standard errors to suppress the inflation of statistics²². As the case–control imbalance can cause the inflation of test statistics in the logistic regression, we re-estimated the effect sizes and P values using Firth logistic regression for the variants with P values <5.0 × 10⁻⁸ for dichotomous traits. For the time-to-event traits, we employed the Cox proportional hazard model implemented in the ‘survival’ R package. Individuals who had already developed the target disease at baseline or developed it within six months from baseline were excluded from the incidental case analyses.

We tested G×E interactions for different sets of environmental factors, as shown in Extended Data Fig. 1, and then aggregated P values across environmental sets on the Cauchy distribution²³ to obtain per-variant P values. When P values were 0 (meaning that P values were <1.0 × 10⁻³⁰⁰), we estimated the precise P values using the multiple-precision floating-point arithmetic with the maximal precision of 10,000 bits, implemented in the Rmpfr R package. Regional G×E interactions were plotted using LocusZoom⁷¹. We defined a G×E interaction as significant at the variant level if the aggregated P value was below the genome-wide threshold of 5.0 × 10⁻⁸. We also reported the study-wide Bonferroni threshold for full transparency. We did not require that any individual environment’s interaction term be significant on its own, considering that G×E interactions can be driven by multiple environments. Nevertheless, we note that at least one raw P value was smaller than the Cauchy-combined P value in principle, as the Cauchy combination method is not a meta-analysis but rather returns a P value within the range of raw P values.

After evaluating the significance of G×E interactions at the variant level, we determined the order of importance of G×E interaction terms by backward elimination from the regression model with all G×E interaction terms. Specifically, we repeatedly removed the G×E interaction term with the least improvement in the likelihood of the linear regression model or the largest Wald P value of the Firth logistic regression model. After removing all G×E interaction terms, we brought back the G×E interaction terms one by one in the order of their importance if the model fit was improved with a P value less than 0.05 (likelihood ratio test). We considered that the environmental factors contributed to G×E interactions if the corresponding interaction terms were included in the final model. In principle, this approach selects more parsimonious sets of environments than the Akaike information criterion and can determine the top environment contributing to individual G×E interactions.

We used the first 20 genetic principal components, genotype array, age², age × sex and age² × sex as covariates for the UKB. For the other cohorts, we used the first ten principle components, age², age × sex and age² × sex as covariates. Age and sex, as well as other environmental factors, were included as the interaction terms with genotypes, and the variables used for interaction terms were automatically included as covariates. We also included the status of the target diseases in BBJ as covariates for quantitative traits and overall survival. We included the status of statin medication as an additional covariate for metabolome measurements. For sex-specific diseases in the phenome-wide association study, we excluded sex from the environmental factors and age × sex and age² × sex from covariates.

Following that previous cross-population GWAS used the distance-based locus definition²¹, we defined that the genome-wide significant variants were in the same locus if their distance was less than 500 kbp. For the genome-wide G×E interaction tests on the proteome and metabolome, we first targeted 56 lead variants of clinical G×E interactions, including those identified by meta-analyses. We then extended the targets on the metabolome to (1) genotyped variants in each cohort, (2) HapMap3 variants and (3) the variants targeted by the meta-analysis across BBJ and UKB, to reduce the computational cost. After defining the metabolome G×E loci based on their distance as above, we merged the metabolome G×E loci if they overlapped with the same locus defined for the clinical traits. See Supplementary Methods for further methodological details.

Ethics statement

This study was approved by the ethics committee of the University of Osaka (approval no. 734-18) and the ethics committee of the Graduate School of Medicine, the University of Tokyo (2023405G-(4)).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The genome-wide summary statistics of G×E interactions for both clinical phenotypes and the metabolome are publicly available at the NBDC Human Database (https://humandbs.dbcls.jp/en) with the accession ID hum0197.v26.374-traits.v1 and at the NHGRI-EBI GWAS Catalogue (https://www.ebi.ac.uk/gwas) with the accession IDs GCST90681837–GCST90690020. The UKB analysis was conducted under application no. 47821 (https://www.ukbiobank.ac.uk/). The BBJ data are available at the NBDC Human Database (https://humandbs.biosciencedbc.jp/en/) via accession IDs JGAS000114 and JGAS000412 (genotype), JGAS000561 (metabolome) and JGAS000785 (proteome). Human reference genome GRCh38, http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/; Japanese human reference genome JG2.1.0, https://jmorp.megabank.tohoku.ac.jp/downloads/tommo-jg2.1.0-20211208; the enformer model v.1, https://www.kaggle.com/models/deepmind/enformer/tensorFlow2/enformer/1; East Asian LD block data, https://github.com/jmacdon/LDblocks_GRCh38/blob/master/data/pyrho_EAS_LD_blocks.bed.

Code availability

References

Li, J., Li, X., Zhang, S. & Snyder, M. Gene-environment interaction in the era of precision medicine. Cell 177, 38–44 (2019).
Article CAS PubMed PubMed Central Google Scholar
Virolainen, S. J., VonHandorf, A., Viel, K. C. M. F., Weirauch, M. T. & Kottyan, L. C. Gene–environment interactions and their impact on human health. Genes Immun. 24, 1–11 (2022).
Article PubMed PubMed Central Google Scholar
Hunter, D. J. Gene–environment interactions in human diseases. Nat. Rev. Genet. 6, 287–298 (2005).
Article CAS PubMed Google Scholar
Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Prim. 1, 59 (2021).
Article CAS Google Scholar
Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).
Article CAS PubMed PubMed Central Google Scholar
Karjalainen, M. K. et al. Genome-wide characterization of circulating metabolic biomarkers. Nature 628, 130–138 (2024).
Article CAS PubMed PubMed Central ADS Google Scholar
Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572–1580 (2022).
Article CAS PubMed PubMed Central Google Scholar
Fan, S., Hansen, M. E. B., Lo, Y. & Tishkoff, S. A. Going global by adapting local: a review of recent human adaptation. Science 354, 54–59 (2016).
Article CAS PubMed PubMed Central ADS Google Scholar
Rees, J. S., Castellano, S. & Andrés, A. M. The genomics of human local adaptation. Trends Genet. 36, 415–428 (2020).
Article CAS PubMed Google Scholar
Legault, M. et al. Study of effect modifiers of genetically predicted CETP reduction. Genet. Epidemiol. 47, 198–212 (2023).
Article CAS PubMed Google Scholar
Kamiza, A. B. et al. Transferability of genetic risk scores in African populations. Nat. Med. 28, 1163–1166 (2022).
Article CAS PubMed PubMed Central Google Scholar
Sørensen, T. I. A., Metz, S. & Kilpeläinen, T. O. Do gene–environment interactions have implications for the precision prevention of type 2 diabetes? Diabetologia 65, 1804–1813 (2022).
Article PubMed Google Scholar
Westerman, K. E. et al. Variance-quantitative trait loci enable systematic discovery of gene-environment interactions for cardiometabolic serum biomarkers. Nat. Commun. 13, 3993 (2022).
Article CAS PubMed PubMed Central ADS Google Scholar
Westerman, K. E. & Sofer, T. Many roads to a gene-environment interaction. Am. J. Hum. Genet. 111, 626–635 (2024).
Article CAS PubMed PubMed Central Google Scholar
Dick, D. M. et al. Candidate gene–environment interaction research. Perspect. Psychol. Sci. 10, 37–59 (2015).
Article PubMed PubMed Central Google Scholar
Joseph, P. G., Pare, G. & Anand, S. S. Exploring gene–environment relationships in cardiovascular disease. Can. J. Cardiol. 29, 37–45 (2013).
PubMed Google Scholar
Aschard, H. A perspective on interaction effects in genetic association studies. Genet. Epidemiol. 40, 678–688 (2016).
Article PubMed PubMed Central Google Scholar
Aschard, H. et al. Challenges and opportunities in genome-wide environmental interaction (GWEI) studies. Hum. Genet. 131, 1591–1613 (2012).
Article PubMed PubMed Central Google Scholar
Boye, C., Nirmalan, S., Ranjbaran, A. & Luca, F. Genotype × environment interactions in gene regulation and complex traits. Nat. Genet. 56, 1057–1068 (2024).
Article CAS PubMed PubMed Central Google Scholar
Herrera-Luis, E., Benke, K., Volk, H., Ladd-Acosta, C. & Wojcik, G. L. Gene–environment interactions in human health. Nat. Rev. Genet. 25, 768–784 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Global biobank meta-analysis initiative: powering genetic discovery across human disease. Cell Genomics 2, 100192 (2022).
Article CAS PubMed PubMed Central Google Scholar
Westerman, K. E. et al. GEM: scalable and flexible gene–environment interaction analysis in millions of samples. Bioinformatics 37, 3514–3520 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liu, Y. et al. ACAT: A fast and powerful P value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet. 104, 410–421 (2019).
Article CAS PubMed PubMed Central Google Scholar
Justice, A. E. et al. Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits. Nat. Commun. 8, 14977 (2017).
Article PubMed PubMed Central ADS Google Scholar
Pattaro, C. et al. Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet. 8, e1002584 (2012).
Article PubMed PubMed Central Google Scholar
Moore, R. et al. A linear mixed-model approach to study multivariate gene–environment interactions. Nat. Genet. 51, 180–186 (2019).
Article CAS PubMed Google Scholar
Cerezo, M. et al. The NHGRI-EBI GWAS Catalog: standards for reusability, sustainability and diversity. Nucleic Acids Res. 53, D998–D1005 (2025).
Article CAS PubMed PubMed Central Google Scholar
Bernabeu, E. et al. Sex differences in genetic architecture in the UK Biobank. Nat. Genet. 53, 1283–1289 (2021).
Article CAS PubMed Google Scholar
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Article CAS PubMed PubMed Central Google Scholar
Koyanagi, Y. N. et al. Genetic architecture of alcohol consumption identified by a genotype-stratified GWAS and impact on esophageal cancer risk in Japanese people. Sci. Adv. 10, ade2780 (2024).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Article MathSciNet CAS PubMed PubMed Central ADS Google Scholar
Eckenstaler, R. & Benndorf, R. A. The role of ABCG2 in the pathogenesis of primary hyperuricemia and gout—an update. Int. J. Mol. Sci. 22, 6678 (2021).
Article CAS PubMed PubMed Central Google Scholar
Okada, Y. et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 9, 1631 (2018).
Article PubMed PubMed Central ADS Google Scholar
Okumura, Y. et al. Current use of direct oral anticoagulants for atrial fibrillation in Japan: findings from the SAKURA AF Registry. J. Arrhythm. 33, 289–296 (2017).
Article PubMed PubMed Central Google Scholar
Nicholls, H. L. et al. Reaching the end-game for GWAS: machine learning approaches for the prioritization of complex disease loci. Front. Genet. 11, 350 (2020).
Article PubMed PubMed Central Google Scholar
Matsuo, K. et al. Gene-environment interaction between an aldehyde dehydrogenase-2 (ALDH2) polymorphism and alcohol consumption for the risk of esophageal cancer. Carcinogenesis 22, 913–916 (2001).
Article CAS PubMed Google Scholar
Shin, J. & Lee, S. H. GxEsum: a novel approach to estimate the phenotypic variance explained by genome-wide GxE interaction based on GWAS summary statistics for biobank-scale data. Genome Biol. 22, 183 (2021).
Article PubMed PubMed Central Google Scholar
Robinson, M. R. et al. Genotype–covariate interaction effects and the heritability of adult body mass index. Nat. Genet. 49, 1174–1181 (2017).
Article CAS PubMed Google Scholar
Ojima, T. et al. Body mass index stratification optimizes polygenic prediction of type 2 diabetes in cross-biobank analyses. Nat. Genet. 56, 1100–1109 (2024).
Article CAS PubMed Google Scholar
Jayasinghe, D. et al. Mitigating type 1 error inflation and power loss in GxE PRS: Genotype–environment interaction in polygenic risk score models. Genet. Epidemiol. 48, 85–100 (2024).
Article CAS PubMed Google Scholar
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Article PubMed PubMed Central Google Scholar
Tabula Sapiens Consortium* et al The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).
Article Google Scholar
Zhang, W. et al. A single-cell transcriptomic landscape of primate arterial aging. Nat. Commun. 11, 2202 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Jia, G., Aroor, A. R., Jia, C. & Sowers, J. R. Endothelial cell senescence in aging-related vascular dysfunction. Biochim. Biophys. Acta 1865, 1802–1809 (2019).
Article CAS Google Scholar
Schmidt, A. F. et al. Cholesteryl ester transfer protein (CETP) as a drug target for cardiovascular disease. Nat. Commun. 12, 5640 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).
Ganesan, M. et al. c.*84 G > A mutation in CETP is associated with coronary artery disease in South Indians. PLoS ONE 11, e0164151 (2016).
Article PubMed PubMed Central Google Scholar
Niture, S., Moore, J. & Kumar, D. TNFAIP8: inflammation, immunity and human diseases. J. Cell. Immunol. 1, 29–34 (2019).
PubMed PubMed Central Google Scholar
Huyghe, S., Mannaerts, G. P., Baes, M. & Van Veldhoven, P. P. Peroxisomal multifunctional protein-2: the enzyme, the patients and the knockout mouse model. Biochim. Biophys. Acta 1761, 973–994 (2006).
Article CAS PubMed Google Scholar
Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Article PubMed PubMed Central Google Scholar
Hirata, M. et al. Overview of BioBank Japan follow-up data in 32 diseases. J. Epidemiol. 27, S22–S28 (2017).
Article PubMed PubMed Central Google Scholar
Hirata, M. et al. Cross-sectional analysis of BioBank Japan clinical data: a large cohort of 200,000 patients with 47 common diseases. J. Epidemiol. 27, S9–S21 (2017).
Article PubMed PubMed Central Google Scholar
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
Article CAS PubMed Google Scholar
Akiyama, M. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 10, 4393 (2019).
Article CAS PubMed PubMed Central ADS Google Scholar
Delaneau, O., Zagury, J.-F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019).
Article PubMed PubMed Central ADS Google Scholar
Fuchsberger, C., Abecasis, G. R. & Hinds, D. A. minimac2: Faster genotype imputation. Bioinformatics 31, 782–784 (2015).
Article CAS PubMed Google Scholar
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central ADS Google Scholar
Takeuchi, K. et al. Study profile of the Japan Multi-institutional Collaborative Cohort (J-MICC) study. J. Epidemiol. 31, JE20200147 (2021).
Article Google Scholar
Hamajima, N. et al. Gene–environment interactions and polymorphism studies of cancer risk in the Hospital-based Epidemiologic Research Program at Aichi Cancer Center II (HERPACC-II). Asian Pac. J. Cancer Prev. 2, 99–107 (2001).
PubMed Google Scholar
Koyanagi, Y. N. et al. Development of a prediction model and estimation of cumulative risk for upper aerodigestive tract cancer on the basis of the aldehyde dehydrogenase 2 genotype and alcohol consumption in a Japanese population. Eur. J. Cancer Prev. 26, 38–47 (2017).
Article PubMed PubMed Central Google Scholar
Tsugane, S. & Sawada, N. The JPHC study: design and some findings on the typical Japanese diet. Jpn. J. Clin. Oncol. 44, 777–82 (2014).
Article PubMed Google Scholar
The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).
Shilo, S. et al. 10 K:Aa large-scale prospective longitudinal study in Israel. Eur. J. Epidemiol. 36, 1187–1194 (2021).
Article CAS PubMed Google Scholar
Eastwood, S. V. et al. Algorithms for the capture and adjudication of prevalent and incident diabetes in UK Biobank. PLoS ONE 11, e0162388 (2016).
Article PubMed PubMed Central Google Scholar
Woodfield, R., UK Biobank Stroke Outcomes Group, UK Biobank Follow-up and Outcomes Working Group & Sudlow, C. L. M. Accuracy of patient self-report of stroke: a systematic review from the UK Biobank Stroke Outcomes Group. PLoS ONE 10, e0137538 (2015).
Article Google Scholar
Westerman, K. E. et al. Genome-wide gene–diet interaction analysis in the UK Biobank identifies novel effects on hemoglobin A1c. Hum. Mol. Genet. 30, 1773–1783 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yamamoto, K. et al. Genetic footprints of assortative mating in the Japanese population. Nat. Hum. Behav. 7, 65–73 (2022).
Article PubMed PubMed Central Google Scholar
Wilkerson, M. D. & Hayes, D. N. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26, 1572–1573 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ritchie, S. C. et al. Quality control and removal of technical variation of NMR metabolic biomarker data in ~120,000 UK Biobank participants. Sci. Data 10, 64 (2023).
Article CAS PubMed PubMed Central Google Scholar
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336 (2010).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the participants and investigators of the BBJ, UKB, J-MICC, HERPACC, JPHC, the National Institutes of Health’s All of Us Research Program and HPP. Part of the super-computing resource was provided by Human Genome Center, the Institute of Medical Science, The University of Tokyo. The J-MICC Study was supported by Grants-in-Aid for Scientific Research for Priority Areas of Cancer (grant 17015018) and Innovative Areas (grant 221S0001), and by the Japan Society for the Promotion of Science (JSPS) KAKENHI grant (grants 16H06277 and 22H04923 (CoBiA)) from the Japanese Ministry of Education, Culture, Sports, Science and Technology. This work was also supported in part by funding for the BBJ from the Japan Agency for Medical Research and Development (from April 2015 until now), as well as the Ministry of Education, Culture, Sports, Science and Technology (from April 2003 to March 2015). The HERPACC Study was supported by a Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology of Japan Priority Areas of Cancer (grant 17015018), Innovative Areas (grant 221S0001) and the JSPS KAKENHI Grants (grants JP16H06277 and 22H04923 (CoBiA), JP26253041,JP20K10463, JP23K16316 and 24K02697) and a Grant-in-Aid for the Third Term Comprehensive ten-year Strategy for Cancer Control from the Ministry of Health, Labour and Welfare of Japan. The JPHC Study was supported by the National Cancer Center Research and Development Fund (grants 23-A-31 (toku), 26-A-2, 29-A-4, 2020-J-4 and 2023-J-4; from 2011 until now), and a Grant-in-Aid for Cancer Research from the Ministry of Health, Labour and Welfare of Japan (from 1989 to 2010). S.Namba was supported by AMED (grants JP24tm0424228, JP24tm0524009, JP25kk0305032 and JP256f0137004), the Takeda Science Foundation and the Japan Foundation for Applied Enzymology. Y.Okada was supported by JSPS KAKENHI (grant 25H01057); AMED (grants JP24km0405217, JP24ek0109594, JP24ek0410113, JP24kk0305022, JP223fa627001, JP223fa627002, JP223fa627010, JP24zf0127008, JP24tm0524002, JP24wm0625504 and JP24gm1810011); JST Moonshot R&D (grants JPMJMS2021 and JPMJMS2024); Takeda Science Foundation; Ono Pharmaceutical Foundation for Oncology, Immunology and Neurology; Bioinformatics Initiative of Graduate School of Medicine, Institute for Open and Transdisciplinary Research Initiatives, Center for Infectious Disease Education and Research (CiDER) and Center for Advanced Modality and DDS (CAMaD) at The University of Osaka; and the RIKEN TRIP initiative (AGIS).

Author information

A full list of members and their affiliations appears in the Supplementary Information

Authors and Affiliations

Department of Genome Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Shinichi Namba, Kyuto Sonehara, Takafumi Ojima, Go Sato, Yoshihiko Tomofuji, Yosuke Ogawa & Yukinori Okada
Department of Statistical Genetics, Graduate School of Medicine, The University of Osaka, Suita, Japan
Shinichi Namba, Kyuto Sonehara, Takafumi Ojima, Ryuya Edahiro, Go Sato, Yoshihiko Tomofuji, Hiroyuki Ueda, Kenichi Yamamoto, Ken Suzuki & Yukinori Okada
Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Shinichi Namba, Kyuto Sonehara, Takafumi Ojima, Ryuya Edahiro, Go Sato, Yoshihiko Tomofuji & Yukinori Okada
Division of Cancer Epidemiology and Prevention, Aichi Cancer Center, Nagoya, Japan
Yuriko N. Koyanagi & Keitaro Matsuo
Faculty of Medicine, The University of Tokyo, Tokyo, Japan
Takezo Kikuchi
Graduate School of Medicine, Tohoku University, Sendai, Japan
Takafumi Ojima
Department of Respiratory Medicine and Clinical Immunology, Graduate School of Medicine, The University of Osaka, Suita, Japan
Ryuya Edahiro
Department of Gastroenterological Surgery, Graduate School of Medicine, The University of Osaka, Suita, Japan
Go Sato
Division of Epidemiology, National Cancer Center Institute for Cancer Control, Tokyo, Japan
Taiki Yamaji, Shiori Nakano & Motoki Iwasaki
Department of Metabolic Medicine, Graduate School of Medicine, The University of Osaka, Suita, Japan
Hiroyuki Ueda
Laboratory of Children’s health and Genetics, Division of Health Sciences, Graduate School of Medicine, The University of Osaka, Suite, Japan
Kenichi Yamamoto
Department of Pediatrics, Graduate School of Medicine, The University of Osaka, Suita, Japan
Kenichi Yamamoto
Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), The University of Osaka, Suita, Japan
Kenichi Yamamoto & Yukinori Okada
Department of Pediatrics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Yosuke Ogawa
Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Ken Suzuki & Toshimasa Yamauchi
Life Science Data Research Center, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
Akinori Kanai & Yutaka Suzuki
Tokushukai Group, Tokyo, Japan
Shinichi Higashiue & Shuzo Kobayashi
Department of Hematology, Nippon Medical School, Tokyo, Japan
Hiroki Yamaguchi & Yasunobu Nagata
Diagnostics and Therapeutics of Intractable Diseases, Intractable Disease Research Center, Graduate School of Medicine, Juntendo University, Tokyo, Japan
Yasushi Okazaki & Naoyuki Matsumoto
Iizuka Hospital, Fukuoka, Japan
Kenta Motomura & Hidenobu Koga
Department of Public Health, Aichi Medical University School of Medicine, Nagakute, Japan
Asahi Hishida
Department of General Internal Medicine, Kyushu University Hospital, Fukuoka, Japan
Hiroaki Ikezaki
Department of Preventive Medicine, Faculty of Medicine, Saga University, Saga, Japan
Megumi Hara
Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
Mako Nagayoshi
Division of Cancer Information and Control, Aichi Cancer Center, Nagoya, Japan
Isao Oze
Department of Lipidomics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Yoshiya Oda
Division of Cohort Research, National Cancer Center Institute for Cancer Control, Tokyo, Japan
Motoki Iwasaki & Norie Sawada
Division of Cancer Epidemiology, Nagoya University Graduate School of Medicine, Nagoya, Japan
Keitaro Matsuo
Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Takayuki Morisaki
Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
Takayuki Morisaki & Koichi Matsuda
Toranomon Hospital, Tokyo, Japan
Takashi Kadowaki
Laboratory of Genome Technology, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Koichi Matsuda
Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), The University of Osaka, Suita, Japan
Yukinori Okada

Authors

Shinichi Namba
View author publications
Search author on:PubMed Google Scholar
Kyuto Sonehara
View author publications
Search author on:PubMed Google Scholar
Yuriko N. Koyanagi
View author publications
Search author on:PubMed Google Scholar
Takezo Kikuchi
View author publications
Search author on:PubMed Google Scholar
Takafumi Ojima
View author publications
Search author on:PubMed Google Scholar
Ryuya Edahiro
View author publications
Search author on:PubMed Google Scholar
Go Sato
View author publications
Search author on:PubMed Google Scholar
Taiki Yamaji
View author publications
Search author on:PubMed Google Scholar
Yoshihiko Tomofuji
View author publications
Search author on:PubMed Google Scholar
Hiroyuki Ueda
View author publications
Search author on:PubMed Google Scholar
Kenichi Yamamoto
View author publications
Search author on:PubMed Google Scholar
Yosuke Ogawa
View author publications
Search author on:PubMed Google Scholar
Ken Suzuki
View author publications
Search author on:PubMed Google Scholar
Akinori Kanai
View author publications
Search author on:PubMed Google Scholar
Shinichi Higashiue
View author publications
Search author on:PubMed Google Scholar
Shuzo Kobayashi
View author publications
Search author on:PubMed Google Scholar
Hiroki Yamaguchi
View author publications
Search author on:PubMed Google Scholar
Yasunobu Nagata
View author publications
Search author on:PubMed Google Scholar
Yasushi Okazaki
View author publications
Search author on:PubMed Google Scholar
Naoyuki Matsumoto
View author publications
Search author on:PubMed Google Scholar
Kenta Motomura
View author publications
Search author on:PubMed Google Scholar
Hidenobu Koga
View author publications
Search author on:PubMed Google Scholar
Asahi Hishida
View author publications
Search author on:PubMed Google Scholar
Hiroaki Ikezaki
View author publications
Search author on:PubMed Google Scholar
Megumi Hara
View author publications
Search author on:PubMed Google Scholar
Mako Nagayoshi
View author publications
Search author on:PubMed Google Scholar
Isao Oze
View author publications
Search author on:PubMed Google Scholar
Shiori Nakano
View author publications
Search author on:PubMed Google Scholar
Yoshiya Oda
View author publications
Search author on:PubMed Google Scholar
Yutaka Suzuki
View author publications
Search author on:PubMed Google Scholar
Motoki Iwasaki
View author publications
Search author on:PubMed Google Scholar
Norie Sawada
View author publications
Search author on:PubMed Google Scholar
Keitaro Matsuo
View author publications
Search author on:PubMed Google Scholar
Takayuki Morisaki
View author publications
Search author on:PubMed Google Scholar
Toshimasa Yamauchi
View author publications
Search author on:PubMed Google Scholar
Takashi Kadowaki
View author publications
Search author on:PubMed Google Scholar
Koichi Matsuda
View author publications
Search author on:PubMed Google Scholar
Yukinori Okada
View author publications
Search author on:PubMed Google Scholar

Consortia

the BioBank Japan Project

Koichi Matsuda
, Takayuki Morisaki
, Yukinori Okada
, Shinichi Higashiue
, Shuzo Kobayashi
, Hiroki Yamaguchi
, Yasunobu Nagata
, Yasushi Okazaki
, Naoyuki Matsumoto
, Kenta Motomura
& Hidenobu Koga

Contributions

S. Namba and Y. Okada conceptualized the work, administered the project and acquired funding. S. Namba designed the methodology, performed data validation and visualizations. K. Sonehara, Y.N.K. and T. Yamaji curated the data. T. Kikuchi and S. Namba performed the formal analysis. T.O., R.E., G.S., Y.T, H.U., K.Y., Y. Ogawa, S. Namba, K. Sonehara and K. Suzuki conducted the investigations. A.K., S.H., S.K., H.Y., Y.N., Y. Okazaki, N.M., K. Motomura, H.K., A.H., H.I., M.H., M.N., I.O., S. Nakano, the BBJ, Y. Oda, Y.S., Y. Okada, M.I., N.S., K. Matsuo, T. Yamauchi, T. Kadowaki, K. Matsuda, Y.N.K., T. Yamaji and T.M. provided resources. M.I., N.S., K. Matsuo, T. Yamauchi, T. Kadowaki, Y. Okada and K. Matsuda supervised the project. S. Namba wrote the original draft; both S. Namba and Y. Okada reviewed and edited the manuscript.

Corresponding authors

Correspondence to Shinichi Namba or Yukinori Okada.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Schematic overview of the G×E interaction analysis workflow.

a, Environment sets used to test G×E interactions. P-values were calculated for individual environmental sets and integrated on the Cauchy distribution to obtain final P-values. b, Workflow of G×E interaction tests. c, Workflow of defining the environments contributing to G×E interactions and their orders.

Extended Data Fig. 2 Consensus clustering of environmental questionnaires.

a, Consensus clustering of questionnaires about dietary consumption and physical activity in UKB1. b, Delta area plot for area under the cumulative distribution function (CDF) curve when changing the number of clusters from two to ten. c and d, Same as a and b, respectively, for BBJ1.

Extended Data Fig. 3 LocusZoom plots of inter-categorical pleiotropic loci in UKB1.

a–i, LocusZoom plots of the significant G×E interactions at the ALPL (a and b), HK1 (c–f), and APOE (g–j) loci. k–m, Hg19 coordinates of protein-coding genes in individual loci. Recombination rates were calculated using the European in-sample linkage disequilibrium information from UKB1. P-values were estimated by two-sided linear regression.

Extended Data Fig. 4 Consensus clustering of dietary questionnaires in independent replication cohorts.

Consensus clustering of food frequency questionnaires in J-MICC/HERPACC (a), JPHC (b), and HPP (c). Delta area plots show the area under the cumulative distribution function (CDF) curve when changing the number of clusters from two to ten.

Extended Data Fig. 5 Replication within and across populations.

a–b, Replication statuses of G×E loci other than the ALDH2 locus (a) and the ALDH2 locus (b). Only interactions with at least one nominally significant replication (P_G×E < 0.05) are shown. EUR, the European population; EAS, the East Asian population; AFR, the African population; AMR, the American population. c–f, The eGFR distributions across rs77924615 genotypes (the UMOD locus) and age in UKB1 (c), UKB2 (d), BBJ1 (e), and BBJ2 (f), as a representative example of cross-population shared G×E interaction. Dots, 5,000 randomly sampled individuals per genotype; lines, regression lines using all participants. g and h, Minor allele frequencies (MAF) of the lead variants (g) and environmental distributions (h) in UKB1 and BBJ1. In g, cross-population shared loci are marked with crosses and labelled by nearest genes. Solid gray lines represent x = 0, y = 0, and y = x. i, Number of G×E loci detected with P_G×E less than 5.0×10⁻⁸ in randomly subsampled participants in UKB1 and BBJ1. Subsample sizes: 10, 50, 100, 150, and 200 thousand in UKB1; 10, 50, and 80 thousand in BBJ1. Full-cohort results are also shown (N_mean = 253,773 in UKB1 and 133,117 in BBJ1).

Extended Data Fig. 6 Urate measurements stratified by rs4148155 and questionnaires about “meat and cheese” consumption.

Distributions of urate measurements across the rs4148155 genotypes (the ABCG2 locus) and the questionnaires related to the “meat and cheese” consumption in UKB1. Dots represent 5,000 randomly sampled individuals per genotype; lines represent regression lines using all participants.

Extended Data Fig. 7 Environmental contributions at the PITX2 locus.

Log-likelihood improvement for G×E models of rs72900155 (the PITX2 locus) for arrhythmia in BBJ1. Environmental factors were added stepwise to the null model (Methods).

Extended Data Fig. 8 Phenome-wide G×E interactions for disease statuses.

P-values of G×E interactions for disease statuses in UKB1 (a) and BBJ1 (b), estimated by two-sided Firth logistic regression. Top G×E-contributing environments are shown for interactions with FDR less than 0.05. Horizontal line represents FDR = 0.05 in each cohort. COPD, chronic obstructive pulmonary disease; AKD, acute kidney disease; CKD, chronic kidney disease.

Extended Data Fig. 9 Additional genome-wide properties of G×E interactions.

a, PGS prediction accuracy in the same population. X-axis: log₂ ratio of prediction accuracy (R²) between “same-stratum” and “opposite-stratum” PGS (based on matching vs. non-matching environmental strata). Y-axis: P-values of R² differences, estimated by two-sided Hotelling’s t test. Dashed line: FDR threshold of 0.05. b, Same as a, for cross-population portability. c and d, Examples of PGS prediction accuracy within (c) and across (d) populations, for hematocrit (Ht) stratified by sex. e, MAGMA one-sided gene-set enrichment analyses of age-stratified genetic effects on pulse pressure (PP), using the gene ontologies for biological processes. Dashed lines: FDR = 0.05 (vertical/horizontal); y = x (diagonal). f and g, Single-cell associations in the monkey artery scRNA-seq data for the younger (f) and older (g) groups. Significant cell types marked with ellipses (false discovery rate <0.05; one-sided Monte Carlo sampling implemented in scDRS). h, Genetic correlations across PP, systolic blood pressure (SBP), and diastolic blood pressure (DBP), estimated separately for the younger and older groups. i, Genetic correlations for the same trait between age groups. For h and i, shown are the inverse-variance weighted meta-analyses of UKB1 and BBJ1 (N = 127,933, 128,369, and 128,530 in UKB1 and N = 73,480, 73,815, and 73,750 in BBJ1 for PP, SBP, and DBP, respectively, for each age stratum); data are presented as estimated values with 95% CI.

Extended Data Fig. 10 Metabolome-wide G×E interactions at lead variants of clinical G×E loci.

P-values of marginal effects and G×E interactions for metabolites in all 15 loci with genome-wide significant G×E interactions, estimated by two-sided linear regression and categorized into five patterns. Triangles denote P-values less than 1.0 × 10⁻⁵⁰.

Extended Data Fig. 11 Sexual dimorphic effects at the CETP locus on clinical lipid biomarkers.

a, Effect sizes of rs12720908 (the CETP locus) on clinical lipid biomarkers in UKB1, stratified by sex and age. b and c, Hazard ratios of clinical lipid biomarkers for all-cause (b) and CAD-caused mortality (c) in UKB1, stratified by sex and statin medication at baseline. Data are presented as estimated values with 95% CI; see Supplementary Tables 31, 32 for the sample sizes.

Extended Data Fig. 12 Sexual dimorphic effects for lipid metabolites at multiple loci.

a, Effect sizes of rs1065853 (the APOE locus), rs6065904 (the PLTP locus), and rs58542926 (the TM6SF2 locus) on lipid metabolites in UKB1, stratified by sex and age. L_LDL_PL_pct, phospholipids to total lipids in large LDL percentage; L_VLDL_FC_pct, free cholesterol to total lipids in large VLDL percentage; LDL_PL_pct, phospholipids to total lipids in LDL percentage; S_HDL_FC_pct, free cholesterol to total lipids in small HDL percentage; S_HDL_PL_pct, phospholipids to total lipids in small HDL percentage. b, Heatmap of marginal effects and G×E interactions of rs67328001 (the TNFAIP8 locus) on lipid metabolites, using P-values estimated by two-sided linear regression. The metabolite with the smallest P_marginal (“triglycerides in small VLDL” [S_VLDL_TG]) and the one with the smallest P_G×E (“concentration of very large HDL particles” [XL_HDL_P]) are shown in red. c and d, Effect sizes of rs67328001 (the TNFAIP8 locus) on the five metabolites with the smallest P_marginal (c) and P_G×E (d), stratified by sex and age. M_LDL_TG, triglycerides in medium LDL; HDL_TG_pct, triglycerides to total lipids in HDL percentage; S_LDL_TG, triglycerides in small LDL; XS_VLDL_TG, triglycerides in very small VLDL; XL_HDL_FC, free cholesterol in very large HDL; XL_HDL_L, total lipids in very large HDL; XL_HDL_C, cholesterol in very large HDL; HDL_size, average diameter for HDL particles. For a,c, and d, data are presented as estimated values with 95% CI; see Supplementary Table 31 for the sample sizes.

Extended Data Fig. 13 Genome-wide G×E interactions for metabolites.

a, Sexual dimorphic effects at four loci in UKB1. XXL_VLDL_TG_pct, the triglycerides to total lipids ratio in chylomicrons and extremely large VLDL; S_VLDL_CE_pct, the cholesteryl esters to total lipids ratio in small VLDL. b and c, Environment-specific effects in UKB1 for current-smoking at the LRRC4C locus (b) and never-drinking at the KLHL32 locus (c). L_VLDL_PL_pct, the phospholipids to total lipids ratio in large VLDL; S_VLDL_PL, phospholipids in small VLDL. d, Distribution of P_marginal and P_G×E at the SLC7A2 locus in UKB1, estimated by two-sided linear regression. e, Environment-stratified effect sizes at the SLC7A2 locus in UKB1. For the “tea and coffee” and “meat and cheese” environment clusters, participants were stratified into two equal-sized groups. Note that coffee consumption was multiplied by −1 to account for its negative correlation with tea consumption in UKB1; therefore, higher cluster values indicate more frequent tea consumption, whereas lower values indicate more frequent coffee consumption. For a–c and e, data are presented as estimated values with 95% CI; see Supplementary Table 34 for the sample sizes.

Supplementary information

Supplementary Information (download PDF )

Supplementary Notes 1–11, Supplementary Figs. 1–3, Supplementary Methods and Supplementary References.

Reporting Summary (download PDF )

Supplementary Tables (download XLSX )

Supplementary Tables 1–34

Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Namba, S., Sonehara, K., Koyanagi, Y.N. et al. A cross-population compendium of gene–environment interactions. Nature 651, 688–697 (2026). https://doi.org/10.1038/s41586-025-10054-6

Download citation

Received: 17 September 2024
Accepted: 15 December 2025
Published: 28 January 2026
Version of record: 28 January 2026
Issue date: 19 March 2026
DOI: https://doi.org/10.1038/s41586-025-10054-6

Subjects

Abstract

Similar content being viewed by others

Main

G×E interactions in individual biobanks

Replication within populations

Cross-population consistency

Environments contributing to G×E

A reverse-causal G×E interaction

Pleiotropic G×E effects on diseases

Genome-wide heritability

Unfiltered approach for G×E detection

Influence on polygenic prediction

Aging shift of pulse pressure genetics

Sex-discordant regulation in metabolites

Discussion

Methods

Biobank Japan

Definition of the discovery and replication cohorts

Quality control of genotype data

UK Biobank

Definition of the discovery and replication cohorts

Quality control of genotype data

Quality control of phenotypes and environments

Clinical traits

Environmental factors

Disease statuses used for the phenome-wide association study

Metabolites measurement

Protein expression measurement

G×E interaction testing

Ethics statement

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

the BioBank Japan Project

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data figures and tables

Extended Data Fig. 6 Urate measurements stratified by rs4148155 and questionnaires about “meat and cheese” consumption.

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links