Introduction

The GBA1 gene is located on chromosome 1 (1q21) and encodes the lysosomal enzyme β-glucocerebrosidase (GCase), which hydrolyzes glucose from glucosylceramide (GlcCer) and glucosylsphingosine (GlcSph)1,2. Pathogenic biallelic GBA1 variants induce structural modifications in GCase, reducing its activity and stability, which drives intracellular GlcCer accumulation and leads to Gaucher Disease (GD) in recessive inheritance, a lysosomal storage disorder3,4. GD can be classified into three subtypes based on clinical progression and the presence/absence of neurological impairment. Type 1 is the most common non-neuronopathic form, while types 2 and 3 involve progressive neurological damage and greater severity5,6,7,8. Approximately 300 GBA1 variants have been associated with GD9, and many are also observed in Parkinson’s disease (PD), including familial cases with autosomal dominant inheritance and sporadic cases10. The connection with PD was initially noticed in large GD clinics11, with p.N409S and p.L483P being the most common PD-associated variants worldwide12. Moreover, GBA1 variants have also been detected subsequently in PD patients affected with REM sleep behavior disorder (RBD) and dementia with Lewy bodies (DLB)13,14,15. In total, previous reports have suggested several GBA1 variants may result in the emergence of neurological phenotypes.

In the past decades, the ubiquitous presence of pleiotropy has been further substantiated, emphasizing that a variant or gene has an effect on multiple categories of traits16,17. While previous studies on GBA1 have mostly focused on one neurological disease or similar neurological phenotypes18,19,20,21, recent research reveals that GBA1-associated traits extend beyond neurological conditions, including blood urea nitrogen (BUN), uric acid, hemoglobin (Hb), and hematocrit (HCT)22. Nevertheless, investigations into the non-neurological effects of GBA1 remain limited, leaving substantial gaps in our understanding of the broader spectrum of GBA1-associated phenotypes. This highlights the need for systematic evaluations of gene-phenotype associations across a diverse phenotypic landscape. Elucidating the associations is essential for advancing genomic medicine, as facilitates the development of novel diagnostic and therapeutic approaches23,24.

A phenome-wide association study (PheWAS) begins with a specific genetic variant and then analyzes across a curated collection of human phenotypes25, including both neurological and non-neurological phenotypes. This approach is particularly effective in exploring pleiotropy, revealing how a single gene can influence multiple traits. Here, we leveraged deep phenotyping data and whole-exome sequencing (WES) from the UKB26, using PheWAS to comprehensively characterize the phenotype spectrum of the GBA1 variants (Fig. 1). Initially, we performed variant-level analyses to assess the contribution of common variants to complex health-related continuous traits and diseases beyond neurological phenotypes. Subsequently, gene-based burden tests were used to evaluate the collective impact of rare variants on a broad range of phenotypes.

Fig. 1: Flow diagram outlining the process for analysis of GBA1 variants with multiple phenotypes.
Fig. 1: Flow diagram outlining the process for analysis of GBA1 variants with multiple phenotypes.
Full size image

UKB UK Biobank, WES whole-exome sequencing, MAF minor allele frequency, LoF loss of function variants, Mis missense variants, LoF+Mis a combination of loss of function and missense variants, Syn synonymous variants.

Results

Overview of the frequency of GBA1 variants and phenotypic data

We processed phenotype data entirely from 502,357 samples in the UK Biobank. After rigorous quality control, we retained only 324,542 unrelated Europeans. A total of 366 GBA1 variants were involved during the classification (Supplementary Table 1). We found 181 individuals (0.06%) carrying at least one of 26 rare (MAF < 0.1%) LoF variants, 2081 individuals (0.64%) carrying at least one of 226 rare missense variants, 2262 individuals (0.70%) carrying at least one of 252 rare LoF+missense variants, and 1558 individuals (0.48%) carrying at least one of 101 rare synonymous variants. In addition, 263,717 individuals (81.26%) carried at least one of 13 common (MAF > 0.1%) variants in the GBA1 gene. Among the common variants, three were protein-coding variants (Supplementary Table 4), namely p.N409S(c.1226A > G, MAF = 1.55 × 10−3), p.T408M(c.1223C > T, MAF = 7.15 × 10−3), and p.E365K(c.1093G > A, MAF = 1.42 × 10−2), all of which were previously reported. Moreover, the remaining 10 non-coding variants, encompassing eight in intronic regions and two in UTR regions (Supplementary Fig. 2), were situated within the targeted areas of whole-exome sequencing and adhered to stringent quality control protocols.

To investigate the pleiotropy of GBA1 variants on multiple systems of the human body and maximize diagnostic utility, 2150 binary traits (Supplementary Table 2), and 3314 continuous traits were classified into 21 ICD-10 chapters (Supplementary Table 3). The “Nervous system” category exhibited the highest percentage (70%) among continuous phenotypes, whereas the category of “Neoplasms” exhibited the highest percentage (16%) among binary phenotypes (Fig. 2a, b). Out of all binary traits, 732 were obtained through classification by PEACOK, while an additional 1420 PheCodes were generated as disease phenotypes through ICD-10 mapping.

Fig. 2: Phenotypic diversity of the sequenced UK Biobank cohort.
Fig. 2: Phenotypic diversity of the sequenced UK Biobank cohort.
Full size image

a The percentage and number of binary traits analyzed in the UKB cohort per ICD-10 disease chapter. Each marked number represents the number of phenotypes in the category. b The percentage and number of continuous traits analyzed in the UKB cohort per chapter. Each marked number represents the number of phenotypes in the category.

Variant-level associations with continuous traits

We identified common variants associated with 33 continuous traits (P < 1.16 × 10−6), spanning the genitourinary, hematological, health status factors, neurological, psychiatric, endocrine and metabolic, ophthalmic, and gastrointestinal categories (Fig. 3b). A substantial number (72%) of the significant correlations were mainly attributed to two non-coding variants: rs9628662 (AAF = 0.31), and rs3115534 (AAF = 0.87) (Fig. 3d, Supplementary Table 5). Among the identified continuous phenotypes, 32 had not previously been associated with the specific variants analyzed in this study, based on cross-referencing with public databases Open Targets and the GWAS Catalogue. According to records from the Open Targets database, the evidence suggested that all the observed associations were independent of known linkage disequilibrium (LD) effects, reinforcing the novelty of these findings. Notably, 17 were brain MRI-based phenotypes, followed by eight hematological biomarkers.

Fig. 3: Summary of variant-level PheWAS results.
Fig. 3: Summary of variant-level PheWAS results.
Full size image

a Associations between single variants and binary traits. For all associations that appear in the analysis, we mark the significant associations and suggestive associations at the sub-threshold. The solid line represents the significant P value threshold (1.79× 10−6). The dashed line represents the suggestive P value threshold (9.15× 10−6). The y-axis is capped at −log10(P) = 16 and the associations with P < 10−16 were plotted on the y = 16 line. (n = 1). b Associations between single variants and common traits. For all associations that appear in the analysis, we mark only the significant association with the smallest P value per category. The solid line represents the significant P value threshold (1.16× 10−6). The dashed line represents the suggestive P value threshold (9.15× 10−6). The y-axis is capped at −log10(P) = 16 and the associations with P < 10−16 were plotted on the y = 16 line. (n = 4). c Illustration of all significant associations with binary traits at a variant level. d Effect sizes for significant associations with continuous traits at a variant level. PD Parkinson’s disease, HbA1C Glycated hemoglobin, GGT Gamma glutamyltransferase, GWC Gray-white contrast, LH left hemisphere, RH right hemisphere.

The variant rs9628662 demonstrated significant associations with 15 brain MRI-based phenotypes. The scanner transverse (Y) brain position showed the highest statistical significance (β = 0.09, P = 2.23 ×10−16). Furthermore, significant correlations were observed between this variant and decreased gray-white matter contrast (GWC) measures across 13 brain regions, such as gray-white contrast in the caudal middle frontal (right hemisphere) (β = −0.06, P = 1.81 ×10−8).

In addition, it was suggested that GBA1 variants exhibit noteworthy associations with various biomarkers. The rs3115534 variant demonstrated significant associations with eight distinct biomarkers, including four of the “hematological and immune” category (such as red blood cell (erythrocyte) count, β = 0.01, P = 9.79 ×10−15), two of the genitourinary category (such as urea, β = −0.03, P = 6.59 ×10−61\()\), one of the “endocrine and metabolic” category (calcium, β = −0.01, P = 2.59 ×10−10) and one of gastrointestinal category (gamma-glutamyltransferase, β = −0.01, P = 5.92 ×10−8). The association with gamma-glutamyltransferase was previously reported (Supplementary Table 9)27. The rs2075569 variant similarly showed associations with seven different biomarkers, including four of the “hematological and immune” category (such as reticulocyte count, β = 0.01, P = 5.60 ×10−7), two of the “endocrine and metabolic” category (such as HbA1c, β = −0.01. P = 1.16 ×10−10) and one of the genitourinary category (urea, β = −0.01, P = 1.53 ×10−7). The p.T408M variant showed associations with three different biomarkers of the “hematological and immune” category (such as hematocrit percentage, β = −0.07, P = 2.66 ×10−9), highlighting the potential pleiotropic effects (Supplementary Table 10). Furthermore, the p.E365K variant showed suggestive associations with hematocrit percentage at sub-threshold (β = 0.04, P = 4.96 ×10−6).

Among the significant associations, we highlighted both rs2075569 and rs3115534 were associated with lower calcium and urea levels. However, the rs140335079 variant was associated with higher urea levels. The non-coding variant rs3115534 and the coding variant p.T408M had opposing directions of effect on hematocrit percentage, hemoglobin concentration, and red blood cell (erythrocyte) count. In summary, despite the relatively modest effect sizes and the different directions of effects on a phenotype, common variants within the GBA1 gene demonstrate the remarkable potential to affect multiple phenotypes, especially traits originating from brain MRI and blood biochemistry.

Variant-level associations with binary traits

We identified unreported associations between specific GBA1 variants and five binary traits (P < 1.78 ×10−6) through comparison with Open Targets and the GWAS Catalogue, which spanned categories including neoplasms and ophthalmology (Fig. 3a). We found the rs9628662 variant was simultaneously associated with four distinct ophthalmic phenotypes (Fig. 3c, Supplementary Table 6), all explaining reasons for glasses/contact lenses, such as myopia (OR = 0.91, P = 3.89 ×10−15) and hypermetropia (OR = 0.91, P = 1.87 ×10−8). Furthermore, we highlighted the association between the rs3115534 variant and a higher risk of benign neoplasm of other parts of the digestive system (OR = 1.03, P = 1.24 ×10−13). This finding warranted further investigation to elucidate the potential role of rs3115534 in the early detection of benign tumors, which may lead to significant morbidity by compressing or obstructing digestive structures or undergoing malignant transformation.

We specifically focused on the diseases previously associated with GBA1 variants, including PD, GD, DLB, and RBD. Due to the inherent limitations in the scope of diseases covered by PheCodes, GD, DLB, and RBD were not included in our analysis of the diseases. Instead, we expanded our examination to include other sleep disorders, such as parasomnias, and additional forms of dementia, such as dementia with cerebral degeneration. The strongest association with dementia with cerebral degenerations (p.N409S, P = 0.02), failed to reach statistical significance. Similarly, no significant associations were found between sleep disorders and common variants (e.g., rs9628662, OR = 1.14, P = 0.02). Although we did not detect any study-wide significant associations with PD, suggestive associations (P < 9.15 ×10−6) showed that p.E365K leads to adverse effects in people with PD compared to controls (OR = 1.57, P = 6.08 ×10−6). It was consistent with previous studies showing that p.E365K was associated with the risk of PD in total populations28,29. Additionally, another common variant p.T408M had weak evidence for association with PD (OR = 1.69, P = 1.21 × 10−4), suggesting the potential PD risk among carriers. These findings partially replicated known GBA1-related phenotypes and confirmed the reliability of our approach.

Gene burden analyses of rare variants

Carriers with qualifying rare variants generally exhibited a low prevalence of phenotypes across all associations in this study. Given the limited statistical power for detecting individual rare variant associations owing to their infrequency, we performed gene-level analysis to investigate the impact of aggregated rare variants on complex phenotypes. We utilized four categories of rare variants to compute a cumulative burden (all rare LoF variants, all rare missense variants, all rare LoF+missense variants, and all rare synonymous variants) and conducted the analysis.

PheWAS of GBA1 variants showed significant associations (Table 1, Supplementary Fig. 3a, Supplementary Table 7) with three neurological phenotypes (P < 5.81 ×10−6), including dementia with cerebral degenerations (burden LoF+Missense variants: OR = 6.16, P = 1.32 ×10−7), PD (burden LoF+Missense variants: OR = 2.38, P = 4.29 ×10−7) and “Siblings have suffered from PD” (burden LoF+Missense variants: OR = 3.05, P = 9.99 ×10−7), which means participants’ brothers or sisters have suffered from PD. LoF+Missense variants were strongly associated with all of the three phenotypes, whereas LoF variants were only associated with “Siblings have suffered from PD” (OR = 10.66, P = 8.57 ×10−7). We successfully replicated the known association with PD. In addition, these findings also suggested that the effects of LoF variants tend to be more severe than those of combined LoF+Missense variants for the same phenotype.

In contrast to the combination of two predictably deleterious GBA1 variants above, missense variants showed no significant associations. However, relationships between missense variants and two phenotypes reached suggestive significance at sub-threshold (P < 9.15 ×10−6), including Weighted-mean FA in tract medial lemniscus (right) (β = −0.34, P = 4.04 ×10−6) (Supplementary Fig. 3b, Supplementary Table 8), as well as PD (OR = 2.25, P = 8.92 ×10−6). As expected, synonymous variants showed no impact on disease risk, even at the sub-threshold, indicating that the confounding effects of population sub-structure were successfully minimized.

Table 1 Significant associations of rare variant burden with binary traits

Discussion

This study represents the largest Phenome-wide analysis aimed at comprehensively assessing the contribution of the GBA1 gene to complex health-related continuous traits and human diseases, within a prospective European cohort to date. We leveraged exome sequences from 324,542 participants of the UKB and analyzed records of 5464 phenotypes, expanding beyond previous research which was limited to neurological diseases. Our analysis identified 41 phenotypes associated with GBA1 variants at the variant level and the gene level. While several identified phenotypes were previously reported, we found a substantial number of previously unreported neurological and non-neurological phenotypes.

This study revealed novel associations with MRI-derived neurological phenotypes, particularly GWC measures across a wide range of brain regions. GWC quantifies the degree of blurring observed at the interface between gray matter and white matter compartments in the brain and serves as an indicator of localized tissue integrity variations, myelin loss, augmented water presence in the white matter, or accumulation of iron30. Previous studies showed that lower GWC scores were associated with diminished cognitive abilities and an increased progression from mild cognitive impairment to dementia30,31. This observation provided a potential indicator for early detection and differentiation of neurodegenerative diseases and new avenues for research into the mechanisms by which changes in gray and white matter integrity affect cognitive function.

This study revealed new non-neurological associations, including biomarkers spanning multiple categories, that remained unrecognized in studies primarily focused on neurological diseases. We found GBA1 variants strongly associated with lower urate levels, which were consistent with previous findings that showed lower levels of uric acid in the serum could serve as a biomarker for the progression of GBA1-PD32. Previous research reported that the rs11264345 variant in the GBA1 gene exhibited a significant association with blood urea nitrogen (BUN), uric acid, hemoglobin (Hb), and hematocrit (HCT)22. In addition, we identified an association between the GBA1 variants and elevated HbA1c levels, which had been implicated in heightened cognitive impairment and neuroaxonal damage in PD patients33,34. Previous studies suggested that GBA1 may drive these changes through interactions with CDC12335, a protein associated with shared genetic susceptibility to PD and Type 2 diabetes (T2D)36, potentially affecting pathways associated with HbA1c levels, a clinical marker for chronic hyperglycemia and glycemic control in T2D. Overall, this cross-systemic exploration of biomarker correlations offered a perspective that could significantly improve clinical management and therapeutic outcomes for patients by intervening in the progression of GBA1-related diseases through atypical biomarkers.

This study revealed that non-coding variants may variably influence the direction of phenotypic effects, emphasizing the crucial role of non-coding variants in shaping individual health outcomes and enhancing the comprehension of disease mechanisms. This aligned with prior findings that disease-related non-coding variants impact gene expression in diverse ways37. We demonstrated the relationships between two variants (rs2075569 and rs3115534) and lower calcium levels. In contrast, it was previously reported that GBA1-PD neurons exhibited increased calcium levels at normal conditions and amplified calcium release from the endoplasmic reticulum stores38. Therefore, the negative correlation between non-coding variants and calcium levels showed that non-coding variants may reflect the complexity of calcium regulatory mechanisms, indicating a previously unrecognized regulatory mechanism. In addition, we found the two non-coding variants associated with higher hematological parameters. Previous studies demonstrated that GCase deficiency leads to GlcCer accumulation in macrophages, particularly in the bone marrow, where Gaucher cell infiltration contributes to cytopenia and other clinical features39. The identification of the rs2075569 variant as a cis-expression Quantitative Trait Locus (cis-eQTL) for Thrombospondin (THBS3)40,41, an adhesive glycoprotein involved in cell-cell and cell-matrix interactions42, underscored the potential role of non-coding variants in regulating key hematological traits. Furthermore, the rs3115534 acting as a cis-eQTL influenced expression of the GBA1 gene in small intestine terminal ileum tissue from GTEx eQTL data43, potentially influencing local metabolism and lysosomal function, which may contribute to the development of benign digestive tumors. The association was further supported by evidence from GD patients, where GBA1 dysfunction was associated with an increased risk of gastrointestinal cancers, including colon cancer44. UKBEC data suggested that the rs9628662 variant is a cis-eQTL of nearby genes (including YY1AP1) in the medulla, which contains both white matter tracts and clusters of gray matter45,46,47. Altered gene expression in this tissue may disrupt the structure and function of gray and white matter, potentially contributing to the GWC changes. In summary, future studies that include additional types of non-coding variants can help expand the understanding of mechanisms of biochemistry and pathophysiology underlying GBA1-related diseases.

The pioneering use of PheWAS to analyze GBA1 variants was a major strength48. This PheWAS efficiently provided a valuable resource of robust associations, revealing novel disease mechanisms, and expanding the phenotypic spectrum of the known gene. Regarding the rarity of the variants when analyzing thousands of traits in a biobank-scale population, we applied Firth’s logistic regression to control false positive associations49. However, our analysis had several limitations. Despite our research being constrained by the number of European ancestry participants carrying rare damaging variants, which resulted in an underpowered analysis for detecting significant effects in missense variants, we could still identify significant relationships involving both LoF and LoF+Missense variants. Our findings primarily applied to individuals of European ancestry, and the generalizability to other populations required further investigation. Although a single diagnosis (ICD-10 codes) per individual may not fully reflect diagnostic variability, the accuracy of neurodegenerative disease diagnoses (such as all-cause dementia and PD) based on inpatient and death registry data in the UKB has been validated50. Future studies with more hospital diagnoses and follow-up encounters could improve diagnostic accuracy. Moreover, experimental validation could be conducted to confirm the discovered associations.

In conclusion, we finally identified 41 neurological and non-neurological phenotypes associated with GBA1 variants, 39 of which have not been previously reported. These findings extend the impact of GBA1 beyond neurological diseases, reveal the significance of non-coding variants, and propose therapeutic targets through biomarker discovery. Large cohorts of diverse populations will help refine the clinical spectrum of GBA1-related phenotypes and advance precision medicine for disease prevention and management.

Methods

UKB resource

The UK Biobank is a population-based cohort study conducted in the United Kingdom involving approximately 500,000 individuals aged 40-70 years at recruitment, who are followed up continuously. Ethical approval for the UK Biobank was granted by the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB), with renewal in 2021 (reference 21/NW/0157). All studied participants provided informed consent. A curated subset of genetically unrelated White British participants (UKB Field ID 22006) was directly selected, comprising more than 80% of the UKB cohort26. After excluding samples with inconsistencies in genetically determined and self-reported sex, with abnormal sex chromosome aneuploidy, and those lacking whole-exome sequencing data at the time of this study, a total of 324,542 unrelated samples were finally included (Supplementary Fig. 1).

Variant selection

The Regeneron Genetic Center conducted exome sequencing on DNA samples collected from the UK Biobank. Multiplexed samples were sequenced with paired-end 75-bp reads on the Illumina NovaSeq 6000 platform, subsequently all raw sequencing data were aligned to the full GRCh38 reference using the OQFE protocols. For further information on sequencing and variant calling methods utilized in detail, please visit https://biobank.ctsu.ox.ac.uk/showcase/label.cgi?ID=170.

We included variants that had variant missingness < 10%, quality score (QUAL) ≥ 30, genotype quality score (GQ) ≥ 20, and total depth (DP) ≥ 10. These criteria were applied using the VCFtools51 and custom-designed software, “FilterVar”, each tailored to specific aspects of the analysis as required. For our analyses, variants were annotated using “ExtremeVar”, with the genomic construct hg38 and GBA1 transcript NM_000157 (chr1: 155234452-155244670). After quality control and annotation, GBA1 variants were categorized into common and rare variants based on a minor allele frequency (MAF) threshold of 0.1%. 13 common variants (MAF > 0.1%) annotated as the exome, intron, UTR3, or UTR5 were included. Furthermore, rare variants (MAF < 0.1%) were classified into four categories: (1) Loss-of-function variants (LoF): Variants predicted to result in a stop-gain, frameshift deletion, or splicing site alteration, which were directly classified as deleterious by “ExtremeVar”. (2) Missense variants. (3) LoF+Missense: A combination of loss-of-function and missense variants. (4) Synonymous variants.

Phenotypes

The main phenotypic categories included in this study were binary traits and continuous traits. These phenotypic data were sourced from the April 2023 data release as part of the UKB application 99093. All phenotypic data were extracted from phenotypes available through the UK Biobank Data Showcase, and then except for ICD-10 data, processed using the Quanli Wang lab modified version of PHESANT package52, which can be viewed at https://github.com/astrazeneca-cgr-publications/PEACOK, to parse the phenotypes. The package excluded continuous phenotypes with less than 500 participants by default. The remaining phenotypes were adjusted to follow a normal distribution through the process of inverse normal rank transformation. Moreover, a minimum threshold of 500 participants was enforced for each binary trait by default. Based on the Field ID paths, we excluded certain binary traits (1) sociodemographics, such as education and employment; (2) dietary information, such as 24-h recall and food preferences; (3) hospital inpatient administrative data, such as admission and discharge dates, and other related aspects; (4) blood count processing details, such as reticulocyte count acquisition methods; (5) ICD-9 diagnosis codes; (6) OPCS traits and other binary traits related to treatment methods. Moreover, in addition to the 732 binary traits generated by the PEACOK R package, we extracted ICD-10 diagnosis codes (Field ID 41270) from various health-related records within the UKB. All ICD-10 codes were systematically mapped to PheCodes53. Participants were considered eligible cases for specific PheCodes if they had at least one recorded diagnosis of the ICD-10 code, while controls consisted of participants who never had a positive diagnosis for any PheCodes within the corresponding ICD-10 root chapter. We restricted the analysis to PheCodes with at least 20 cases, resulting in a final consideration of 1420 PheCodes. In summary, a total of 2150 binary phenotypes and 3314 continuous phenotypes were included in the study.

Phenome‑wide association study (PheWAS)

We performed variant-level PheWAS analyses between all 5464 phenotypes and 13 GBA1 common variants. For binary traits, each variant-level analysis was carried out independently by Firth’s logistic regression, correcting for age, age squared2, sex, and the first 10 principal components. For continuous traits, we used linear regression correcting for the same covariates. In this approach, we took two Bonferroni-adjusted thresholds to individually define significant variant-level correlations within continuous traits (P = 0.05/13/3314 ≈ 1.16 ×10−6, Bonferroni correction for 13 variants in 3314 continuous traits) and binary traits (P = 0.05/13/2150 ≈ 1.78 ×10−6, Bonferroni correction for 13 variants in 2150 binary traits). In addition, we took a sub-threshold to define suggestive correlations (Bonferroni-adjusted significance threshold of P = 0.05/5464 ≈ 9.15 ×10−6).

Rare variants were mainly tested on aggregate. We performed gene-level PheWAS analyses between 5,464 phenotypes and burden scores in the GBA1 gene. For the analysis, we considered four categories of qualifying variants, encompassing all rare variants (MAF < 0.1%). Firstly, burden scores were calculated for each participant based on the count of qualifying variants to represent the cumulative effects and enhance the statistical power. For binary traits, Firth’s logistic regression was employed, while linear regression was utilized for continuous traits. Covariates included age, age squared2, sex, and the first 10 principal components. Throughout all analyses conducted using the PheWAS R package54, additive genotype models were employed. In this approach, we took two Bonferroni-adjusted thresholds to individually define significant gene-based correlations within continuous traits (P = 0.05/4/3314 ≈ 3.77 ×10−6, Bonferroni correction for four categories of rare variants in 3314 continuous traits) and binary traits (P = 0.05/4/2150 ≈ 5.81 ×10−6, Bonferroni correction for four categories of rare variants in 2,150 continuous traits). In addition, we took a sub-threshold to define suggestive correlations (P = 0.05/5464 ≈ 9.15 ×10−6).