Introduction

Females are at a significantly higher risk of depression with higher chronicity1 and late-onset Alzheimer’s disease with faster progression2, relative to males. Estradiol, the predominant estrogen in the female body, is a modulator of brain structure and function, and has repeatedly been implicated in the development of depression and Alzheimer’s disease in females3.

The estrogen hypothesis postulates that estrogens exert a neuroprotective effect on the brain and cognitive functions4. Aligning with this hypothesis, a range of factors indicative of higher lifetime estradiol exposure, such as an older age at menopause and a longer reproductive span (age at menopause minus age at menarche), have been linked to lower risk for depression5 and dementia6. Further, these factors have been associated with markers of brain health using data from the UK Biobank (UKB)7,8,9, a prospective population-based cohort from the United Kingdom (see Sudlow et al.,10 for details on the UKB). For instance, a lower brain age gap has been linked to a higher number of childbirths7, an older age at natural menopause8, and a longer reproductive span9. Brain age gap is defined as the difference between predicted brain age derived from magnetic resonance imaging (MRI) measures and chronological age11. A higher brain age gap has been linked to a range of negative health outcomes, including Alzheimer’s disease and depression12,13. However, some observational studies contradict the estrogen hypothesis, linking a longer reproductive span, an earlier age at menarche, and a later age at menopause with smaller total brain volume14. Inconclusive findings on the relationship between estradiol and brain and mental health might partly be traced back to differences in the data used, the type of hormone exposure (endogenous vs. exogenous) assessed, and measurements or proxies used to ascertain estradiol exposure.

Further, the predominant use of observational study designs encompasses challenges, such as confounding and reverse causation, that might contribute to inconsistent findings across studies15. To overcome some of these limitations, recent studies have implemented Mendelian randomization, which uses genetic variants to test for causal exposure-outcome relationships15. The idea behind Mendelian randomization mirrors that of a randomised controlled trial, however, instead of randomising individuals into experimental groups, the randomization occurs through the assignment of genetic variants at conception15. Two-sample Mendelian randomization, where variant-exposure and variant-outcome associations are taken from different samples derived from the same underlying population, has become increasingly popular, as it makes use of summary statistics from genome-wide association studies (GWAS) and increases statistical power15.

Using this approach, previous studies have examined relationships between variables related to estradiol as exposures, including age at menarche, age at menopause, and estradiol levels, and outcomes related to brain and mental health, including Alzheimer’s disease and depression risk, largely reporting no causal associations or small effect sizes16,17,18,19,20,21,22. However, many previous studies have used small or mixed-sex samples, likely as a consequence of the lack of available large-scale, sex-stratified GWAS. As previous studies have highlighted different genetic effects regarding estradiol levels between females and males23, sex-specific effects might be overlooked and obtained estimates may be biased when using mixed-sex samples. Furthermore, many other factors related to lifetime estradiol exposure commonly used in observational studies, such as reproductive span and number of childbirths, as well as other indicators of brain health, including brain age gap, have not yet been examined using Mendelian randomization. Thus, a comprehensive approach investigating these factors in female-only samples has so far been lacking. Such an approach can improve our mechanistic understanding of sex differences in mental disorders and neurodegenerative diseases. Based on previous observational studies, we hypothesised that higher estradiol levels and factors related to higher lifetime estradiol exposure are associated with a lower brain age gap (i.e., younger brain age relative to chronological age), a lower risk for Alzheimer’s disease, and a lower risk for depression, thus showing protective effects of estradiol on brain health and mental health in females.

In this preregistered study, we conduct two-sample Mendelian randomization analyses in female-only samples and find no causal effects of lifetime estradiol exposure on the included proxies of brain and mental health. In addition to openly available sex-stratified GWAS, we use data from the UKB to run and annotate a female-specific GWAS on brain age gap as well as a female-specific GWAS on estradiol levels using a continuous and a binary approach (i.e., above or below detection limit). As exposure variables, we use genetically-predicted estradiol levels as well as factors related to lifetime estradiol exposure (i.e., reproductive span, age at menarche, age at menopause, and number of childbirths) and examine each relationship with genetically-predicted brain age gap, risk for Alzheimer’s disease, and risk for depression as outcome variables. Due to methodological challenges of measuring estradiol levels in pre- and postmenopausal females (e.g., effects of menstrual cycle and menopausal status), likely impacting GWAS results24, we conduct our analyses on continuous estradiol levels in combined as well as stratified samples of pre- and postmenopausal females from the UKB and use the binary approach for sensitivity analyses. To further ensure robustness of our results across samples, we replicate our findings in an independent sample of postmenopausal females and, as estradiol levels are known to fluctuate less in males, in a male-only sample. To this end, we run a male-specific GWAS on brain age gap using data form the UKB and a GWAS meta-analysis including data from the UKB, the Norwegian Mother and Child Cohort Study (MoBa; a population-based pregnancy cohort study conducted by the Norwegian Institute of Public Health; see Magnus et al.25 for details), and a previously conducted GWAS. We run supplementary analyses using factors related to exogenous hormone use and health-related procedures that likely impact estradiol levels (i.e., oral contraceptive use, hormone replacement therapy (HRT) use, history of hysterectomy, and history of oophorectomy) as exposures. As further sensitivity analyses, we use risk for recurrent depression as an outcome to assess potential influences of disease burden, as well as a subsample of the depression GWAS to avoid sample overlap. We conduct multivariable Mendelian randomization, including body mass index (BMI) as an exposure variable, to account for potential pleiotropic effects, as BMI is known to be associated with estradiol exposure and brain and mental health16,19,23. For the Mendelian randomization analyses, we apply the inverse-variance weighted (IVW) method and robust estimation methods, including MR-Egger, weighted median, simple mode, and weighted mode, as well as MRlap for analyses with sample overlap.

Results

GWAS of brain age gap in females

We identified two independent genomic loci, on chromosomes 5 and 17, that were significantly associated with brain age gap in females (see Fig. 1 for Manhattan plot and Supplementary Fig. 4 for QQ-plot). The loci included 3 lead SNPs and 4 independent SNPs and mapped onto 14 genes (Supplementary Table 5 and Supplementary Fig. 9). Out of the 14 genes, 5 genes (MAPT, NSF, ARHGAP27, KANSL1, and PLEKHM1) expressed higher, on average, across brain tissue compared to the other genes. The other nine genes largely showed lower average expression levels across tissue types (Supplementary Fig. 12). The gene-set enrichment analysis identified 1 positional gene set and 28 associated phenotypes (Supplementary Fig. 15). We identified no significant genomic loci in the GWAS conducted on brain age gap in the male-only sample (see Supplementary Note 4).

Fig. 1: Manhattan plot of brain age gap in females.
Fig. 1: Manhattan plot of brain age gap in females.The alternative text for this image may have been generated using AI.
Full size image

The blue/lower and red/upper lines indicate the suggestive (p < 5 × 10−5) and the genome-wide significance (p < 5 × 10−8; i.e., accounting for multiple comparisons) thresholds, respectively. The genome-wide association study was run using the standard additive model of linear associations. Source data are provided as a Source Data file.

GWAS of estradiol levels in females

For the GWAS conducted on continuous estradiol levels in a combined sample of pre- and postmenopausal females, we identified two independent genomic loci, on chromosomes 17 and 19, that were significantly associated with estradiol levels (see Fig. 2 for the Manhattan plot and Supplementary Fig. 5 for the QQ-plot). The loci included 3 lead SNPs and 4 independent SNPs and mapped onto 21 genes (Supplementary Table 6 and Supplementary Fig. 10). A cluster of genes (SHBG and SLC35G6) expressed lower, on average, across tissue types, compared to the other genes (Supplementary Fig. 13). The gene-set enrichment analysis identified 1 positional gene set and 10 associated phenotypes (Supplementary Fig. 16). We identified nine independent genomic loci in the GWAS conducted on binary estradiol levels (see Supplementary Note 6 for details).

Fig. 2: Manhattan plot of estradiol levels (continuous approach).
Fig. 2: Manhattan plot of estradiol levels (continuous approach).The alternative text for this image may have been generated using AI.
Full size image

The blue/lower and red/upper lines indicate the suggestive (p < 5 × 10−5) and the genome-wide significance (p < 5 × 10−8; i.e., accounting for multiple comparisons) thresholds, respectively. The genome-wide association study was run using the standard additive model of linear associations. Source data are provided as a Source Data file.

Mendelian randomization analyses

No significant causal relationships were found between estradiol levels, using the continuous measures as the exposure, and brain age gap, Alzheimer’s disease, and depression as outcomes (Table 1). The results were consistently non-significant across the combined, premenopausal-only, and postmenopausal-only samples from the UKB. Similarly, the replication analyses in the independent postmenopausal sample from the LIFE studies were non-significant (Table 1). Across the samples used for continuous estradiol levels, few instrumental variables were identified at the genome-wide significance threshold, therefore, a lower threshold (p < 5 × 10−6) was applied for selecting instrumental variables (see Supplementary Note 3 and Supplementary Table 4 for details). All results remained non-significant across the estimation methods, except for the analysis with estradiol levels in the combined pre- and postmenopausal sample as an exposure and Alzheimer’s disease as an outcome, which was significant using the MR-Egger method (b = 1.27, se = 0.48, p = .02; Supplementary Table 9).

Table 1 Results of Univariable Mendelian Randomization Analyses with Continuous Estradiol Levels

The sensitivity analyses using binary estradiol levels for the combined pre- and postmenopausal, premenopausal-only, and postmenopausal-only females from the UKB were not significant for any of the outcomes (Supplementary Table 10). For the male samples, no significant associations were found between estradiol levels and brain age gap, Alzheimer’s disease, and depression (Supplementary Table 10). The results remained non-significant across the estimation methods.

No significant associations were found between the exposures reproductive span, age at menopause, and number of childbirths and the outcomes brain age gap, Alzheimer’s disease, and depression (Table 2). Consistently, no significant effects were found in the supplementary analyses using oral contraceptive use, HRT use, history of hysterectomy, and history of oophorectomy as exposure variables (Supplementary Note 7 and Supplementary Table 8). All results remained robust across the estimation methods. Similarly, age at natural (non-surgical) menopause as an exposure was not significant in the sensitivity analysis for Alzheimer’s disease as an outcome (Supplementary Table 11). A significant association was found for age at menarche with depression as an outcome (b = − 0.09, se = 0.04, p = .04), however, this result did not remain significant after adjusting for multiple comparisons (pFDR = .76) and was not significant when using any of the other estimation methods (Fig. 3). For age at menarche as an exposure, no significant relationships were found with brain age gap or Alzheimer’s disease as outcomes (Table 2). Results were robust across the estimation methods for Alzheimer’s disease and largely for brain age gap, except for the weighted median which was significant (b = − 0.44, se = 0.21, p = .04; Supplementary Table 9).

Fig. 3: Effect of instrumental variables on age at menarche (exposure) and depression (outcome).
Fig. 3: Effect of instrumental variables on age at menarche (exposure) and depression (outcome).The alternative text for this image may have been generated using AI.
Full size image

Each point represents one single nucleotide polymorphism (SNP) included as an instrumental variable. Data are presented as the beta value and the confidence interval (+/− standard error) of the effect of a SNP on the exposure on the x-axis and the effect of a SNP on the outcome on the y-axis. The inverse-variance weighted (IVW), weighted median, and simple mode methods are plotted together for visualisation purposes, due to their overlapping regression lines (see Table 2 and Supplementary Table 9 for details). The IVW estimate was used as the main estimation method (one-sided test) and was significant only before adjusting for multiple comparisons. Age at menarche50: N = 182,416 females. Depression51: N = 329,476 females (44,610 cases with diagnosed depression and 284,866 healthy controls). See Table 3 for details on the included samples. MR Mendelian randomization. Source data are provided as a Source Data file.

Table 2 Results of Univariable Mendelian Randomization Analyses with Factors Related to Lifetime Estradiol Exposure

When using MRlap to correct for potential biases resulting from sample overlap, the results of the analyses with overlapping exposure and outcome samples (where both samples included data from the UKB) remained robust, with no significant differences between the IVW estimates and the corrected estimates (Supplementary Table 9). Further, the sensitivity analyses using the depression subsample excluding UKB for analyses with sample overlap replicated the findings of the main analyses, with no significant associations across analyses (Supplementary Table 12). Similarly, the sensitivity analyses using recurrent depression as an outcome replicated the main analyses. Only the association with age at menarche as an exposure was significant, however, this association did not remain significant after adjusting for multiple comparisons (b = − 0.14, se = 0.05, p = .01, pFDR = .30) and was not robust across the estimation methods.

There was significant heterogeneity in the analyses of age at menarche as an exposure and depression as an outcome (p < .001), number of childbirths as an exposure and depression (p = .02) and brain age gap (p < .001) as outcomes, continuous estradiol levels measured in the postmenopausal sample from the UKB as an exposure and Alzheimer’s disease as an outcome (p = .02) and reproductive span (p = .04) with brain age gap as an outcome, pointing to potential pleiotropic effects. Further, there was significant heterogeneity in the analyses of binary estradiol levels in the combined pre- and postmenopausal sample (p = .04), reproductive span (p < .01), and number of childbirths (p = .02) as exposures with recurrent depression as an outcome and estradiol levels in males with depression (p < .01; Supplementary Table 14).

In multivariable analyses, genetically-predicted BMI and continuous estradiol levels in the combined pre- and postmenopausal sample were not significantly associated with brain age gap or Alzheimer’s disease as outcomes (Supplementary Table 13). Results were consistent when using binary estradiol levels as an exposure and when using multivariable MR-Egger. For depression as an outcome, continuous estradiol levels in the combined pre- and postmenopausal sample were not significant, but BMI was significant (b = 0.14, se = 0.05, p = .01, pFDR = .30) in multivariable analyses. This was not robust when using multivariable MR-Egger, however, the result was consistent when using binary estradiol levels as an exposure (BMI: b = 0.15, se = 0.05, p = .003, pFDR = .27). Furthermore, when including BMI, the association between age at menarche and depression, as well as the association between age at menarche and recurrent depression were not significant. This remained consistent when using multivariable MR-Egger. No significant heterogeneity was found in any of the analyses, except for the analysis using age at menarche and BMI as exposures and depression as an outcome (Supplementary Table 14).

Discussion

We took a comprehensive approach to investigating potential causal associations between genetically predicted estradiol levels as well as factors related to lifetime estradiol exposure and genetically predicted brain age gap, Alzheimer’s disease, and depression in females, by using Mendelian randomization. To this end, we conducted sex-stratified GWAS of brain age gap and female-specific GWAS of estradiol levels. The two-sample Mendelian randomization analyses revealed no significant causal associations between estradiol levels, reproductive span, age at menopause, and number of childbirths as exposures with brain age gap, Alzheimer’s disease, and depression as outcomes. The results on estradiol levels were consistent across samples when using binary estradiol levels and across robust estimation methods. Using age at menarche as an exposure, a significant causal association with depression as an outcome was found, linking a younger age at menarche with a higher risk for depression. However, this result was not robust when adjusting for multiple comparisons, across estimation methods, or when including BMI in multivariable Mendelian randomization analyses.

In this study, we ran and annotated a sex-stratified GWAS on brain age gap, using grey matter and white matter MRI features from the UKB. We identified 2 significant loci on chromosomes 5 and 17 in the female-only sample. The significant locus on chromosome 17 has been reported by previous studies that conducted GWAS on brain age gap in mixed-sex samples26,27,28,29,30. This locus includes the MAPT gene, which encodes the tau protein involved in several neurodegenerative diseases, including Alzheimer’s disease31. The significant locus on chromosome 5 has not been reported in previous GWAS on brain age gap in mixed-sex samples26,27,28,29,30. This region includes the TENM2 gene, which has been linked to various traits, including smoking initiation, BMI, educational attainment, depression, age at menarche, and Alzheimer’s disease in females (see NHGRI-EBI GWAS Catalogue32). Thus, this locus might be female-specific in its association with brain age gap. The absence of significant loci in the male-only sample suggests potential statistical power issues. Our findings emphasise the importance of conducting further, large-scale studies investigating sex-specific effects or gene-sex interactions in the genetic determinants of brain age gap.

Further, we ran and annotated a GWAS on estradiol levels in females from the UKB to avoid sample overlap. We identified two significant genomic loci, of which one (on chromosome 17) has been previously reported by a study on estradiol levels in females in the UKB24. This locus is near the sex-hormone binding globulin (SHBG) gene, which is closely linked to concentrations of SHBG, a glycoprotein that binds to estradiol and testosterone23,24,32. The independent significant SNPs identified in our GWAS have been linked to various related traits, including SHBG levels and testosterone levels (see Supplementary Table 15). In addition, the present study identified a significant locus on chromosome 19 that has been associated with white blood cell count and height, but not estradiol levels (see NHGRI-EBI GWAS Catalogue32). Differences between the present study and previous studies may partly be traced back to differences in covariates and sample selection. The GWAS of the present study might be limited by the included covariates. For instance, while we controlled for menopausal status as well as history of oral contraceptive use, HRT use, hysterectomy, and oophorectomy, menstrual cycle phase was not considered due to the inclusion of pre- and postmenopausal females, potentially influencing the findings.

As a sensitivity analysis, we conducted a GWAS on binary estradiol levels (below or above detection limit), which had a substantially larger sample size. Similar to Haas et al.24, we identified different significant loci in the binary estradiol GWAS, suggesting that different traits might be measured using these different approaches. Issues concerning estradiol measurements in the UKB have been previously discussed, including a potential bias towards the detection of loci associated with menopause due to the age of the participants and the substantial number of measurements below the detection limit23,24,33. This may influence the findings of the present study, as the detected loci of the GWAS, as well as the selected instruments for estradiol levels, may be associated with related traits. This raises the possibility of confounding from horizontal pleiotropy15, where the genetic instruments influence the outcome through pathways other than the exposure (estradiol), such as via menopausal status or SHBG levels. Many large-scale databases, including the UKB, are conducted on aging cohorts and are confounded by survivor and healthy volunteer biases34. Therefore, we conducted sensitivity analyses across different samples and approaches. Nevertheless, the present study is limited by the available data on estradiol levels and further highlights the need for large-scale, precise measurements conducted in diverse samples and age groups under consideration of female-specific variables – a data gap which has been repeatedly identified35.

We did not find support for our hypotheses, which were largely based on findings from observational studies5,6,7,9. Some similar patterns have been found in previous Mendelian randomization studies, for example, showing no causal associations between estradiol levels and depression20, age at menopause and depression22, and age at menopause and age at menarche and risk for Alzheimer’s disease18. Taken together with the results of these previous Mendelian randomization studies, our findings suggest that estradiol exposure may not directly influence brain and mental health, thus highlighting the need for careful consideration of possible confounding factors and reverse causation when interpreting observational studies. However, it must be mentioned that some of the additionally analysed exposure variables, such as HRT use and oral contraceptive use, might not have a substantial genetic component, potentially leading to null findings in the present study. Furthermore, non-linear and age-dependent effects, as have been suggested for associations of the number of childbirths and HRT use with brain health7,36, should be investigated. For instance, previous studies highlight the importance of timing, duration, and formulation of HRT use36. These aspects were not captured by the variable used for HRT use in the present study, possibly resulting in null findings. Further, the GWAS of the history of hysterectomy did not take important influencing factors into account, such as the medical reason for the procedure, the timing of the procedure (i.e., before or after menopause), and whether an oophorectomy also took place. These limitations can be traced back to the use of GWAS from consortia, which have advanced research through publicly available, large-scale datasets, however, are often designed in a standardised matter across sexes and may not consider potentially relevant covariates for specific variables or samples. Future research is needed to disentangle genetic and environmental contributions to these variables and further assess their causal associations with brain and mental health.

Moreover, the generalisability of the present findings is limited due to the restriction of the analyses to White European samples, which is a result of the limited availability of diverse large-scale datasets. Females from different ethnic groups have been suggested to differ in their hormonal profiles37, thus the present findings should be replicated across diverse samples. Furthermore, it is important to note that while we set a strict threshold for selecting instrumental variables in the Mendelian randomization analyses to satisfy the relevance assumption, the exchangeability and exclusion restriction assumptions cannot be verified and may limit the validity of the results15. Although we conducted a variety of sensitivity analyses and reduced sample overlap, pleiotropic effects and confounding of the genetic variant-outcome association remain possible, for example, arising from population stratification or assortative mating15. This is especially important to note for the analyses that exhibited significant heterogeneity, such as the analysis with age at menarche as an exposure and depression as an outcome, or weak instrument bias, such as estradiol levels in the multivariable analyses including BMI as an additional exposure. Further, power issues might be present in the analyses with few instrumental variables, especially in the sensitivity analyses using methods with lower power, such as MR-Egger, and in the presence of weak instrument bias38.

Previous Mendelian randomization studies have reported a causal link between a younger age at menarche and an increased risk for depression in adolescents and adults16,17,19,21,22. In the multivariable Mendelian randomization analyses of the present study, the attenuation of the significant effect when including BMI suggests potential pleiotropy, whereby the genetic variants included in the analysis with age at menarche influence depression through BMI. BMI is known to be linked to an earlier onset of puberty and, in line with the present study, has previously been highlighted as a relevant confounder in the causal relationship between a younger age at menarche and depression16,19. Previous studies have also discussed possible age-dependent effects of age at menarche on depression16,19, warranting exploration to explain inconsistent findings.

The female lifespan is marked by fluctuations in estradiol levels throughout the menstrual cycle, the peripartum period, and the perimenopausal period, and these fluctuations have repeatedly been linked to brain and mental health, largely in observational studies3. For instance, brain age gap has been found to vary across the menstrual cycle, with higher estradiol levels linked to a younger brain age gap39 and estradiol fluctuations have been linked with changes in brain regions that are implicated in depression and Alzheimer’s disease40. The emergence of female-specific subtypes of depression, including premenstrual dysphoric disorder, peripartum depression, and perimenopausal depression, follow a similar trajectory to hormonal changes3. For instance, increased brain plasticity during the peripartum period has been linked with changes in estradiol levels and an increased vulnerability to mental disorders41. Furthermore, progesterone, as another prevalent sex steroid in females, has been found to have varying effects on mood, with suggested protective effects in the peripartum period and possible symptom-provoking effects in premenstrual dysphoric disorder42. Future studies should use the present framework to further explore the role of other sex steroids on brain and mental health in females. Together, these studies suggest that sex steroids differently modulate females’ brain and mental health across the lifespan, and that hormonal fluctuations rather than absolute hormone levels might be pivotal in this dynamic.

We did not find significant effects of estradiol levels on brain and mental health in males, who are known to experience substantially lower estradiol fluctuations, further supporting the absence of a constant effect of estradiol. If the effect of estradiol on brain health and mental health is phasic and not constant, this might have influenced the current results and contributed to the non-significant findings. Due to the methods used, the present study was unable to examine the potential effects of estradiol fluctuations or susceptible periods to estradiol exposure throughout the female lifespan. Therefore, we used the phrase lifetime estradiol exposure. However, the sample used for our main analysis on estradiol levels consisted predominantly of postmenopausal females, thus likely not capturing effects across the full lifespan. Ideally, to capture time-varying effects, multivariable Mendelian randomization should be applied using GWAS summary statistics across multiple time points43. For instance, estradiol levels should be assessed repeatedly across the female lifespan. However, this approach is difficult to implement due to the limited availability of the required data. Nevertheless, a previous study has shown age-specific effects of genetic associations of age at menopause and revealed different magnitudes of causal effects dependent on the age group44, emphasising that the genetic underpinnings of female reproductive factors vary across the lifespan. Thus, future studies are needed that assess estradiol levels in females across multiple timepoints – across the menstrual cycle, across pregnancy, or across the lifespan – and that examine these effects using combined sex-specific and causal approaches.

While the non-significant associations between estradiol levels and brain age gap, depression, and Alzheimer’s disease found in the present study suggest that estradiol might not have a simple direct influence on brain health and mental health in females, associations are complex and require investigation using a more parcellated lifespan approach. Future research should consider the causal effects of hormonal fluctuations and periods of increased susceptibility to sex steroids. Further, sex-specific genetic data is needed to disentangle the complex relationships underlying sex differences in brain health and mental health and to ultimately aid the advancement of individualised healthcare.

Methods

Samples and data sources

Available summary statistics

Summary statistics from GWAS, including only White European female samples, were used for all variables, except for sensitivity analyses conducted in White European males. Samples were selected to minimise overlap between the exposure and outcome samples, while allowing for the largest possible sample sizes. No statistical method was used to predetermine sample size. Main exposure variables of interest included estradiol levels and factors related to lifetime estradiol exposure (i.e., reproductive span, age at menarche, age at menopause, and number of childbirths). The main outcome variables of interest were brain age gap, Alzheimer’s disease, and depression.

To ensure robustness to effects of menopausal status previously shown to impact GWAS results24, several datasets were included for estradiol levels. The following samples from the UKB were included: a combined sample of pre- and postmenopausal females, a sample of only premenopausal females, and a sample of only postmenopausal females24. Main analyses in each sample were performed using GWAS of continuous estradiol levels. Sensitivity analyses were conducted in each sample using a binary approach, examining estradiol levels above and below the detection limit24 to increase sample sizes. Further, analyses on continuous estradiol levels were replicated in an independent sample of postmenopausal females from the LIFE-Adult and LIFE-Heart studies45 and a male-only sample from the UKB46.

For Alzheimer’s disease47 as an outcome, a larger GWAS for age at menopause was used that was not limited to natural menopause48 and a sensitivity analysis was conducted using age at natural menopause49. To avoid sample overlap, analyses using age at menarche50 and age at natural menopause49 as exposures and brain age gap and depression51 as outcomes were conducted using GWAS summary statistics from a smaller, independent sample. Supplementary analyses were conducted using factors related to exogenous hormone use and health-related procedures likely impacting estradiol levels, including oral contraceptive use, HRT use, history of hysterectomy, and history of oophorectomy52 as exposure variables. As the GWAS on depression51 included both single episode and recurrent depression, sensitivity analyses were conducted using recurrent depression53 as an outcome variable to check whether disease burden influenced the results found. Further, as some of the analyses using depression51 as an outcome had sample overlap, these analyses were replicated using a subsample of the depression GWAS that did not include the UKB sample53. BMI54 was included in multivariable sensitivity analyses, to account for potential pleiotropic effects16,17,21.

For estradiol levels in the independent replication sample45, estradiol levels in the male-only sample46, age at menarche48,50, age at (natural) menopause48,49, number of childbirths, oral contraceptive use, HRT use, history of hysterectomy, history of oophorectomy52, and BMI54 summary statistics were publicly available in data repositories or obtained from the IEU OpenGWAS project52. For estradiol levels in the premenopausal sample and in the postmenopausal sample24, reproductive span9, Alzheimer’s disease47 in the female-only and male-only samples, depression51, the depression subsample excluding UKB53 in the female-only and male-only samples, and recurrent depression53 summary statistics were requested and received from the respective authors. For depression in the male-only sample, a GWAS meta-analysis was run in line with the female-only GWAS51 including data from the UKB10, MoBa25, and a previously conducted GWAS, received from the authors upon request53 (see Supplementary Notes 1, 2, and 5 for details). See Table 3 and Supplementary Table 1 for a complete overview of the samples and datasets included in the present study.

Table 3 Included Datasets and Samples

UKB samples

As, to the best of our knowledge, no previously conducted sex-stratified GWAS on brain age gap existed, we ran separate GWAS on female- and male-only samples using data from the UKB. Further, to avoid sample overlap in the analyses using estradiol levels in the combined sample of pre- and postmenopausal females, we ran a female-specific GWAS using data from the UKB. The UKB complies with the Helsinki Declaration, with informed consent obtained from all participants. For the male-only GWAS on brain age gap, only genetic males (XY chromosome) with White British ancestry were included. For the female-only GWAS on brain age gap and estradiol levels, only genetic females (XX chromosome) with White British ancestry were included. For both GWAS on brain age gap, only participants with both T1-weighted and diffusion-weighted MRI data were included, and participants with ICD-10 diagnoses known to impact brain health were excluded (see Supplementary Note 1 for details on exclusions). For the GWAS on estradiol levels, participants with serum blood sample data from their first visit to the UKB assessment centre (2006 – 2010) were included. To avoid sample overlap, participants were excluded from the estradiol levels sample if they were included in the brain age gap sample. Further, participants who answered, “prefer not to answer” or “do not know/not sure” on any of the covariates included in the GWAS on continuous estradiol levels were excluded (Supplementary Note 1).

Sample overlap

Sample overlap was avoided for all analyses using Alzheimer’s disease and recurrent depression as outcomes, as well as the sensitivity analyses in the depression53 subsample, excluding UKB. Sample overlap with brain age gap and depression as outcomes was avoided for age at menarche and age at menopause as exposures by using datasets that did not include the UKB sample. Further, sample overlap with brain age gap as an outcome was avoided for estradiol levels (combined pre- and postmenopausal samples) and reproductive span as exposures by excluding the MRI sample from the UKB. However, sample overlap could not be avoided for the analyses using continuous estradiol levels in the combined pre- and postmenopausal sample and reproductive span as exposures and depression as an outcome (maximum sample overlap: 10.53% for estradiol levels and 36.91% for reproductive span). Further, there was sample overlap for continuous estradiol levels in the premenopausal sample and in the postmenopausal sample with brain age gap and depression (maximum sample overlap: 43.25% for premenopausal females with brain age gap as an outcome; 10.03% for premenopausal females with depression as an outcome; 26.31% for postmenopausal females with brain age gap as an outcome; 1.14% for postmenopausal females with depression as an outcome). For the number of childbirths, there was a maximum sample overlap of 5.70% with brain age gap as an outcome and 82.19% with depression as an outcome. Further, sample overlap could not be avoided for the supplementary analyses using factors related to exogenous hormone use and health-related procedures and the sensitivity analyses using estradiol levels in males with brain age gap and depression as outcomes, as well as using binary estradiol levels with depression as an outcome (see Supplementary Table 2).

Brain age gap prediction

Grey matter and white matter MRI features were derived from T1-weighted and diffusion-weighted images, respectively, and used for multimodal brain age prediction (see Alfaro-Almagro et al.55 and Miller et al.56 for details on MRI data acquisition and protocols in the UKB). Automated surface-based morphometry and subcortical segmentation pipelines in FreeSurfer v5.357 were used to process raw T1-weighted images. The standard set of FreeSurfer-derived subcortical and cortical summary statistics57 were further supplemented by a fine-grained cortical parcellation scheme58 to extract cortical thickness, area, and volume for 180 regions of interest per hemisphere. Outliers based on Euler numbers more than 4 SD above the mean were excluded, to avoid poor-quality data likely arising from movement59. The data from FreeSurfer was controlled for scanner site and intracranial volumes using linear models. Diffusion-weighted data were processed using an optimised diffusion pipeline60,61,62 and similarly controlled for scanner site using linear models. The diffusion-weighted data passed TBSS post-processing quality control through the YTTRIUM algorithm60,61,62. In total, 1,118 T1-weighted features and 912 diffusion-weighted features were included in the brain age prediction model.

Python (v3.7.4) was used for the brain age prediction. In line with previous studies on brain age gap in females7,8,9, brain age was computed using the XGBoost (eXtreme Gradient Boosting) regression model, based on a decision-tree ensemble algorithm63. Hyperparameters were tuned using a nested cross-validation with 5 inner folds for randomised search and 10 outer folds for validation of the model (see general model setup; see Supplementary Table 3 for model performance metrics). Brain age was estimated for each individual, and brain age gap was calculated by subtracting chronological age from the brain age prediction11. A final sample of N = 14,287 females was included in the brain age gap calculations and the subsequently conducted GWAS (age range: 45.13 – 81.83 years; mean = 63.56, SD = 7.27; see Supplementary Note 1 for details on the male brain age gap sample).

Estradiol levels

In the UKB, estradiol levels were measured using two-step competitive analysis on a Beckman Coulter Unicel Dxl 800 with a minimum detection limit of 175 pmol/L (for details see UKB Data Field 30800). A majority of the participants in the UKB have estradiol levels below this limit, likely due to the advanced age of the cohort. The GWAS on estradiol levels using the continuous approach included females with measurements above the detection limit (N = 34,697; age range: 40.16 – 70.17 years; mean = 48.30, SD = 5.96). As the estradiol levels were positively skewed, we performed an inverse rank normalisation on the data. Following previous studies23,24, a second GWAS was conducted using a binary approach which classified all females (N = 207,119) as either below (0) or above (1) the detection limit to increase the sample size (Supplementary Note 1).

GWAS procedure

The UKB v3 imputed genetic data was used, which has been genotyped, extensively quality controlled, and imputed by the UKB genetics team (see Bycroft et al.64 for details). PLINK 2.065 was used to run the GWAS. We performed standard quality check procedures, by excluding individuals with more than 10% missingness and setting the minor allele frequency (MAF) threshold at .005. We filtered out single nucleotide polymorphisms (SNPs) with more than 5% missingness as well as those out of the Hardy-Weinberg equilibrium (HWE) at p < 1 x 10−9. GWAS were run using the standard additive model of linear associations for the continuous variables (i.e., brain age gap and continuous estradiol levels) and logistic associations for the binary variable (i.e., binary estradiol levels). All analyses were controlled for the age of the participants at the time of measurement and the first 20 principal components to account for population structure. The GWAS on continuous estradiol levels was further controlled for menopausal status (premenopausal/postmenopausal), history of oral contraceptive use (yes/no), history of HRT use (yes/no), history of bilateral oophorectomy (yes/no), and history of hysterectomy (yes/no), as these variables are known to influence estradiol levels.

Post-GWAS annotations

For annotations of the GWAS results, Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA) was used, applying the SNP2GENE function66. GWAS results were clumped using the 1000 Genomes projects phase 3 European dataset (1KG/Phase3) linkage disequilibrium (LD) structure with a maximum p-value of lead SNPs < 5 × 10−8, a maximum p-value cutoff < 0.05, an R2 threshold ≥ .6 to define independent significant SNPs, an R2 threshold ≥ .1 to define lead SNPs, and 250 kilo-bases (kb) as the maximum distance to merge LD blocks into a locus66. Positional mapping was performed using a maximum distance of 10 kb. Independent significant SNPs for each genomic region were identified, along with lead SNPs defined as SNPs with the smallest p-value for a genomic region. For estradiol levels, independent significant SNPs were linked to traits as reported in the GWAS Catalogue32 to facilitate investigation of potential pleiotropy of these variables used as exposures in the Mendelian randomization analyses. Using the GENE2FUNC function implemented by FUMA, the expression levels of the mapped genes were investigated for each of the GTEx v8 54 tissue types67 and hypergeometric tests were performed for gene-set enrichment analyses, testing whether genes are overrepresented in any of the pre-defined gene sets and their associations with various phenotypes66.

Two-sample Mendelian randomization analyses

Two-sample Mendelian randomization analyses were performed using R (v4.3.1). The main steps of the analyses are visualised in Fig. 4. All p-values for the IVW estimates were adjusted for multiple comparisons across all analyses (91 comparisons), including the supplementary, sensitivity, and follow-up analyses, using false discovery rate (FDR; 5%) correction68. The present study was preregistered on Open Science Framework (OSF; see Supplementary Note 8 for deviations from the preregistration).

Fig. 4: Analysis steps and conducted Mendelian randomization analyses.
Fig. 4: Analysis steps and conducted Mendelian randomization analyses.The alternative text for this image may have been generated using AI.
Full size image

a Analysis steps. 1: Example Manhattan plot of a genome-wide association study (GWAS) for visualisation purposes. The blue/lower and red/upper lines indicate the suggestive (p < 5 × 10−5) and the genome-wide significance (p < 5 × 10−8; i.e., accounting for multiple comparisons) thresholds, respectively. The instrumental variables included in a Mendelian randomization analysis are assumed to be associated with the respective exposure variable (relevance assumption), not be associated with the outcome variable via confounding pathways (exchangeability assumption), and not affect the outcome variable in any way other than through the exposure variable (exclusion restriction assumption)15. *Requested from respective authors or openly available. 2: To satisfy the relevance assumption, the genome-wide significance threshold (p-value cut-off) was chosen for the selection of instrumental variables, when possible. **Relaxed in the case of too few available instrumental variables (see Supplementary Note 3 and Supplementary Table 4 for details). 3: Example data of exposure and outcome GWAS summary statistics for visualisation purposes. b refers to the beta-estimate of the GWAS for each single-nucleotide polymorphism (SNP). 4: Example plot of two-sample Mendelian randomization analyses for visualisation purposes. Each point represents one SNP included as an instrumental variable. Data are presented as the beta value and the confidence interval (+/− standard error) of the effect of a SNP on the exposure on the x-axis and the effect of a SNP on the outcome on the y-axis. Analyses were conducted using the inverse variance weighted method as a main estimation method (one-sided test). As the exchangeability and exclusion restriction assumptions are not verifiable15, sensitivity analyses were conducted to assess biases arising from potential violations. b Conducted analyses. ***Follow-up analysis with depression as an outcome. LD Linkage disequilibrium. HRT Hormone replacement therapy. BMI Body mass index. UKB UK Biobank.

Selection of instrumental variables

The instrumental variables included in a Mendelian randomization analysis are assumed to be associated with the respective exposure variable (relevance assumption), not be associated with the outcome variable via confounding pathways (exchangeability assumption), and not affect the outcome variable in any way other than through the exposure variable (exclusion restriction assumption)15. The relevance assumption can be verified, while the exchangeability and exclusion restriction assumptions are not verifiable15, however, a set of sensitivity analyses, including the use of robust methods and multivariable Mendelian randomization, were conducted to assess biases arising from potential violations.

To satisfy the relevance assumption, instrumental variables were selected by applying a genome-wide significance threshold of p < 5 × 10−8 (p < 5 × 10−7 or p < 5 × 10−6 in the case of few available SNPs; see Supplementary Note 3 for details on the selection of instrumental variables for each analysis) to identify SNPs associated with the respective exposure variables. Using the TwoSampleMR package (v0.5.11)69, which makes use of data from the 1000 Genomes project, significant SNPs in LD were pruned to include only the most significant SNP, and the exposure and outcome datasets were harmonised.

First-stage F-statistics were approximated for all instrumental variables as a measure of instrument strength (F < 10 indicating potential weak instrument bias), using the MendelianRandomization package (v0.9.0)70. The instrumental variables did not indicate weak instrument bias for any of the main analyses. For the sensitivity analyses, weak instrument bias was only indicated for the instruments used for the number of childbirths as an exposure with the depression53 subsample excluding UKB as an outcome (F = 0.80) and with recurrent depression as an outcome (F = 0.70; Supplementary Table 4). Furthermore, Cochran’s Q was computed as a measure of heterogeneity, which assesses whether the instrumental variables underlie the same causal parameter and can thereby be used as an indicator of potential pleiotropy (i.e., instrumental variables affect the outcome through other biological pathways) when significant15.

Univariable analyses

Univariable two-sample Mendelian randomization analyses were conducted using the GWAS summary statistics. Univariable analyses were performed between all separate exposure variables and all separate outcome variables (Fig. 4). Odds ratios were log-transformed before conducting analyses. Analyses were performed using the TwoSampleMR package (v0.5.11)69. For the main analyses, the IVW method was applied, as it maximises statistical power15. Results were compared to more robust Mendelian randomization approaches, including MR-Egger, which allows for directional pleiotropy, as well as the weighted median, simple mode, and weighted mode estimation methods, which are more robust to outliers and make varying assumptions regarding the validity of the instrumental variables (see Burgess and Thompson, 202115 for details). Using the MRlap package (v0.0.3.2)71, MRlap was performed for analyses with sample overlap. MRlap corrects for potential biases arising from sample overlap, weak instruments, and winner’s curse (i.e., the overestimation of effects in discovery GWAS)71.

Multivariable analyses

Multivariable Mendelian randomization analyses were conducted, including continuous estradiol levels in the combined pre- and postmenopausal sample and BMI as exposure variables with brain age gap, Alzheimer’s disease, and depression as outcomes. Sensitivity analyses were conducted using the binary estradiol approach and, to avoid sample overlap, using the depression subsample excluding UKB. Furthermore, based on our univariable findings and following previous studies16,17,21, follow-up multivariable analyses were conducted, including age at menarche and BMI as exposures with depression, the depression subsample excluding UKB, and recurrent depression as outcomes. Using the TwoSampleMR package (v0.5.11)69, instrumental variables were extracted from the exposure datasets, the exposures and the outcome were harmonised, and multivariable IVW analyses were run. Conditional F-statistics for instrument strength were computed, and the multivariable MR-Egger method was applied to check the robustness of the multivariable Mendelian randomization results using the MVMR package (v0.1)72. The instrumental variables for the multivariable analyses with estradiol levels and BMI as exposures indicated no weak instrument bias for BMI, however, possible weak instrument bias for estradiol levels (brain age gap and Alzheimer’s disease as outcomes: conditional F = 1.92 for the continuous approach and 8.87 for the binary approach in the combined pre- and postmenopausal sample from the UKB; depression as an outcome: conditional F = 1.93 for the continuous approach and 9.16 for the binary approach in the combined pre- and postmenopausal sample from the UKB; depression subsample excluding UKB as an outcome: conditional F = 0.99 for the continuous approach and 8.49 for the binary approach in the combined pre- and postmenopausal sample from the UKB; see also Supplementary Note 3 and Supplementary Table 4). The instrumental variables included in the multivariable analyses of age at menarche and BMI as exposures did not exhibit weak instrument bias, except for possible weak instrument bias for BMI in the analysis with the depression subsample excluding UKB as an outcome (conditional F = 9.97).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.