Introduction

Understanding the causal mechanisms behind diseases is a core challenge in clinical research. While randomized controlled trials are considered the gold standard for establishing causality, they can be impractical in certain situations. As a result, Mendelian Randomization (MR) has become increasingly popular as an alternative. MR leverages genetic variations as a form of natural experiment, effectively reducing the influence of unmeasured environmental confounders in epidemiological studies1,2.

Traditional MR analyses often rely on cross-sectional designs, treating risk factors as constant over time. However, many heritable risk factors vary across the lifespan and may have time-specific effects on disease outcomes3,4. Ignoring this temporal aspect can lead to simplistic and misleading conclusions5. For example, while ecological studies in epidemiology show that vitamin D levels during childhood are associated with multiple sclerosis risk, standard MR approaches, which focus on adult vitamin D levels, have instead pointed to adult vitamin D as the key factor in disease etiology6.

In response to these limitations, life-course MR has emerged as a framework that considers how risk factors measured across an individual’s lifetime influence later-life outcomes7. Though the idea of incorporating life-long information is appealing, performing life-course MR can be challenging due to the small cohort size in Genome-wide Association Studies (GWAS) of earlier life traits and high auto-correlations of risk factors over time. One commonly used approach is multivariable MR (MVMR), such as MVMR-IVW8, which estimates the direct causal effects of the risk factor at each time point. However, MVMR will have limited efficiency and reliability on noisy GWAS data with a small cohort size. Additionally, current MVMR techniques struggle to evaluate indirect causal effects of early-life risk factors that influence outcomes through intermediate factors. Other methods, such as g-estimations of structural mean models4,9 or functional principal component analysis that aggregates effects across time points10, offer some alternative solutions but require access to individual-level GWAS data that are often not available on a large scale.

To address these challenges, we propose FLOW-MR (Framework for Life-cOurse pathWay analysis using Mendelian Randomization), a computational method that estimates a full system of structural equations for temporally ordered heritable traits using only GWAS summary statistics. FLOW-MR enables comprehensive causal mediation analysis, allowing for the decomposition of total causal effects into direct, indirect, and path-specific components. As a longitudinal extension of GRAPPLE11, FLOW-MR introduces a novel sufficient condition for causal identifiability that generalizes the well-known InSIDE assumption12 to longitudinal settings, thereby accommodating both pervasive horizontal pleiotropy and unmeasured mediator-outcome confounding. FLOW-MR can be viewed as a Bayesian analogue of the full-information maximum likelihood (FIML) estimator commonly used in structural equation modeling in economics and social sciences13, with the advantage of requiring only summary data. Its hierarchical Bayesian framework allows for flexible modeling of weak and invalid instruments and captures complex interactions between genetic variants and environmental factors.

When benchmarked against existing MVMR methods, FLOW-MR substantially improves the efficiency and robustness of the statistical inference, especially when the GWAS data are noisy and from smaller cohorts. By incorporating a spike-and-slab prior for genetic effects, FLOW-MR can tackle the different polygenic patterns of complex traits and mitigate weak instrument bias. Applying FLOW-MR to publicly available GWAS summary statistics, we identify a protective effect of Body Mass Index (BMI) during childhood on breast cancer risk, confined to a specific developmental period. Our analysis further reveals a complex interplay between BMI, systolic blood pressure (SBP), and low-density cholesterol levels across different life stages and their effects on stroke risk. Our findings suggest that adulthood SBP may be the only factor with a direct causal effect on stroke, while the previously identified causal effect of BMI on SBP14,15 in adulthood may be attributable to confounding by childhood SBP.

Results

Method overview

Consider a sequence of risk factors (X1X2, …, XK−1) in temporal order, so that Xi precedes Xj when i < j. Our goal is to understand the causal relationship between these traits and their effects on a later outcome Y (also denoted as XK for convenience), typically some disease status measured in adulthood. Causal effects among the traits must adhere to the temporal sequence (i.e., only an earlier trait can causally influence a later trait) (Fig. 1a). We allow the genetic variants Z and the unmeasured environmental confounders U to have direct causal effects on any trait at any time point. Associated with this directed acyclic graph (DAG), we assume a series of structural equations for each individual (see Methods).

Fig. 1: Model overview.
Fig. 1: Model overview.
Full size image

a X1, …, XK−1 are exposure traits in increasing temporal order and Y is the outcome trait. The red arrows indicate the causal effects of the genetic variants Z on the traits. The blue arrows represent the effects of unmeasured non-heritable confounders U. b Illustration of causal relationships across traits, allowing for multiple traits at each time point.

FLOW-MR requires only GWAS summary statistics for each trait, achieved by projecting the individual-level structural equations onto each single SNP Zj. Specifically, let γkj be the marginal associations between a SNP j and trait Xk, we then have the following linear models:

$${\gamma }_{kj}={\sum }_{l=1}^{k-1}{\beta }_{kl}{\gamma }_{lj}+{\alpha }_{kj},\,\,k=1,\ldots,K.$$
(1)

Here, βkl is our parameter of interest, representing the direct causal effect of an earlier trait Xl on a later trait Xk. The term αkj can be viewed as the direct genetic effect of SNP j on trait k. If SNP Zj is used as an instrument for trait Xk, then αkj corresponds to the SNP-trait link used for instrumental variable analysis. In contrast, for \({k}^{{\prime} }\ne k\), \({\alpha }_{{k}^{{\prime} }j}\) represents pleiotropic effects of SNP Zj on trait \({X}_{{k}^{{\prime} }}\).

GWAS summary statistics provide estimates of the marginal associations γkj with their standard errors. They can be obtained either from linear regressions or, for binary traits, from logistic regressions, following16 under the liability threshold model (Supplementary Fig. S1, SI text). For each SNP Zj and trait Xk, we assume \({\hat{\gamma }}_{kj} \sim {{{\mathcal{N}}}}({\gamma }_{kj},{\delta }_{kj}^{2})\), where \({\hat{\gamma }}_{kj}\) is the observed estimate and \({\delta }_{kj}^{2}\) is the estimated variance. Our approach accommodates sample overlap across traits, thus allowing for correlations among \({\hat{\gamma }}_{kj}\) across different traits k. Following previous MR methods11,17, we can adaptively estimate this correlation matrix from the GWAS summary data.

Notice that, unlike marginal associations γkj, the direct genetic effects αkj are not observed from the data. Existing literature shows that most SNPs may have non-zero but weak effects on complex traits, while a small subset may be responsible for the core biological process and strongly impact the traits11. Therefore, FLOW-MR assumes the following spike-and-slab distributional assumptions on αkj:

$${{{{\mathbf{\alpha }}}}}_{kj}| \,{p}_{k},{\sigma }_{k0}^{2},{\sigma }_{k1}^{2}\quad { \sim }^{{{{\rm{ind}}}}}\quad (1-{p}_{k}){{{\mathcal{N}}}}\left(0,{\sigma }_{k0}^{2}\right)+{p}_{k}{{{\mathcal{N}}}}\left(0,{\sigma }_{k1}^{2}\right),\quad k=1,2,\ldots,K,$$

which allows for trait-specific effect sizes. To estimate the model, we use a hierarchical Bayesian framework and Gibbs sampler. The Gibbs sampler enables the convenient use of posterior samples to construct credible intervals for any function of the direct effects \({\{{\beta }_{kl}\}}_{k\ge l}\), including both direct and indirect effects among the traits, as well as the proportions of these effects relative to the total effects.

To address known confounders, we further extend our model to accommodate multiple traits at any given time point (Fig. 1b). Specifically, at each time point k, we assume that there are nk ≥ 1 exposures, denoted as \({X}_{k1},\ldots,{X}_{k{n}_{k}}\). To facilitate the MR analysis, we assume that no causal relationship exists between traits measured at the same time point. If a known causal direction exists between two traits measured at the same time point k, our method can be still be applied by introducing a pseudo-time point (or stage) k + 1 following k, where the outcome trait of the two traits at time k is moved to stage k + 1. More details on the model, estimation, inference procedures, and key assumptions are provided in Methods and SI Text.

Identifiability of direct and indirect causal effects

Mediation analysis, as conducted in this study, is inherently vulnerable to unmeasured mediator-outcome confounding even when the treatment is fully randomized18. As illustrated in Fig. 2, with two exposures and an outcome (K = 3), if we rely solely on a set of valid genetic instruments for the first risk factor X1 (Fig. 2a), it is not possible to separate the direct and indirect effects of X1 due to unmeasured mediator-outcome confounding between X2 and Y (see SI Text for a counter-example). In contrast, if each risk factor has its own set of valid instruments (Fig. 2b), it becomes feasible to identify all direct and indirect causal effects. However, since the sequence of risk factors often represents the same factor measured at different time points, finding SNPs that exert their effects exclusively at a specific time point can be challenging.

Fig. 2: Identifiability of the direct and indirect causal effects between traits under three scenarios.
Fig. 2: Identifiability of the direct and indirect causal effects between traits under three scenarios.
Full size image

a Only X1 has valid instruments and the direct and indirect effects are not identifiable. b All exposures have valid instruments, and both the direct and indirect effects are identifiable. c None of the exposures have valid instruments, but the effects remain identifiable under an independence assumption.

FLOW-MR requires a more realistic assumption to account for pleiotropic effects of the SNPs (Fig. 2c). Instead of assigning each risk factor its own set of valid instruments, we allow SNPs to have nonzero direct effects on all risk factors–thereby permitting any given SNP to be an invalid instrument for a particular trait. However, we require that the direct genetic effects \({\alpha }_{1}={({\alpha }_{1j})}_{j=1}^{P},\ldots,{\alpha }_{K}={({\alpha }_{1j})}_{j=1}^{P}\) be orthogonal across traits (see Methods). More generally, if the number of SNPs P is large, it suffices for these direct effects to be independent rather than exactly orthogonal, extending the well-known InSIDE assumption12 to multiple traits. Notice that we only assume independence among αkj, allowing the marginal associations γkj to remain possibly correlated across traits. This assumption enables FLOW-MR to address both pleiotropic effects and unmeasured mediator-outcome confounding. A formal mathematical statement of the identification conditions and its proof are provided in the SI Text.

Systematic benchmarking with synthetic GWAS summary statistics

We compare FLOW-MR with alternative MVMR approaches to perform life-course MR. Specifically, we benchmark our method against six methods, including MVMR-IVW8, MVMR-Egger19, MVMR-Robust20, GRAPPLE11, MVMR-Horse21 and MVMR-cML22. Among these methods, MVMR-IVW is the most widely used in practice, and other methods are designed to deal with pleiotropic effects under different assumptions and weak instruments. To apply an MVMR method, we use a multiple-step approach where at each step k, we estimate the direct causal effects of X1,  , Xk−1 on Xk. One limitation of this approach is the challenge of providing reliable statistical inference on indirect and pathwise effects, as estimates across different steps are correlated.

To assess performance, we generate synthetic GWAS summary datasets by soft-thresholding real GWAS summary statistics, allowing the true direct genetic effects to be exactly 0 for some SNPs. Specifically, if we observe \({\hat{\gamma }}_{kj}^{{{{\rm{real}}}}}\) with variance \({\hat{\delta }}_{kj}^{2}\) for a particular SNP j in a real GWAS dataset, we set \({\alpha }_{kj}=\,{\mbox{sign}}\,({\hat{\gamma }}_{kj}^{{{{\rm{real}}}}}){(| {\hat{\gamma }}_{kj}^{{{{\rm{real}}}}}| -0.01)}_{+}\) and randomly shuffle αkj across j within each trait k to ensure independence across traits.

To specify the causal relationship across the temporally ordered traits, we simulate three scenarios: K = 3, K = 4, and a multivariate case with multiple exposures at each time point (Fig. 3a). We generate synthetic data for earlier traits using childhood GWAS data to mimic real MR mediation analysis, where earlier childhood traits typically have smaller sample sizes than adulthood traits. Specifically, we use childhood BMI data from the Norwegian Mother, Father, and Child Cohort Study (MoBa)23 and childhood lipid traits from the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort24. Adult GWAS datasets are used to generate summary statistics for the remaining traits. Further details of the simulation design, along with additional sensitivity analyses and evaluation of computation costs, can be found in the SI Text. Supplementary Tables S1, S2 present the number of SNPs and conditional F-statistics25 computed at each p-value threshold as evaluations of instrument strength.

Fig. 3: Systematic benchmarking results.
Fig. 3: Systematic benchmarking results.
Full size image

a Three simulation scenarios for K = 3, K = 4 and the multivariate case. b Empirical coverage of 95% credible intervals on direct effects of X1 and X2 on Y and indirect effects of X1 over 100 repeated simulations when K = 3. The error bars are the 95% confidence intervals of the true coverage of the estimation from each method, computed from two-sided binomial test. The center points represent the empirical coverage. Number of SNPs for each p-value threshold: 10−8 (121 SNPs), 10−6 (144 SNPs), 10−4 (178 SNPs), and 0.01 (238 SNPs). c Boxplots of lengths of credible intervals over repeated 100 simulations when K = 3. Each box represents the lower and upper quartiles, with the line inside indicating the median. The whiskers extend to 1.5 times the interquartile range beyond the box. Outliers have been removed for clarity. Source data are provided with this paper.

We evaluate the coverage and average lengths of the confidence and credible intervals of the direct and indirect causal effects across different methods (Fig. 3b, c, Supplementary Figs. S2, S3). For the direct causal effects, most MVMR methods suffer from severe under-coverage, largely because of limited capabilities to handle pervasive pleiotropic effects and weak-instrument bias. Specifically, MVMR-IVW does not allow any pleiotropic effects, whereas MVMR-Robust and MVMR-cML assume a plurality of valid genetic instruments for each step–a condition that is violated in our simulations and is also unlikely to hold in real life-course MR analyses. Although MVMR-Horse and MVMR-Egger can accommodate independent and pervasive pleiotropy, they remain under-covered in most scenarios, likely due to biases introduced by (conditionally) weak instruments (Supplementary Table S2) and correlation of SNP-trait associations across risk factors. By contrast, both FLOW-MR and GRAPPLE demonstrate good coverage regardless of SNP strength. Compared to GRAPPLE, FLOW-MR has shorter intervals and offers more efficient analyses. Furthermore, for the indirect causal effects of X1, only FLOW-MR can provide reliable inference, with credible intervals showing good coverage and reasonable power.

Effect of early life body size on breast cancer

Recent studies suggest that early-life body size may protect against breast cancer26,27,28. Using MVMR-IVW,3 found that this protective effect is not mediated by adult body size, which itself does not causally influence breast cancer. However, their study relied on a recalled early-life body score from the UK Biobank as a proxy for early-life body size. As noted by the original authors, such coarse measurements can raise concerns about the accuracy of their findings.

To address this, we revisited the analysis using FLOW-MR and childhood BMI GWAS summary statistics. First, we replicate the original study3 using the same dataset: adult BMI from the UK Biobank for adult body size, recalled early-life body score from UK Biobank for early-life body size, and breast cancer data from Michailidou et al.29 for the outcome. Genetic instruments were selected at prespecified p-value thresholds from the GIANT adult BMI dataset and EGG childhood BMI dataset. FLOW-MR, alongside all MVMR methods, successfully replicates the original findings (Fig. 4a, Supplementary Fig. S4a), confirming no causal effect of adult BMI and a direct protective effect of childhood body size on breast cancer.

Fig. 4: Evaluation of the effect of body size on breast cancer at different ages.
Fig. 4: Evaluation of the effect of body size on breast cancer at different ages.
Full size image

a Two-sided 95% confidence/credible intervals of childhood body size (from UK Biobank) and adult body mass index (BMI) on breast cancer risk estimated by MVMR-IVW, GRAPPLE, and FLOW-MR (without multiple testing correction). Error bar centers indicate point estimates (same for panels (b, d, e). Number of SNPs for each p-value threshold: 10−8 (56 SNPs), 10−6 (119 SNPs), 10−4 (305 SNPs), and 0.001 (570 SNPs). b Two-sided 95% confidence/credible intervals of childhood BMI and adult BMI on breast cancer risk estimated by these three methods (without multiple testing correction). Number of SNPs for each p-value threshold: 10−8 (56 SNPs), 10−6 (118 SNPs), 10−4 (304 SNPs), and 0.001 (571 SNPs). c Estimated causal DAG using FLOW-MR at p-value threshold 0.001. The red (blue) arrows indicate significant positive (negative) direct effects at significance level α = 0.05 (two-sided, without multiple testing correction). 95% Two-sided credible intervals estimated by FLOW-MR are also provided (without multiple testing correction) for (d) all the direct effects and e indirect effects of 1-year-old BMI. SNP counts for each threshold: 10−8 (55 SNPs), 10−6 (117 SNPs), 10−4 (303 SNPs), and 0.001 (570 SNPs). Source data are provided with this paper.

However, when we replaced the early-life body size trait with GWAS data on 8-year-old BMI from MoBa23, a direct measurement of childhood body size, MVMR-IVW and MVMR-Egger unexpectedly suggested a protective effect of adult BMI on breast cancer, while childhood BMI showed no direct effect (Fig. 4b, Supplementary Fig. S4b). In contrast, GRAPPLE, MVMR-Robust and MVMR-Horse lost the power to detect any causal effects. Only FLOW-MR replicates the original conclusion that childhood BMI has a protective effect. Although 8-year-old BMI provides a more precise measurement, its use in MR is challenging due to its small sample size (3K samples in the MoBa 2019 cohort compared to half a million in the UK Biobank). This is reflected in the substantial decrease in the conditional F-statistics in the childhood dataset (Supplementary Table S2), indicating a loss of instrument strength due to the limited cohort size of the MoBa study. FLOW-MR demonstrates both efficiency and robustness in this scenario.

We further explored the impact of including 1-year-old BMI from MoBa in our analysis. Given that the GWAS data of 1-year-old BMI and 8-year-old BMI traits are from the same cohort, we incorporate the estimated noise correlation matrix (Supplementary Fig. S4c). Despite significant genetic correlations between the 1-year-old and 8-year-old BMI traits (Supplementary Fig. S4d), FLOW-MR still detected the protective effect of 8-year-old BMI on breast cancer at the p-value threshold 0.001 (Fig. 4c, d). As the direct effect of 1-year-old BMI on breast cancer is insignificant, the analysis revealed that the protective effect of body size on preventing breast cancer is likely confined to a specific period in childhood.

Additionally, we found that 1-year-old BMI directly influences 8-year-old BMI but does not have any additional direct effect on adult BMI. Interestingly, the estimated causal effects of 1-year-old BMI on 8-year-old BMI and 8-year-old BMI on adult BMI appear to be of similar magnitude (Fig. 4d, Supplementary Fig. S4e). Furthermore, FLOW-MR has the power to detect path-wise indirect causal effects, indicating a significant protective effect of 1-year-old BMI against breast cancer, as well as a positive effect on adult BMI, both of which are mediated through 8-year-old BMI (Fig. 4e).

Temporal causal relationships between high blood pressure, BMI and stroke

High blood pressure is a well-established modifiable risk factor for stroke30. Recent univariate MR studies have also suggested that adult BMI may be a potential causal risk factor for stroke and has a significant positive causal effect on high blood pressure in adulthood14,15,31. Our study aims to disentangle the complex relationships between high blood pressure, BMI, and stroke, focusing on how these causal connections evolve throughout life. We also consider potential confounders, such as lipid traits.

We analyze the causal effects of systolic blood pressure (SBP), BMI, and low-density lipoprotein cholesterol (LDL-C) in both childhood and adulthood on stroke using FLOW-MR. To capture potential causal effects of BMI on SBP, even at the same time point, we introduce additional pseudo-time points (stages) for SBP traits both in childhood and adulthood (Fig. 5a). We utilize GWAS summary statistics from the ALSPAC cohort24 for childhood traits, UK Biobank for adult BMI and SBP, and the Global Lipids Genetics Consortium for adult LDL-C32. Stroke data comes from Malik et al.33, and SNP selection is based on the GERA datasets34,35, the GIANT adult BMI dataset, and EGG childhood BMI dataset.

Fig. 5: Multivariate mediation analysis on low-density lipoprotein cholesterol (LDL-C), body mass index (BMI), systemic blood pressure (SBP) and stroke.
Fig. 5: Multivariate mediation analysis on low-density lipoprotein cholesterol (LDL-C), body mass index (BMI), systemic blood pressure (SBP) and stroke.
Full size image

a Design of the stages. Childhood traits are marked with label (C) and adulthood traits are marked with label (A). b Two-sided 95% credible intervals estimated by FLOW-MR for direct effects on adult traits LDL-C, BMI, SBP and Stroke (without multiple testing correction). The error bar centers indicate the point estimates. The p-value threshold is set to be 10−4 with 341 SNPs selected. Source data are provided with this paper.

Our analysis reveals a clear positive causal effect of adulthood SBP on stroke, with no significant direct effects from other traits, including childhood traits and adult BMI (Fig. 5b, Supplementary Fig. S5a). Interestingly, our findings challenge previous univariate MR results (refs. 14,15, Supplementary Fig. S5b), showing no evidence of a positive causal effect of adult BMI on adult SBP after accounting for childhood SBP. Further MR analyses using only these three continuous traits also suggest that the previously observed association between adult BMI and SBP may be confounded by childhood SBP (Supplementary Fig. S5c). Additionally, we observe a stronger genetic correlation between childhood SBP and adult BMI than between adult SBP and BMI (Supplementary Fig. S5d). These results imply that the impact of adulthood BMI on SBP in earlier studies may have been overstated due to unaccounted confounding factors.

Discussion

We propose FLOW-MR using GWAS summary data for life-course Mendelian Randomization, based on a full structural model for all traits. This method allows users to assess time-varying causal effects of heritable risk factors and distinguish between direct and indirect causal effects. By addressing key challenges in life-course MR—the high genetic correlation of the same trait at different ages and the limited cohort size of age-specific GWAS, FLOW-MR demonstrates superior performance in simulation studies. Unlike other methods, FLOW-MR is not prone to biases arising from using weakly associated SNPs as instruments and can efficiently integrate information across all traits.

For the identification of causal effects in life-course MR, a similar argument has been discussed in ref. 7. They show that to identify the causal direct effects of traits, genetic variants must exert different effects on each exposure in the model, and these effects must be linearly independent, meaning the true SNP-trait association matrix must have full rank. This condition is necessary but not sufficient for identifying all direct and indirect effects. For instance, in univariable MR, pleiotropic effects must adhere to specific assumptions (such as the InSIDE assumption) to ensure the identification of a risk factor’s casual effect. Merely assuming that the pleiotropic effects are not perfectly correlated with the marginal SNP-exposure associations is insufficient for identifying the causal effects. While we allow pervasive pleiotropy in our work, we assume that the direct genetic effects of SNPs are independent across traits, so there is no correlated pleiotropy between the traits. Our sensitivity analysis indicates that, although FLOW-MR models the life-course traits as a whole system, the statistical inference for a particular outcome demonstrates robustness to correlated pleiotropy between other traits (Supplementary Fig. S6). In practice, if confounding pathways are known to exist between traits with correlated pleiotropy, we recommend collecting additional GWAS data for these confounding factors and explicitly adjusting for them using FLOW-MR.

To assess the stability of estimates across different instrument selection criteria, in the paper we employed a range of p-value thresholds for selecting genetic instruments in both simulations and case studies, following the recommendations of Wang et al.11. In our simulations, the results remain broadly consistent across the various p-value thresholds. In contrast, larger discrepancies are observed in the real data case studies. This difference may be partly due to the stronger causal signals and the enforced independence of direct genetic effects across traits in the simulation design. Additionally, the number of selected SNPs varies much more substantially in the real data (Supplementary Table S1), which contributes to differences in statistical power across thresholds.

Finally, as noted by previous studies6,36, life-course MR for mediation analysis also faces several potential limitations. One such limitation is the absence of measurements at critical time points when the exposure exerts a substantial causal effect. However, if a missing time point Xk is not a hidden confounder for any pair of subsequent traits \({X}_{{k}_{1}}\) and \({X}_{{k}_{2}}\), our assumption of independent direct genetic associations remains valid, and the omission of that time point does not bias the estimation of causal effects at observed time points. Another concern is the possibility of reverse causation, where the outcome may causally influence the exposure measured at the latest time point. Although life-course MR typically benefits from the temporal ordering of traits, a challenge arises when the final exposure and outcome are measured concurrently–such as when both are assessed in adulthood. In such cases, if the selected SNPs influence the outcome primarily through their effects on the exposures, MR estimates are generally robust to reverse causality6,11,37. Our proposed framework inherits this desirable property of MR.

Methods

Structural equations on individual-level data

Denote Z as the vector containing all available genetic variants. Based on the causal DAG (Fig. 1a), we assume the following additive structural equations for each individual:

$${X}_{K}:=Y= {\sum }_{l=1}^{K-1}{\beta }_{Kl}{X}_{l}+{f}_{K}({{{\bf{U}}}},{{{\bf{Z}}}},{{{{\bf{E}}}}}_{Y}),\\ {X}_{k}= {\sum }_{l=1}^{k-1}{\beta }_{kl}{X}_{l}+{f}_{k}({{{\bf{U}}}},{{{\bf{Z}}}},{{{{\bf{E}}}}}_{{X}_{k}}),\,\,k=2,\ldots,K-1\\ {X}_1= {f}_{1}({{{\bf{U}}}},{{{\bf{Z}}}},{{{{\bf{E}}}}}_{{X}_{1}}).$$
(2)

In these equations, the functions f1( ),  , fK( ) capture the direct genetic and environmental effects on the traits, which may be nonlinear and involve arbitrary interactions. Our primary assumption in (2) is that the causal effect βkl for any earlier trait Xl(1 ≤l≤ K − 1) on any later trait Xk (l + 1 ≤ k ≤ K) is linear and homogeneous. When the outcome or risk factor traits are binary, we adopt a liability threshold model16, treating each binary trait Xk as a discretized version of an underlying continuous liability \({X}_{k}^{\star }\), that follows the additive structural equations above (Supplementary Fig. S1). In this context, our method estimates the causal relationships among the latent liabilities. We also further generalize (2) to allow for nonlinear causal effects and interactions among earlier traits, so that our linear model for summary statistics can accommodate more complex causal relationships measured at different time points. Additional details are provided in the SI Text.

Given these structural equations representing the causal relationships among the traits, each βkl represents the direct causal effect of Xl on Xk whenever l < k. A path-wise causal effect from Xl to Xk along a path \({{{\mathcal{L}}}}=({X}_{{r}_{0}},{X}_{{r}_{1}},\cdots \,,{X}_{{r}_{t-1}},{X}_{{r}_{t}})\) (where \({X}_{{r}_{0}}={X}_{l}\) and \({X}_{{r}_{t}}={X}_{k}\)) of length t is defined as

$${\beta }_{kl}^{{{{\mathcal{L}}}}}={\prod }_{i=1}^{t}{\beta }_{{r}_{i}{r}_{i-1}}.$$

The total effect sums up the effect from Xl to Xk through all directed pathways, and the indirect effect is the difference between the total effect and the direct effects38,39. In other words, the indirect effect \({\beta }_{kl}^{{{{\rm{ind}}}}}\) sums all path-wise causal effects of length t ≥ 2:

$${\beta }_{kl}^{{{{\rm{ind}}}}}={\sum }_{t=2}^{k-l}{\sum}_{{{{\mathcal{L}}}}=\{{X}_{{r}_{0}}={X}_{l},\cdots \,,{X}_{{r}_{t}}={X}_{k}\}}{\beta }_{kl}^{{{{\mathcal{L}}}}}.$$

Model on summary statistics

Denote γkj ≡ argminγVar[Xk − γZj] as the marginal association between continous trait Xk and Zj, and let \({\alpha }_{kj}\equiv {{\mbox{argmin}}}_{\alpha }{{{\rm{Var}}}}[{f}_{k}(U,{{{\bf{Z}}}},{{{{\bf{E}}}}}_{{X}_{k}})-\alpha {Z}_{j}]\) as the projection of the genetic and environmental direct effects of trait Xk onto SNP Zj, which can be heuristically understood as the direct genetic effect of SNP j on Xk. Then by projecting the structural Eq. (2) onto a single SNP Zj, we can obtain a linear model (1) with measurement error on the GWAS summary statistics (for mathematical details, see SI Text). These equations can be more conveniently expressed in matrix form as:

$${{{\mathbf{\Gamma }}}}={{{\bf{B}}}}\cdot {{{\mathbf{\Gamma }}}}+{{{\bf{A}}}},$$
(3)

where P is the number of SNPs and the matrices are defined as

$${{{\mathbf{\Gamma }}}}\equiv \left[\begin{array}{cccc}{\gamma }_{11}&{\gamma }_{12}&\ldots \,&{\gamma }_{1P}\\ {\gamma }_{21}&{\gamma }_{22}&\ldots \,&{\gamma }_{2P}\\ \vdots &\vdots &\ldots \,&\vdots \\ {\gamma }_{K1}&{\gamma }_{K2}&\ldots \,&{\gamma }_{KP}\end{array}\right],\,{{{\bf{A}}}}\equiv \left[\begin{array}{cccc}{\alpha }_{11}&{\alpha }_{12}&\ldots \,&{\alpha }_{1P}\\ {\alpha }_{21}&{\alpha }_{22}&\ldots \,&{\alpha }_{2P}\\ \vdots &\vdots &\ldots \,&\vdots \\ {\alpha }_{K1}&{\alpha }_{K2}&\ldots \,&{\alpha }_{KP}\end{array}\right],\,\\ {{{\bf{B}}}}\equiv \left[\begin{array}{ccccc}0&0&\ldots \,&0&0\\ {\beta }_{21}&0&\ldots \,&0&0\\ \vdots &\vdots &\ldots \,&\vdots &\vdots \\ {\beta }_{K1}&{\beta }_{K2}&\ldots \,&{\beta }_{K(K-1)}&0\end{array}\right].$$

Here, the matrix Γ represents the marginal SNP-trait associations, A denotes the direct effects of the SNPs on each trait and B is the matrix of direct causal relationships of earlier traits on later traits that we would like to infer. Equation (3) can be alternatively represented as

$${{{\mathbf{\Gamma }}}}=({{{\bf{I}}}}+\tilde{{{{\bf{B}}}}}){{{\bf{A}}}},\quad \tilde{{{{\bf{B}}}}}\equiv \left[\begin{array}{ccccc}0&0&\ldots \,&0&0\\ {\tilde{\beta }}_{21}&0&\ldots \,&0&0\\ \vdots &\vdots &\ldots \,&\vdots &\vdots \\ {\tilde{\beta }}_{K1}&{\tilde{\beta }}_{K2}&\ldots \,&{\tilde{\beta }}_{K(K-1)}&0\end{array}\right]$$
(4)

where I is the K × K identity matrix, and \(\tilde{{{{\bf{B}}}}}\) is the lower-triangular matrix that solves \({({{{\bf{I}}}}-{{{\bf{B}}}})}^{-1}={{{\bf{I}}}}+\tilde{{{{\bf{B}}}}}.\) Compared to (3), the parametrization in (4) avoids matrix inversion, making it more amenable for statistical estimation. It is easy to show that \(\tilde{{{{\bf{B}}}}}\) can be written as a Neumann series: \(\tilde{{{{\bf{B}}}}}={{{\bf{B}}}}+{{{{\bf{B}}}}}^{2}+\ldots \,\). Intuitively, the (kl) entry of the matrix \(\tilde{{{{\bf{B}}}}}\) denotes the total causal effects of Xl on Xk through all directed pathways40.

Since the GWAS cohort size is typically large, for each SNP Zj, the summary statistics \({\hat{{{{\mathbf{\Gamma }}}}}}_{j}\) approximately follow a normal distribution \({\hat{{{{\mathbf{\Gamma }}}}}}_{j} \sim {{{\mathcal{N}}}}({{{{\mathbf{\Gamma }}}}}_{j},{{{{\mathbf{\Sigma }}}}}_{j})\), where Γj: = (γ1jγ2j, …, γKj) is a vector of marginal associations and Σj is a covariance matrix obtained from the GWAS standard errors and a correlation matrix that depends on the extent of sample overlap and the correlation between the traits. This correlation matrix is shared across the SNPs and can be estimated from the non-statistically significant GWAS summary statistics using the method described in ref. 11 (SI Text).

Identification conditions for direct and indirect effects

We provide identification conditions for the matrix B of all direct causal effects from earlier traits to later ones. Once these direct effects are identified, the indirect effects–arising from sums and products of direct effects under our additive structural Eq. (2) become identifiable as well.

Under the relationship \(\tilde{{{{\bf{B}}}}}={\left({{{\bf{I}}}}-{{{\bf{B}}}}\right)}^{-1}-{{{\bf{I}}}}\), it suffices to identify the lower-triangular matrix \(\tilde{{{{\bf{B}}}}}\). Assuming an infinitely large cohort, the marginal association matrix Γ can be consistently estimated from GWAS data. We then provide conditions on the direct genetic effects A that make \(\tilde{{{{\bf{B}}}}}\) uniquely solvable in Eq. (4).

Specifically, we prove that \(\tilde{{{{\bf{B}}}}}\) is unique if matrix A lies in the set

$${{{\mathcal{A}}}}:=\{{{{\bf{A}}}}\in {{\mathbb{R}}}^{K\times P}:{{{\bf{A}}}}{{{{\bf{A}}}}}^{\top }={{{\rm{diag}}}}({d}_{1},\ldots,{d}_{K})\,{{{\rm{with}}}}\,\min \{{d}_{1},\ldots,{d}_{K-1}\} > 0\}.$$

Intuitively, rather than assuming that each trait k has its own set of valid SNPs (i.e., there is at least a SNP j where αjk ≠ 0 and \({\alpha }_{j{k}^{{\prime} }}=0\) for any \({k}^{{\prime} }\ne k\)), we require only that the direct genetic effects across traits are mutually orthogonal. Mathematically, this is because \(({{{\bf{I}}}}+\tilde{{{{\bf{B}}}}})({{{\bf{A}}}}{{{{\bf{A}}}}}^{\top }){({{{\bf{I}}}}+\tilde{{{{\bf{B}}}}})}^{\top }\) provides a unique LDL decomposition of the identifiable matrix ΓΓ. To make this assumption more realistic, we further can show that when the number of SNPs P → , it is sufficient for these direct effects to be merely independent across traits rather than strictly orthogonal–an extension of the well-known InSIDE assumption to multiple traits.

Selection of genetic instruments

We assume that we can use separate GWAS datasets to obtain p-values for SNP selection, thereby mitigating potential selection biases11,41. Upon selecting the SNPs, another set of GWAS datasets is used to obtain the summary statistics for the exposure and outcome traits. The selection p-value for each SNP is defined as the Bonferroni-corrected p-value, computed from K − 1 GWAS summary datasets corresponding to traits X1 to XK−1. SNPs are subsequently chosen based on the selection p-values using LD clumping42 (with r2 cutoff at 0.001), ensuring that selected SNPs are approximately independent. Compared to a stringent p-value threshold (such as 10−8) for SNP selection, our method allows for a higher threshold (such as 10−4 or 0.01) and uses SNPs that are weakly associated with the exposure traits. This can generally increase the power of MR analysis.

Model estimation and inference

We use a hierarchical Bayesian framework and Gibbs sampler to infer the direct and indirect causal effects, which can be expressed as functions of the matrix \(\tilde{{{{\bf{B}}}}}\). In addition to the spike-and-slab prior on the direct genetic effects, we put Gaussian priors on elements of the matrix \(\tilde{{{{\bf{B}}}}}\) and conjugate priors on the hyperparameters:

$${\tilde{\beta }}_{kl}| {\sigma }^{2} { \sim }^{{{{\rm{i}}}}.{{{\rm{i}}}}.{{{\rm{d}}}}.}{{{\mathcal{N}}}}(0,{\sigma }^{2}),\quad 1\le l < k\le K\\ {\sigma }^{2} \sim {{{\rm{InvGamma}}}}(\alpha,\beta )\\ {p}_{k} { \sim }^{{{{\rm{ind}}}}}{{{\rm{Beta}}}}({a}_{k},{b}_{k}),\quad k=1,2,\ldots,K\\ {\sigma }_{k0}^{2} { \sim }^{{{{\rm{ind}}}}}\,{\mbox{InvGamma}}\,({\alpha }_{k}^{0},{\beta }_{k}^{0})\quad k=1,2,\ldots,K\\ {\sigma }_{k1}^{2} { \sim }^{{{{\rm{ind}}}}}\,{\mbox{InvGamma}}\,({\alpha }_{k}^{1},{\beta }_{k}^{1})\quad k=1,2,\ldots,K$$

To estimate the above Bayesian hierarchical model, we employ a Gibbs sampler algorithm which can efficiently generate posterior samples when K is small. Posterior samples of \(\tilde{{{{\bf{B}}}}}\) directly give us posterior samples of B, which can then be used to construct credible intervals for any direct and indirect causal effects, or their functions. Additionally, we use a simplified empirical Bayes approach to choose the hyper-parameters \(({\alpha }_{k}^{0},{\beta }_{k}^{0}),({\alpha }_{k}^{1},{\beta }_{k}^{1})\) and (akbk), recognizing their significant impact on the posterior distributions of \(\tilde{{{{\bf{B}}}}}\). For further computational and mathematical details, see SI Text.

Extension to multiple traits at each time point

We also extend our model to accommodate multiple traits at any given time point, incorporating two key assumptions to facilitate the MR analysis. First, we assume that there is no causal effect between traits at the same time point. This assumption avoids the need to identify causal directions between any two traits that coexist temporally. If two sets of traits {Xk1,  , Xkm} and {Xk(m+1),  , XkM} at the same time point k potentially causally related–with the first set influencing the second–our method can still be applied by introducing an additional pseudo-time point (or stage) k + 1 immediately following k. The second set of traits is then moved to this added stage, and all subsequent stages are shifted accordingly.

The second assumption is a generalization of the independence assumption on direct genetic effects. Specifically, we require that the direct genetic effects αkj of any SNP j must be independent across all traits, including those measured at the same and at different time points. With these assumptions, we can adapt our model and Gibbs sampler to infer the direct and indirect causal effects.

Benchmarking methods

We benchmarked the competing methods using publicly available software packages. Specifically, GRAPPLE, MVMR-IVW (MVMR; https://github.com/WSpiller/MVMR), MVMR-Egger (MendelianRandomization; https://CRAN.R-project.org/package=MendelianRandomization), MVMR-Robust (https://github.com/aj-grant/robust-mvmr), MVMR-cML (https://github.com/ZhaotongL/MVMR-cML), and MVMR-Horse (https://github.com/aj-grant/mrhorse) were all implemented with their standard procedures recommended in the respective software documentation. We used the same set of selected SNPs for each method.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.