BEAST X for Bayesian phylogenetic, phylogeographic and phylodynamic inference

Baele, Guy; Ji, Xiang; Hassler, Gabriel W.; McCrone, John T.; Shao, Yucai; Zhang, Zhenyu; Holbrook, Andrew J.; Lemey, Philippe; Drummond, Alexei J.; Rambaut, Andrew; Suchard, Marc A.

doi:10.1038/s41592-025-02751-x

Download PDF

Brief Communication
Open access
Published: 07 July 2025

BEAST X for Bayesian phylogenetic, phylogeographic and phylodynamic inference

Nature Methods volume 22, pages 1653–1656 (2025)Cite this article

19k Accesses
7 Citations
57 Altmetric
Metrics details

Subjects

Abstract

Here we present the open-source and cross-platform BEAST X software that combines molecular phylogenetic reconstruction with complex trait evolution, divergence-time dating and coalescent demographics in an efficient statistical inference engine. BEAST X significantly advances the flexibility and scalability of evolutionary models supported. Novel clock and substitution models leverage a large variety of evolutionary processes; discrete, continuous and mixed traits with missingness and measurement errors; and fast, gradient-informed integration techniques that rapidly traverse high-dimensional parameter spaces.

Quantitatively defining species boundaries with more efficiency and more biological realism

Article Open access 28 July 2022

A species-level timeline of mammal evolution integrating phylogenomic data

Article 22 December 2021

Deep learning from phylogenies to uncover the epidemiological dynamics of outbreaks

Article Open access 06 July 2022

Main

The Bayesian evolutionary analysis sampling trees (BEAST) platform stands as one of the leading inference tools across a range of biological fields from systematic biology to molecular epidemiology of infectious diseases. BEAST’s success arises from its focus on sequence, phenotypic and epidemiological data integration along time-scaled phylogenetic trees. Motivation for BEAST development builds from the rapid growth of pathogen genome sequencing to deliver real-time inference for the emergence and spread of rapidly evolving pathogens to better understand their epidemiology and evolutionary dynamics. Recent scientific successes using the BEAST platform uncover the origins, spread and persistence of multiple Ebola virus outbreaks¹, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants² and mpox virus lineages³.

BEAST X introduces salient advances over previous software versions⁴ by providing a substantially more flexible and scalable platform for evolutionary analysis with a strong focus on pathogen genomics. Two thematic thrusts describe these advances: state-of-science, high-dimensional models span multiple biological and public health domains including sequence evolution, phylodynamics and phylogeography, while new computational algorithms and emerging statistical sampling techniques notably accelerate inference across this collection of complex, highly structured models.

BEAST X incorporates new extensions to existing substitution processes to model additional features affecting sequence changes. These include a covarion-like Markov-modulated extension that incorporates site- and branch-specific heterogeneity by integrating over candidate substitution processes to capture different selective pressures over site and time⁵. Random-effects substitution models extend common continuous-time Markov chain (CTMC) models into a richer class of processes capable of capturing a wider variety of substitution dynamics, enabling a more appropriate characterization of underlying substitution processes⁶. To enable scaling of sampling-based inference under such models for large trees and state spaces, BEAST X now includes fast approximate likelihood gradients for all unknown substitution model parameters⁷. We refer to Methods for additional details regarding these substitution models.

BEAST X complements flexible sequence substitution models with advanced extensions to nonparametric tree-generative coalescent models that correct for preferential sequence sampling as a function of time⁸ and high-dimensional episodic birth–death sampling models⁹. Across tasks, BEAST X enables flexible trait evolution modeling for larger numbers of complex traits. Popular relaxed clock models capture various sources of rate heterogeneity on the phylogenetic tree, but their large numbers of model parameters can make inference difficult. BEAST X improves the classic uncorrelated relaxed clock model with a time-dependent evolutionary rate extension that accommodates rate variations through time¹⁰, a newly developed, continuous random-effects clock model¹¹ and a more general mixed-effects relaxed clock model¹². BEAST X enhances the previously computationally infeasible classic random local clock (RLC) model with a tractable and interpretable shrinkage-based local clock model¹³. We refer to Methods for additional details regarding these molecular clock models.

These advances underpin fast, flexible phylogeographic modeling in BEAST X. Discrete-trait phylogeography through CTMC modeling¹⁴ remains an attractive and widely used inference methodology. Geographic sampling bias sensitivity of the CTMC model is a common concern in phylogeographic analyses¹⁵. Although helpful, structured coalescent models fail to completely account for such bias¹⁶. BEAST X solves this problem with novel modeling¹⁷ and computational inference strategies: when parameterizing between-location transition rates as log-linear functions of environmental or epidemiological predictors¹⁸, missing predictor values often arise for one or more location pairs. BEAST X integrates out missing data within the Bayesian inference procedure by using a new Hamiltonian Monte Carlo (HMC) approach to jointly sample all missing predictor values from their full conditional distribution². Figure 1 illustrates discrete-trait phylogeographic and phylodynamic analyses of SARS-CoV-2 enabled by these BEAST X advances, focusing on the Omicron BA.1 invasion in England².

**Fig. 1: Phylodynamic analysis of the SARS-CoV-2 Omicron BA.1 invasion in England.**

Continuous-trait phylogeography using relaxed random walk (RRW) models¹⁹ requires precise spatial location data for sampled sequences. Low-precision geographic location data represent a major barrier for spatially explicit phylogeographic inference of fast-evolving pathogens. A new approach²⁰ incorporates heterogeneous prior sampling probabilities—informed by external data such as outbreak locations—over a collection of subpolygons that make up a geographic area. BEAST X now also defines homogeneous and heterogeneous prior ranges of sampling coordinates²¹. At the same time, BEAST X responds to computational challenges associated with learning branch-specific rate multipliers for large datasets by incorporating a scalable method to efficiently fit RRWs and infer their branch-specific parameters in a Bayesian framework through HMC sampling²².

Additional significant modeling enhancements include the scalable incorporation of general Gaussian (for example, Ornstein–Uhlenbeck) trait-evolution models²³, missing-trait models²⁴, phylogenetic factor analysis²⁵ and phylogenetic multivariate probit²⁶ within BEAST X. In particular, these methods successfully model dependencies between high-dimensional trait data with dozens or even thousands of observations per taxon with the help of novel computational inference techniques.

Newly introduced preorder tree traversal algorithms in BEAST X enable many of the advances we describe above. Let N denote the number of taxa, or leaves, on a tree. Preorder tree traversal algorithms complement their postorder counterparts and calculate vectors of partial likelihoods (for discrete traits) and sufficient statistics (for continuous traits) for each branch. With the pre- and postorder vectors together, one calculates derivatives to give rise to linear-in-N evaluations of high-dimensional gradients for branch-specific parameters of interest (for example, evolutionary rates of discrete/continuous traits and divergence times)^{11,13,22,23,24,25,26,27}. These scalable, high-dimensional gradients enable much higher performance Markov-chain Monte-Carlo transition kernels to efficiently simulate phylogenetic, phylogeographic and phylodynamic posterior distributions.

Linear-in-N gradient algorithms enable high-performance HMC transition kernels to sample from high-dimensional spaces of parameters that were previously computationally burdensome to learn. BEAST X implements linear gradients with HMC for a broad collection of gold-standard models: the nonparametric coalescent-based skygrid model^28,29 now scalably infers past population dynamics without strong assumptions regarding population size trends; mixed-effects and shrinkage-based clock models improve classic uncorrelated relaxed and random local clock models by incorporating biologically rich features to capture rate heterogeneities¹¹; a variety of new continuous-trait evolution models learn branch-specific rate multipliers^22,24,26; and novel divergence-time models efficiently overcome complex node-height restrictions by operating in a transformed space²⁷. Despite the increased computational cost of gradient evaluations, Table 1 shows that applications of these linear-time HMC samplers achieve substantial increases in effective sample size (ESS) per unit time compared with the conventional Metropolis–Hastings samplers that previous versions of BEAST provide^7,9,11,24,27. Note that these speedups are indicative and can be sensitive to the size (for example, number of taxa and number of sites; Table 1) and nature of the dataset, and to the tuning of the HMC operations. While many of the models in BEAST X already use HMC transition kernels (Table 1), ongoing developments target further extending the list of models supported by HMC. In addition, development and integration of novel and existing evolutionary, clock and coalescent models warrant continued efforts into designing and fine-turning accompanying HMC transition kernels. Specific to the data analysis presented here will be an increased focus on designing phylogeographic model formulations that are better suited to accommodating sampling bias and that offer more flexibility in their generalized linear model (GLM) extension.

Table 1 Performance benchmarks of HMC transition kernels

Full size table

Methods

Substitution models

Among the newly developed and incorporated substitution models in BEAST X are Markov-modulated models (MMMs), which constitute a class of mixture models that allow the substitution process to change across each branch and for each site independently within an alignment⁵. To this end, MMMs are made up of a number of K substitution models (for example, nucleotide, amino acid or codon models) of dimension S to construct the KS × KS instantaneous rate matrix used in calculating the observed sequence data likelihood. This augmented dimensionality of MMMs leads to increased computational demands that we mitigate through recent developments in BEAGLE, a high-performance computational library for phylogenetic inference³¹. MMMs have been shown to readily integrate with Bayesian model selection through (log) marginal likelihood estimation, to substantially improve model fit compared with standard CTMC substitution models and to impact phylogenetic tree estimation in examples from bacterial, viral and plastid genome evolution⁵.

Random-effects substitution models form another extension of standard CTMC models that incorporate additional rate variation by representing the original (base) model as fixed-effect model parameters and allow the additional random effects to capture deviations from the simpler process, thereby enabling a more appropriate characterization of underlying substitution processes while retaining the basic structure of the base model that may be biologically or epidemiologically motivated⁶. Given that random-effects substitution models are in general overparameterized and, as such, not identifiable by the observed sequence data likelihood alone, one may use shrinkage priors to regularize the random effects, pulling them (often strongly) to be near or equal to 0 when the data provide little or no information to the contrary and, otherwise, attempting to impart little bias into the posterior. Further, shrinkage priors also aid in the performance of model selection, determining whether a particular random effect should be excluded from the model. One may use these models to study the strongly increased rate of C → T substitutions over the reverse T → C substitutions in SARS-CoV-2, a phenomenon that is a violation of the common phylogenetic assumption of reversibility that the majority of standard CTMC substitution models make. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random effects exhibits strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show it to be a more adequate model than a reversible model⁷.

Molecular clock models

Recently developed molecular clock models include a time-dependent evolutionary rate model that accommodates evolutionary rate variations through time¹⁰. Such a phenomenon is now widely recognized in various organisms, with particular prevalence in rapidly evolving viruses that have a relatively long-term transmission history in animal and human populations³². This novel molecular clock model builds upon phylogenetic epoch modeling³³ to specify a sequence of unique substitution processes throughout evolutionary history, one for each of M discretized time intervals in the epoch structure determined by boundaries at times T₀ < T₁ < … < T_M−1 < T_M, where T_M = ∞. In this structure, the boundaries T₁ to T_M−1 determine a shift in evolutionary rate that simultaneously applies to all lineages in the tree at that point in time. This model uncovers a strong time-dependent effect that implies rate variation over four orders of magnitude in both foamy virus co-speciation and lentivirus evolutionary histories. The model improves node height (that is, time to most recent common ancestor) estimation and readily integrates with Bayesian model selection through (log) marginal likelihood estimation, where the inclusioin of time dependence yields a better fit to the data compared with other molecular clock models¹⁰.

A novel continuous random-effects relaxed clock model offers an alternative parameterization to the standard uncorrelated relaxed clock model¹¹. In this model, the evolutionary rate r_i on branch i follows

$$\log {r}_{i}={\beta }_{0}+{\epsilon }_{i},$$

where β₀ is an unknown grand mean representing the background rate in log-space and ϵ_i are independent and normally distributed random variables with mean 0 and estimable variance. A standard approach in BEAST X across molecular clock models is to make use of a conditional reference prior³⁴ on the global tree-wise mean parameter $\exp ({\beta }_{0})$. This clock model leads to higher-dimensional parameter spaces than a simple strict clock model, but challenges in likelihood-based inference from these high-dimensional models have been addressed through applications in gradient-based optimization methods and HMC sampling (see below).

A more general mixed-effects relaxed clock model¹² that combines both fixed and random effects in the evolutionary rate is also available, with the evolutionary rate parameter r_i on branch i now expanding to

$$\log {r}_{i}={\beta }_{0}+\mathop{\sum }\limits_{j=1}^{p}{X}_{ij}{\beta }_{j}+{\epsilon }_{i},$$

where β_j is the estimated effect size of the jth covariate X_ij (out of p covariates). For example, modeling a clade-specific rate effect with coefficient β_j, one would set X_ij = 1 for all branches encompassed by the clade and X_ij = 0 for all other branches. This mixed-effects model has been used to confirm considerable rate variation among HIV-1 group M subtypes that cannot be adequately modeled by uncorrelated relaxed clock models, also yielding a time to the most recent common ancestor of HIV-1 group M that is earlier than the uncorrelated relaxed clock estimate for the same dataset.

Finally, the original RLC model has been reparameterized to tackle convergence and statistical mixing issues, and to achieve scalability to phylogenies with large numbers of taxa¹³. To this end, the novel shrinkage-based RLC assumes that clock rates are autocorrelated and that the incremental differences between each clock rate and its parental clock rate are equipped with a flexible, heavy-tailed, Bayesian bridge before shrink increments of change between branch-specific clocks, thereby enabling the use of a computationally efficient sampling approach to perform inference³⁵. HMC sampling is used to generate proposals in increment space, using preconditioning to improve HMC performance by rescaling increment proposals, allowing larger steps to be taken in dimensions with larger variance. This novel shrinkage-based RLC has been successfully used in problems that once appeared computationally impractical, such as the study of a heritable clock structure of various surface glycoproteins of influenza A virus in the absence of prior knowledge about clock placement¹³.

HMC sampling

HMC constitutes a gradient-based alternative to random-walk MCMC for efficient parameter inference, yielding markedly improved parameter estimation efficiency. HMC transition kernels leverage gradients to produce distant proposals with relatively high acceptance rates for the Metropolis–Hastings–Green algorithm by exploiting numerical solutions to Hamiltonian dynamics. For observed sequence data Y and estimable model parameters ${\mathbf{\uptheta}} = ( \theta_1, \ldots, \theta_k )$, this requires computing a number of derivatives of the observed sequence data likelihood ${\mathbb{P}}\left({\bf{Y}}| {\mathbf{\uptheta }}\right)$ on top of already calculating ${\mathbb{P}}\left({\bf{Y}}| {\mathbf{\uptheta }}\right)$, which can be computationally demanding by itself. The gradient $\nabla {\mathbb{P}}\left({\bf{Y}}| {\mathbf{\uptheta }}\right)$ is the collection of derivatives with respect to all estimable model parameters

$$\nabla {\mathbb{P}}\left({\bf{Y}}| {\mathbf{\uptheta }}\right)={\left(\frac{\partial }{\partial {\theta }_{1}}{\mathbb{P}}\left({\bf{Y}}| {\mathbf{\uptheta }}\right),\ldots ,\frac{\partial }{\partial {\theta }_{k}}{\mathbb{P}}\left({\bf{Y}}| {\mathbf{\uptheta }}\right)\right)}^{{\prime} },$$

where the prime symbol denotes the transpose operator. As with computing ${\mathbb{P}}\left({\bf{Y}}| {\mathbf{\uptheta }}\right)$, a pruning algorithm can be used to simplify calculating a single entry in $\nabla {\mathbb{P}}\left({\bf{Y}}| {\mathbf{\uptheta }}\right)$ through postorder traversal, but the ${\mathcal{O}}(NK)$ computational demands across all entries for HMC remain much higher than for standard transition kernels when K → N as for many clock models. That said, a linear-time algorithm for ${\mathcal{O}}(N)$-dimensional gradient evaluation by complementing the postorder traversal with its corresponding preorder traversal renders these computations feasible on central processing units (CPUs)¹¹. Additional development of novel massively parallel algorithms enables taking advantage of graphics processing units (GPUs) to obtain further speedups over the CPU implementation³⁶.

We have developed and implemented into BEAST X a wide range of HMC transition kernels that have led to drastic improvements in parameter estimation efficiency (see the main text for results)^{7,9,11,13,24,26,27}. Given the increased amount of sequence data and associated metadata to be analyzed in Bayesian phylodynamic inference, HMC transition kernels are essential building blocks that enable complex Bayesian phylodynamic analyses in a reasonable amount of time. This is illustrated in the following section’s practical example, where the combination of a large number (11,351) of taxa from 256 discrete geographic locations (the upper limit in the current computational architecture and implementation in BEAGLE³¹) would be completely infeasible to analyze without the help of HMC.

Application to SARS-CoV-2

Figure 1 illustrates a variety of advances in BEAST X modeling and inference strategies as applied to the phylogeographic and phylodynamic analysis of SARS-CoV-2. Specifically, it focuses on phylodynamic analyses of the Omicron BA.1 invasion in England². Figure 1a reports effect size estimates for covariates in a GLM extension of discrete phylogeographic inference for the largest BA.1 transmission lineage identified by Tsui et al.² (11,351 genomes). The phylogeographic inference involved two epochs³³ to estimate separate covariate effect sizes in the expansion and postexpansion phase, showing, for example, differences in dispersal out of Greater London and in contributions of mobility. The epoch-GLM discrete diffusion model was fit to a set of trees estimated using the Thorney BEAST approach³⁷, inferring dispersal among 256 lower-tier local authorities (LTLAs) in England. HMC inference was required to fit this high-dimensional diffusion model while integrating over some degree of missing data in the covariates, and the computation benefits tremendously from using GPUs^31,36. In addition to effect size estimates illustrated here, Tsui et al.² introduce a new estimate of relative predictor importance-based deviance measures. Figure 1b summarizes Markov jump estimates³⁸ between the LTLAs during the expansion phase, showing dispersal from Greater London LTLAs (in blue) as well as from other LTLAs (in yellow). Figure 1c,d illustrates spatiotemporal projections of a maximum clade credibility summary of a continuous phylogeographic inference of the same large BA.1 transmission lineage, including a mapping of the first half (Fig. 1c) and the complete expansion phase (Fig. 1d). This visualization was achieved using spread.gl³⁰, a high-performance browser application that uses the kepler.gl framework to accommodate large-scale data. Fitting the RRW model in this analysis required HMC inference to efficiently integrate over the branch-specific rates of diffusion²². Using standard Metropolis–Hastings kernels on the RRW branch rates does not complete MCMC burn-in in the same compute-time it takes HMC to deliver ESSs >100 across all 22,700 rate parameters from the converged posterior distribution. Figure 1e,f represents estimates of transmission dynamics using several tree generative priors. We compare doubling-time estimates for a subset of genomes representative for the expansion phase of the largest BA.1 transmission lineage (n = 1,000) to a genomic dataset representative for expansion of the Alpha variant in England (n = 976)³⁹. This highlights an Omicron BA.1 doubling time that is about 3.5 times smaller compared with Alpha. We include estimates of effective population size (Ne) through time using a nonparametric coalescent model inferred from the set of empirical trees with 11,351 tips, showing a roughly linear increase in log Ne in the expansion phase as opposed to roughly contant log Ne in the postexpansion phase. Finally, we provide comparative estimates of the effective reproduction number (Re) through time using an episodic birth–death sampling model fit to the same set of empirical trees⁹, showing a considerable decrease in Re after implementation of measures against the spread of Omicron in the UK.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All data required to perform the analyses in Fig. 1 are available as XML input files for BEAST X via GitHub at https://github.com/beast-dev/beast-mcmc/tree/master/examples/BEASTXRelease.

Code availability

By naming this major release BEAST X v10.5.0, we delineate our platform from the related but independent BEAST 2 project⁴⁰. The jump from BEAST v1.10.5 to BEAST X v10.5.0 facilitates best-practice semantic versioning and clarifies continued active development. BEAST X is open-source under the GNU lesser general public license and available via GitHub at https://beast-dev.github.io/beast-mcmc for cross-platform compiled programs and https://github.com/beast-dev/beast-mcmc for software development and source code. It requires Java version 1.8 or greater and the platform also depends on the BEAGLE high-performance phylogenetic likelihood library version ≥4.0. BEAST X interfaces with recent releases of the BEAGLE high-performance phylogenetic likelihood library³¹ to off-load computationally expensive calculations to multicore CPUs and GPUs on laptop, desktop and cluster devices. These calculations include for the first time the preorder traversals³⁶ necessary for linear-time evaluation of high-dimensional gradients of the sequence and trait data likelihoods. Documentation, tutorials and help are available at http://beast.community, and many users actively discuss BEAST usage and development in the ‘beast-users’ GoogleGroup discussion group (http://groups.google.com/group/beast-users).

References

Dudas, G. et al. Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 544, 309–315 (2017).
Article CAS PubMed PubMed Central Google Scholar
Tsui, J. L.-H. et al. Genomic assessment of invasion dynamics of SARS-CoV-2 Omicron BA.1. Science 381, 336–343 (2023).
Article CAS PubMed PubMed Central Google Scholar
O’Toole, Á. et al. APOBEC3 deaminase editing in mpox virus as evidence for sustained human transmission since at least 2016. Science 382, 595–600 (2023).
Article PubMed PubMed Central Google Scholar
Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
Article PubMed PubMed Central Google Scholar
Baele, G., Gill, M. S., Bastide, P., Lemey, P. & Suchard, M. A. Markov-modulated continuous-time Markov chains to identify site-and branch-specific evolutionary variation in BEAST. Syst. Biol. 70, 181—189 (2021).
Article PubMed Google Scholar
Pekar, J. E. et al. The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Science 377, 960–966 (2022).
Article CAS PubMed Google Scholar
Magee, A. F. et al. Random-effects substitution models for phylogenetics via scalable gradient approximations. Syst. Biol. 73, 562–578 (2024).
Article PubMed PubMed Central Google Scholar
Karcher, M. D., Carvalho, L. M., Suchard, M. A., Dudas, G. & Minin, V. N. Estimating effective population size changes from preferentially sampled genetic sequences. PLOS Comput. Biol. 16, 1–22 (2020).
Article Google Scholar
Shao, Y., Magee, A. F., Vasylyeva, T. I. & Suchard, M. A. Scalable gradients enable Hamiltonian Monte Carlo sampling for phylodynamic inference under episodic birth–death-sampling models. PLOS Comput. Biol. 20, 1–23 (2024).
Article Google Scholar
Membrebe, J. V., Suchard, M. A., Rambaut, A., Baele, G. & Lemey, P. Bayesian inference of evolutionary histories under time-dependent substitution rates. Mol. Biol. Evol. 36, 1793–1803 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ji, X. et al. Gradients do grow on trees: a linear-time O(N)-dimensional gradient for statistical phylogenetics. Mol. Biol. Evol. 37, 3047—3060 (2020).
Article PubMed PubMed Central Google Scholar
Bletsa, M. et al. Divergence dating using mixed effects clock modelling: an application to HIV-1. Virus Evol. 5, vez036 (2019).
Article PubMed PubMed Central Google Scholar
Fisher, A. A. et al. Shrinkage-based random local clocks with scalable inference. Mol. Biol. Evol. 40, msad242 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lemey, P., Rambaut, A., Drummond, A. J. & Suchard, M. A. Bayesian phylogeography finds its roots. PLoS Comp. Biol. 5, e1000520 (2009).
Article Google Scholar
De Maio, N., Wu, C.-H., O’Reilly, K. M. & Wilson, D. New routes to phylogeography: a Bayesian structured coalescent approximation. PLoS Genet. 11, 1–22 (2015).
Google Scholar
Layan, M. et al. Impact and mitigation of sampling bias to determine viral spread: evaluating discrete phylogeography through CTMC modeling and structured coalescent model approximations. Virus Evol. 9, vead010 (2023).
Article PubMed PubMed Central Google Scholar
Lemey, P. et al. Accommodating individual travel history and unsampled diversity in Bayesian phylogeographic inference of SARS-CoV-2. Nat. Commun. 11, 5110 (2020).
Lemey, P. et al. Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog. 10, e1003932 (2014).
Article PubMed PubMed Central Google Scholar
Lemey, P., Rambaut, A., Welch, J. & Suchard, M. Phylogeography takes a relaxed random walk in continuous space and time. Mol. Biol. Evol. 27, 1877–1885 (2010).
Article CAS PubMed PubMed Central Google Scholar
Dellicour, S. et al. Incorporating heterogeneous sampling probabilities in continuous phylogeographic inference—application to H5N1 spread in the Mekong region. Bioinformatics 36, 2098–2104 (2019).
Article PubMed Central Google Scholar
Dellicour, S., Lemey, P., Suchard, M. A., Gilbert, M. & Baele, G. Accommodating sampling location uncertainty in continuous phylogeography. Virus Evol. 8, veac041 (2022).
Article PubMed PubMed Central Google Scholar
Fisher, A. A., Ji, X., Zhang, Z., Lemey, P. & Suchard, M. A. Relaxed random walks at scale. Syst. Biol. 70, 258–267 (2021).
Article PubMed Google Scholar
Bastide, P., Ho, L. S. T., Baele, G., Lemey, P. & Suchard, M. A. Efficient Bayesian inference of general Gaussian models on large phylogenetic trees. Ann. Appl. Stat. 15, 971 – 997 (2021).
Article Google Scholar
Hassler, G. W. et al. Inferring phenotypic trait evolution on large trees with many incomplete measurements. J. Am. Stat. Assoc. 117, 678–692 (2022).
Article CAS PubMed Google Scholar
Hassler, G. W. et al. Principled, practical, flexible, fast: a new approach to phylogenetic factor analysis. Methods Ecol. Evol. 13, 2181–2197 (2022).
Article PubMed PubMed Central Google Scholar
Zhang, Z. et al. Accelerating Bayesian inference of dependency between mixed-type biological traits. PLOS Comput. Biol. 19, 1–22 (2023).
Article CAS Google Scholar
Ji, X. et al. Scalable Bayesian divergence time estimation with ratio transformations. Syst. Biol. 72, 1136–1153 (2023).
Article PubMed PubMed Central Google Scholar
Gill, M. S. et al. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30, 713–724 (2013).
Article CAS PubMed Google Scholar
Baele, G., Gill, M. S., Lemey, P. & Suchard, M. A. Hamiltonian Monte Carlo sampling to estimate past population dynamics using the skygrid coalescent model in a Bayesian phylogenetics framework. Wellcome Open Res. 53, 53 (2020).
Article Google Scholar
Li, Y. et al. spread.gl—visualising pathogen dispersal in a high-performance browser application. Bioinformatics 40, btae721 (2024).
Article CAS PubMed PubMed Central Google Scholar
Ayres, D. L. et al. BEAGLE 3.0: improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics. Syst. Biol. 68, 1052–1061 (2019).
Article PubMed PubMed Central Google Scholar
Duchêne, S., Holmes, E. C. & Ho, S. Y. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Proc. R. Soc. B 281, 20140732 (2014).
Article PubMed PubMed Central Google Scholar
Bielejec, F., Lemey, P., Baele, G., Rambaut, A. & Suchard, M. A. Inferring heterogeneous evolutionary processes through time: from sequence substitution to phylogeography. Syst. Biol. 63, 493–504 (2014).
Article PubMed PubMed Central Google Scholar
Ferreira, M. A. R. & Suchard, M. A. Bayesian analysis of elapsed times in continuous-time Markov chains. Can. J. Stat. 36, 355–368 (2008).
Article Google Scholar
Polson, N. G., Scott, J. G. & Windle, J. The Bayesian bridge. J. R. Stat. Soc. Ser. B 76, 713–733 (2014).
Article Google Scholar
Gangavarapu, K. et al. Many-core algorithms for high-dimensional gradients on phylogenetic trees. Bioinformatics 40, btae030 (2024).
Article CAS PubMed PubMed Central Google Scholar
McCrone, J. et al. Context-specific emergence and growth of the SARS-CoV-2 Delta variant. Nature 610, 154–160 (2022).
Article CAS PubMed PubMed Central Google Scholar
Minin, V. N. & Suchard, M. A. Fast, accurate and simulation-free stochastic mapping. Philos. Trans. R. Soc. B 363, 3985–3995 (2008).
Article Google Scholar
Hill, V. et al. The origins and molecular evolution of SARS-CoV-2 lineage B.1.1.7 in the UK. Virus Evol. 8, veac080 (2022).
Article PubMed PubMed Central Google Scholar
Bouckaert, R. et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comp. Biol. 10, e1003537 (2014).
Article Google Scholar

Download references

Acknowledgements

We thank the many developers and contributors to BEAST X, including A. Alekseyenko, D. Ayres, T. Bedford, F. Bielejec, E. Bloomquist, M. Karcher, G. Cybis, R. Forsberg, M. Gill, M. Hall, J. Heled, S. Hoehna, D. Kuehnert, W. Lok Sibon Li, G. Lunter, A. Magee, S. Markowitz, V. Minin, Á. O’Toole, J. Palacios, M. Defoin Platel, O. Pybus, B. Shapiro, K. Strimmer, M. Tolkoff, C.-H. Wu and W. Xie. We thank J. Tsui and M. Kraemer for sharing SARS-CoV-2 genomic data and metadata. We thank Y. Li for proving the spread.gl visualization for the SARS-CoV-2 Omicron BA.1 invasion in England. This work was supported in part by the European Union Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 725422-ReservoirDOCS and from the European Union’s Horizon 2020 project MOOD (grant agreement no. 874850). This work was supported in part by the Wellcome Trust through project 206298/Z/17/Z (ARTIC network). X.J. acknowledges support through Louisiana Board of Regents Research Competitiveness Subprogram, NSF grant DEB1754142 and NIH grants R01 GM072562 and R01 AI153044. A.J.H. acknowledges support through NIH grant K25 AI153816 and NSF grants DMS 2152774 and DMS 2236854. M.A.S. acknowledges support through NIH grants R01 HG006139, U19 AI135995, R01 AI153044 and R01 AI162611. P.L. acknowledges support by the Research Foundation – Flanders (‘Fonds voor Wetenschappelijk Onderzoek – Vlaanderen’, G005323N, G0D5117N and G051322N). A.R. acknowledges support from the Bill & Melinda Gates Foundation through grant OPP1175094, Pangea-II. G.B. acknowledges support from the Research Foundation – Flanders (‘Fonds voor Wetenschappelijk Onderzoek – Vlaanderen’, G098321N and G0E1420N), from the European Union Horizon 2023 RIA project LEAPS (grant agreement no. 101094685) and from the DURABLE EU4Health project 02/2023-01/2027, which is co-funded by the European Union (call EU4H-2021-PJ4) under grant agreement no. 101102733. G.W.H. acknowledges support through NIH grants T32 HG002536 and F31 AI154824. We gratefully acknowledge support from NVIDIA Corporation and Advanced Micro Devices, Inc., with the donation of parallel computing resources used for this research.

Author information

Authors and Affiliations

Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
Guy Baele & Philippe Lemey
Department of Mathematics, School of Science and Engineering, Tulane University, New Orleans, LA, USA
Xiang Ji
Statistics Group, RAND Corporation, Santa Monica, CA, USA
Gabriel W. Hassler
Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, USA
John T. McCrone
Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, USA
Yucai Shao, Zhenyu Zhang, Andrew J. Holbrook & Marc A. Suchard
School of Biological Sciences, University of Auckland, Auckland, New Zealand
Alexei J. Drummond
Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
Alexei J. Drummond
Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, UK
Andrew Rambaut
Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
Marc A. Suchard
Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
Marc A. Suchard

Authors

Guy Baele
View author publications
Search author on:PubMed Google Scholar
Xiang Ji
View author publications
Search author on:PubMed Google Scholar
Gabriel W. Hassler
View author publications
Search author on:PubMed Google Scholar
John T. McCrone
View author publications
Search author on:PubMed Google Scholar
Yucai Shao
View author publications
Search author on:PubMed Google Scholar
Zhenyu Zhang
View author publications
Search author on:PubMed Google Scholar
Andrew J. Holbrook
View author publications
Search author on:PubMed Google Scholar
Philippe Lemey
View author publications
Search author on:PubMed Google Scholar
Alexei J. Drummond
View author publications
Search author on:PubMed Google Scholar
Andrew Rambaut
View author publications
Search author on:PubMed Google Scholar
Marc A. Suchard
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors contributed to the methodological developments described. P.L. and Y.S. performed the analysis of the SARS-CoV-2 dataset. G.B., X.J., A.J.H., P.L., G.W.H., A.R. and M.A.S. wrote the first draft of the paper. J.T.M., Z.Z. and A.J.D. provided edits to the paper. All authors approved of the final version.

Corresponding authors

Correspondence to Andrew Rambaut or Marc A. Suchard.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Liang Liu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Baele, G., Ji, X., Hassler, G.W. et al. BEAST X for Bayesian phylogenetic, phylogeographic and phylodynamic inference. Nat Methods 22, 1653–1656 (2025). https://doi.org/10.1038/s41592-025-02751-x

Download citation

Received: 23 July 2024
Accepted: 13 June 2025
Published: 07 July 2025
Issue date: August 2025
DOI: https://doi.org/10.1038/s41592-025-02751-x