Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Communications
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. nature communications
  3. articles
  4. article
Scaling up Bayesian population phylogenomics through virtual dimension reduction
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 08 April 2026

Scaling up Bayesian population phylogenomics through virtual dimension reduction

  • Tomáš Flouri  ORCID: orcid.org/0000-0002-8474-95071 na1,
  • Xiyun Jiao  ORCID: orcid.org/0009-0006-3924-985X2 na1,
  • Jun Huang  ORCID: orcid.org/0000-0002-4196-97293,
  • Bruce Rannala  ORCID: orcid.org/0000-0002-8355-99554 &
  • …
  • Ziheng Yang  ORCID: orcid.org/0000-0003-3351-79811 

Nature Communications , Article number:  (2026) Cite this article

  • 187 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational models
  • Phylogenetics

Abstract

Population phylogenomics uses sampled genomes to jointly infer population genetic processes (ancestral and contemporary population sizes, historical gene flow) and a phylogenetic tree relating species or populations including species split times. This challenging problem has been tackled most successfully in the Bayesian framework under the multispecies coalescent (MSC) model via Markov chain Monte Carlo (MCMC) computational algorithms. However, MCMC methods suffer from two serious problems: (i) mixing difficulties due to the high-dimensional state space with complex constraints, and (ii) the intrinsically serial nature of MCMC algorithms that defies parallelisation. To deal with both issues, we develop a new method, called Virtual Dimension Reduction allowing Parallelisation (VDRoP), that achieves the same MCMC mixing efficiency as dimension reduction through analytical integration of parameters, but without sacrificing parallel computation and without the restriction to conjugate priors. We implement the new method in the Bayesian program BPP and apply it to genomic datasets from Adansonia baobab trees, Anopheles mosquitoes, and Heliconius butterflies. The new algorithms reduce the run-time of MCMC analyses by 3 to 8 fold and improve the mixing efficiency by up to 50 fold for representative empirical datasets.

Data availability

The three empirical datasets analysed in this study and BPP scripts are available at https://figshare.com/articles/dataset/bppMigration-algorithms-data_tgz/30032800. Accession codes for sequences in the baobabs dataset are in table S640. For the Anopheles dataset, 12 genomes (two per species) were from the assemblies of refs. 57 and 47; GenBank accession numbers and NCBI BioSample numbers are in table S77,48. The BioSample accession numbers for the three Heliconius genomes are SAMN11398304, SAMN11398291, and SAMN1139830149 (see table S1 in ref. 50).

Code availability

The VDRoP algorithms are implemented in the BPP software, available at https://github.com/bpp/bpp.

References

  1. Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003).

    Google Scholar 

  2. Jiao, X., Flouri, T. & Yang, Z. Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow. Nat. Sci. Rev. 8, nwab127 (2021).

  3. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A. & Teller, E. Equations of state calculations by fast computing machines. J. Chem. Physi. 21, 1087–1092 (1953).

    Google Scholar 

  4. Hastings, W. Monte Carlo sampling methods using Markov chains and their application. Biometrika 57, 97–109 (1970).

    Google Scholar 

  5. Hey, J. et al. Phylogeny estimation by integration over isolation with migration models. Mol. Biol. Evol. 35, 2805–2818 (2018).

    Google Scholar 

  6. Douglas, J., Jimenez-Silva, C. L. & Bouckaert, R. StarBeast3: Adaptive parallelised Bayesian inference under the multispecies coalescent. Syst. Biol. (2022).

  7. Flouri, T., Jiao, X., Huang, J., Rannala, B. & Yang, Z. Efficient Bayesian inference under the multispecies coalescent with migration. Proc. Natl. Acad. Sci. USA. 120, e2310708120 (2023).

    Google Scholar 

  8. Huang, J., Flouri, T. & Yang, Z. A simulation study to examine the information content in phylogenomic datasets under the multispecies coalescent model. Mol. Biol. Evol. 37, 3211–3224 (2020).

    Google Scholar 

  9. Thawornwattana, Y., Flouris, T., Mallet, J. & Yang, Z. Inference of gene flow between species from genomic data when the mode, direction and lineages are misspecified. Mol. Biol. Evol. 42, 1–18 (2025).

    Google Scholar 

  10. Thawornwattana, Y., Rannala, B. & Yang, Z. On the robustness of Bayesian inference of gene flow to intragenic recombination and natural selection. Mol. Biol. Evol. 43, 1–22 (2026).

    Google Scholar 

  11. Hey, J. & Nielsen, R. Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc. Natl. Acad. Sci. USA. 104, 2785–2790 (2007).

    Google Scholar 

  12. Jones, G. R. Divergence estimation in the presence of incomplete lineage sorting and migration. Syst. Biol. 68, 19–31 (2019).

    Google Scholar 

  13. Wen, D. & Nakhleh, L. Coestimating reticulate phylogenies and gene trees from multilocus sequence data. Syst. Biol. 67, 439–457 (2018).

    Google Scholar 

  14. Zhang, C., Ogilvie, H. A., Drummond, A. J. & Stadler, T. Bayesian inference of species networks from multilocus sequence data. Mol. Biol. Evol. 35, 504–517 (2018).

    Google Scholar 

  15. Flouri, T., Jiao, X., Rannala, B. & Yang, Z. A Bayesian implementation of the multispecies coalescent model with introgression for phylogenomic analysis. Mol. Biol. Evol. 37, 1211–1223 (2020).

    Google Scholar 

  16. Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nature Genet. 43, 1031–1034 (2011).

    Google Scholar 

  17. Thawornwattana, Y., Seixas, F. A., Mallet, J. & Yang, Z. Full-likelihood genomic analysis clarifies a complex history of species divergence and introgression: the example of the erato-sara group of Heliconius butterflies. Syst. Biol. 71, 1159–1177 (2022).

    Google Scholar 

  18. Thawornwattana, Y., Seixas, F. A., Yang, Z. & Mallet, J. Major patterns in the introgression history of Heliconius butterflies. eLife 12, RP90656 (2023).

    Google Scholar 

  19. Santos, S. H. D. et al. Massive inter-species introgression overwhelms phylogenomic relationships among jaguar, lion, and leopard. Syst. Biol. 74, 583–599 (2025).

  20. Flouri, T., Jiao, X., Rannala, B. & Yang, Z. Species tree inference with BPP using genomic sequences and the multispecies coalescent. Mol. Biol. Evol. 35, 2585–2593 (2018).

    Google Scholar 

  21. Notohara, M. The coalescent and the genealogical process in geographically structured populations. J. Math. Biol. 29, 59–75 (1990).

    Google Scholar 

  22. Beerli, P. & Felsenstein, J. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152, 763–773 (1999).

    Google Scholar 

  23. Beerli, P. & Felsenstein, J. Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc. Natl. Acad. Sci. USA. 98, 4563–4568 (2001).

    Google Scholar 

  24. Nielsen, R. & Wakeley, J. Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158, 885–896 (2001).

    Google Scholar 

  25. Zhu, T., Flouri, T. & Yang, Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol. Ecol. 31, 2814–2829 (2022).

    Google Scholar 

  26. Yan, Z., Ogilvie, H. A. & Nakhleh, L. Comparing inference under the multispecies coalescent with and without recombination. Mol. Phylogenet. Evol. 181, 107724 (2023).

    Google Scholar 

  27. Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).

    Google Scholar 

  28. Yang, Z. Molecular Evolution: A Statistical Approach (Oxford University Press, Oxford, England, 2014).

  29. Jones, G. Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent. J. Math. Biol. 74, 447–467 (2017).

  30. Ripley, B. Stochastic Simulation (Wiley, New York, 1987).

  31. Rannala, B. & Yang, Z. Improved reversible jump algorithms for Bayesian species delimitation. Genetics 194, 245–253 (2013).

    Google Scholar 

  32. Peskun, P. Optimum Monte-Carlo sampling using Markov chains. Biometrika 60, 607–612 (1973).

    Google Scholar 

  33. Green, P. J. & Han, X. L. in Metropolis methods, Gaussian proposals and antithetic variables (eds Barone, P., Frigessi, A. & Piccioni, M.) Stochastic Models, Statistical Methods and Algorithms in Image Analysis 142–164 (Springer, New York, 1992).

  34. Gelman, A., Roberts, G. & Gilks, W. in Efficient Metropolis jumping rules (eds Bernardo, J., Berger, J., Dawid, A. & Smith, A.) Bayesian Statistics 5, Vol. 5 599–607 (Oxford University Press, Oxford, 1996).

  35. Yang, Z. & Rodríguez, C. E. Searching for efficient Markov chain Monte Carlo proposal kernels. Proc. Natl. Acad. Sci. USA. 110, 19307–19312 (2013).

    Google Scholar 

  36. Robert, C. P. & Roberts, G. Rao-blackwellisation in the Markov chain Monte Carlo era. Int. Statist. Rev. 89, 237–249 (2021).

    Google Scholar 

  37. Liu, J. S., Wong, W. H. & Kong, A. Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81, 27–40 (1994).

    Google Scholar 

  38. Geyer, C. J. Conditioning in Markov chain Monte Carlo. J. Comput. Graph. Statist. 4, 148–154 (1995).

    Google Scholar 

  39. Thawornwattana, Y., Dalquen, D. & Yang, Z. Designing simple and efficient Markov chain Monte Carlo proposal kernels. Bayesian Analysis 13, 1033–1059 (2018).

    Google Scholar 

  40. Karimi, N. et al. Reticulate evolution helps explain apparent homoplasy in floral biology and pollination in Baobabs (Adansonia; Bombacoideae; Malvaceae). Syst. Biol. 69, 462–478 (2020).

    Google Scholar 

  41. Wan, J. N. et al. The rise of baobab trees in Madagascar. Nature 629, 1091–1099 (2024).

    Google Scholar 

  42. Yang, Z. & Rannala, B. Unguided species delimitation using DNA sequence data from multiple loci. Mol. Biol. Evol. 31, 3125–3135 (2014).

    Google Scholar 

  43. Rannala, B. & Yang, Z. Efficient bayesian species tree inference under the multispecies coalescent. Syst. Biol. 66, 823–842 (2017).

    Google Scholar 

  44. Baum, D. A. The comparative pollination and floral biology of baobabs (Adansonia-Bombacaceae). Ann. Missouri Bot. Gard. 82, 322–348 (1995).

    Google Scholar 

  45. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).

    Google Scholar 

  46. Solis-Lemus, C. & Ane, C. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet. 12, e1005896 (2016).

    Google Scholar 

  47. Fontaine, M. C. et al. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science 347, 1258524 (2015).

    Google Scholar 

  48. Thawornwattana, Y., Dalquen, D. & Yang, Z. Coalescent analysis of phylogenomic data confidently resolves the species relationships in the Anopheles gambiae species complex. Mol. Biol. Evol. 35, 2512–2527 (2018).

    Google Scholar 

  49. Edelman, N. B. et al. Genomic architecture and introgression shape a butterfly radiation. Science 366, 594–599 (2019).

    Google Scholar 

  50. Thawornwattana, Y., Huang, J., Flouris, T., Mallet, J. & Yang, Z. Inferring the direction of introgression using genomic sequence data. Mol. Biol. Evol. 40, msad178 (2023).

    Google Scholar 

  51. Beerli, P. Comparison of bayesian and maximum-likelihood inference of population genetic parameters. Bioinformatics 22, 341–345 (2006).

    Google Scholar 

  52. Ji, J., Jackson, D. J., Leache, A. D. & Yang, Z. Power of Bayesian and heuristic tests to detect cross-species introgression with reference to gene flow in the Tamias quadrivittatus group of North American chipmunks. Syst. Biol. 72, 446–465 (2023).

    Google Scholar 

  53. Cheng, S., Flouris, T., Zhu, T. & Yang, Z. The impact of taxon sampling on inference of gene flow by summary and Bayesian methods using genomic sequence data. Syst. Biol. 75, https://doi.org/10.1093/sysbio/syag023 (2026).

  54. Carvalho-Sobrinho, J. G. et al. Revisiting the phylogeny of Bombacoideae (Malvaceae): Novel relationships, morphologically cohesive clades, and a new tribal classification based on multilocus phylogenetic analyses. Mol. Phylogenet. Evol. 101, 56–74 (2016).

    Google Scholar 

  55. Yang, Z. The BPP program for species tree estimation and species delimitation. Curr. Zool. 61, 854–865 (2015).

    Google Scholar 

  56. Huang, J., Bennett, J., Flouri, T. & Yang, Z. Phase resolution of heterozygous sites in diploid genomes is important to phylogenomic analysis under the multispecies coalescent model. Syst. Biol. 71, 334–352 (2022).

    Google Scholar 

  57. Neafsey, D. E. et al. Mosquito genomics. highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science 347, 1258522 (2015).

    Google Scholar 

  58. Yang, Z. & Rannala, B. Bayesian species delimitation using multilocus sequence data. Proc. Natl. Acad. Sci. USA. 107, 9264–9269 (2010).

    Google Scholar 

  59. Bull, V. et al. Polyphyly and gene flow between non-sibling Heliconius species. BMC Biol. 4, 11 (2006).

Download references

Acknowledgements

We are grateful to Dr Nisa Karimi for preparing the HapHunt baobab dataset to include multiple samples per species. This study has been supported by Biotechnology and Biological Sciences Research Council grants (BB/T003502/1 and BB/X007553/1), a Natural Environment Research Council grant (NSFDEB-NERC NE/X002071/1) to Z.Y., a Natural Science Foundation of China (NSFC) grant (12101295), a Guangdong Natural Science Foundation grant (2022A1515011767), and a Shenzhen Training Project of Excellent Scientific & Technological Talents grant (RCYX20221008093033012) to X.J., an NSFC grant (32200490) and a grant from Fundamental Research Funds for Beijing Municipal Universities (XJJS202523) to J.H., and an NIH Grant (GM123306) to B.R. The study has also been supported by a Swiss National Science Foundation scientific visit grant (IZSEZ0_232434/1) to Z.Y. and Prof. Maria Anisimova.

Author information

Author notes
  1. These authors contributed equally: Tomáš Flouri, Xiyun Jiao.

Authors and Affiliations

  1. Department of Genetics, Evolution, and Environment, University College London, Gower Street, London, UK

    Tomáš Flouri & Ziheng Yang

  2. Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen, Guangdong, China

    Xiyun Jiao

  3. School of Biomedical Engineering, Capital Medical University, Beijing, China

    Jun Huang

  4. Department of Evolution and Ecology, University of California, Davis, CA, USA

    Bruce Rannala

Authors
  1. Tomáš Flouri
    View author publications

    Search author on:PubMed Google Scholar

  2. Xiyun Jiao
    View author publications

    Search author on:PubMed Google Scholar

  3. Jun Huang
    View author publications

    Search author on:PubMed Google Scholar

  4. Bruce Rannala
    View author publications

    Search author on:PubMed Google Scholar

  5. Ziheng Yang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

T.F., X.J., B.R., and Z.Y. designed and tested the BPP algorithms and prepared the documentation. J.H. analysed the empirical datasets. Z.Y. and B.R. supervised the research. All authors interpreted data and edited the manuscript.

Corresponding authors

Correspondence to Tomáš Flouri, Xiyun Jiao, Bruce Rannala or Ziheng Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Guy Baele, Fredrik Ronquist and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Transparent Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Flouri, T., Jiao, X., Huang, J. et al. Scaling up Bayesian population phylogenomics through virtual dimension reduction. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71057-z

Download citation

  • Received: 30 July 2025

  • Accepted: 10 March 2026

  • Published: 08 April 2026

  • DOI: https://doi.org/10.1038/s41467-026-71057-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Videos
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims & Scope
  • Editors
  • Journal Information
  • Open Access Fees and Funding
  • Calls for Papers
  • Editorial Values Statement
  • Journal Metrics
  • Editors' Highlights
  • Contact
  • Editorial policies
  • Top Articles

Publish with us

  • For authors
  • For Reviewers
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Communications (Nat Commun)

ISSN 2041-1723 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing