Abstract
Population phylogenomics uses sampled genomes to jointly infer population genetic processes (ancestral and contemporary population sizes, historical gene flow) and a phylogenetic tree relating species or populations including species split times. This challenging problem has been tackled most successfully in the Bayesian framework under the multispecies coalescent (MSC) model via Markov chain Monte Carlo (MCMC) computational algorithms. However, MCMC methods suffer from two serious problems: (i) mixing difficulties due to the high-dimensional state space with complex constraints, and (ii) the intrinsically serial nature of MCMC algorithms that defies parallelisation. To deal with both issues, we develop a new method, called Virtual Dimension Reduction allowing Parallelisation (VDRoP), that achieves the same MCMC mixing efficiency as dimension reduction through analytical integration of parameters, but without sacrificing parallel computation and without the restriction to conjugate priors. We implement the new method in the Bayesian program BPP and apply it to genomic datasets from Adansonia baobab trees, Anopheles mosquitoes, and Heliconius butterflies. The new algorithms reduce the run-time of MCMC analyses by 3 to 8 fold and improve the mixing efficiency by up to 50 fold for representative empirical datasets.
Data availability
The three empirical datasets analysed in this study and BPP scripts are available at https://figshare.com/articles/dataset/bppMigration-algorithms-data_tgz/30032800. Accession codes for sequences in the baobabs dataset are in table S640. For the Anopheles dataset, 12 genomes (two per species) were from the assemblies of refs. 57 and 47; GenBank accession numbers and NCBI BioSample numbers are in table S77,48. The BioSample accession numbers for the three Heliconius genomes are SAMN11398304, SAMN11398291, and SAMN1139830149 (see table S1 in ref. 50).
Code availability
The VDRoP algorithms are implemented in the BPP software, available at https://github.com/bpp/bpp.
References
Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003).
Jiao, X., Flouri, T. & Yang, Z. Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow. Nat. Sci. Rev. 8, nwab127 (2021).
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A. & Teller, E. Equations of state calculations by fast computing machines. J. Chem. Physi. 21, 1087–1092 (1953).
Hastings, W. Monte Carlo sampling methods using Markov chains and their application. Biometrika 57, 97–109 (1970).
Hey, J. et al. Phylogeny estimation by integration over isolation with migration models. Mol. Biol. Evol. 35, 2805–2818 (2018).
Douglas, J., Jimenez-Silva, C. L. & Bouckaert, R. StarBeast3: Adaptive parallelised Bayesian inference under the multispecies coalescent. Syst. Biol. (2022).
Flouri, T., Jiao, X., Huang, J., Rannala, B. & Yang, Z. Efficient Bayesian inference under the multispecies coalescent with migration. Proc. Natl. Acad. Sci. USA. 120, e2310708120 (2023).
Huang, J., Flouri, T. & Yang, Z. A simulation study to examine the information content in phylogenomic datasets under the multispecies coalescent model. Mol. Biol. Evol. 37, 3211–3224 (2020).
Thawornwattana, Y., Flouris, T., Mallet, J. & Yang, Z. Inference of gene flow between species from genomic data when the mode, direction and lineages are misspecified. Mol. Biol. Evol. 42, 1–18 (2025).
Thawornwattana, Y., Rannala, B. & Yang, Z. On the robustness of Bayesian inference of gene flow to intragenic recombination and natural selection. Mol. Biol. Evol. 43, 1–22 (2026).
Hey, J. & Nielsen, R. Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc. Natl. Acad. Sci. USA. 104, 2785–2790 (2007).
Jones, G. R. Divergence estimation in the presence of incomplete lineage sorting and migration. Syst. Biol. 68, 19–31 (2019).
Wen, D. & Nakhleh, L. Coestimating reticulate phylogenies and gene trees from multilocus sequence data. Syst. Biol. 67, 439–457 (2018).
Zhang, C., Ogilvie, H. A., Drummond, A. J. & Stadler, T. Bayesian inference of species networks from multilocus sequence data. Mol. Biol. Evol. 35, 504–517 (2018).
Flouri, T., Jiao, X., Rannala, B. & Yang, Z. A Bayesian implementation of the multispecies coalescent model with introgression for phylogenomic analysis. Mol. Biol. Evol. 37, 1211–1223 (2020).
Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nature Genet. 43, 1031–1034 (2011).
Thawornwattana, Y., Seixas, F. A., Mallet, J. & Yang, Z. Full-likelihood genomic analysis clarifies a complex history of species divergence and introgression: the example of the erato-sara group of Heliconius butterflies. Syst. Biol. 71, 1159–1177 (2022).
Thawornwattana, Y., Seixas, F. A., Yang, Z. & Mallet, J. Major patterns in the introgression history of Heliconius butterflies. eLife 12, RP90656 (2023).
Santos, S. H. D. et al. Massive inter-species introgression overwhelms phylogenomic relationships among jaguar, lion, and leopard. Syst. Biol. 74, 583–599 (2025).
Flouri, T., Jiao, X., Rannala, B. & Yang, Z. Species tree inference with BPP using genomic sequences and the multispecies coalescent. Mol. Biol. Evol. 35, 2585–2593 (2018).
Notohara, M. The coalescent and the genealogical process in geographically structured populations. J. Math. Biol. 29, 59–75 (1990).
Beerli, P. & Felsenstein, J. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152, 763–773 (1999).
Beerli, P. & Felsenstein, J. Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc. Natl. Acad. Sci. USA. 98, 4563–4568 (2001).
Nielsen, R. & Wakeley, J. Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158, 885–896 (2001).
Zhu, T., Flouri, T. & Yang, Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol. Ecol. 31, 2814–2829 (2022).
Yan, Z., Ogilvie, H. A. & Nakhleh, L. Comparing inference under the multispecies coalescent with and without recombination. Mol. Phylogenet. Evol. 181, 107724 (2023).
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
Yang, Z. Molecular Evolution: A Statistical Approach (Oxford University Press, Oxford, England, 2014).
Jones, G. Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent. J. Math. Biol. 74, 447–467 (2017).
Ripley, B. Stochastic Simulation (Wiley, New York, 1987).
Rannala, B. & Yang, Z. Improved reversible jump algorithms for Bayesian species delimitation. Genetics 194, 245–253 (2013).
Peskun, P. Optimum Monte-Carlo sampling using Markov chains. Biometrika 60, 607–612 (1973).
Green, P. J. & Han, X. L. in Metropolis methods, Gaussian proposals and antithetic variables (eds Barone, P., Frigessi, A. & Piccioni, M.) Stochastic Models, Statistical Methods and Algorithms in Image Analysis 142–164 (Springer, New York, 1992).
Gelman, A., Roberts, G. & Gilks, W. in Efficient Metropolis jumping rules (eds Bernardo, J., Berger, J., Dawid, A. & Smith, A.) Bayesian Statistics 5, Vol. 5 599–607 (Oxford University Press, Oxford, 1996).
Yang, Z. & Rodríguez, C. E. Searching for efficient Markov chain Monte Carlo proposal kernels. Proc. Natl. Acad. Sci. USA. 110, 19307–19312 (2013).
Robert, C. P. & Roberts, G. Rao-blackwellisation in the Markov chain Monte Carlo era. Int. Statist. Rev. 89, 237–249 (2021).
Liu, J. S., Wong, W. H. & Kong, A. Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81, 27–40 (1994).
Geyer, C. J. Conditioning in Markov chain Monte Carlo. J. Comput. Graph. Statist. 4, 148–154 (1995).
Thawornwattana, Y., Dalquen, D. & Yang, Z. Designing simple and efficient Markov chain Monte Carlo proposal kernels. Bayesian Analysis 13, 1033–1059 (2018).
Karimi, N. et al. Reticulate evolution helps explain apparent homoplasy in floral biology and pollination in Baobabs (Adansonia; Bombacoideae; Malvaceae). Syst. Biol. 69, 462–478 (2020).
Wan, J. N. et al. The rise of baobab trees in Madagascar. Nature 629, 1091–1099 (2024).
Yang, Z. & Rannala, B. Unguided species delimitation using DNA sequence data from multiple loci. Mol. Biol. Evol. 31, 3125–3135 (2014).
Rannala, B. & Yang, Z. Efficient bayesian species tree inference under the multispecies coalescent. Syst. Biol. 66, 823–842 (2017).
Baum, D. A. The comparative pollination and floral biology of baobabs (Adansonia-Bombacaceae). Ann. Missouri Bot. Gard. 82, 322–348 (1995).
Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).
Solis-Lemus, C. & Ane, C. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet. 12, e1005896 (2016).
Fontaine, M. C. et al. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science 347, 1258524 (2015).
Thawornwattana, Y., Dalquen, D. & Yang, Z. Coalescent analysis of phylogenomic data confidently resolves the species relationships in the Anopheles gambiae species complex. Mol. Biol. Evol. 35, 2512–2527 (2018).
Edelman, N. B. et al. Genomic architecture and introgression shape a butterfly radiation. Science 366, 594–599 (2019).
Thawornwattana, Y., Huang, J., Flouris, T., Mallet, J. & Yang, Z. Inferring the direction of introgression using genomic sequence data. Mol. Biol. Evol. 40, msad178 (2023).
Beerli, P. Comparison of bayesian and maximum-likelihood inference of population genetic parameters. Bioinformatics 22, 341–345 (2006).
Ji, J., Jackson, D. J., Leache, A. D. & Yang, Z. Power of Bayesian and heuristic tests to detect cross-species introgression with reference to gene flow in the Tamias quadrivittatus group of North American chipmunks. Syst. Biol. 72, 446–465 (2023).
Cheng, S., Flouris, T., Zhu, T. & Yang, Z. The impact of taxon sampling on inference of gene flow by summary and Bayesian methods using genomic sequence data. Syst. Biol. 75, https://doi.org/10.1093/sysbio/syag023 (2026).
Carvalho-Sobrinho, J. G. et al. Revisiting the phylogeny of Bombacoideae (Malvaceae): Novel relationships, morphologically cohesive clades, and a new tribal classification based on multilocus phylogenetic analyses. Mol. Phylogenet. Evol. 101, 56–74 (2016).
Yang, Z. The BPP program for species tree estimation and species delimitation. Curr. Zool. 61, 854–865 (2015).
Huang, J., Bennett, J., Flouri, T. & Yang, Z. Phase resolution of heterozygous sites in diploid genomes is important to phylogenomic analysis under the multispecies coalescent model. Syst. Biol. 71, 334–352 (2022).
Neafsey, D. E. et al. Mosquito genomics. highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science 347, 1258522 (2015).
Yang, Z. & Rannala, B. Bayesian species delimitation using multilocus sequence data. Proc. Natl. Acad. Sci. USA. 107, 9264–9269 (2010).
Bull, V. et al. Polyphyly and gene flow between non-sibling Heliconius species. BMC Biol. 4, 11 (2006).
Acknowledgements
We are grateful to Dr Nisa Karimi for preparing the HapHunt baobab dataset to include multiple samples per species. This study has been supported by Biotechnology and Biological Sciences Research Council grants (BB/T003502/1 and BB/X007553/1), a Natural Environment Research Council grant (NSFDEB-NERC NE/X002071/1) to Z.Y., a Natural Science Foundation of China (NSFC) grant (12101295), a Guangdong Natural Science Foundation grant (2022A1515011767), and a Shenzhen Training Project of Excellent Scientific & Technological Talents grant (RCYX20221008093033012) to X.J., an NSFC grant (32200490) and a grant from Fundamental Research Funds for Beijing Municipal Universities (XJJS202523) to J.H., and an NIH Grant (GM123306) to B.R. The study has also been supported by a Swiss National Science Foundation scientific visit grant (IZSEZ0_232434/1) to Z.Y. and Prof. Maria Anisimova.
Author information
Authors and Affiliations
Contributions
T.F., X.J., B.R., and Z.Y. designed and tested the BPP algorithms and prepared the documentation. J.H. analysed the empirical datasets. Z.Y. and B.R. supervised the research. All authors interpreted data and edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Guy Baele, Fredrik Ronquist and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Flouri, T., Jiao, X., Huang, J. et al. Scaling up Bayesian population phylogenomics through virtual dimension reduction. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71057-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-71057-z