Abstract
Pangenome graphs can represent all variation between multiple reference genomes, but current approaches to build them exclude complex sequences or are based upon a single reference. In response, we developed the PanGenome Graph Builder, a pipeline for constructing pangenome graphs without bias or exclusion. The PanGenome Graph Builder uses all-to-all alignments to build a variation graph in which we can identify variation, measure conservation, detect recombination events and infer phylogenetic relationships.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others
Data availability
Pangenomes are available at Zenodo at https://doi.org/10.5281/zenodo.7658895 (ref. 37). Scripts and source data URLs for downloading the pangenomes at https://github.com/pangenome/pggb-paper/blob/main/workflows/0.Preparation.md. Lists of all accession codes for all pangenomes are reported in Supplementary File 1.
Code availability
PGGB is available at https://github.com/pangenome/pggb. Code used for experiments can be accessed at https://github.com/pangenome/pggb-paper.
References
Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnol. 36, 875–879 (2018).
Paten, B., Novak, A. M., Eizenga, J. M. & Garrison, E. Genome graphs and the evolution of genome inference. Genome Res. 27, 665–676 (2017).
Armstrong, J. et al. Progressive cactus is a multiple-genome aligner for the thousand-genome era. Nature 587, 246–251 (2020).
Guarracino, A., Heumos, S., Nahnsen, S., Prins, P. & Garrison, E. ODGI: understanding pangenome graphs. Bioinformatics 38, 3319–3326 (2022).
Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. https://doi.org/10.1186/s13059-020-02168-z (2020).
Hickey, G. et al. Pangenome graph construction from genome alignments with minigraph-cactus. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01793-w (2023).
Noll, N., Molari, M., Shaw, L. P. & Neher, R. A. PanGraph: scalable bacterial pan-genome graph construction. Preprint at bioRxiv https://doi.org/10.1099/mgen.0.001034 (2022).
Garrison, E. & Guarracino, A. Unbiased pangenome graphs. Bioinformatics https://doi.org/10.1093/bioinformatics/btac743 (2022).
Minkin, I., Pham, S. & Medvedev, P. TwoPaCo: an efficient algorithm to build the compacted de bruijn graph from many complete genomes. Bioinformatics 33, 4024–4032 (2016).
Chin, C.-S. et al. Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes. Nat Methods 20, 1213–1221 (2023).
Sullivan, P. F. et al. Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Science https://doi.org/10.1126/science.abn2937 (2023).
Guarracino, A., Mwaniki, N., Marco-Sola, S. & Garrison, E. wfmash: whole-chromosome pairwise alignment using the hierarchical wavefront algorithm. GitHub https://github.com/waveygang/wfmash (2021).
Lee, C., Grasso, C. & Sharlow, M. F. Multiple sequence alignment using partial order graphs. Bioinformatics 18, 452–464 (2002).
Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
Gao, Y. et al. abPOA: an SIMD-based c library for fast partial order alignment using adaptive band. Bioinformatics 37, 2209–2211 (2020).
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Bolognini, D. et al. Global diversity, recurrent evolution, and recent selection on amylase structural haplotypes in humans https://doi.org/10.1101/2024.02.07.579378 (2024).
Heumos, S. et al. Pangenome graph layout by path-guided stochastic gradient descent. Bioinformatics 40, 363 (2024).
Doerr, D., Marijon, P. & Marschall, T. GFAffix identifies walk-preserving shared affixes in variation graphs and collapses them into a non-redundant graph structure. GitHub https://github.com/marschall-lab/GFAffix (2023).
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. Multiqc: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047 (2016).
Marco-Sola, S. et al. Optimal gap-affine alignment in o(s) space. Bioinformatics https://doi.org/10.1093/bioinformatics/btad074 (2023).
Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 14, 1005944 (2018).
Cagan, A. et al. Natural selection in the great apes. Mol. Biol. Evol. 33, 3268–3283 (2016).
Guarracino, A. et al. Recombination between heterologous human acrocentric chromosomes. Nature 617, 335–343 (2023).
Crysnanto, D., Leonard, A. & Pausch, H. Comparison of methods for building pangenome graphs. In Proc. 12th World Congress on Genetics Applied to Livestock Production (WCGALP) Technical and Species Orientated Innovations in Animal Breeding, and Contribution of Genetics to Solving Societal Challenges, 1066–1069 (Wageningen Academic Publishers, 2022).
Leonard, A. S., Crysnanto, D., Mapel, X. M., Bhati, M. & Pausch, H. Graph construction method impacts variation representation and analyses in a bovine super-pangenome. Preprint at bioRxiv https://doi.org/10.1186/s13059-023-02969-y (2022).
Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022).
Heumos, S. et al. nf-core/pangenome. Zenodo https://doi.org/10.5281/zenodo.8202637 (2024).
Fischer, C. & Garrison, E. chfi/gfaestus: a pangenome graph browser. Zenodo https://doi.org/10.5281/zenodo.6954035 (2022).
Logsdon, G. A. et al. The structure, function and evolution of a complete human chromosome 8. Nature 593, 101–107 (2021).
Jain, C., Koren, S., Dilthey, A., Phillippy, A. M. & Aluru, S. A fast adaptive algorithm for computing whole-genome homology maps. Bioinformatics 34, i748–i756 (2018).
Llamas, B. et al. A strategy for building and using a human reference pangenome. F1000Res 8, 1751 (2021).
Vollger, M. R. et al. Increased mutation and gene conversion within human segmental duplications. Nature 617, 325–334 (2023).
Traag, V. A., Waltman, L. & van Eck, N. J. From louvain to leiden: guaranteeing well-connected communities. Sci. Rep. 9, 1–22 (2019).
Kosugi, S. et al. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 20, 1–18 (2019).
Poplin, R. et al. A universal snp and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
Guarracino, A. Pangenomes of multiple species for the “Building pangenome graphs” publication. Zenodo 10.5281/zenodo.7658895 (2023).
Acknowledgements
The authors gratefully acknowledge support from National Institutes of Health (NIH)/NIDA U01DA047638 (E.G.), NIH/NIGMS R01GM123489 (E.G. and P.P.), NIH/NIGMS R35GM142916 (P.H.S.) and NSF PPoSS Award 2118709 (E.G. and P.P.) and the Center for Integrative and Translational Genomics (E.G.). S.H. acknowledges funding from the Central Innovation Program (ZIM) for SMEs of the Federal Ministry for Economic Affairs and Energy of Germany. This work was supported by the BMBF-funded de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) (031A532B, 031A533A, 031A533B, 031A534A, 031A535A, 031A537A, 031A537B, 031A537C, 031A537D and 031A538A). A.A.G. acknowledges the Alexander von Humboldt Foundation in the framework of Sofja Kovalevskaja Award and Deutsche Forschungsgemeinschaft (German Research Foundation) project no. 497667402. S.N. acknowledges support from iFIT funded by the Deutsche Forschungsgemeinschaft under Germany’s Excellence Strategy—EXC 2180—390900677 and CMFI under EXC 2124-390838134. The authors also acknowledge funding from the Max Planck Society (Z.B., S.V., C.K. and D.W.). Co-financed by the Connecting Europe Facility of the European Union. M.N.M. is fully funded by the EU H2020 ALPACA ITN under the Marie Skłodowska-Curie grant agreement no. 956229. The authors thank members of the HPRC Pangenome Working Group for their insightful discussion and feedback and members of the HPRC production teams for their development of resources used in our exposition.
Author information
Authors and Affiliations
Contributions
E.G. conceived the project. E.G., S.N., N.S., V.C., R.W.W. and P.P. provided project guidance. E.G., A.G., Simon H. and S.M.S. developed the software. E.G., A.G., Simon H., V.C., R.W.W. and P.P. edited the paper. E.G. designed the experiments. E.G., A.G., and L.T. evaluated quality. E.G., A.G., Simon H., F.V., Z.B., L.T., J.H., S.V., C.K., K.T., R.L.R.P., A.A.G., S.N., Z.Y., M.N.M., F.L.N., H.C., J.d.L. and P.H.S. conducted testing. A.G. executed the experiments. A.G. and Simon H. provided documentation. F.V., D.G.A., H.C. and V.C. worked on Mus musculus and Rattus norvegicus. Z.B. and Sanwen H. contributed to the tomato pangenome. L.T. and G.L. worked on Saccharomyces cerevisiae and S. paradoxus. J.H. contributed to the soy pangenome. S.V., C.K., Z.B. and D.W. worked on A. thaliana. S.V. determined parameter settings. K.T. and E.R. worked on Helicobacter pylori. A.A.G. contributed to Vicia fava. Z.Y. and J.d.L. worked on Neisseria mingitidis. F.L.N. and Y.W. worked on Escherichia coli and Coliphages. P.H.S. contributed to the primate pangenome. P.P. managed High Performance Computing.
Corresponding author
Ethics declarations
Competing interests
J.H. is employed by Computomics. D.W. holds equity in Computomics and consults for KWS SE. All other authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 SMOOTHXG.
Overview of the algorithmic steps in SMOOTHXG.
Extended Data Fig. 2 Impact of graph normalization on the human chromosome 6 pangenome graph.
Comparison of the raw SEQWISH graph and its normalized version after SMOOTHXG. The normalization process locally compresses and simplifies the graph structure, resulting in lower node degree and graph depth. This effect is particularly pronounced in repetitive regions such as centromeres and satellite sequences. Node degree and graph depth are visualized for both the original SEQWISH graph and the normalized graph.
Extended Data Fig. 3 Impact of graph normalization on complement component 4 (C4) pangenome graph.
Each bar represents a haplotype and black lines on the bottom represent the graph topology. Paths are colored by using the Spectra color palette with four levels of node depths: white indicates no depth, while gray, red, and yellow indicate depths 1, 2, and greater than or equal to 3, respectively. a) C4 subgraph extracted from the chromosome 6 graph built without SMOOTHXG. b) C4 subgraph extracted from the chromosome 6 graph built without SMOOTHXG and sorted. c). C4 subgraph extracted from the chromosome 6 graph built with SMOOTHXG. The two references present two different allele copies of the C4 genes (red = 2X coverage), both of them including the HERV sequence. The entirely gray paths have one copy of these genes (gray = 1X coverage). HG01071#2 presents three copies of the locus (orange = 3X coverage), of which one contains the HERV sequence (gray in the middle of the orange).
Extended Data Fig. 4 Human chromosome 6 variant calling performance.
Precision, recall, and F1-score of small variants in the H. sapiens chromosome 6 pangenome graph relative to HiFi–DeepVariant calls. Comparisons are made whole-chromosome and then stratified by the GIAB (v.3.0) genomic context. The 44 samples evaluated are colored by superpopulation. AFR = African, AMR = Ad Mixed American, EAS = East Asian, SAS = South Asian.
Extended Data Fig. 5 A. thaliana variant calling performance.
Precision, recall, and F1-score of small variants in the A. thaliana pangenome graph relative to HiFi–DeepVariant calls. Comparisons are made whole-genome and then stratified by genomic context. Easy and Hard regions exclude and include, respectively, rDNA, centromere, and Trasposable Elements. The 64 samples evaluated are colored by population. The low precision for Lor-16 and Met-6 is due to the high heterozygosity of these 2 samples.
Extended Data Fig. 6 Tomato variant calling performance.
Precision, recall, and F1-score of small variants in the tomato pangenome graph relative to HiFi–DeepVariant calls. Comparisons are made whole-genome and then stratified by genomic context. Easy and Hard regions exclude and include, respectively, Transposable elements. The 5 samples evaluated are colored by group. BIG = S. lycopersicum, big-fruited tomato; CER = S. lycopersicum var. cerasiforme, cherry tomato; PIM = S. pimpinellifolium, the progenitor of cultivated tomatoes.
Supplementary information
Supplementary Information
Supplementary discussion and Figs. 1–9.
Supplementary Table
Tables with list of all accessions for all pangenomes.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Garrison, E., Guarracino, A., Heumos, S. et al. Building pangenome graphs. Nat Methods 21, 2008–2012 (2024). https://doi.org/10.1038/s41592-024-02430-3
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41592-024-02430-3
This article is cited by
-
Comparative genomics profiling of Citrus species reveals the diversity and disease responsiveness of the GLP pangenes family
BMC Plant Biology (2025)
-
Pangenome graph mitigates heterozygosity overestimation from mapping bias: a case study in Chinese indigenous pigs
BMC Biology (2025)
-
The role of pangenomics in orphan crop improvement
Nature Communications (2025)
-
Choose your human genome reference wisely
Nature Methods (2025)
-
The reference genome of the human diploid cell line RPE-1
Nature Communications (2025)