Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Population-level structural variant characterization using pangenome graphs

Abstract

Population-level structural variant (SV) profiling is crucial in the era of pangenomes. However, identifying SVs from genome assemblies and pangenome graphs remains a substantial challenge. Here we present Swave, a sequence-to-image, deep learning-based method that accurately resolves both simple and complex SVs, along with their population characteristics, from assembly-derived pangenome graphs. Swave introduces ‘projection waves’ to summarize the dotplot images that capture mapping patterns between reference and SV-indicating alleles in the pangenome. Then, a recurrent neural network distinguishes true SV signals from background noise introduced by genomic repeats. Swave demonstrates superior performance in both SV-type classification and genotyping compared with existing methods. When applied to healthy cohorts and rare-disease cohorts, Swave reveals complex and polymorphic SV patterns across human populations and identifies potentially pathogenic SVs. These advancements will facilitate the creation of comprehensive population-level SV catalogs, deepening our understanding of SVs in genetic diversity and disease associations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic overview of Swave.
The alternative text for this image may have been generated using AI.
Fig. 2: Benchmarking the performance of Swave and comparative methods.
The alternative text for this image may have been generated using AI.
Fig. 3: Characterization and benchmarking of inversions.
The alternative text for this image may have been generated using AI.
Fig. 4: Population-scale discovery of SSVs and CSVs using Swave.
The alternative text for this image may have been generated using AI.
Fig. 5: Discovery of potentially pathogenic SVs in the GA4K rare-disease cohort.
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Data availability

All the published reference genomes, sample assemblies and SV callsets are presented in Supplementary Table 17. The callsets on healthy and disease cohorts produced by Swave are available via Zenodo at https://doi.org/10.5281/zenodo.18229680 and https://doi.org/10.5281/zenodo.18425621 (refs. 54,55).

Code availability

Swave is available via GitHub at https://github.com/songbowang125/Swave.git (ref. 56). The custom scripts and scripts for reproducing the results in this paper are available via GitHub at https://github.com/songbowang125/Swave-Utils.git (ref. 57).

References

  1. Ahsan, M. U., Liu, Q., Perdomo, J. E., Fang, L. & Wang, K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat. Methods 20, 1143–1158 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Wang, S. et al. De novo and somatic structural variant discovery with SVision-pro. Nat. Biotechnol. 43, 181–185 (2025).

    Article  CAS  PubMed  Google Scholar 

  3. Ding, W. et al. Adaptive functions of structural variants in human brain development. Sci. Adv. 10, eadl4600 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Collins, R. L. & Talkowski, M. E. Diversity and consequences of structural variation in the human genome. Nat. Rev. Genet. 26, 443–462 (2025).

  5. Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020).

    Article  CAS  PubMed  Google Scholar 

  6. Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Lin, J. et al. SVision: a deep learning approach to resolve complex structural variants. Nat. Methods 19, 1230–1233 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Chen, Y. et al. Deciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak. Nat. Commun. 14, 283 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Popic, V. et al. Cue: a deep-learning framework for structural variant discovery and genotyping. Nat. Methods 20, 559–568 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Smolka, M. et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat. Biotechnol. 42, 1571–1580 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Denti, L., Khorsand, P., Bonizzoni, P., Hormozdiari, F. & Chikhi, R. SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads. Nat. Methods 20, 550–558 (2023).

    Article  CAS  PubMed  Google Scholar 

  12. Wang, S. & Ye, K. Deep-learning based representation and recognition for genome variants-from SNVs to structural variants. Natl Sci. Rev. 11, nwae335 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).

    Article  CAS  PubMed  Google Scholar 

  14. Liu, Y. H., Luo, C., Golding, S. G., Ioffe, J. B. & Zhou, X. M. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat. Commun. 15, 2447 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, abf7117 (2021).

    Article  Google Scholar 

  16. Heller, D. & Vingron, M. SVIM-asm: structural variant detection from haploid and diploid genome assemblies. Bioinformatics 36, 5519–5521 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Liao, W. W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Gao, Y. et al. A pangenome reference of 36 Chinese populations. Nature 619, 112–121 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Logsdon, G. A. et al. Complex genetic variation in nearly complete human genomes. Nature 644, 430–441 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Groza, C. et al. Pangenome graphs improve the analysis of structural variants in rare genetic diseases. Nat. Commun. 15, 657 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Yilmaz, F. et al. Reconstruction of the human amylase locus reveals ancient duplications seeding modern-day variation. Science 386, eadn0609 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Bolognini, D. et al. Recurrent evolution and selection shape structural diversity at the amylase locus. Nature 634, 617–625 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Plender, E. G. et al. Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B. Am. J. Hum. Genet. 111, 1700–1716 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Kirsche, M. et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat. Methods 20, 408–417 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Zheng, Z. Y. et al. A sequence-aware merger of genomic structural variations at population scale. Nat. Commun. 15, 960 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Jayakodi, M. et al. Structural variation in the pangenome of wild and domesticated barley. Nature 636, 654–662 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Bian, P. et al. A graph-based goat pangenome reveals structural variations involved in domestication and adaptation. Mol. Biol. Evol. 41, msae251 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Hickey, G. et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat. Biotechnol. 42, 663–673 (2024).

    Article  CAS  PubMed  Google Scholar 

  33. Garrison, E. et al. Building pangenome graphs. Nat. Methods 21, 2008–2012 (2024).

    Article  CAS  PubMed  Google Scholar 

  34. Cui, Y., Peng, C., Xia, Z., Yang, C. & Guo, Y. A survey of sequence-to-graph mapping algorithms in the pangenome era. Genome Biol. 26, 138 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Andreace, F., Lechat, P., Dufresne, Y. & Chikhi, R. Comparing methods for constructing and representing human pangenome graphs. Genome Biol. 24, 274 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Garrison, E., Kronenberg, Z. N., Dawson, E. T., Pedersen, B. S. & Prins, P. A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar. PLoS Comput. Biol. 18, e1009123 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Porubsky, D. et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 185, 1986–2005 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Vollger, M. R. et al. Segmental duplications and their variation in a complete human genome. Science 376, abj6965 (2022).

    Article  Google Scholar 

  42. Yang, J. & Chaisson, M. J. P. TT-Mars: structural variants assessment based on haplotype-resolved assemblies. Genome Biol. 23, 110 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Zhao, X., Weber, A. M. & Mills, R. E. A recurrence-based approach for validating structural variation using long-read sequencing technology. Gigascience 6, 1–9 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Pajic, P., Lin, Y. L., Xu, D. & Gokcumen, O. The psoriasis-associated deletion of late cornified envelope genes LCE3B and LCE3C has been maintained under balancing selection since human Denisovan divergence. BMC Evol. Biol. 16, 265 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Ago, Y., Asano, S., Hashimoto, H. & Waschek, J. A. A. Probing the VIPR2 microduplication linkage to schizophrenia in animal and cellular models. Front. Neurosci. 15, 717490 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Chen, C. H. et al. Identification of rare mutations of the vasoactive intestinal peptide receptor 2 gene in schizophrenia. Psychiatric Genet. 32, 125–130 (2022).

    Article  CAS  Google Scholar 

  48. Pitera, J. E., Scambler, P. J. & Woolf, A. S. Fras1, a basement membrane-associated protein mutated in Fraser syndrome, mediates both the initiation of the mammalian kidney and the integrity of renal glomeruli. Hum. Mol. Genet. 17, 3953–3964 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Slavotinek, A., Li, C., Sherr, E. H. & Chudley, A. E. Mutation analysis of the FRAS1 gene demonstrates new mutations in a propositus with Fraser syndrome. Am. J. Med. Genet. A 140a, 1909–1914 (2006).

    Article  CAS  Google Scholar 

  50. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Vollger, M. R. et al. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann. Hum.Genet. 84, 125–140 (2020).

    Article  CAS  PubMed  Google Scholar 

  52. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Wang, S. Swave call on healthy cohort. Zenodo https://doi.org/10.5281/zenodo.18229680 (2026).

  55. Wang, S. Swave call on disease cohort. Zenodo https://doi.org/10.5281/zenodo.18425621 (2026).

  56. Wang, S. Swave code. Zenodo https://doi.org/10.5281/zenodo.18229263 (2026).

  57. Wang, S. Swave Utils code. Zenodo https://doi.org/10.5281/zenodo.18229275 (2026).

Download references

Acknowledgements

K.Y. is supported by the National Key R&D Program of China (grant no. 2022YFC3400300) and National Science Foundation of China (grant nos. 32125009 and 32430017). S.W. is supported by the National Science Foundation of China (grant no. 323B2015)

Author information

Authors and Affiliations

Authors

Contributions

K.Y. designed and supervised the research. S.W. developed the algorithm and performed the performance evaluation and downstream analysis. T.X. and P.Z. analyzed the impact of SVs.

Corresponding author

Correspondence to Kai Ye.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of pangenome construction and allele extraction in Swave.

a, Construction of pangenome graph using Minigraph with both reference and sample assemblies. The resulting graph is saved in GFA format, which encodes node sequences and directed edges between nodes. b, Assembly paths are recovered using–call function in Minigraph. Regions where paths diverge (Snarls) are identified as candidate structural variant loci. Allele sequences for each snarl are reconstructed by extracting the corresponding node sequences from the GFA. c, Based on the Minigraph–all outputs, Swave determines carrier assemblies for each allele and proceeds to generate dotplots for each reference-alternative pair in the next processing module. d, Swave’s handling of phasing information and multi-allelic loci. Sample genotypes are obtained by joining all haplotype genotypes.

Extended Data Fig. 2 Dotplot generation and projection.

a, Base-level refinement of kmer-based dotplots. Initial alignment introduces (k-1) base gaps near SV breakpoints. Swave performs base-level remapping at kmer stop-matching boundaries to improve breakpoint resolution for downstream SV classification. b, Influence of genomic repeats on wave patterns. Dense, repetitive regions generate abundant spurious matches in dotplots, resulting fluctuating wave signals upon projection.

Extended Data Fig. 3 Recurrent Neural Network for SV classification in Swave.

a, Projected wave signals are encoded as four-element tuples per genomic segment, comprising span length, background average wave value, and the differences between SV-implying and background waves for both forward and reverse matches. These tuples serve as the input for the RNN classification model. b, A one-layer Bi-LSTM with 64 hidden units forms the core of the RNN, enabling context-aware classification of SV components across the sequence. c, The time and memory consumption were performed using three datasets, including HGSVC3 (130 haplotypes), Health cohort (HGSVC3 + HPRC + CPC, 334 haplotypes) and Disease cohort (GA4K, 574 haplotypes). Using a person computer (CPU: Intel Core i9-13900K, Max Memory: 32GB), Swave run with 8 threads. Using a computing cluster node (CPU: Intel Xeon Gold 6240 R, Max Memory: 376GB), the computational process of Swave could be accelerated by using 24 threads.

Extended Data Fig. 4 Performance evaluation results.

a, F1-score comparison for simple structural variant (SSV) detection. b, F1-score comparison for complex structural variant (CSV) detection. c, Mendelian consistency across three trio datesets. Average consistencys are noted on the plot. d, Genotyping (GT) missing rate across two population datasets. Average missing rates on the two datasets are noted on the plot. e, Improvements in genotyping performance following PanPop refinement. We applied SVIM-asm followed by 3 merging tools on 3 trios, respectively, making n = 9. The boxplot defines the median (Q2, 50th percentile), first quartile (Q1, 25th percentile) and third quartile (Q3, 75th percentile). The bounds of box, that is interquartile range (IQR), of the boxplot is between Q1 and Q3. The minima and maxima values are defined as Q1-1.5*IQR and Q3 + 1.5*IQR, respectively. The whiskers are values between minima and Q1 as well as between Q3 and maxima. Values falling outside the Q1 – Q3 range are plotted as outliers of the data.

Extended Data Fig. 5 Validation and illustration of inversions.

a and b, Validation results for all detected balanced and complex inversions. Three orthogonal metrics were applied: mapping integrity, TT-mars, and Vapor. c, Comparison of breakpoint accuracy of the 52 overlapped inversions between Swave and PAV. The boxplot defines the median (Q2, 50th percentile), first quartile (Q1, 25th percentile) and third quartile (Q3, 75th percentile). The bounds of box, that is interquartile range (IQR), of the boxplot is between Q1 and Q3. The minima and maxima values are defined as Q1-1.5*IQR and Q3 + 1.5*IQR, respectively. The whiskers are values between minima and Q1 as well as between Q3 and maxima. Values falling outside the Q1 – Q3 range are plotted as outliers of the data. d, Illustration of breakpoint distortion caused by inverted segmental duplications (SDs). While PAV’s breakpoints are frequently shifted due to alignment ambiguity, Swave maintains accurate breakpoint placement within repetitive regions.

Extended Data Fig. 6 Characterization of polymorphic scarred inversions.

a, Example of a polymorphic scarred inversion snarl containing five distinct alleles (1), generated by combinatorial arrangements of five unique internal scars across four genomic regions. The most complex variant includes four separate scars (2). b, Length distribution of all detected scars (n = 81), ranging from 61 bp to 18,451 bp. c, Repeat annotation of all scars (n = 81). d, Example of polymorphic scarred inversions driven by repetitive elements, where two repeat expansions give rise to insertion scars of difference lengths.

Extended Data Fig. 7 Rare and complex structural variants revealed by Swave.

a, Pangenome graph structure of snarl ‘>s21910 > s21914’. A rare CSV allele introduced a novel traversal path not observed among the reported alleles. b, IGV snapshot of snarl ‘>s21910 > s21914’, illustrating co-occurrence of two distinct SSVs and one CSV. The rare SSV (67 kb deletion) extended the common 32 bp deletion, where the rare CSV (a duplication flanked by a deletion) added a 58 kb duplication at the right breakpoint of the frequent 32 kb simple deletion. c, Pangenome graph of snarl ‘>s10752 > s10754’, showing a novel CSV locus, structurally distinct from previously reported variants. d, Illustration of a rare scarred inversion that partially disrupts the coding structure of VIPR2, a gene associated with neuropsychiatric disorder.

Extended Data Fig. 8 How potentially pathogenic structural variants affect genes.

a, Mapping of residue-level disruptions caused by ClinVar pathogenic variants and the CSV detected by Swave. b, Structural annotation of the CYP17A1 protein highlights two functional binding sites, as sourced from UniProt. c Mapping of residue-level disruptions caused by ClinVar pathogenic variants and the CSV detected by Swave. d, Schematic of a simple structural variant, a 411 bp inversion, disrupting the 2nd exon of gene HYLS1, a gene implicated in Hydrolethalus Syndrome. e, Representation of a 43 kb deletion spanning introns 6 to 14 of gene FRAS1, a gene associated with Fraser syndrome.

Extended Data Fig. 9 Genotyping incompleteness associated with unresolved pangenome graph regions.

a, Mapping results of a carrier assembly exhibiting missing genotypes across four snarls. b, Genome-wide distribution of snarls with missing genotypes across HGSVC samples. The Y-axis indicates the number of assemblies lacking mappable sequence at each snarl.

Extended Data Fig. 10 Misclassified dispersed-duplications into insertions.

Dispersed duplications—where the source sequence originates from distant loci (a) on the same chromosome or from different chromosomes (b)—pose challenges for Swave. When generating dotplots, Swave extends the reference regions by twice the length of the alternative sequence on both sides. Consequently, if the duplicated source sequence lies outside this extended window, Swave fails to capture it and instead reports it as an insertion. c, Using the 65 samples from HGSVC, we compared Swave’s outputs with dispersed duplications reported by SVision-pro. We found that Swave misclassified 0–4 duplications with distant same-chromosome sources and 10–51 duplications with cross-chromosome sources as insertions. The boxplot defines the median (Q2, 50th percentile), first quartile (Q1, 25th percentile) and third quartile (Q3, 75th percentile). The bounds of box, that is interquartile range (IQR), of the boxplot is between Q1 and Q3. The minima and maxima values are defined as Q1-1.5*IQR and Q3 + 1.5*IQR, respectively. The whiskers are values between minima and Q1 as well as between Q3 and maxima. Values falling outside the Q1 – Q3 range are plotted as outliers of the data.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Xu, T., Zhang, P. et al. Population-level structural variant characterization using pangenome graphs. Nat Genet 58, 664–672 (2026). https://doi.org/10.1038/s41588-026-02538-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41588-026-02538-6

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research