Population-level structural variant characterization using pangenome graphs

Wang, Songbo; Xu, Tun; Zhang, Pengyu; Ye, Kai

doi:10.1038/s41588-026-02538-6

Article
Published: 10 March 2026

Population-level structural variant characterization using pangenome graphs

Nature Genetics volume 58, pages 664–672 (2026) Cite this article

6511 Accesses
1 Citations
7 Altmetric
Metrics details

Subjects

Abstract

Population-level structural variant (SV) profiling is crucial in the era of pangenomes. However, identifying SVs from genome assemblies and pangenome graphs remains a substantial challenge. Here we present Swave, a sequence-to-image, deep learning-based method that accurately resolves both simple and complex SVs, along with their population characteristics, from assembly-derived pangenome graphs. Swave introduces ‘projection waves’ to summarize the dotplot images that capture mapping patterns between reference and SV-indicating alleles in the pangenome. Then, a recurrent neural network distinguishes true SV signals from background noise introduced by genomic repeats. Swave demonstrates superior performance in both SV-type classification and genotyping compared with existing methods. When applied to healthy cohorts and rare-disease cohorts, Swave reveals complex and polymorphic SV patterns across human populations and identifies potentially pathogenic SVs. These advancements will facilitate the creation of comprehensive population-level SV catalogs, deepening our understanding of SVs in genetic diversity and disease associations.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Schematic overview of Swave.**

**Fig. 2: Benchmarking the performance of Swave and comparative methods.**

**Fig. 3: Characterization and benchmarking of inversions.**

**Fig. 4: Population-scale discovery of SSVs and CSVs using Swave.**

**Fig. 5: Discovery of potentially pathogenic SVs in the GA4K rare-disease cohort.**

Pangenome graphs improve the analysis of structural variants in rare genetic diseases

Article Open access 22 January 2024

A survey of algorithms for the detection of genomic structural variants from long-read sequencing data

Article 29 June 2023

Genome-wide associations of structural variants with human traits through imputation from long-read assemblies

Article Open access 20 May 2026

Data availability

All the published reference genomes, sample assemblies and SV callsets are presented in Supplementary Table 17. The callsets on healthy and disease cohorts produced by Swave are available via Zenodo at https://doi.org/10.5281/zenodo.18229680 and https://doi.org/10.5281/zenodo.18425621 (refs. ^54,55).

Code availability

Swave is available via GitHub at https://github.com/songbowang125/Swave.git (ref. ⁵⁶). The custom scripts and scripts for reproducing the results in this paper are available via GitHub at https://github.com/songbowang125/Swave-Utils.git (ref. ⁵⁷).

References

Ahsan, M. U., Liu, Q., Perdomo, J. E., Fang, L. & Wang, K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat. Methods 20, 1143–1158 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wang, S. et al. De novo and somatic structural variant discovery with SVision-pro. Nat. Biotechnol. 43, 181–185 (2025).
Article CAS PubMed Google Scholar
Ding, W. et al. Adaptive functions of structural variants in human brain development. Sci. Adv. 10, eadl4600 (2024).
Article CAS PubMed PubMed Central Google Scholar
Collins, R. L. & Talkowski, M. E. Diversity and consequences of structural variation in the human genome. Nat. Rev. Genet. 26, 443–462 (2025).
Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020).
Article CAS PubMed Google Scholar
Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).
Article PubMed PubMed Central Google Scholar
Lin, J. et al. SVision: a deep learning approach to resolve complex structural variants. Nat. Methods 19, 1230–1233 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. Deciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak. Nat. Commun. 14, 283 (2023).
Article CAS PubMed PubMed Central Google Scholar
Popic, V. et al. Cue: a deep-learning framework for structural variant discovery and genotyping. Nat. Methods 20, 559–568 (2023).
Article CAS PubMed PubMed Central Google Scholar
Smolka, M. et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat. Biotechnol. 42, 1571–1580 (2024).
Article CAS PubMed PubMed Central Google Scholar
Denti, L., Khorsand, P., Bonizzoni, P., Hormozdiari, F. & Chikhi, R. SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads. Nat. Methods 20, 550–558 (2023).
Article CAS PubMed Google Scholar
Wang, S. & Ye, K. Deep-learning based representation and recognition for genome variants-from SNVs to structural variants. Natl Sci. Rev. 11, nwae335 (2024).
Article CAS PubMed PubMed Central Google Scholar
Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).
Article CAS PubMed Google Scholar
Liu, Y. H., Luo, C., Golding, S. G., Ioffe, J. B. & Zhou, X. M. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat. Commun. 15, 2447 (2024).
Article CAS PubMed PubMed Central Google Scholar
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, abf7117 (2021).
Article Google Scholar
Heller, D. & Vingron, M. SVIM-asm: structural variant detection from haploid and diploid genome assemblies. Bioinformatics 36, 5519–5521 (2021).
Article PubMed PubMed Central Google Scholar
Liao, W. W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Article CAS PubMed PubMed Central Google Scholar
Gao, Y. et al. A pangenome reference of 36 Chinese populations. Nature 619, 112–121 (2023).
Article CAS PubMed PubMed Central Google Scholar
Logsdon, G. A. et al. Complex genetic variation in nearly complete human genomes. Nature 644, 430–441 (2025).
Article CAS PubMed PubMed Central Google Scholar
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
Article CAS PubMed PubMed Central Google Scholar
Groza, C. et al. Pangenome graphs improve the analysis of structural variants in rare genetic diseases. Nat. Commun. 15, 657 (2024).
Article CAS PubMed PubMed Central Google Scholar
Yilmaz, F. et al. Reconstruction of the human amylase locus reveals ancient duplications seeding modern-day variation. Science 386, eadn0609 (2024).
Article CAS PubMed PubMed Central Google Scholar
Bolognini, D. et al. Recurrent evolution and selection shape structural diversity at the amylase locus. Nature 634, 617–625 (2024).
Article CAS PubMed PubMed Central Google Scholar
Plender, E. G. et al. Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B. Am. J. Hum. Genet. 111, 1700–1716 (2024).
Article CAS PubMed PubMed Central Google Scholar
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kirsche, M. et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat. Methods 20, 408–417 (2023).
Article CAS PubMed PubMed Central Google Scholar
English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zheng, Z. Y. et al. A sequence-aware merger of genomic structural variations at population scale. Nat. Commun. 15, 960 (2024).
Article CAS PubMed PubMed Central Google Scholar
Jayakodi, M. et al. Structural variation in the pangenome of wild and domesticated barley. Nature 636, 654–662 (2024).
Article CAS PubMed PubMed Central Google Scholar
Bian, P. et al. A graph-based goat pangenome reveals structural variations involved in domestication and adaptation. Mol. Biol. Evol. 41, msae251 (2024).
Article CAS PubMed PubMed Central Google Scholar
Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).
Article PubMed PubMed Central Google Scholar
Hickey, G. et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat. Biotechnol. 42, 663–673 (2024).
Article CAS PubMed Google Scholar
Garrison, E. et al. Building pangenome graphs. Nat. Methods 21, 2008–2012 (2024).
Article CAS PubMed Google Scholar
Cui, Y., Peng, C., Xia, Z., Yang, C. & Guo, Y. A survey of sequence-to-graph mapping algorithms in the pangenome era. Genome Biol. 26, 138 (2025).
Article PubMed PubMed Central Google Scholar
Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).
Article CAS PubMed PubMed Central Google Scholar
Andreace, F., Lechat, P., Dufresne, Y. & Chikhi, R. Comparing methods for constructing and representing human pangenome graphs. Genome Biol. 24, 274 (2023).
Article CAS PubMed PubMed Central Google Scholar
Garrison, E., Kronenberg, Z. N., Dawson, E. T., Pedersen, B. S. & Prins, P. A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar. PLoS Comput. Biol. 18, e1009123 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).
Article CAS PubMed PubMed Central Google Scholar
Porubsky, D. et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 185, 1986–2005 (2022).
Article CAS PubMed PubMed Central Google Scholar
Vollger, M. R. et al. Segmental duplications and their variation in a complete human genome. Science 376, abj6965 (2022).
Article Google Scholar
Yang, J. & Chaisson, M. J. P. TT-Mars: structural variants assessment based on haplotype-resolved assemblies. Genome Biol. 23, 110 (2022).
Article PubMed PubMed Central Google Scholar
Zhao, X., Weber, A. M. & Mills, R. E. A recurrence-based approach for validating structural variation using long-read sequencing technology. Gigascience 6, 1–9 (2017).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Pajic, P., Lin, Y. L., Xu, D. & Gokcumen, O. The psoriasis-associated deletion of late cornified envelope genes LCE3B and LCE3C has been maintained under balancing selection since human Denisovan divergence. BMC Evol. Biol. 16, 265 (2016).
Article PubMed PubMed Central Google Scholar
Ago, Y., Asano, S., Hashimoto, H. & Waschek, J. A. A. Probing the VIPR2 microduplication linkage to schizophrenia in animal and cellular models. Front. Neurosci. 15, 717490 (2021).
Article PubMed PubMed Central Google Scholar
Chen, C. H. et al. Identification of rare mutations of the vasoactive intestinal peptide receptor 2 gene in schizophrenia. Psychiatric Genet. 32, 125–130 (2022).
Article CAS Google Scholar
Pitera, J. E., Scambler, P. J. & Woolf, A. S. Fras1, a basement membrane-associated protein mutated in Fraser syndrome, mediates both the initiation of the mammalian kidney and the integrity of renal glomeruli. Hum. Mol. Genet. 17, 3953–3964 (2008).
Article CAS PubMed PubMed Central Google Scholar
Slavotinek, A., Li, C., Sherr, E. H. & Chudley, A. E. Mutation analysis of the FRAS1 gene demonstrates new mutations in a propositus with Fraser syndrome. Am. J. Med. Genet. A 140a, 1909–1914 (2006).
Article CAS Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Vollger, M. R. et al. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann. Hum.Genet. 84, 125–140 (2020).
Article CAS PubMed Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Wang, S. Swave call on healthy cohort. Zenodo https://doi.org/10.5281/zenodo.18229680 (2026).
Wang, S. Swave call on disease cohort. Zenodo https://doi.org/10.5281/zenodo.18425621 (2026).
Wang, S. Swave code. Zenodo https://doi.org/10.5281/zenodo.18229263 (2026).
Wang, S. Swave Utils code. Zenodo https://doi.org/10.5281/zenodo.18229275 (2026).

Download references

Acknowledgements

K.Y. is supported by the National Key R&D Program of China (grant no. 2022YFC3400300) and National Science Foundation of China (grant nos. 32125009 and 32430017). S.W. is supported by the National Science Foundation of China (grant no. 323B2015)

Author information

Authors and Affiliations

School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
Songbo Wang, Tun Xu, Pengyu Zhang & Kai Ye
MOE Key Laboratory for Intelligent Networks and Networks Security, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
Songbo Wang, Tun Xu, Pengyu Zhang & Kai Ye
Center for Mathematical Medical, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an, China
Kai Ye
Genome Institute, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an, China
Kai Ye
Faculty of Science, Leiden University, Leiden, the Netherlands
Kai Ye

Authors

Songbo Wang
View author publications
Search author on:PubMed Google Scholar
Tun Xu
View author publications
Search author on:PubMed Google Scholar
Pengyu Zhang
View author publications
Search author on:PubMed Google Scholar
Kai Ye
View author publications
Search author on:PubMed Google Scholar

Contributions

K.Y. designed and supervised the research. S.W. developed the algorithm and performed the performance evaluation and downstream analysis. T.X. and P.Z. analyzed the impact of SVs.

Corresponding author

Correspondence to Kai Ye.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of pangenome construction and allele extraction in Swave.

a, Construction of pangenome graph using Minigraph with both reference and sample assemblies. The resulting graph is saved in GFA format, which encodes node sequences and directed edges between nodes. b, Assembly paths are recovered using–call function in Minigraph. Regions where paths diverge (Snarls) are identified as candidate structural variant loci. Allele sequences for each snarl are reconstructed by extracting the corresponding node sequences from the GFA. c, Based on the Minigraph–all outputs, Swave determines carrier assemblies for each allele and proceeds to generate dotplots for each reference-alternative pair in the next processing module. d, Swave’s handling of phasing information and multi-allelic loci. Sample genotypes are obtained by joining all haplotype genotypes.

Extended Data Fig. 2 Dotplot generation and projection.

a, Base-level refinement of kmer-based dotplots. Initial alignment introduces (k-1) base gaps near SV breakpoints. Swave performs base-level remapping at kmer stop-matching boundaries to improve breakpoint resolution for downstream SV classification. b, Influence of genomic repeats on wave patterns. Dense, repetitive regions generate abundant spurious matches in dotplots, resulting fluctuating wave signals upon projection.

Extended Data Fig. 3 Recurrent Neural Network for SV classification in Swave.

a, Projected wave signals are encoded as four-element tuples per genomic segment, comprising span length, background average wave value, and the differences between SV-implying and background waves for both forward and reverse matches. These tuples serve as the input for the RNN classification model. b, A one-layer Bi-LSTM with 64 hidden units forms the core of the RNN, enabling context-aware classification of SV components across the sequence. c, The time and memory consumption were performed using three datasets, including HGSVC3 (130 haplotypes), Health cohort (HGSVC3 + HPRC + CPC, 334 haplotypes) and Disease cohort (GA4K, 574 haplotypes). Using a person computer (CPU: Intel Core i9-13900K, Max Memory: 32GB), Swave run with 8 threads. Using a computing cluster node (CPU: Intel Xeon Gold 6240 R, Max Memory: 376GB), the computational process of Swave could be accelerated by using 24 threads.

Extended Data Fig. 4 Performance evaluation results.

a, F1-score comparison for simple structural variant (SSV) detection. b, F1-score comparison for complex structural variant (CSV) detection. c, Mendelian consistency across three trio datesets. Average consistencys are noted on the plot. d, Genotyping (GT) missing rate across two population datasets. Average missing rates on the two datasets are noted on the plot. e, Improvements in genotyping performance following PanPop refinement. We applied SVIM-asm followed by 3 merging tools on 3 trios, respectively, making n = 9. The boxplot defines the median (Q2, 50th percentile), first quartile (Q1, 25th percentile) and third quartile (Q3, 75th percentile). The bounds of box, that is interquartile range (IQR), of the boxplot is between Q1 and Q3. The minima and maxima values are defined as Q1-1.5*IQR and Q3 + 1.5*IQR, respectively. The whiskers are values between minima and Q1 as well as between Q3 and maxima. Values falling outside the Q1 – Q3 range are plotted as outliers of the data.

Extended Data Fig. 5 Validation and illustration of inversions.

a and b, Validation results for all detected balanced and complex inversions. Three orthogonal metrics were applied: mapping integrity, TT-mars, and Vapor. c, Comparison of breakpoint accuracy of the 52 overlapped inversions between Swave and PAV. The boxplot defines the median (Q2, 50th percentile), first quartile (Q1, 25th percentile) and third quartile (Q3, 75th percentile). The bounds of box, that is interquartile range (IQR), of the boxplot is between Q1 and Q3. The minima and maxima values are defined as Q1-1.5*IQR and Q3 + 1.5*IQR, respectively. The whiskers are values between minima and Q1 as well as between Q3 and maxima. Values falling outside the Q1 – Q3 range are plotted as outliers of the data. d, Illustration of breakpoint distortion caused by inverted segmental duplications (SDs). While PAV’s breakpoints are frequently shifted due to alignment ambiguity, Swave maintains accurate breakpoint placement within repetitive regions.

Extended Data Fig. 6 Characterization of polymorphic scarred inversions.

a, Example of a polymorphic scarred inversion snarl containing five distinct alleles (1), generated by combinatorial arrangements of five unique internal scars across four genomic regions. The most complex variant includes four separate scars (2). b, Length distribution of all detected scars (n = 81), ranging from 61 bp to 18,451 bp. c, Repeat annotation of all scars (n = 81). d, Example of polymorphic scarred inversions driven by repetitive elements, where two repeat expansions give rise to insertion scars of difference lengths.

Extended Data Fig. 7 Rare and complex structural variants revealed by Swave.

a, Pangenome graph structure of snarl ‘>s21910 > s21914’. A rare CSV allele introduced a novel traversal path not observed among the reported alleles. b, IGV snapshot of snarl ‘>s21910 > s21914’, illustrating co-occurrence of two distinct SSVs and one CSV. The rare SSV (67 kb deletion) extended the common 32 bp deletion, where the rare CSV (a duplication flanked by a deletion) added a 58 kb duplication at the right breakpoint of the frequent 32 kb simple deletion. c, Pangenome graph of snarl ‘>s10752 > s10754’, showing a novel CSV locus, structurally distinct from previously reported variants. d, Illustration of a rare scarred inversion that partially disrupts the coding structure of VIPR2, a gene associated with neuropsychiatric disorder.

Extended Data Fig. 8 How potentially pathogenic structural variants affect genes.

a, Mapping of residue-level disruptions caused by ClinVar pathogenic variants and the CSV detected by Swave. b, Structural annotation of the CYP17A1 protein highlights two functional binding sites, as sourced from UniProt. c Mapping of residue-level disruptions caused by ClinVar pathogenic variants and the CSV detected by Swave. d, Schematic of a simple structural variant, a 411 bp inversion, disrupting the 2^nd exon of gene HYLS1, a gene implicated in Hydrolethalus Syndrome. e, Representation of a 43 kb deletion spanning introns 6 to 14 of gene FRAS1, a gene associated with Fraser syndrome.

Extended Data Fig. 9 Genotyping incompleteness associated with unresolved pangenome graph regions.

a, Mapping results of a carrier assembly exhibiting missing genotypes across four snarls. b, Genome-wide distribution of snarls with missing genotypes across HGSVC samples. The Y-axis indicates the number of assemblies lacking mappable sequence at each snarl.

Extended Data Fig. 10 Misclassified dispersed-duplications into insertions.

Dispersed duplications—where the source sequence originates from distant loci (a) on the same chromosome or from different chromosomes (b)—pose challenges for Swave. When generating dotplots, Swave extends the reference regions by twice the length of the alternative sequence on both sides. Consequently, if the duplicated source sequence lies outside this extended window, Swave fails to capture it and instead reports it as an insertion. c, Using the 65 samples from HGSVC, we compared Swave’s outputs with dispersed duplications reported by SVision-pro. We found that Swave misclassified 0–4 duplications with distant same-chromosome sources and 10–51 duplications with cross-chromosome sources as insertions. The boxplot defines the median (Q2, 50th percentile), first quartile (Q1, 25th percentile) and third quartile (Q3, 75th percentile). The bounds of box, that is interquartile range (IQR), of the boxplot is between Q1 and Q3. The minima and maxima values are defined as Q1-1.5*IQR and Q3 + 1.5*IQR, respectively. The whiskers are values between minima and Q1 as well as between Q3 and maxima. Values falling outside the Q1 – Q3 range are plotted as outliers of the data.

Supplementary information

Reporting Summary (download PDF )

Peer Review File (download PDF )

Supplementary Tables (download XLSX )

Supplementary Tables 1–17.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, S., Xu, T., Zhang, P. et al. Population-level structural variant characterization using pangenome graphs. Nat Genet 58, 664–672 (2026). https://doi.org/10.1038/s41588-026-02538-6

Download citation

Received: 17 June 2025
Accepted: 10 February 2026
Published: 10 March 2026
Version of record: 10 March 2026
Issue date: March 2026
DOI: https://doi.org/10.1038/s41588-026-02538-6