Abstract
Structural variants (SVs) represent an important yet underexplored component of plant genome diversity. Here we present a graph-based cucumber pangenome constructed from 39 reference-quality genomes, including 27 newly assembled and 12 previously published. The pangenome captures 171,892 high-confidence SVs, which were genotyped across 447 wild and cultivated accessions. Our analyses reveal that, during cucumber domestication, a substantial portion of mildly deleterious SNPs were retained, whereas SVs were consistently purged, highlighting their highly deleterious nature. During geographical expansion, a reduced SV burden and a younger age of SVs compared to SNPs were observed, suggesting stronger purifying selection acting on SVs. Introgressions from wild populations increased SV burden, potentially due to hitchhiking. Notably, incorporating SV burden into genomic prediction models improved prediction accuracy for several agronomically important traits. This study illuminates SV dynamics during cucumber domestication and range expansion and underscores the implications of SVs for future cucumber breeding.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
Raw genome resequencing reads have been deposited in the National Center for Biotechnology Information (NCBI) BioProject database under the accession no. PRJNA1192329. Raw HiFi reads and genome assemblies have been deposited in the NCBI Bioproject database under the accession no. PRJNA844366. Genome assemblies and annotations, SNPs, small indels and SVs in VCF format are available at CuGenDBv2 (http://cucurbitgenomics.org/v2/ftp/pan-genome/cucumber/).
Code availability
All pipelines and customized scripts used in this study are available via GitHub at https://github.com/xuebozhao16/CucurbitGenomics and via Zenodo at https://doi.org/10.5281/zenodo.17872506 (ref. 93).
References
Ho, S. S., Urban, A. E. & Mills, R. E. Structural variation in the sequencing era. Nat. Rev. Genet. 21, 171–189 (2020).
Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Sherman, R. M. & Salzberg, S. L. Pan-genomics in the human genome era. Nat. Rev. Genet. 21, 243–254 (2020).
Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022).
Liao, W. W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161 (2020).
Li, H. & Durbin, R. Genome assembly in the telomere-to-telomere era. Nat. Rev. Genet. 25, 658–670 (2024).
Schreiber, M., Jayakodi, M., Stein, N. & Mascher, M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat. Rev. Genet. 25, 563–577 (2024).
Huang, S. et al. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, 1275–1281 (2009).
Qi, J. et al. A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity. Nat. Genet. 45, 1510–1515 (2013).
Zhang, Z. et al. Genome-wide mapping of structural variations reveals a copy number variant that determines reproductive morphology in cucumber. Plant Cell 27, 1595–1604 (2015).
Li, Q. et al. A chromosome-scale genome assembly of cucumber (Cucumis sativus L.). Gigascience 8, giz072 (2019).
Li, H. et al. Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber. Nat. Commun. 13, 682 (2022).
Guan, J. et al. A near-complete cucumber reference genome assembly and Cucumber-DB, a multi-omics database. Mol. Plant 17, 1178–1182 (2024).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).
Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021).
Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).
Bayer, P. E. et al. Sequencing the USDA core soybean collection reveals gene loss during domestication and breeding. Plant Genome 15, e20109 (2022).
Sun, X. et al. Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication. Nat. Genet. 52, 1423–1432 (2020).
Marcussen, T. et al. Ancient hybridizations among the ancestral genomes of bread wheat. Science 345, 1250092 (2014).
Ebler, J. et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat. Genet. 54, 518–525 (2022).
Shang, Y. et al. Biosynthesis, regulation, and domestication of bitterness in cucumber. Science 346, 1084–1088 (2014).
Lun, Y. et al. A CsYcf54 variant conferring light green coloration in cucumber. Euphytica 208, 509–517 (2016).
Wang, X. et al. The USDA cucumber (Cucumis sativus L.) collection: genetic diversity, population structure, genome-wide association studies, and core collection development. Hortic. Res. 5, 64 (2018).
Weng, Y. Cucumis sativus chromosome evolution, domestication, and genetic diversity: implications for cucumber breeding. Plant Breed. Rev. 44, 79–111 (2020).
Lu, J. et al. The accumulation of deleterious mutations in rice genomes: a hypothesis on the cost of domestication. Trends Genet. 22, 126–131 (2006).
Lozano, R. et al. Comparative evolutionary genetics of deleterious load in sorghum and maize. Nat. Plants 7, 17–24 (2021).
Zhou, Y. et al. The population genetics of structural variants in grapevine domestication. Nat. Plants 5, 965–979 (2019).
Casillas, S. & Barbadilla, A. Molecular population genetics. Genetics 205, 1003–1035 (2017).
Keightley, P. D. & Eyre-Walker, A. Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 177, 2251–2261 (2007).
Peischl, S., Dupanloup, I., Kirkpatrick, M. & Excoffier, L. On the accumulation of deleterious mutations during range expansions. Mol. Ecol. 22, 5972–5982 (2013).
Lohmueller, K. E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994–997 (2008).
Peischl, S. & Excoffier, L. Expansion load: Recessive mutations and the role of standing genetic variation. Mol. Ecol. 24, 2084–2094 (2015).
Bertorelle, G. et al. Genetic load: genomic estimates and applications in non-model animals. Nat. Rev. Genet. 23, 492–503 (2022).
Frankham, R. Relationship of genetic variation to population size in wildlife. Conserv. Biol. 10, 1500–1508 (1996).
Ohta, T. Slightly deleterious mutant substitutions in evolution. Nature 246, 96–98 (1973).
Harrison, R. G. & Larson, E. L. Hybridization, introgression, and the nature of species boundaries. J. Hered. 105, 795–809 (2014).
Rotival, M. & Quintana-Murci, L. Functional consequences of archaic introgression and their impact on fitness. Genome Biol. 21, 19–22 (2020).
Janzen, G. M., Wang, L. & Hufford, M. B. The extent of adaptive wild introgression in crops. New Phytol. 221, 1279–1288 (2018).
Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).
Martin, S. H., Davey, J. W. & Jiggins, C. D. Evaluating the use of ABBA-BABA statistics to locate introgressed loci. Mol. Biol. Evol. 32, 244–257 (2015).
Kaya, C., Uğurlar, F. & Adamakis, I. D. S. Molecular mechanisms of CBL-CIPK signaling pathway in plant abiotic stress tolerance and hormone crosstalk. Int. J. Mol. Sci. 25, 5043 (2024).
Daetwyler, H. D., Calus, M. P. L., Pong-Wong, R., de los Campos, G. & Hickey, J. M. Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 193, 347–365 (2013).
Yang, J. et al. Incomplete dominance of deleterious alleles contributes substantially to trait variation and heterosis in maize. PLoS Genet. 13, e1007019 (2017).
Ramstein, G. P. & Buckler, E. S. Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize. Genome Biol. 23, 183 (2022).
Wu, Y. et al. Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding. Cell 186, 2313–2328 (2023).
Lin, Y.-C., Weng, Y., Fei, Z. & Grumet, R. Mining the cucumber core collection: phenotypic and genetic characterization of morphological diversity for fruit quality characteristics. Hortic. Res. 12, uhae340 (2024).
Guo, D. et al. A pangenome reference of wild and cultivated rice. Nature 642, 662–671 (2025).
Liu, Z. et al. Grapevine pangenome facilitates trait genetics and genomic breeding. Nat. Genet. 56, 2804–2814 (2024).
Chen, J. et al. Pangenome analysis reveals genomic variations associated with domestication traits in broomcorn millet. Nat. Genet. 55, 2243–2254 (2023).
Hufford, M. B. et al. The genomic signature of crop-wild introgression in maize. PLoS Genet. 9, e1003477 (2013).
He, F. et al. Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome. Nat. Genet. 51, 896–904 (2019).
Calfee, E. et al. Selective sorting of ancestral introgression in maize and teosinte along an elevational cline. PLoS Genet. 17, e1009810 (2021).
Zhao, X. et al. Population genomics unravels the Holocene history of bread wheat and its relatives. Nat. Plants 9, 403–419 (2023).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics Chapter 4, 4.10.1–4.10.14 (2009).
Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome annotation and curation using MAKER and MAKER-P. Curr. Protoc. Bioinformatics 12, 11–39 (2014).
Keller, O., Kollmar, M., Stanke, M. & Waack, S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27, 757–763 (2011).
Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).
Li, Z. et al. RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC Genom. 12, 540 (2011).
Castanera, R., Ruggieri, V., Pujol, M., Garcia-Mas, J. & Casacuberta, J. M. An improved melon reference genome with single-molecule sequencing uncovers a recent burst of transposable elements with potential impact on genes. Front. Plant Sci. 10, 1815 (2020).
Qin, X. et al. Chromosome-scale genome assembly of Cucumis hystrix—a wild species interspecifically cross-compatible with cultivated cucumber. Hortic. Res. 8, 40 (2021).
Iwata, H. & Gotoh, O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res. 40, e161 (2012).
Stiehler, F. et al. Helixer: Cross-species gene annotation of large eukaryotic genomes using deep learning. Bioinformatics 36, 5291–5298 (2020).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Katoh, K., Misawa, K., Kuma, K. I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
Wang, D., Zhang, Y., Zhang, Z., Zhu, J. & Yu, J. KaKs_Calculator 2.0: A toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteomics Bioinformatics 8, 77–80 (2010).
Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. GGTREE: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
Hickey, G. et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat. Biotechnol. 42, 663–673 (2024).
Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–881 (2018).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, 80–92 (2012).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Letunic, I. & Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 44, 242–245 (2016).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Caye, K., Deist, T. M., Martins, H., Michel, O. & François, O. TESS3: Fast inference of spatial population structure and genome scans for selection. Mol. Ecol. Resour. 16, 540–548 (2016).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Terhorst, J., Kamm, J. A. & Song, Y. S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303–309 (2017).
Ossowski, S. et al. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94 (2010).
Albers, P. K. & McVean, G. Dating genomic variants and shared ancestry in population-scale sequencing data. PLoS Biol. 18, e3000586 (2020).
Tataru, P. & Bataillon, T. PolyDFEv2.0: testing for invariance of the distribution of fitness effects within and across species. Bioinformatics 35, 2868–2869 (2019).
Endelman, J. B. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4, 250–255 (2011).
Speed, D., Holmes, J. & Balding, D. J. Evaluating and improving heritability models using summary statistics. Nat. Genet. 52, 458–462 (2020).
Zhao, X. CucurbitGenomics: pipelines and scripts for Cucurbitaceae genomics and evolution analyses. Zenodo https://doi.org/10.5281/zenodo.17872506 (2025).
Acknowledgements
We thank S. Beyer (US Department of Agriculture, Agricultural Research Service (USDA-ARS)) for technical help in developing the core collection. This research was supported by grants from USDA National Institute of Food and Agriculture Specialty Crop Research Initiative (nos. 2015-51181-24285 and 2020-51181-32139).
Author information
Authors and Affiliations
Contributions
Z.F. and Y.X. conceived the project. Z.F. designed and supervised the study. X.Z., J.Y., H.S. and S.W. contributed to genome assembly and annotation, pangenome construction and SV genotyping. X.Z. performed population genetic analyses. X.Z., J. Zhao and Y.Z. contributed to genomic prediction analysis. R.G., S.A.H. and Y.-C.L. contributed to sample collection, DNA extraction and phenotyping. R.T.D. and F.C. helped develop the population for sequencing. J. Zhang, Y.X., Y.W. and Z.F. coordinated genome sequencing. X.Z. wrote the paper. Z.F., Z.Z, S.H., Y.W., R.G. and Y.X. revised the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Total length of SVs across different cucumber populations.
For each boxplot, the lower and upper bounds indicate the first and third quartiles, respectively, the center line indicates the median, and the whiskers extend to 1.5× the interquartile range. XSBN, Xishuangbanna; AF, Africa; WA, Central/West Asia; EU, Europe; EA, East Asia; AM, America.
Extended Data Fig. 2 SVs under selection during cucumber domestication and improvement.
a, Comparison of SV occurrence frequencies between wild and landrace populations (domestication). b, Comparison of SV occurrence frequencies between landrace and cultivar populations (improvement). SVs associated with known genes regulating key agronomic traits, including Psm (Paternal sorting of mitochondria), lgp (light green peel), bt (bitter fruit), up (upward-pedicel), ten (tendril-less), and lgf (light green fruit), are shown.
Extended Data Fig. 3 Phylogeny of 447 cucumber accessions based on SVs.
African accessions are marked with red arrows.
Extended Data Fig. 4 Population structure and principal component analysis of cucumber accessions.
a, Population structure of cucumber accessions based on SNPs, with the number of clusters (K) ranging from 2 to 6. b, Cross-validation (CV) error plotted against K for inference of population structure. c, Principal component analysis (PCA) of cucumber accessions based on SVs. d, PCA of cucumber accessions based on SNPs. The right panel shows an enlarged view of the cluster indicated by the dotted box in the left panel.
Extended Data Fig. 5 Site frequency spectrum (SFS) of sSNPs, nSNPs, and SVs in introgressed and non-introgressed regions.
a, b, SFS of sSNPs, nSNPs, insertions, and deletions in regions with (a) and without (b) introgressions from wild to the European population.
Extended Data Fig. 6 Genomic prediction accuracies for four traits significantly correlated with SV burden.
a–d, Genomic prediction accuracies for young fruit shape (a), mature fruit shape (b), fruit curvature (c), and fruit hollowness (d). Values are from 50 independent cross-validation replicates. For each boxplot, the lower and upper bounds indicate the first and third quartiles, respectively, the center line indicates the median, and the whiskers extend to 1.5× the interquartile range.
Extended Data Fig. 7 Genomic prediction accuracies for traits not correlated with SV burden.
Values are from 50 independent cross-validation replicates. For each boxplot, the lower and upper bounds indicate the first and third quartiles, respectively, the center line indicates the median, and the whiskers extend to 1.5× the interquartile range.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–9.
Supplementary Tables (download XLSX )
Supplementary Tables 1–11.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, X., Yu, J., Zhang, J. et al. Graph-based pangenome reveals structural variation dynamics during cucumber breeding. Nat Genet 58, 643–654 (2026). https://doi.org/10.1038/s41588-026-02506-0
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41588-026-02506-0


