Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Graph-based pangenome reveals structural variation dynamics during cucumber breeding

Abstract

Structural variants (SVs) represent an important yet underexplored component of plant genome diversity. Here we present a graph-based cucumber pangenome constructed from 39 reference-quality genomes, including 27 newly assembled and 12 previously published. The pangenome captures 171,892 high-confidence SVs, which were genotyped across 447 wild and cultivated accessions. Our analyses reveal that, during cucumber domestication, a substantial portion of mildly deleterious SNPs were retained, whereas SVs were consistently purged, highlighting their highly deleterious nature. During geographical expansion, a reduced SV burden and a younger age of SVs compared to SNPs were observed, suggesting stronger purifying selection acting on SVs. Introgressions from wild populations increased SV burden, potentially due to hitchhiking. Notably, incorporating SV burden into genomic prediction models improved prediction accuracy for several agronomically important traits. This study illuminates SV dynamics during cucumber domestication and range expansion and underscores the implications of SVs for future cucumber breeding.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Pangenome of wild and cultivated cucumbers.
Fig. 2: SV detection and genotyping in cucumber.
Fig. 3: SV dynamics during cucumber domestication and improvement.
Fig. 4: Spatiotemporal profiles of SV burden across different cucumber populations.
Fig. 5: Adaptive introgressions in cucumber.
Fig. 6: Genomic prediction incorporating SV burden information.

Similar content being viewed by others

Data availability

Raw genome resequencing reads have been deposited in the National Center for Biotechnology Information (NCBI) BioProject database under the accession no. PRJNA1192329. Raw HiFi reads and genome assemblies have been deposited in the NCBI Bioproject database under the accession no. PRJNA844366. Genome assemblies and annotations, SNPs, small indels and SVs in VCF format are available at CuGenDBv2 (http://cucurbitgenomics.org/v2/ftp/pan-genome/cucumber/).

Code availability

All pipelines and customized scripts used in this study are available via GitHub at https://github.com/xuebozhao16/CucurbitGenomics and via Zenodo at https://doi.org/10.5281/zenodo.17872506 (ref. 93).

References

  1. Ho, S. S., Urban, A. E. & Mills, R. E. Structural variation in the sequencing era. Nat. Rev. Genet. 21, 171–189 (2020).

    Article  CAS  PubMed  Google Scholar 

  2. Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Sherman, R. M. & Salzberg, S. L. Pan-genomics in the human genome era. Nat. Rev. Genet. 21, 243–254 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Liao, W. W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Li, H. & Durbin, R. Genome assembly in the telomere-to-telomere era. Nat. Rev. Genet. 25, 658–670 (2024).

    Article  CAS  PubMed  Google Scholar 

  8. Schreiber, M., Jayakodi, M., Stein, N. & Mascher, M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat. Rev. Genet. 25, 563–577 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Huang, S. et al. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, 1275–1281 (2009).

    Article  CAS  PubMed  Google Scholar 

  10. Qi, J. et al. A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity. Nat. Genet. 45, 1510–1515 (2013).

    Article  CAS  PubMed  Google Scholar 

  11. Zhang, Z. et al. Genome-wide mapping of structural variations reveals a copy number variant that determines reproductive morphology in cucumber. Plant Cell 27, 1595–1604 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Li, Q. et al. A chromosome-scale genome assembly of cucumber (Cucumis sativus L.). Gigascience 8, giz072 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Li, H. et al. Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber. Nat. Commun. 13, 682 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Guan, J. et al. A near-complete cucumber reference genome assembly and Cucumber-DB, a multi-omics database. Mol. Plant 17, 1178–1182 (2024).

    Article  CAS  PubMed  Google Scholar 

  15. Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).

    PubMed  PubMed Central  Google Scholar 

  18. Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).

    Article  CAS  PubMed  Google Scholar 

  20. Bayer, P. E. et al. Sequencing the USDA core soybean collection reveals gene loss during domestication and breeding. Plant Genome 15, e20109 (2022).

    Article  CAS  PubMed  Google Scholar 

  21. Sun, X. et al. Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication. Nat. Genet. 52, 1423–1432 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Marcussen, T. et al. Ancient hybridizations among the ancestral genomes of bread wheat. Science 345, 1250092 (2014).

    Article  PubMed  Google Scholar 

  23. Ebler, J. et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat. Genet. 54, 518–525 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Shang, Y. et al. Biosynthesis, regulation, and domestication of bitterness in cucumber. Science 346, 1084–1088 (2014).

    Article  CAS  PubMed  Google Scholar 

  25. Lun, Y. et al. A CsYcf54 variant conferring light green coloration in cucumber. Euphytica 208, 509–517 (2016).

    Article  CAS  Google Scholar 

  26. Wang, X. et al. The USDA cucumber (Cucumis sativus L.) collection: genetic diversity, population structure, genome-wide association studies, and core collection development. Hortic. Res. 5, 64 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Weng, Y. Cucumis sativus chromosome evolution, domestication, and genetic diversity: implications for cucumber breeding. Plant Breed. Rev. 44, 79–111 (2020).

    Google Scholar 

  28. Lu, J. et al. The accumulation of deleterious mutations in rice genomes: a hypothesis on the cost of domestication. Trends Genet. 22, 126–131 (2006).

    Article  CAS  PubMed  Google Scholar 

  29. Lozano, R. et al. Comparative evolutionary genetics of deleterious load in sorghum and maize. Nat. Plants 7, 17–24 (2021).

    Article  CAS  PubMed  Google Scholar 

  30. Zhou, Y. et al. The population genetics of structural variants in grapevine domestication. Nat. Plants 5, 965–979 (2019).

    Article  PubMed  Google Scholar 

  31. Casillas, S. & Barbadilla, A. Molecular population genetics. Genetics 205, 1003–1035 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Keightley, P. D. & Eyre-Walker, A. Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 177, 2251–2261 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Peischl, S., Dupanloup, I., Kirkpatrick, M. & Excoffier, L. On the accumulation of deleterious mutations during range expansions. Mol. Ecol. 22, 5972–5982 (2013).

    Article  CAS  PubMed  Google Scholar 

  34. Lohmueller, K. E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994–997 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Peischl, S. & Excoffier, L. Expansion load: Recessive mutations and the role of standing genetic variation. Mol. Ecol. 24, 2084–2094 (2015).

    Article  PubMed  Google Scholar 

  36. Bertorelle, G. et al. Genetic load: genomic estimates and applications in non-model animals. Nat. Rev. Genet. 23, 492–503 (2022).

    Article  CAS  PubMed  Google Scholar 

  37. Frankham, R. Relationship of genetic variation to population size in wildlife. Conserv. Biol. 10, 1500–1508 (1996).

    Article  Google Scholar 

  38. Ohta, T. Slightly deleterious mutant substitutions in evolution. Nature 246, 96–98 (1973).

    Article  CAS  PubMed  Google Scholar 

  39. Harrison, R. G. & Larson, E. L. Hybridization, introgression, and the nature of species boundaries. J. Hered. 105, 795–809 (2014).

    PubMed  Google Scholar 

  40. Rotival, M. & Quintana-Murci, L. Functional consequences of archaic introgression and their impact on fitness. Genome Biol. 21, 19–22 (2020).

    Article  Google Scholar 

  41. Janzen, G. M., Wang, L. & Hufford, M. B. The extent of adaptive wild introgression in crops. New Phytol. 221, 1279–1288 (2018).

    Article  PubMed  Google Scholar 

  42. Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Martin, S. H., Davey, J. W. & Jiggins, C. D. Evaluating the use of ABBA-BABA statistics to locate introgressed loci. Mol. Biol. Evol. 32, 244–257 (2015).

    Article  CAS  PubMed  Google Scholar 

  44. Kaya, C., Uğurlar, F. & Adamakis, I. D. S. Molecular mechanisms of CBL-CIPK signaling pathway in plant abiotic stress tolerance and hormone crosstalk. Int. J. Mol. Sci. 25, 5043 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Daetwyler, H. D., Calus, M. P. L., Pong-Wong, R., de los Campos, G. & Hickey, J. M. Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 193, 347–365 (2013).

    Article  PubMed  Google Scholar 

  46. Yang, J. et al. Incomplete dominance of deleterious alleles contributes substantially to trait variation and heterosis in maize. PLoS Genet. 13, e1007019 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Ramstein, G. P. & Buckler, E. S. Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize. Genome Biol. 23, 183 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Wu, Y. et al. Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding. Cell 186, 2313–2328 (2023).

    Article  CAS  PubMed  Google Scholar 

  49. Lin, Y.-C., Weng, Y., Fei, Z. & Grumet, R. Mining the cucumber core collection: phenotypic and genetic characterization of morphological diversity for fruit quality characteristics. Hortic. Res. 12, uhae340 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Guo, D. et al. A pangenome reference of wild and cultivated rice. Nature 642, 662–671 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Liu, Z. et al. Grapevine pangenome facilitates trait genetics and genomic breeding. Nat. Genet. 56, 2804–2814 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Chen, J. et al. Pangenome analysis reveals genomic variations associated with domestication traits in broomcorn millet. Nat. Genet. 55, 2243–2254 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Hufford, M. B. et al. The genomic signature of crop-wild introgression in maize. PLoS Genet. 9, e1003477 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. He, F. et al. Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome. Nat. Genet. 51, 896–904 (2019).

    Article  CAS  PubMed  Google Scholar 

  55. Calfee, E. et al. Selective sorting of ancestral introgression in maize and teosinte along an elevational cline. PLoS Genet. 17, e1009810 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Zhao, X. et al. Population genomics unravels the Holocene history of bread wheat and its relatives. Nat. Plants 9, 403–419 (2023).

    Article  PubMed  Google Scholar 

  57. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics Chapter 4, 4.10.1–4.10.14 (2009).

    PubMed  Google Scholar 

  61. Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome annotation and curation using MAKER and MAKER-P. Curr. Protoc. Bioinformatics 12, 11–39 (2014).

    Google Scholar 

  62. Keller, O., Kollmar, M., Stanke, M. & Waack, S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27, 757–763 (2011).

    Article  CAS  PubMed  Google Scholar 

  63. Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).

    Article  Google Scholar 

  64. Li, Z. et al. RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC Genom. 12, 540 (2011).

    Article  CAS  Google Scholar 

  65. Castanera, R., Ruggieri, V., Pujol, M., Garcia-Mas, J. & Casacuberta, J. M. An improved melon reference genome with single-molecule sequencing uncovers a recent burst of transposable elements with potential impact on genes. Front. Plant Sci. 10, 1815 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Qin, X. et al. Chromosome-scale genome assembly of Cucumis hystrix—a wild species interspecifically cross-compatible with cultivated cucumber. Hortic. Res. 8, 40 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Iwata, H. & Gotoh, O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res. 40, e161 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Stiehler, F. et al. Helixer: Cross-species gene annotation of large eukaryotic genomes using deep learning. Bioinformatics 36, 5291–5298 (2020).

    Article  CAS  Google Scholar 

  69. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Katoh, K., Misawa, K., Kuma, K. I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Wang, D., Zhang, Y., Zhang, Z., Zhu, J. & Yu, J. KaKs_Calculator 2.0: A toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteomics Bioinformatics 8, 77–80 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. GGTREE: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).

    Article  Google Scholar 

  75. Hickey, G. et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat. Biotechnol. 42, 663–673 (2024).

    Article  CAS  PubMed  Google Scholar 

  76. Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–881 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  79. McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, 80–92 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Letunic, I. & Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 44, 242–245 (2016).

    Article  Google Scholar 

  83. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Caye, K., Deist, T. M., Martins, H., Michel, O. & François, O. TESS3: Fast inference of spatial population structure and genome scans for selection. Mol. Ecol. Resour. 16, 540–548 (2016).

    Article  CAS  PubMed  Google Scholar 

  85. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Terhorst, J., Kamm, J. A. & Song, Y. S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303–309 (2017).

    Article  CAS  PubMed  Google Scholar 

  88. Ossowski, S. et al. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Albers, P. K. & McVean, G. Dating genomic variants and shared ancestry in population-scale sequencing data. PLoS Biol. 18, e3000586 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Tataru, P. & Bataillon, T. PolyDFEv2.0: testing for invariance of the distribution of fitness effects within and across species. Bioinformatics 35, 2868–2869 (2019).

    Article  CAS  PubMed  Google Scholar 

  91. Endelman, J. B. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4, 250–255 (2011).

    Article  Google Scholar 

  92. Speed, D., Holmes, J. & Balding, D. J. Evaluating and improving heritability models using summary statistics. Nat. Genet. 52, 458–462 (2020).

    Article  CAS  PubMed  Google Scholar 

  93. Zhao, X. CucurbitGenomics: pipelines and scripts for Cucurbitaceae genomics and evolution analyses. Zenodo https://doi.org/10.5281/zenodo.17872506 (2025).

Download references

Acknowledgements

We thank S. Beyer (US Department of Agriculture, Agricultural Research Service (USDA-ARS)) for technical help in developing the core collection. This research was supported by grants from USDA National Institute of Food and Agriculture Specialty Crop Research Initiative (nos. 2015-51181-24285 and 2020-51181-32139).

Author information

Authors and Affiliations

Authors

Contributions

Z.F. and Y.X. conceived the project. Z.F. designed and supervised the study. X.Z., J.Y., H.S. and S.W. contributed to genome assembly and annotation, pangenome construction and SV genotyping. X.Z. performed population genetic analyses. X.Z., J. Zhao and Y.Z. contributed to genomic prediction analysis. R.G., S.A.H. and Y.-C.L. contributed to sample collection, DNA extraction and phenotyping. R.T.D. and F.C. helped develop the population for sequencing. J. Zhang, Y.X., Y.W. and Z.F. coordinated genome sequencing. X.Z. wrote the paper. Z.F., Z.Z, S.H., Y.W., R.G. and Y.X. revised the paper.

Corresponding authors

Correspondence to Yong Xu or Zhangjun Fei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Total length of SVs across different cucumber populations.

For each boxplot, the lower and upper bounds indicate the first and third quartiles, respectively, the center line indicates the median, and the whiskers extend to 1.5× the interquartile range. XSBN, Xishuangbanna; AF, Africa; WA, Central/West Asia; EU, Europe; EA, East Asia; AM, America.

Extended Data Fig. 2 SVs under selection during cucumber domestication and improvement.

a, Comparison of SV occurrence frequencies between wild and landrace populations (domestication). b, Comparison of SV occurrence frequencies between landrace and cultivar populations (improvement). SVs associated with known genes regulating key agronomic traits, including Psm (Paternal sorting of mitochondria), lgp (light green peel), bt (bitter fruit), up (upward-pedicel), ten (tendril-less), and lgf (light green fruit), are shown.

Extended Data Fig. 3 Phylogeny of 447 cucumber accessions based on SVs.

African accessions are marked with red arrows.

Extended Data Fig. 4 Population structure and principal component analysis of cucumber accessions.

a, Population structure of cucumber accessions based on SNPs, with the number of clusters (K) ranging from 2 to 6. b, Cross-validation (CV) error plotted against K for inference of population structure. c, Principal component analysis (PCA) of cucumber accessions based on SVs. d, PCA of cucumber accessions based on SNPs. The right panel shows an enlarged view of the cluster indicated by the dotted box in the left panel.

Extended Data Fig. 5 Site frequency spectrum (SFS) of sSNPs, nSNPs, and SVs in introgressed and non-introgressed regions.

a, b, SFS of sSNPs, nSNPs, insertions, and deletions in regions with (a) and without (b) introgressions from wild to the European population.

Extended Data Fig. 6 Genomic prediction accuracies for four traits significantly correlated with SV burden.

a–d, Genomic prediction accuracies for young fruit shape (a), mature fruit shape (b), fruit curvature (c), and fruit hollowness (d). Values are from 50 independent cross-validation replicates. For each boxplot, the lower and upper bounds indicate the first and third quartiles, respectively, the center line indicates the median, and the whiskers extend to 1.5× the interquartile range.

Extended Data Fig. 7 Genomic prediction accuracies for traits not correlated with SV burden.

Values are from 50 independent cross-validation replicates. For each boxplot, the lower and upper bounds indicate the first and third quartiles, respectively, the center line indicates the median, and the whiskers extend to 1.5× the interquartile range.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Yu, J., Zhang, J. et al. Graph-based pangenome reveals structural variation dynamics during cucumber breeding. Nat Genet 58, 643–654 (2026). https://doi.org/10.1038/s41588-026-02506-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41588-026-02506-0

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research