Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Pan-genome bridges wheat structural variations with habitat and breeding

This article has been updated

Abstract

Wheat is the second largest food crop with a very good breeding system and pedigree record in China. Investigating the genomic footprints of wheat cultivars will unveil potential avenues for future breeding efforts1,2. Here we report chromosome-level genome assemblies of 17 wheat cultivars that chronicle the breeding history of China. Comparative genomic analysis uncovered a wealth of structural rearrangements, identifying 249,976 structural variations with 49.03% (122,567) longer than 5 kb. Cultivars developed in 1980s displayed significant accumulations of structural variations, a pattern linked to the extensive incorporation of European and American varieties into breeding programmes of that era. We further proved that structural variations in the centromere-proximal regions are associated with a reduction of crossover events. We showed that common wheat evolved from spring to winter types via mutations and duplications of the VRN-A1 gene as an adaptation strategy to a changing environment. We confirmed shifts in wheat cultivars linked to dietary preferences, migration and cultural integration in Northwest China. We identified large presence or absence variations of pSc200 tandem repeats on the 1RS terminal, suggesting its own rapid evolution in the wheat genome. The high-quality genome assemblies of 17 representatives developed and their good complementarity to the 10+ pan-genomes offer a robust platform for future genomics-assisted breeding in wheat.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Heads and seeds of the 17 wheat cultivars, their assembled genomes and representativeness for local and global wheat diversity.
Fig. 2: PAVs in the centromere-proximal region halted crossover recombination.
Fig. 3: Duplication at VRN-A1 is associated with the winter–spring differentiation of wheat.
Fig. 4: Allelic comparison of the Pina and Pinb genes and their geographical distribution in landrace indicated different priority to the grain hardness in North and South China food culture.
Fig. 5: Rapid reorganization of 1RS translocated onto wheat chromosomes in the past half century.

Similar content being viewed by others

Data availability

All data are available in this paper, the Supplementary Information or at publicly accessible repositories. The data in the public repositories include all raw reads and assembled sequence data (Supplementary Table 29) for wheat pan-genomics in the BIG Data Center under BioProject ID PRJCA021345. All materials are available from X.Z. on request. Although DNA samples of the 17 assembled genotypes are freely available, the seeds of these genotypes can be obtained following China Legislation on Crop Seeds and Material Transfer Agreement. There is no concern for researchers in China to access these seeds.

Code availability

The source code and scripts used in the paper have been deposited in GitHub (https://github.com/Xiaoming8102/WheatPangenome).

Change history

  • 20 December 2024

    In the version of this article initially published, there was a typo in the National Key Research and Development Program of China grant number 2023YFF1000400 (originally reading 2023YFD...) which is now amended in the HTML and PDF versions of the article.

References

  1. International Wheat Genome Sequencing Consortium (IWGSC). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018).

    Article  Google Scholar 

  2. Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588, 277–283 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. Salamini, F., Zkan, H., Brandolini, A., Schfer-Pregl, R. & Martin, W. Genetics and geography of wild cereal domestication in the near east. Nat. Rev. Genet. 3, 429–441 (2002).

    Article  CAS  PubMed  Google Scholar 

  4. The International Wheat Genome Sequencing Consortium (IWGSC). A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014).

    Article  Google Scholar 

  5. Feldman, M. & Levy, A. A. Genome evolution due to allopolyploidization in wheat. Genetics 192, 763–774 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Biehl, P. F. et al. Ancient DNA from 8400 year-old catalhöyük wheat: implications for the origin of neolithic agriculture. PLoS ONE 11, e0151974 (2016).

    Article  Google Scholar 

  7. Zhao, X. B. et al. Population genomics unravels the Holocene history of bread wheat and its relatives. Nat. Plants 9, 403–419 (2023).

    Article  PubMed  Google Scholar 

  8. Michael F, S. et al. A 3,000-year-old Egyptian emmer wheat genome reveals dispersal and domestication history. Nat. Plants 5, 1120–1128 (2019).

    Article  Google Scholar 

  9. Mcclatchie, M. et al. Neolithic farming in north-western Europe: archaeobotanical evidence from Ireland. J. Archaeol. Sci. 51, 206–215 (2014).

    Article  Google Scholar 

  10. Liu, X. et al. From ecological opportunism to multi-cropping: mapping food globalisation in prehistory. Quat. Sci. Rev. 206, 21–28 (2019).

    Article  ADS  Google Scholar 

  11. Hao, C. et al. Resequencing of 145 landmark cultivars reveals asymmetric sub-genome selection and strong founder genotype effects on wheat breeding in China. Mol. Plant 13, 1733–1751 (2020).

    Article  CAS  PubMed  Google Scholar 

  12. Zhuang, Q. S. Chinese Wheat Improvement and Pedigree Analysis [Chinese] (Agricultural Press, 2003).

  13. Murukarthick, J., Mona, S., Nils, S. & Martin, M. Building pan-genome infrastructures for crop plants and their use in association genetics. DNA Res. 28, dsaa030 (2021).

    Article  Google Scholar 

  14. Lei, L., Goltsman, E., Goodstein, D., Wu, G. A. & Vogel, J. P. Plant pan-genomics comes of age. Annu. Rev. Plant Biol. 72, 411–435 (2021).

    Article  CAS  PubMed  Google Scholar 

  15. Mona, S., Murukarthick, J., Nils, S. & Martin, M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat. Rev. Genet. https://doi.org/10.1038/s41576-024-00691-4 (2024).

  16. Zhang, X. Y. & Appels, R. in The Wheat Genome (eds Appels, R. et al.) 93–111 (Springer, 2023).

  17. Castillo, F. A. The Oxford Handbook of the Archaeology of Diet (Oxford Univ. Press, 2015).

  18. Simon G, K. et al. A putative ABC transporter confers durable resistance to multiple fungal pathogens in wheat. Science 323, 1360–1363 (2009).

    Article  ADS  Google Scholar 

  19. Fu, D. et al. A kinase-START gene confers temperature-dependent resistance to wheat stripe rust. Science 323, 1357–1360 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  20. Wang, B. et al. De novo genome assembly and analyses of 12 founder inbred lines provide insights into maize heterosis. Nat. Genet. 55, 312–323 (2023).

    Article  CAS  PubMed  Google Scholar 

  21. Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542–3558.e16 (2021).

    Article  CAS  PubMed  Google Scholar 

  22. Song, L. et al. Reducing brassinosteroid signalling enhances grain yield in semi-dwarf wheat. Nature 617, 118–124 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. Németh, A. & Längst, G. Genome organization in and around the nucleolus. Trends Genet. 27, 149–156 (2011).

    Article  PubMed  Google Scholar 

  24. Kishii, M. & Mao, L. Synthetic hexaploid wheat: yesterday, today, and tomorrow. Engineering 4, 552–558 (2018).

    Article  Google Scholar 

  25. Guo, W. et al. Origin and adaptation to high altitude of Tibetan semi-wild wheat. Nat. Commun. 11, 5085 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  26. Zhou, Y. et al. Triticum population sequencing provides insights into wheat adaptation. Nat. Genet. 52, 1412–1422 (2020).

    Article  CAS  PubMed  Google Scholar 

  27. Monat, C., Padmarasu, S., Lux, T., Wicker, T. & Mascher, M. TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biol. 20, 284 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Athiyannan, N. et al. Long-read genome sequencing of bread wheat facilitates disease resistance gene cloning. Nat. Genet. 54, 227–231 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Kale, S. M. et al. A catalogue of resistance gene homologs and a chromosome-scale reference sequence support resistance gene mapping in winter wheat. Plant Biotechnol. J. 20, 1730–1742 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Li, B. et al. Wheat centromeric retrotransposons: the new ones take a major role in centromeric structure. Plant J. 73, 952–965 (2013).

    Article  CAS  PubMed  Google Scholar 

  31. Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620, 830–838 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Wang, Z. et al. Dispersed emergence and protracted domestication of polyploid wheat uncovered by mosaic ancestral haploblock inference. Nat. Commun. 13, 3891 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Cheng, H., Liu, J., Wen, J., Nie, X. & Jiang, Y. Frequent intra- and inter-species introgression shapes the landscape of genetic variation in bread wheat. Genome Biol. 20, 136 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Oliver, S. N., Finnegan, E. J., Dennis, E. S., Peacock, W. J. & Trevaskis, B. Vernalization-induced flowering in cereals is associated with changes in histone methylation at the VERNALIZATION1 gene. Proc. Natl Acad. Sci. USA 106, 8386–8391 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  35. Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161.e23 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Li, G. et al. A high-quality genome assembly highlights rye genomic characteristics and agronomically important genes. Nat. Genet. 53, 574–584 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Rabanus-Wallace, M. T. et al. Chromosome-scale genome assembly provides insights into rye biology, evolution and agronomic potential. Nat. Genet. 53, 564–573 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Gabay, G., Zhang, J., Burguener, G. F., Howell, T. & Dubcovsky, J. Structural rearrangements in wheat (1BS)–rye (1RS) recombinant chromosomes affect gene dosage and root length. Plant Genome 14, e20079 (2021).

    Article  CAS  PubMed  Google Scholar 

  39. Zhou, Y. et al. Introgressing the Aegilops tauschii genome into wheat as a basis for cereal improvement. Nat. Plants 7, 774–786 (2021).

    Article  CAS  PubMed  Google Scholar 

  40. Song, J. M. et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat. Plants 6, 34–45 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Saayman, X., Graham, E., Nathan, W. J., Nussenzweig, A. & Esashi, F. Centromeres as universal hotspots of DNA breakage, driving RAD51-mediated recombination during quiescence. Mol. Cell 83, 523–538.e7 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Nambiar, M. & Smith, G. R. Pericentromere-Specific cohesin complex prevents meiotic pericentric DNA double-strand breaks and lethal crossovers. Mol. Cell 71, 540–553.e4 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. He, F. et al. Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome. Nat. Genet. https://doi.org/10.1038/s41588-019-0382-2 (2019).

  44. Zhao, J. et al. Centromere repositioning and shifts in wheat evolution. Plant Commun. 4, 100556 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Scott A, B. et al. Ppd-1 is a key regulator of inflorescence architecture and paired spikelet development in wheat. Nat. Plants 1, 14016 (2015).

    Article  Google Scholar 

  46. Yan, L. L. et al. The wheat VRN2 gene is a flowering repressor down-regulated by vernalization. Science 303, 1640–1644 (2004).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  47. Yan, L. et al. Positional cloning of the wheat vernalization gene VRN1. Proc. Natl Acad. Sci. USA 100, 6263–6268 (2003).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  48. Hazen, S. P. et al. Copy number variation affecting the Photoperiod-B1 and Vernalization-A1 genes is associated with altered flowering time in wheat (Triticum aestivum). PLoS ONE https://doi.org/10.1371/journal.pone.0033234 (2012).

  49. Würschum, T., Boeven, P. H. G., Langer, S. M., Longin, C. F. H. & Leiser, W. L. Multiply to conquer: copy number variations at Ppd-B1 and Vrn-A1 facilitate global adaptation in wheat. BMC Genet. 16, 96 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Giroux, M. J. & Morris, C. F. Wheat grain hardness results from highly conserved mutations in the friabilin components puroindoline a and b. Proc. Natl Acad. Sci. USA 11, 6262–6266 (1998).

    Article  ADS  Google Scholar 

  51. Xie, T. et al. De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. Mol. Plant 8, 489–492 (2015).

    Article  CAS  PubMed  Google Scholar 

  52. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).

    Article  CAS  PubMed  Google Scholar 

  55. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article  CAS  PubMed  Google Scholar 

  57. Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR assembly index (LAI). Nucleic Acids Res. 46, e126 (2018).

    PubMed  PubMed Central  Google Scholar 

  58. Burkhard, S. et al. The NLR-Annotator tool enables annotation of the intracellular immune receptor repertoire. Plant Physiol. 183, 468–482 (2020).

    Article  Google Scholar 

  59. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics https://doi.org/10.1002/0471250953.bi0410s05 (2009).

  60. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Yu, X. J., Zheng, H. K., Wang, J., Wang, W. & Su, B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics 88, 745–751 (2006).

    Article  CAS  PubMed  Google Scholar 

  62. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

    Article  CAS  PubMed  Google Scholar 

  66. Guigo, R. Assembling genes from predicted exons in linear time with dynamic programming. J. Comput. Biol. 5, 681–702 (1998).

    Article  CAS  PubMed  Google Scholar 

  67. Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).

    Article  CAS  PubMed  Google Scholar 

  68. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Ghosh, S. & Chan, C. K. Analysis of RNA-seq data using TopHat and Cufflinks. Methods Mol. Biol. 1374, 339–361 (2016).

    Article  CAS  PubMed  Google Scholar 

  71. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  73. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

    Article  CAS  PubMed  Google Scholar 

  74. Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215 (2009).

    Article  CAS  PubMed  Google Scholar 

  75. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).

    Article  ADS  CAS  PubMed  Google Scholar 

  79. Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).

    Article  CAS  PubMed  Google Scholar 

  80. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Weber, J. A., Aldana, R., Gallagher, B. D. & Edwards, J. S. Sentieon DNA pipeline for variant detection-Software-only solution, over 20× faster than GATK 3.3 with identical results. PeerJ PrePrints 4, e1672v1672 (2016).

    Google Scholar 

  83. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  85. Marcais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  86. Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  88. Chiang, C. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 12, 966–968 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).

    Article  PubMed  Google Scholar 

  90. Laurens, V. D. M. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014).

    MathSciNet  Google Scholar 

  91. Yang, Z. et al. ggComp enables dissection of germplasm resources and construction of a multiscale germplasm network in wheat. Plant Physiol. 188, 1950–1965 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Gao, F., Ming, C., Hu, W. & Li, H. New software for the fast estimation of population recombination rates (FastEPRR) in the genomic era. G3 6, 1563–1571 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64 (2009).

    Article  CAS  PubMed  Google Scholar 

  96. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Chen, Y. et al. A collinearity-incorporating homology inference strategy for connecting emerging assemblies in the Triticeae tribe as a pilot practice in the plant pangenomic era. Mol. Plant 13, 1694–1708 (2020).

    Article  CAS  PubMed  Google Scholar 

  100. Ma, S. et al. WheatOmics: a platform combining multiple omics data to accelerate functional genomics studies in wheat. Mol. Plant 14, 1965–1968 (2021).

    Article  CAS  PubMed  Google Scholar 

  101. He, W. et al. NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes. Bioinformatics 39, btad121 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Han, F., Lamb, J. C. & Birchler, J. A. High frequency of centromere inactivation resulting in stable dicentric chromosomes of maize. Proc. Natl Acad. Sci. USA 103, 3238–3243 (2006).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  104. Fu, S., Chen, L., Wang, Y., Li, M. & Tang, Z. Oligonucleotide probes for ND-FISH analysis to identify rye and wheat chromosomes. Sci. Rep. 5, 10552 (2015).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  105. Tang, Z., Yang, Z. & Fu, S. Oligonucleotides replacing the roles of repetitive sequences pAs1, pSc119.2, pTa-535, pTa71, CCS1, and pAWRC.1 for FISH analysis. J. Appl. Genet. 55, 313–318 (2014).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We appreciate Z. F. Lu for discussion on the VRN-A1 expression work and W. X. Wang for help on bioinformatics analysis. This project was funded by the National Key Research and Development Program of China (2023YFF1000400 and 2022YFD1201503). This project was also funded by the National Natural Science Foundation of China (grant no. 32322059) and the Innovation Program of Chinese Academy of Agricultural Sciences (CAAS-CSCB-202401). R.K.V. thanks Food Futures Institute, Murdoch University and Grains Research & Development Corporation (project nos. UMU2404-003RTX and WSU2303-001RTX) for supporting this work in part.

Author information

Authors and Affiliations

Authors

Contributions

X.Z. together with W.G. designed the project. X.Z., W.G. and R.K.V. supervised the execution and completion of the project. C.J., X.X. and L.C. performed the bioinformatics analysis. C.H managed the fieldwork and prepared the samples. L.Z. conducted the cytogenetic experiments. V.G., Z.W., Y.Z., T.L., J.F., A.C., J.H., H.L., G.D., X.L., J.J., L.M. and X.W. contributed to the conducting of experiments, data analysis and interpretation for various sections of the paper. X.Z., W.G., Y.X., V.G. and R.A. wrote the paper, with input from all authors. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Rajeev K. Varshney, Weilong Guo or Xueyong Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Erik Garrison, Sean Walkowiak and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 The genome diversity and representation of 17 assembled genomes for the 145 resequenced landmark cultivars (Hao et al.11).

a, Phylogenetic tree of all accessions inferred from whole-genome SNPs. The red lines indicate the cultivars used for de novo genome assembly. b, The t-distributed stochastic neighbor embedding (t-SNE) analysis based on the SNPs revealed the genetic relationships between the 17 de novo assemblies and the 145 landmark cultivars. c, The genome wide GGNet of 145 landmark cultivars. The 17 de novo assembled cultivars were marked in green for 1950s, blue for 1980s-1990s and red for post 2000s. The edge colors indicate the ranges of the gIBD ratio (genome similarity) for accession pairs. Only the edges in which the gIBD ratio ≥ 30% are shown. Grey edges, 40% > gIBD ratio ≥ 30%; Green edges, 50% > gIBD ratio  ≥ 40%; Red edges, gIBD ratio  ≥ 50%.

Extended Data Fig. 2 An overview of the 21 wheat genomes.

Living habit of the 20 cultivars with de novo genomes assembly. From left to right, the features are: wheat’s spring-winter characteristics, contig N50 and total length of genome, LAI value, and BUSCO evaluation results.

Extended Data Fig. 3 Wheat pangenome of the 21 cultivars and structural variations referenced to CS.

a, Core and pan gene clusters of 20 wheat genomes. The UpSet plot illustrates the core gene clusters (present in all genomes), soft-core gene clusters (present in 18-20 genomes) and dispensable gene clusters (present in 2-17 genomes). b, Pie Chart shows the proportion of the gene families marked by each composition. c, Density distribution of PAVs in representative cultivars from three breeding ages across chromosomes 3A, 3B, 3D, 5A, 5B, and 5D. The horizontal axis of each box represents the chromosomal position, while the vertical axis indicates the density of PAVs within 1 Mb windows. d, Scatter plots showing PAVs occurrence frequencies in 50s&60s and 80s&90s, 80s&90s and post 00s cultivars, respectively. The red scatter indicates the significantly selected PAVs harbored adjusted P values (FDR) bigger than 0.01, the green scatter indicates P values smaller than 0.01 and bigger than 1e−8, the bule scatter indicate P values smaller than 1e−8. The PAVs with frequency more than 0.5 in 80s&90s but less than 0.5 in 50s&60s are defined as specific high-frequency PAVs in 80s&90s. e, Source of specific high-frequency PAVs in cultivars released at three breeding stages respectively. The main source of PAV with specific high frequency was European cultivars.

Extended Data Fig. 4 Features of structural variations (SVs) in the pangenome comprising of 17 wheat cultivars.

a, Pie chart shows the proportion of the different SV types in wheat pan-genome. b, Percentage of PAVs overlapping with different genomic features. c, Distribution of PAVs based on their allele frequency in wheat accessions indicated most PAVs were present in one or only a few accessions. d, Distribution of PAV hotspot regions on 21 chromosomes. The red color represents the top 10 regions with PAVs hotspots, and the orange color represents the the next 10-20 regions with a high density of PAV hotspots. e, The number of PAVs sharply decreases with PAV length indicated that longer PAVs are relatively rare in the genome, while shorter PAVs are more common. f, Characteristic distribution of structural variation lengths in the 17 wheat genomes.

Extended Data Fig. 5 Distribution of SVs on each chromosome under different length intervals.

The B subgenome shows significantly higher density of SVs in the 50 bp–500 kb range compared to the A and D subgenomes, except for the 6A and 4B regions. The D subgenome has a lower density of SVs in this range. No significant difference in the number of SVs was observed among chromosomes for length ranges above 500 kb. a, 50 bp–1 kb; b, 1 kb−5 kb; c, 5 kb–10 kb; d, 10 kb–100 kb; e, 100 kb–500 kb; f, >500 kb.

Extended Data Fig. 6 Crossover recombination at regions proximal to cenctromeres was negatively affected by PAVs strongly.

a, b and c respectively depict the breeding history crossover recombination number (CRN) and PAV number within 100 Mb up- and down-stream of centromeres respectively in the A, B and D sub-genomes. The grey-shaded regions represent the centromeres regions identified by peaks of CRWs and Quintas. The blue line represents CRN per Mb window estimated based on re-sequencing data of the 145 cultivars in China, while the red line indicates the number PAVs per Mb window in the de novo assembled 17 genomes, representing the wheat breeding history in China. R, Pearson’s correlation coefficient between CRN frequencies and PAV counts. Red stars represent significant negative correlation between CRNs and the numbers of PAVs (P < 0.05).

Extended Data Fig. 7 Partition of and distribution of SVs at regions proximal to centromeres on chromosomes in A- and B-subgenomes.

Counts of different SVs between assembly pairs for intra- (purple) and inter-centAHG group (blue) on each chromosome for A- and B-subgenomes. Red bar, regions-proximal to centromeres, from 100 Mb upstream to 100 Mb downstream of centromere on each chromosome. Green bar, centAHG block previously identified32.

Extended Data Fig. 8 CNVs in the re-sequenced database of wheat and its tetraploid ancestors indicated VRN-A1 gene experienced strong selection in wheat origin, spread and breeding in reaction to the environments.

a-b, Hapmap of VRN-A1 copy number in landrace and cultivar. Blue represents VRN-A1 gene loss (0 copy), green, orange, and red represent VRN-A1 being 1 to 3 copies respectively. During the spread of wheat to China, the copy number increased (triple copy proportion increased), and the share of triple copy types of VRN-A1 significantly increased in the colder north-western region. However, in modern cultivars the frequency of cultivars with one or two copies increased in north China, probably caused by temperature global warming. The sizes of pies are relative to the count of samples in each location. c, Mean temperature in January from 1961 to 2020 in Henan, the largest wheat production province in China. The average temperature in January from 1981 to 2020 is marked with an orange line.

Extended Data Fig. 9 Gene structure of Pina mutations and the geographic distribution of Pina and Pinb.

a, The Pina-D1c allele found in the resequencing data exists only in Central Asia region. b-c, The Pina-D1b and Pinb-D1u alleles spread mainly to the east. d, IGV plot of resequencing data of wheat varieties. The deletion of the fragment results in the disruption of the Pina gene structure, followed by gene deletion.

Extended Data Fig. 10 Large PAVs of sub-telomere repeat on 1RS translocations among modern cultivars.

a, The collinearity at the end of short arm of 1RS in cultivars with 1RS·1BL and 1RS·7DL translocation. Each block represents 500 kb of sequence, and to highlight the position of the extended sequences, the collinear relationship between the extended sequences is in light-red. b, Distribution of different elements in extended intervals. Purple, green and light red rectangles denoting the telomere-associated sequences (TASs), sub-telomeric location sequences and tandem repeat interval, and the positions are staggered up and down to distinguish different elements intervals. c, The number and length of sub-telomere sequences on each genome. The yellow bars represent the number and length of sub-telomere sequences at the end of the short arm of chromosome 1B in 1RS·1BLtranslocation. The blue and black bars represent the number and length of sequences in the entire genome of non-1RS·1BL translocation hexaploid wheat and rye, respectively.

Extended Data Fig. 11 Genome composition of CM42, the first cultivar derived from cross between common wheat and the CIMMYT synthetic.

Introgressed fragments from Aegilops tauschii are marked in green. Cetromeres were indicated by black triangles.

Extended Data Table 1 Statistics of the assembly and annotation of 21 wheat genomes

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiao, C., Xie, X., Hao, C. et al. Pan-genome bridges wheat structural variations with habitat and breeding. Nature 637, 384–393 (2025). https://doi.org/10.1038/s41586-024-08277-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41586-024-08277-0

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research