Pan-genome bridges wheat structural variations with habitat and breeding

Jiao, Chengzhi; Xie, Xiaoming; Hao, Chenyang; Chen, Liyang; Xie, Yuxin; Garg, Vanika; Zhao, Li; Wang, Zihao; Zhang, Yuqi; Li, Tian; Fu, Junjie; Chitikineni, Annapurna; Hou, Jian; Liu, Hongxia; Dwivedi, Girish; Liu, Xu; Jia, Jizeng; Mao, Long; Wang, Xiue; Appels, Rudi; Varshney, Rajeev K.; Guo, Weilong; Zhang, Xueyong

doi:10.1038/s41586-024-08277-0

Article
Published: 27 November 2024

Pan-genome bridges wheat structural variations with habitat and breeding

Nature volume 637, pages 384–393 (2025)Cite this article

27k Accesses
127 Citations
66 Altmetric
Metrics details

Subjects

This article has been updated

Abstract

Wheat is the second largest food crop with a very good breeding system and pedigree record in China. Investigating the genomic footprints of wheat cultivars will unveil potential avenues for future breeding efforts^1,2. Here we report chromosome-level genome assemblies of 17 wheat cultivars that chronicle the breeding history of China. Comparative genomic analysis uncovered a wealth of structural rearrangements, identifying 249,976 structural variations with 49.03% (122,567) longer than 5 kb. Cultivars developed in 1980s displayed significant accumulations of structural variations, a pattern linked to the extensive incorporation of European and American varieties into breeding programmes of that era. We further proved that structural variations in the centromere-proximal regions are associated with a reduction of crossover events. We showed that common wheat evolved from spring to winter types via mutations and duplications of the VRN-A1 gene as an adaptation strategy to a changing environment. We confirmed shifts in wheat cultivars linked to dietary preferences, migration and cultural integration in Northwest China. We identified large presence or absence variations of pSc200 tandem repeats on the 1RS terminal, suggesting its own rapid evolution in the wheat genome. The high-quality genome assemblies of 17 representatives developed and their good complementarity to the 10+ pan-genomes offer a robust platform for future genomics-assisted breeding in wheat.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Heads and seeds of the 17 wheat cultivars, their assembled genomes and representativeness for local and global wheat diversity.**

**Fig. 2: PAVs in the centromere-proximal region halted crossover recombination.**

**Fig. 3: Duplication at *VRN-A1* is associated with the winter–spring differentiation of wheat.**

**Fig. 4: Allelic comparison of the *Pina* and *Pinb* genes and their geographical distribution in landrace indicated different priority to the grain hardness in North and South China food culture.**

**Fig. 5: Rapid reorganization of 1RS translocated onto wheat chromosomes in the past half century.**

Genetic diversity and selective sweeps in historical and modern Canadian spring wheat cultivars using the 90K SNP array

Article Open access 10 December 2021

Facts, uncertainties, and opportunities in wheat molecular improvement

Article Open access 05 September 2024

De novo annotation reveals transcriptomic complexity across the hexaploid wheat pan-genome

Article Open access 06 October 2025

Data availability

All data are available in this paper, the Supplementary Information or at publicly accessible repositories. The data in the public repositories include all raw reads and assembled sequence data (Supplementary Table 29) for wheat pan-genomics in the BIG Data Center under BioProject ID PRJCA021345. All materials are available from X.Z. on request. Although DNA samples of the 17 assembled genotypes are freely available, the seeds of these genotypes can be obtained following China Legislation on Crop Seeds and Material Transfer Agreement. There is no concern for researchers in China to access these seeds.

Code availability

The source code and scripts used in the paper have been deposited in GitHub (https://github.com/Xiaoming8102/WheatPangenome).

Change history

20 December 2024
In the version of this article initially published, there was a typo in the National Key Research and Development Program of China grant number 2023YFF1000400 (originally reading 2023YFD...) which is now amended in the HTML and PDF versions of the article.

References

International Wheat Genome Sequencing Consortium (IWGSC). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018).
Article Google Scholar
Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588, 277–283 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Salamini, F., Zkan, H., Brandolini, A., Schfer-Pregl, R. & Martin, W. Genetics and geography of wild cereal domestication in the near east. Nat. Rev. Genet. 3, 429–441 (2002).
Article CAS PubMed Google Scholar
The International Wheat Genome Sequencing Consortium (IWGSC). A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014).
Article Google Scholar
Feldman, M. & Levy, A. A. Genome evolution due to allopolyploidization in wheat. Genetics 192, 763–774 (2012).
Article CAS PubMed PubMed Central Google Scholar
Biehl, P. F. et al. Ancient DNA from 8400 year-old catalhöyük wheat: implications for the origin of neolithic agriculture. PLoS ONE 11, e0151974 (2016).
Article Google Scholar
Zhao, X. B. et al. Population genomics unravels the Holocene history of bread wheat and its relatives. Nat. Plants 9, 403–419 (2023).
Article PubMed Google Scholar
Michael F, S. et al. A 3,000-year-old Egyptian emmer wheat genome reveals dispersal and domestication history. Nat. Plants 5, 1120–1128 (2019).
Article Google Scholar
Mcclatchie, M. et al. Neolithic farming in north-western Europe: archaeobotanical evidence from Ireland. J. Archaeol. Sci. 51, 206–215 (2014).
Article Google Scholar
Liu, X. et al. From ecological opportunism to multi-cropping: mapping food globalisation in prehistory. Quat. Sci. Rev. 206, 21–28 (2019).
Article ADS Google Scholar
Hao, C. et al. Resequencing of 145 landmark cultivars reveals asymmetric sub-genome selection and strong founder genotype effects on wheat breeding in China. Mol. Plant 13, 1733–1751 (2020).
Article CAS PubMed Google Scholar
Zhuang, Q. S. Chinese Wheat Improvement and Pedigree Analysis [Chinese] (Agricultural Press, 2003).
Murukarthick, J., Mona, S., Nils, S. & Martin, M. Building pan-genome infrastructures for crop plants and their use in association genetics. DNA Res. 28, dsaa030 (2021).
Article Google Scholar
Lei, L., Goltsman, E., Goodstein, D., Wu, G. A. & Vogel, J. P. Plant pan-genomics comes of age. Annu. Rev. Plant Biol. 72, 411–435 (2021).
Article CAS PubMed Google Scholar
Mona, S., Murukarthick, J., Nils, S. & Martin, M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat. Rev. Genet. https://doi.org/10.1038/s41576-024-00691-4 (2024).
Zhang, X. Y. & Appels, R. in The Wheat Genome (eds Appels, R. et al.) 93–111 (Springer, 2023).
Castillo, F. A. The Oxford Handbook of the Archaeology of Diet (Oxford Univ. Press, 2015).
Simon G, K. et al. A putative ABC transporter confers durable resistance to multiple fungal pathogens in wheat. Science 323, 1360–1363 (2009).
Article ADS Google Scholar
Fu, D. et al. A kinase-START gene confers temperature-dependent resistance to wheat stripe rust. Science 323, 1357–1360 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, B. et al. De novo genome assembly and analyses of 12 founder inbred lines provide insights into maize heterosis. Nat. Genet. 55, 312–323 (2023).
Article CAS PubMed Google Scholar
Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542–3558.e16 (2021).
Article CAS PubMed Google Scholar
Song, L. et al. Reducing brassinosteroid signalling enhances grain yield in semi-dwarf wheat. Nature 617, 118–124 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Németh, A. & Längst, G. Genome organization in and around the nucleolus. Trends Genet. 27, 149–156 (2011).
Article PubMed Google Scholar
Kishii, M. & Mao, L. Synthetic hexaploid wheat: yesterday, today, and tomorrow. Engineering 4, 552–558 (2018).
Article Google Scholar
Guo, W. et al. Origin and adaptation to high altitude of Tibetan semi-wild wheat. Nat. Commun. 11, 5085 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhou, Y. et al. Triticum population sequencing provides insights into wheat adaptation. Nat. Genet. 52, 1412–1422 (2020).
Article CAS PubMed Google Scholar
Monat, C., Padmarasu, S., Lux, T., Wicker, T. & Mascher, M. TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biol. 20, 284 (2019).
Article CAS PubMed PubMed Central Google Scholar
Athiyannan, N. et al. Long-read genome sequencing of bread wheat facilitates disease resistance gene cloning. Nat. Genet. 54, 227–231 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kale, S. M. et al. A catalogue of resistance gene homologs and a chromosome-scale reference sequence support resistance gene mapping in winter wheat. Plant Biotechnol. J. 20, 1730–1742 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, B. et al. Wheat centromeric retrotransposons: the new ones take a major role in centromeric structure. Plant J. 73, 952–965 (2013).
Article CAS PubMed Google Scholar
Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620, 830–838 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, Z. et al. Dispersed emergence and protracted domestication of polyploid wheat uncovered by mosaic ancestral haploblock inference. Nat. Commun. 13, 3891 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Cheng, H., Liu, J., Wen, J., Nie, X. & Jiang, Y. Frequent intra- and inter-species introgression shapes the landscape of genetic variation in bread wheat. Genome Biol. 20, 136 (2019).
Article PubMed PubMed Central Google Scholar
Oliver, S. N., Finnegan, E. J., Dennis, E. S., Peacock, W. J. & Trevaskis, B. Vernalization-induced flowering in cereals is associated with changes in histone methylation at the VERNALIZATION1 gene. Proc. Natl Acad. Sci. USA 106, 8386–8391 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161.e23 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, G. et al. A high-quality genome assembly highlights rye genomic characteristics and agronomically important genes. Nat. Genet. 53, 574–584 (2021).
Article CAS PubMed PubMed Central Google Scholar
Rabanus-Wallace, M. T. et al. Chromosome-scale genome assembly provides insights into rye biology, evolution and agronomic potential. Nat. Genet. 53, 564–573 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gabay, G., Zhang, J., Burguener, G. F., Howell, T. & Dubcovsky, J. Structural rearrangements in wheat (1BS)–rye (1RS) recombinant chromosomes affect gene dosage and root length. Plant Genome 14, e20079 (2021).
Article CAS PubMed Google Scholar
Zhou, Y. et al. Introgressing the Aegilops tauschii genome into wheat as a basis for cereal improvement. Nat. Plants 7, 774–786 (2021).
Article CAS PubMed Google Scholar
Song, J. M. et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat. Plants 6, 34–45 (2020).
Article CAS PubMed PubMed Central Google Scholar
Saayman, X., Graham, E., Nathan, W. J., Nussenzweig, A. & Esashi, F. Centromeres as universal hotspots of DNA breakage, driving RAD51-mediated recombination during quiescence. Mol. Cell 83, 523–538.e7 (2023).
Article CAS PubMed PubMed Central Google Scholar
Nambiar, M. & Smith, G. R. Pericentromere-Specific cohesin complex prevents meiotic pericentric DNA double-strand breaks and lethal crossovers. Mol. Cell 71, 540–553.e4 (2018).
Article CAS PubMed PubMed Central Google Scholar
He, F. et al. Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome. Nat. Genet. https://doi.org/10.1038/s41588-019-0382-2 (2019).
Zhao, J. et al. Centromere repositioning and shifts in wheat evolution. Plant Commun. 4, 100556 (2023).
Article CAS PubMed PubMed Central Google Scholar
Scott A, B. et al. Ppd-1 is a key regulator of inflorescence architecture and paired spikelet development in wheat. Nat. Plants 1, 14016 (2015).
Article Google Scholar
Yan, L. L. et al. The wheat VRN2 gene is a flowering repressor down-regulated by vernalization. Science 303, 1640–1644 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Yan, L. et al. Positional cloning of the wheat vernalization gene VRN1. Proc. Natl Acad. Sci. USA 100, 6263–6268 (2003).
Article ADS CAS PubMed PubMed Central Google Scholar
Hazen, S. P. et al. Copy number variation affecting the Photoperiod-B1 and Vernalization-A1 genes is associated with altered flowering time in wheat (Triticum aestivum). PLoS ONE https://doi.org/10.1371/journal.pone.0033234 (2012).
Würschum, T., Boeven, P. H. G., Langer, S. M., Longin, C. F. H. & Leiser, W. L. Multiply to conquer: copy number variations at Ppd-B1 and Vrn-A1 facilitate global adaptation in wheat. BMC Genet. 16, 96 (2015).
Article PubMed PubMed Central Google Scholar
Giroux, M. J. & Morris, C. F. Wheat grain hardness results from highly conserved mutations in the friabilin components puroindoline a and b. Proc. Natl Acad. Sci. USA 11, 6262–6266 (1998).
Article ADS Google Scholar
Xie, T. et al. De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. Mol. Plant 8, 489–492 (2015).
Article CAS PubMed Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).
Article PubMed PubMed Central Google Scholar
Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).
Article CAS PubMed Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Article CAS PubMed PubMed Central Google Scholar
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article CAS PubMed Google Scholar
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR assembly index (LAI). Nucleic Acids Res. 46, e126 (2018).
PubMed PubMed Central Google Scholar
Burkhard, S. et al. The NLR-Annotator tool enables annotation of the intracellular immune receptor repertoire. Plant Physiol. 183, 468–482 (2020).
Article Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics https://doi.org/10.1002/0471250953.bi0410s05 (2009).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Yu, X. J., Zheng, H. K., Wang, J., Wang, W. & Su, B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics 88, 745–751 (2006).
Article CAS PubMed Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
Article CAS PubMed PubMed Central Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS PubMed Google Scholar
Guigo, R. Assembling genes from predicted exons in linear time with dynamic programming. J. Comput. Biol. 5, 681–702 (1998).
Article CAS PubMed Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article CAS PubMed Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Article PubMed PubMed Central Google Scholar
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Article PubMed PubMed Central Google Scholar
Ghosh, S. & Chan, C. K. Analysis of RNA-seq data using TopHat and Cufflinks. Methods Mol. Biol. 1374, 339–361 (2016).
Article CAS PubMed Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Article PubMed PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).
Article CAS PubMed Google Scholar
Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215 (2009).
Article CAS PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).
Article CAS PubMed PubMed Central Google Scholar
Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005).
Article CAS PubMed PubMed Central Google Scholar
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).
Article ADS CAS PubMed Google Scholar
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Article CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Weber, J. A., Aldana, R., Gallagher, B. D. & Edwards, J. S. Sentieon DNA pipeline for variant detection-Software-only solution, over 20× faster than GATK 3.3 with identical results. PeerJ PrePrints 4, e1672v1672 (2016).
Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central Google Scholar
Marcais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
Article PubMed PubMed Central Google Scholar
Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
Article PubMed PubMed Central Google Scholar
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Chiang, C. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 12, 966–968 (2015).
Article CAS PubMed PubMed Central Google Scholar
Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).
Article PubMed Google Scholar
Laurens, V. D. M. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014).
MathSciNet Google Scholar
Yang, Z. et al. ggComp enables dissection of germplasm resources and construction of a multiscale germplasm network in wheat. Plant Physiol. 188, 1950–1965 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gao, F., Ming, C., Hu, W. & Li, H. New software for the fast estimation of population recombination rates (FastEPRR) in the genomic era. G3 6, 1563–1571 (2016).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
Article CAS PubMed PubMed Central Google Scholar
Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64 (2009).
Article CAS PubMed Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016).
Article PubMed PubMed Central Google Scholar
Chen, Y. et al. A collinearity-incorporating homology inference strategy for connecting emerging assemblies in the Triticeae tribe as a pilot practice in the plant pangenomic era. Mol. Plant 13, 1694–1708 (2020).
Article CAS PubMed Google Scholar
Ma, S. et al. WheatOmics: a platform combining multiple omics data to accelerate functional genomics studies in wheat. Mol. Plant 14, 1965–1968 (2021).
Article CAS PubMed Google Scholar
He, W. et al. NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes. Bioinformatics 39, btad121 (2023).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Han, F., Lamb, J. C. & Birchler, J. A. High frequency of centromere inactivation resulting in stable dicentric chromosomes of maize. Proc. Natl Acad. Sci. USA 103, 3238–3243 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Fu, S., Chen, L., Wang, Y., Li, M. & Tang, Z. Oligonucleotide probes for ND-FISH analysis to identify rye and wheat chromosomes. Sci. Rep. 5, 10552 (2015).
Article ADS PubMed PubMed Central Google Scholar
Tang, Z., Yang, Z. & Fu, S. Oligonucleotides replacing the roles of repetitive sequences pAs1, pSc119.2, pTa-535, pTa71, CCS1, and pAWRC.1 for FISH analysis. J. Appl. Genet. 55, 313–318 (2014).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We appreciate Z. F. Lu for discussion on the VRN-A1 expression work and W. X. Wang for help on bioinformatics analysis. This project was funded by the National Key Research and Development Program of China (2023YFF1000400 and 2022YFD1201503). This project was also funded by the National Natural Science Foundation of China (grant no. 32322059) and the Innovation Program of Chinese Academy of Agricultural Sciences (CAAS-CSCB-202401). R.K.V. thanks Food Futures Institute, Murdoch University and Grains Research & Development Corporation (project nos. UMU2404-003RTX and WSU2303-001RTX) for supporting this work in part.

Author information

These authors contributed equally: Chengzhi Jiao, Xiaoming Xie, Chenyang Hao, Liyang Chen

Authors and Affiliations

State Key Laboratory of Crop Gene Resources and Breeding/Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
Chengzhi Jiao, Chenyang Hao, Yuxin Xie, Li Zhao, Tian Li, Junjie Fu, Jian Hou, Hongxia Liu, Xu Liu, Jizeng Jia, Long Mao & Xueyong Zhang
National Key Laboratory of Crop Genetics and Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, China
Chengzhi Jiao & Xiue Wang
Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing, China
Xiaoming Xie, Zihao Wang, Yuqi Zhang & Weilong Guo
Smartgenomics Technology Institute, Tianjin, China
Liyang Chen
Centre for Crop and Food Innovation, WA State Agricultural Biotechnology Centre, Food Futures Institute, Murdoch University, Murdoch, Western Australia, Australia
Vanika Garg, Annapurna Chitikineni, Rudi Appels & Rajeev K. Varshney
Harry Perkins Institute of Medical Research, the University of Western Australia, Murdoch, Western Australia, Australia
Girish Dwivedi
Department of Cardiology, Fiona Stanley Hospital, Murdoch, Western Australia, Australia
Girish Dwivedi
AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport, and Resources, La Trobe University, Bundoora, Victoria, Australia
Rudi Appels

Authors

Chengzhi Jiao
View author publications
Search author on:PubMed Google Scholar
Xiaoming Xie
View author publications
Search author on:PubMed Google Scholar
Chenyang Hao
View author publications
Search author on:PubMed Google Scholar
Liyang Chen
View author publications
Search author on:PubMed Google Scholar
Yuxin Xie
View author publications
Search author on:PubMed Google Scholar
Vanika Garg
View author publications
Search author on:PubMed Google Scholar
Li Zhao
View author publications
Search author on:PubMed Google Scholar
Zihao Wang
View author publications
Search author on:PubMed Google Scholar
Yuqi Zhang
View author publications
Search author on:PubMed Google Scholar
Tian Li
View author publications
Search author on:PubMed Google Scholar
Junjie Fu
View author publications
Search author on:PubMed Google Scholar
Annapurna Chitikineni
View author publications
Search author on:PubMed Google Scholar
Jian Hou
View author publications
Search author on:PubMed Google Scholar
Hongxia Liu
View author publications
Search author on:PubMed Google Scholar
Girish Dwivedi
View author publications
Search author on:PubMed Google Scholar
Xu Liu
View author publications
Search author on:PubMed Google Scholar
Jizeng Jia
View author publications
Search author on:PubMed Google Scholar
Long Mao
View author publications
Search author on:PubMed Google Scholar
Xiue Wang
View author publications
Search author on:PubMed Google Scholar
Rudi Appels
View author publications
Search author on:PubMed Google Scholar
Rajeev K. Varshney
View author publications
Search author on:PubMed Google Scholar
Weilong Guo
View author publications
Search author on:PubMed Google Scholar
Xueyong Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

X.Z. together with W.G. designed the project. X.Z., W.G. and R.K.V. supervised the execution and completion of the project. C.J., X.X. and L.C. performed the bioinformatics analysis. C.H managed the fieldwork and prepared the samples. L.Z. conducted the cytogenetic experiments. V.G., Z.W., Y.Z., T.L., J.F., A.C., J.H., H.L., G.D., X.L., J.J., L.M. and X.W. contributed to the conducting of experiments, data analysis and interpretation for various sections of the paper. X.Z., W.G., Y.X., V.G. and R.A. wrote the paper, with input from all authors. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Rajeev K. Varshney, Weilong Guo or Xueyong Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Erik Garrison, Sean Walkowiak and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 The genome diversity and representation of 17 assembled genomes for the 145 resequenced landmark cultivars (Hao et al.11).

a, Phylogenetic tree of all accessions inferred from whole-genome SNPs. The red lines indicate the cultivars used for de novo genome assembly. b, The t-distributed stochastic neighbor embedding (t-SNE) analysis based on the SNPs revealed the genetic relationships between the 17 de novo assemblies and the 145 landmark cultivars. c, The genome wide GGNet of 145 landmark cultivars. The 17 de novo assembled cultivars were marked in green for 1950s, blue for 1980s-1990s and red for post 2000s. The edge colors indicate the ranges of the gIBD ratio (genome similarity) for accession pairs. Only the edges in which the gIBD ratio ≥ 30% are shown. Grey edges, 40% > gIBD ratio ≥ 30%; Green edges, 50% > gIBD ratio ≥ 40%; Red edges, gIBD ratio ≥ 50%.

Extended Data Fig. 2 An overview of the 21 wheat genomes.

Living habit of the 20 cultivars with de novo genomes assembly. From left to right, the features are: wheat’s spring-winter characteristics, contig N50 and total length of genome, LAI value, and BUSCO evaluation results.

Extended Data Fig. 3 Wheat pangenome of the 21 cultivars and structural variations referenced to CS.

a, Core and pan gene clusters of 20 wheat genomes. The UpSet plot illustrates the core gene clusters (present in all genomes), soft-core gene clusters (present in 18-20 genomes) and dispensable gene clusters (present in 2-17 genomes). b, Pie Chart shows the proportion of the gene families marked by each composition. c, Density distribution of PAVs in representative cultivars from three breeding ages across chromosomes 3A, 3B, 3D, 5A, 5B, and 5D. The horizontal axis of each box represents the chromosomal position, while the vertical axis indicates the density of PAVs within 1 Mb windows. d, Scatter plots showing PAVs occurrence frequencies in 50s&60s and 80s&90s, 80s&90s and post 00s cultivars, respectively. The red scatter indicates the significantly selected PAVs harbored adjusted P values (FDR) bigger than 0.01, the green scatter indicates P values smaller than 0.01 and bigger than 1e−8, the bule scatter indicate P values smaller than 1e−8. The PAVs with frequency more than 0.5 in 80s&90s but less than 0.5 in 50s&60s are defined as specific high-frequency PAVs in 80s&90s. e, Source of specific high-frequency PAVs in cultivars released at three breeding stages respectively. The main source of PAV with specific high frequency was European cultivars.

Extended Data Fig. 4 Features of structural variations (SVs) in the pangenome comprising of 17 wheat cultivars.

a, Pie chart shows the proportion of the different SV types in wheat pan-genome. b, Percentage of PAVs overlapping with different genomic features. c, Distribution of PAVs based on their allele frequency in wheat accessions indicated most PAVs were present in one or only a few accessions. d, Distribution of PAV hotspot regions on 21 chromosomes. The red color represents the top 10 regions with PAVs hotspots, and the orange color represents the the next 10-20 regions with a high density of PAV hotspots. e, The number of PAVs sharply decreases with PAV length indicated that longer PAVs are relatively rare in the genome, while shorter PAVs are more common. f, Characteristic distribution of structural variation lengths in the 17 wheat genomes.

Extended Data Fig. 5 Distribution of SVs on each chromosome under different length intervals.

The B subgenome shows significantly higher density of SVs in the 50 bp–500 kb range compared to the A and D subgenomes, except for the 6A and 4B regions. The D subgenome has a lower density of SVs in this range. No significant difference in the number of SVs was observed among chromosomes for length ranges above 500 kb. a, 50 bp–1 kb; b, 1 kb−5 kb; c, 5 kb–10 kb; d, 10 kb–100 kb; e, 100 kb–500 kb; f, >500 kb.

Extended Data Fig. 6 Crossover recombination at regions proximal to cenctromeres was negatively affected by PAVs strongly.

a, b and c respectively depict the breeding history crossover recombination number (CRN) and PAV number within 100 Mb up- and down-stream of centromeres respectively in the A, B and D sub-genomes. The grey-shaded regions represent the centromeres regions identified by peaks of CRWs and Quintas. The blue line represents CRN per Mb window estimated based on re-sequencing data of the 145 cultivars in China, while the red line indicates the number PAVs per Mb window in the de novo assembled 17 genomes, representing the wheat breeding history in China. R, Pearson’s correlation coefficient between CRN frequencies and PAV counts. Red stars represent significant negative correlation between CRNs and the numbers of PAVs (P < 0.05).

Extended Data Fig. 7 Partition of and distribution of SVs at regions proximal to centromeres on chromosomes in A- and B-subgenomes.

Counts of different SVs between assembly pairs for intra- (purple) and inter-centAHG group (blue) on each chromosome for A- and B-subgenomes. Red bar, regions-proximal to centromeres, from 100 Mb upstream to 100 Mb downstream of centromere on each chromosome. Green bar, centAHG block previously identified³².

Extended Data Fig. 8 CNVs in the re-sequenced database of wheat and its tetraploid ancestors indicated VRN-A1 gene experienced strong selection in wheat origin, spread and breeding in reaction to the environments.

a-b, Hapmap of VRN-A1 copy number in landrace and cultivar. Blue represents VRN-A1 gene loss (0 copy), green, orange, and red represent VRN-A1 being 1 to 3 copies respectively. During the spread of wheat to China, the copy number increased (triple copy proportion increased), and the share of triple copy types of VRN-A1 significantly increased in the colder north-western region. However, in modern cultivars the frequency of cultivars with one or two copies increased in north China, probably caused by temperature global warming. The sizes of pies are relative to the count of samples in each location. c, Mean temperature in January from 1961 to 2020 in Henan, the largest wheat production province in China. The average temperature in January from 1981 to 2020 is marked with an orange line.

Extended Data Fig. 9 Gene structure of Pina mutations and the geographic distribution of Pina and Pinb.

a, The Pina-D1c allele found in the resequencing data exists only in Central Asia region. b-c, The Pina-D1b and Pinb-D1u alleles spread mainly to the east. d, IGV plot of resequencing data of wheat varieties. The deletion of the fragment results in the disruption of the Pina gene structure, followed by gene deletion.

Extended Data Fig. 10 Large PAVs of sub-telomere repeat on 1RS translocations among modern cultivars.

a, The collinearity at the end of short arm of 1RS in cultivars with 1RS·1BL and 1RS·7DL translocation. Each block represents 500 kb of sequence, and to highlight the position of the extended sequences, the collinear relationship between the extended sequences is in light-red. b, Distribution of different elements in extended intervals. Purple, green and light red rectangles denoting the telomere-associated sequences (TASs), sub-telomeric location sequences and tandem repeat interval, and the positions are staggered up and down to distinguish different elements intervals. c, The number and length of sub-telomere sequences on each genome. The yellow bars represent the number and length of sub-telomere sequences at the end of the short arm of chromosome 1B in 1RS·1BLtranslocation. The blue and black bars represent the number and length of sequences in the entire genome of non-1RS·1BL translocation hexaploid wheat and rye, respectively.

Extended Data Fig. 11 Genome composition of CM42, the first cultivar derived from cross between common wheat and the CIMMYT synthetic.

Introgressed fragments from Aegilops tauschii are marked in green. Cetromeres were indicated by black triangles.

Extended Data Table 1 Statistics of the assembly and annotation of 21 wheat genomes

Full size table

Supplementary information

Supplementary Figures (download DOCX )

Supplementary Figs. 1–16

Reporting Summary (download PDF )

Supplementary Tables (download XLSX )

Supplementary Tables 1–29

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiao, C., Xie, X., Hao, C. et al. Pan-genome bridges wheat structural variations with habitat and breeding. Nature 637, 384–393 (2025). https://doi.org/10.1038/s41586-024-08277-0

Download citation

Received: 16 January 2024
Accepted: 23 October 2024
Published: 27 November 2024
Version of record: 27 November 2024
Issue date: 09 January 2025
DOI: https://doi.org/10.1038/s41586-024-08277-0

This article is cited by

From the genome to super-pangenome: a new paradigm for accelerated crop improvement
- Ali Raza
- Yiran Li
- Rajeev K. Varshney
npj Science of Plants (2026)
Comprehensive analysis of the asparagine synthetase (ASN) gene family in common wheat suggests stress-responsive candidates
- Xiaojing Shan
- Tairui Lu
- Shengli Zhang
Genetic Resources and Crop Evolution (2026)
Application and prospects of phenotypic intelligent identification technology in genome-wide association studies of wheat
- Kai Liu
- Qier Liu
- Rongchang Yang
Cereal Research Communications (2026)
Pangenome identification and functional characterization of AHL genes in wheat (Triticum aestivum L.) reveal the role of TaAHL67 in grain weight regulation
- Lijian Guo
- Qinxia Song
- Delong Yang
BMC Plant Biology (2025)
Genetic dissection of plant height-related traits by combined methods in wheat (Triticum aestivum L.)
- Haiyan Zhang
- Yue Li
- Jun Zheng
BMC Plant Biology (2025)