A pangenome reference and population studies link structural variants with breeding traits in Gossypium hirsutum

Zhang, Yan; Sun, Zhengwen; Tian, Shilin; Wu, Liqiang; Gu, Qishen; Ke, Huifeng; Zhang, Guiyin; Chen, Bin; Wang, Zhicheng; Zhang, Jin; Zhang, Xinyu; Li, Ziming; Yang, Jun; Li, Xiangkong; Jiang, Yafei; Zhang, Kaijian; Wu, Jinhua; Wang, Guoning; Zhang, Dongmei; Wang, Xingyi; Meng, Chengsheng; Li, Yanbin; Zhang, Zixu; Chen, Weiyi; Jiao, Mengjia; Jia, Hao; Li, Jing; Zuo, Haonan; Wang, Yan; Gu, Man; Xie, Meixia; Wu, Lizhu; Li, Zhikun; Yan, Yuanyuan; Cui, Yanru; Liu, Jie; Wang, Xingfen; Ma, Zhiying

doi:10.1038/s41588-026-02523-z

Article
Published: 20 March 2026

A pangenome reference and population studies link structural variants with breeding traits in Gossypium hirsutum

Yan Zhang ORCID: orcid.org/0000-0002-1596-8060^1,2,3^na1,
Zhengwen Sun ORCID: orcid.org/0009-0000-4007-1792^1,2^na1,
Shilin Tian ORCID: orcid.org/0000-0001-8958-1806^1,4^na1,
Liqiang Wu ORCID: orcid.org/0000-0001-6754-8175^3,5^na1,
Qishen Gu ORCID: orcid.org/0000-0001-9666-5020^1,3^na1,
Huifeng Ke ORCID: orcid.org/0000-0002-3779-0814^1,5^na1,
Guiyin Zhang ORCID: orcid.org/0000-0003-3851-1263⁵^na1,
Bin Chen^3,5,
Zhicheng Wang^1,3,
Jin Zhang^1,3,
Xinyu Zhang^1,3,
Ziming Li^1,3,
Jun Yang ORCID: orcid.org/0000-0002-1537-8159^1,5,
Xiangkong Li⁴,
Yafei Jiang⁴,
Kaijian Zhang⁴,
Jinhua Wu^1,3,
Guoning Wang ORCID: orcid.org/0000-0002-8914-5235^1,5,
Dongmei Zhang ORCID: orcid.org/0000-0003-4560-9058^1,3,
Xingyi Wang^1,3,
Chengsheng Meng^1,5,
Yanbin Li^1,3,
Zixu Zhang^1,3,
Weiyi Chen^1,3,
Mengjia Jiao^1,3,
Hao Jia^1,3,
Jing Li^1,3,
Haonan Zuo^1,3,
Yan Wang^1,3,
Man Gu^1,3,
Meixia Xie^1,3,
Lizhu Wu^3,5,
Zhikun Li^1,5,
Yuanyuan Yan ORCID: orcid.org/0000-0002-9608-2324^1,2,
Yanru Cui^1,5,
Jie Liu^1,3,
Xingfen Wang ORCID: orcid.org/0000-0002-8576-4565^1,2,5 &
…
Zhiying Ma ORCID: orcid.org/0000-0002-0298-757X^1,2,3

Nature Genetics volume 58, pages 928–939 (2026) Cite this article

3867 Accesses
1 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Limited pangenome and ambiguous genomic architecture constrain comprehensive genetic variation discovery and cotton improvement. Here we assembled a telomere-to-telomere (T2T) genome for elite cultivar NDM13 and near-T2T genomes for 27 additional representatives of Gossypium hirsutum over the recent century, with transcriptomic profiling of 15 distinct tissues from each. We uncovered 51,551 one-to-one conserved orthologs across all genomes and landscapes of telomere, centromere, 45S rDNA, segmental duplication and copy number variant. We revealed hotspots of structural variation (SV) and impacts of SV, segmental duplication and copy number variant on gene expression or content alteration, as well as adversity resistances. We identified thousands of divergent SVs and genes implicated in modern breeding evolution. Combining T2T-reference-based pangenome construction and 761,536 SVs identified across 1,671 worldwide accessions with phenotypic data from 22 environments, we captured a number of hidden SVs that potentially influence critical breeding traits. These will boost genetic study and biotechnological improvement of the crop.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Characterization of T2T genome of NDM13.**

**Fig. 2: Gene-based pangenome analysis of 28 cottons.**

**Fig. 3: Landscape and diversification of complex regions in cotton.**

**Fig. 4: Genome-wide patterns of SDs and CNVs.**

**Fig. 5: Inferences from SVs from 27 genomes with reference to NDM13 and breeding history.**

**Fig. 6: Identification of important associated SVs underlying FL and FS.**

A sorghum pangenome reference improves global crop trait discovery

Article Open access 11 March 2026

Graph-based pangenome reveals structural variation dynamics during cucumber breeding

Article 10 February 2026

Graph pan-genome illuminates evolutionary trajectories and agronomic trait architecture in allotetraploid cotton

Article 02 January 2026

Data availability

The raw sequencing and transcriptome data for 28 cottons have been deposited in the National Genomics Data Center (NGDC) under the BioProject accession PRJCA023347 and in the NCBI Sequence Read Archive (SRA) under the BioProject accession PRJNA1132390. The genome assemblies of 28 cottons and the CENH3 ChIP–seq data for NDM13 have been deposited in the NGDC under the BioProject accession PRJCA023347. The resequencing data for 1,671 accessions are available in the NCBI SRA under the BioProject accession PRJNA680449 (1,081 cotton accessions) and PRJNA1132397 (590 cotton accessions). Source data are provided with this paper.

Code availability

The script and software used in this study are all publicly available from the internet as described in Methods and Reporting Summary. All custom scripts and codes associated with this project are available via Zenodo at https://doi.org/10.5281/zenodo.18357054 (ref. ¹²¹) and GitHub at https://github.com/SLBio/Analysis_pipeleine-NG-A66010.

References

Sven, B. Empire of Cotton: A Global History (Alfred A. Knopf Press, 2014).
Fang, L. et al. Genomic analyses in cotton identify signatures of selection and loci associated with fiber quality and yield traits. Nat. Genet. 49, 1089–1098 (2017).
PubMed CAS Google Scholar
The International Wheat Genome Sequencing Consortium et al. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018).
Google Scholar
Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588, 277–283 (2020).
PubMed PubMed Central CAS Google Scholar
Goff, S. A. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296, 92–100 (2002).
PubMed CAS Google Scholar
Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92 (2002).
PubMed CAS Google Scholar
Schnable, P. S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009).
PubMed CAS Google Scholar
Jiao, Y. et al. Improved maize reference genome with single-molecule technologies. Nature 546, 524–527 (2017).
PubMed PubMed Central CAS Google Scholar
Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021).
PubMed PubMed Central CAS Google Scholar
Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010).
PubMed CAS Google Scholar
Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).
PubMed CAS Google Scholar
Wang, K. et al. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 44, 1098–1103 (2012).
PubMed CAS Google Scholar
Li, F. et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567–572 (2014).
PubMed CAS Google Scholar
Li, F. et al. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat. Biotechnol. 33, 524–530 (2015).
PubMed Google Scholar
Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, 531–537 (2015).
PubMed CAS Google Scholar
Ma, Z. et al. Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield. Nat. Genet. 50, 803–813 (2018).
PubMed CAS Google Scholar
Hu, Y. et al. Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton. Nat. Genet. 51, 739–748 (2019).
PubMed CAS Google Scholar
Wang, M. et al. Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. Nat. Genet. 51, 224–229 (2019).
PubMed Google Scholar
Chen, Z. J. et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 52, 525–533 (2020).
PubMed PubMed Central CAS Google Scholar
Huang, G. et al. Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution. Nat. Genet. 52, 516–524 (2020).
PubMed PubMed Central CAS Google Scholar
Ma, Z. et al. High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement. Nat. Genet. 53, 1385–1391 (2021).
PubMed PubMed Central CAS Google Scholar
Wang, M. et al. Genomic innovation and regulatory rewiring during evolution of the cotton genus Gossypium. Nat. Genet. 54, 1959–1971 (2022).
PubMed CAS Google Scholar
Sreedasyam, A. et al. Genome resources for three modern cotton lines guide future breeding efforts. Nat. Plants 10, 1039–1051 (2024).
PubMed PubMed Central Google Scholar
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
PubMed PubMed Central CAS Google Scholar
Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1221–1231 (2023).
PubMed PubMed Central CAS Google Scholar
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
PubMed PubMed Central Google Scholar
Shang, L. et al. A complete assembly of the rice Nipponbare reference genome. Mol. Plant 16, 1232–1236 (2023).
PubMed CAS Google Scholar
Hu, Y. et al. Post-polyploidization centromere evolution in cotton. Nat. Genet. 57, 1021–1030 (2025).
Google Scholar
Hu, G. et al. A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation. Nat. Genet. 57, 1031–1043 (2025).
PubMed CAS Google Scholar
Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell 182, 162–176 (2020).
PubMed CAS Google Scholar
Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161 (2020).
PubMed PubMed Central CAS Google Scholar
Jayakodi, M. et al. The barley pan-genome reveals the hidden legacy of mutation breeding. Nature 588, 284–289 (2020).
PubMed PubMed Central CAS Google Scholar
Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542–3558 (2021).
PubMed CAS Google Scholar
Shi, J., Tian, Z., Lai, J. & Huang, X. Plant pan-genomics and its applications. Mol. Plant 16, 168–186 (2023).
PubMed CAS Google Scholar
Tang, D. et al. Genome evolution and diversity of wild and cultivated potatoes. Nature 606, 535–541 (2022).
PubMed PubMed Central CAS Google Scholar
Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022).
PubMed PubMed Central CAS Google Scholar
Li, N. et al. Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species. Nat. Genet. 55, 852–860 (2023).
PubMed PubMed Central CAS Google Scholar
He, Q. et al. A graph-based genome and pan-genome variation of the model plant Setaria. Nat. Genet. 55, 1232–1242 (2023).
PubMed PubMed Central CAS Google Scholar
Yang, Z. et al. Graph pan-genome illuminates evolutionary trajectories and agronomic trait architecture in allotetraploid cotton. Nat. Genet. 58, 218–229 (2026).
PubMed CAS Google Scholar
Gu, Q. et al. A high-density genetic map and multiple environmental tests reveal novel quantitative trait loci and candidate genes for fibre quality and yield in cotton. Theor. Appl. Genet. 133, 3395–3408 (2020).
PubMed CAS Google Scholar
Gu, Q. et al. A stable QTL qSalt-A04-1 contributes to salt tolerance in the cotton seed germination stage. Theor. Appl. Genet. 134, 2399–2410 (2021).
PubMed CAS Google Scholar
Zhang, X. et al. Breeding of high-quality cotton in Hebei province during the past 70 years. China Cotton 47, 1–6 (2020).
Google Scholar
Liao, W. W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
PubMed PubMed Central CAS Google Scholar
Yang, Z. et al. Recent progression and future perspectives in cotton genomic breeding. J. Integr. Plant Biol. 65, 548–569 (2023).
PubMed CAS Google Scholar
Zhang, C. Y. et al. High-quality genome of a modern soybean cultivar and resequencing of 547 accessions provide insights into the role of structural variation. Nat. Genet. 56, 2247–2258 (2024).
PubMed CAS Google Scholar
Yang, Z. et al. Multi-omics provides new insights into the domestication and improvement of dark jute (Corchorus olitorius). Plant J. 112, 812–829 (2022).
PubMed CAS Google Scholar
Zhang, Y. et al. The telomere-to-telomere gap-free genome of four rice parents reveals SV and PAV patterns in hybrid rice breeding. Plant Biotechnol. J. 20, 1642–1644 (2022).
PubMed PubMed Central CAS Google Scholar
Aganezov, S. et al. A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533 (2022).
PubMed PubMed Central CAS Google Scholar
Vollger, M. R. et al. Segmental duplications and their variation in a complete human genome. Science 376, eabj6965 (2022).
PubMed PubMed Central CAS Google Scholar
Bretani, G. et al. Segmental duplications are hot spots of copy number variants affecting barley gene content. Plant J. 103, 1073–1088 (2020).
PubMed PubMed Central CAS Google Scholar
Emanuel, B. S. & Shaikh, T. H. Segmental duplications: an ‘expanding’ role in genomic instability and disease. Nat. Rev. Genet. 2, 791–800 (2001).
PubMed CAS Google Scholar
Hosmani, P. S. et al. Dirigent domain-containing protein is part of the machinery required for formation of the lignin-based Casparian strip in the root. Proc. Natl Acad. Sci. USA 110, 14498–14503 (2013).
PubMed PubMed Central CAS Google Scholar
Paniagua, C. et al. Dirigent proteins in plants: modulating cell wall metabolism during abiotic and biotic stress exposure. J. Exp. Bot. 68, 3287–3301 (2017).
PubMed CAS Google Scholar
Wang, Y. et al. A dirigent family protein confers variation of Casparian strip thickness and salt tolerance in maize. Nat. Commun. 13, 2222 (2022).
PubMed PubMed Central CAS Google Scholar
Yang, X. et al. A loss-of-function of the dirigent gene TaDIR-B1 improves resistance to Fusarium crown rot in wheat. Plant Biotechnol. J. 19, 866–868 (2021).
PubMed CAS Google Scholar
Deng, J. et al. Dirigent gene family is involved in the molecular interaction between Panax notoginseng and root rot pathogen Fusarium solani. Ind. Crop. Prod. 178, 114544 (2022).
CAS Google Scholar
Lin, J. L. et al. Dirigent gene editing of gossypol enantiomers for toxicity-depleted cotton seeds. Nat. Plants 9, 605–615 (2023).
PubMed Google Scholar
Li, S. et al. Genome-edited powdery mildew resistance in wheat without growth penalties. Nature 602, 455–460 (2022).
PubMed CAS Google Scholar
Li, Y. B. et al. The thioredoxin GbNRX1 plays a crucial role in homeostasis of apoplastic reactive oxygen species in response to Verticillium dahliae infection in cotton. Plant Physiol. 170, 2392–2406 (2016).
PubMed PubMed Central CAS Google Scholar
Chen, J. et al. NLR surveillance of pathogen interference with hormone receptors induces immunity. Nature 613, 145–152 (2023).
PubMed CAS Google Scholar
Wang, N. et al. An F-box protein attenuates fungal xylanase-triggered immunity by destabilizing LRR-RLP NbEIX2 in a SOBIR1-dependent manner. New Phytol. 236, 2202–2215 (2022).
PubMed CAS Google Scholar
Bian, Y. et al. Cancer SLC43A2 alters T cell methionine metabolism and histone methylation. Nature 585, 277–282 (2020).
PubMed PubMed Central CAS Google Scholar
Zhai, K. et al. NLRs guard metabolism to coordinate pattern- and effector-triggered immunity. Nature 601, 245–251 (2022).
PubMed CAS Google Scholar
Porubsky, D. et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 185, 1986–2005 (2022).
PubMed PubMed Central CAS Google Scholar
Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).
PubMed PubMed Central CAS Google Scholar
Jamshed, M. et al. Identification of stable quantitative trait loci (QTLs) for fiber quality traits across multiple environments in Gossypium hirsutum recombinant inbred line population. BMC Genomics 17, 197 (2016).
PubMed PubMed Central Google Scholar
Rico, M. & Egelhoff, T. T. Myosin heavy chain kinase B participates in the regulation of myosin assembly into the cytoskeleton. J. Cell Biochem. 88, 521–532 (2003).
PubMed CAS Google Scholar
Song, X. et al. Genome-wide association analysis reveals loci and candidate genes involved in fiber quality traits under multiple field environments in cotton (Gossypium hirsutum). Front. Plant Sci. 12, 695503 (2021).
PubMed PubMed Central Google Scholar
Shao, Q. et al. Identifying QTL for fiber quality traits with three upland cotton (Gossypium hirsutum L.) populations. Euphytica 198, 43–58 (2014).
Google Scholar
Zhang, Z. et al. Genome-wide quantitative trait loci reveal the genetic basis of cotton fibre quality and yield-related traits in a Gossypium hirsutum recombinant inbred line population. Plant Biotechnol. J. 18, 239–253 (2020).
PubMed CAS Google Scholar
Ling, J. Karyotype Analysis by Telomere-FISH and Primary Development of High-Resolution Cytological Map in Cotton. PhD thesis, Chinese Academy of Agricultural Sciences (2008).
Dvořáčková, M., Fojtová, M. & Fajkus, J. Chromatin dynamics of plant telomeres and ribosomal genes. Plant J. 83, 18–37 (2015).
PubMed Google Scholar
Sykorova, E. et al. The absence of Arabidopsis-type telomeres in Cestrum and closely related genera Vestia and Sessea (Solanaceae): first evidence from eudicots. Plant J. 34, 283–291 (2003).
PubMed CAS Google Scholar
Sykorová, E. et al. Minisatellite telomeres occur in the family Alliaceae but are lost in Allium. Am. J. Bot. 93, 814–823 (2006).
PubMed Google Scholar
He, S. et al. The genomic basis of geographic differentiation and fiber improvement in cultivated cotton. Nat. Genet. 53, 916–924 (2021).
PubMed CAS Google Scholar
Yang, Z. et al. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat. Commun. 10, 2989 (2019).
PubMed PubMed Central Google Scholar
Harringmeyer, O. & Hoekstra, H. Chromosomal inversion polymorphisms shape the genomic landscape of deer mice. Nat. Ecol. Evol. 6, 1965–1979 (2022).
PubMed PubMed Central Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
PubMed PubMed Central Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
PubMed PubMed Central CAS Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
PubMed PubMed Central CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
PubMed PubMed Central Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
PubMed Google Scholar
Chang, X. et al. High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the landscape and evolution of centromeres. Plant Commun. 5, 100722 (2023).
PubMed PubMed Central Google Scholar
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
PubMed PubMed Central Google Scholar
Mount, D. W. Using the basic local alignment search tool (BLAST). CSH Protoc. 2007, pdb.top17 (2007).
PubMed Google Scholar
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-4.0. Institute for Systems Biology http://www.repeatmasker.org (2013–2015).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
PubMed PubMed Central Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, 265–268 (2007).
Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).
PubMed CAS Google Scholar
Edgar, R. C. & Myers, E. W. PILER: identification and classification of genomic repeats. Bioinformatics 21, i152–i158 (2005).
PubMed CAS Google Scholar
Smit, A. & Hubley, R. RepeatModeler Open-1.0. Institute for Systems Biology http://www.repeatmasker.org (2008–2015).
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).
PubMed PubMed Central CAS Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
PubMed PubMed Central CAS Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
PubMed PubMed Central CAS Google Scholar
Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215 (2003).
PubMed Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
PubMed CAS Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
PubMed PubMed Central Google Scholar
Guigó, R. Assembling genes from predicted exons in linear time with dynamic programming. J. Comput. Biol. 5, 681–702 (1998).
PubMed Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
PubMed CAS Google Scholar
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
PubMed PubMed Central Google Scholar
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
PubMed PubMed Central CAS Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
PubMed PubMed Central Google Scholar
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46, 2699 (2018).
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, 279–285 (2016).
Google Scholar
The Gene Ontology Consortium. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 45, 331–338 (2017).
Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, 199–205 (2014).
Google Scholar
Išerić, H., Alkan, C., Hach, F. & Numanagić, I. Fast characterization of segmental duplication structure in multiple genome assemblies. Algorithms Mol. Biol. 17, 4 (2022).
PubMed PubMed Central Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
PubMed PubMed Central Google Scholar
Chakraborty, M., Emerson, J. J., Macdonald, S. J. & Long, A. D. Structural variants exhibit widespread allelic heterogeneity and shape variation in complex traits. Nat. Commun. 10, 4872 (2019).
PubMed PubMed Central CAS Google Scholar
Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
PubMed PubMed Central Google Scholar
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
PubMed CAS Google Scholar
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
PubMed PubMed Central CAS Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
PubMed PubMed Central CAS Google Scholar
Putri, G. H., Anders, S., Pyl, P. T., Pimanda, J. E. & Zanini, F. Analysing high-throughput sequencing data in Python with HTSeq 2.0. Bioinformatics 38, 2943–2945 (2022).
PubMed PubMed Central CAS Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
PubMed PubMed Central CAS Google Scholar
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
PubMed PubMed Central CAS Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
PubMed PubMed Central CAS Google Scholar
Ge, X. et al. Efficient genotype-independent cotton genetic transformation and genome editing. J. Integr. Plant Biol. 65, 907–917 (2023).
PubMed CAS Google Scholar
Sun, Z. Scripts and code used in ‘A pangenome reference and population studies link structural variants with breeding traits in Gossypium hirsutum’. Zenodo https://doi.org/10.5281/zenodo.18357054 (2026).

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2022YFF1001403) to Y.Z., Z.M. and Xingfen Wang; the Science Research Project of Hebei Education Department (PTZX2026014) to Xingfen Wang, Y.Z., Z.S. and Z.M.; the Natural Science Foundation (C2022204205) to Xingfen Wang, Y.Z. and Z.S.; the Key Research and Development Program (21326314D) to Z.M., Xingfen Wang and Y.Z.; the Top Talent Project (031601801) of Hebei Province to Z.M.; the National Key Project of Bio-breeding of China (2023ZD04039) to Z.M., Xingfen Wang, Y.Z. and Z.S.; the China Agricultural Research System (CARS-15-03) to L.W., Xingfen Wang, Y.Z. and Z.M. and the Project for National Top Talent (0602019) and Shennong Plan of China to Y.Z.

Author information

These authors contributed equally: Yan Zhang, Zhengwen Sun, Shilin Tian, Liqiang Wu, Qishen Gu, Huifeng Ke, Guiyin Zhang.

Authors and Affiliations

North China Key Laboratory for Crop Germplasm Resources of Education Ministry, Hebei Agricultural University, Baoding, China
Yan Zhang, Zhengwen Sun, Shilin Tian, Qishen Gu, Huifeng Ke, Zhicheng Wang, Jin Zhang, Xinyu Zhang, Ziming Li, Jun Yang, Jinhua Wu, Guoning Wang, Dongmei Zhang, Xingyi Wang, Chengsheng Meng, Yanbin Li, Zixu Zhang, Weiyi Chen, Mengjia Jiao, Hao Jia, Jing Li, Haonan Zuo, Yan Wang, Man Gu, Meixia Xie, Zhikun Li, Yuanyuan Yan, Yanru Cui, Jie Liu, Xingfen Wang & Zhiying Ma
State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding, China
Yan Zhang, Zhengwen Sun, Yuanyuan Yan, Xingfen Wang & Zhiying Ma
Key Laboratory for Crop Germplasm Resources of Hebei Province, Hebei Agricultural University, Baoding, China
Yan Zhang, Liqiang Wu, Qishen Gu, Bin Chen, Zhicheng Wang, Jin Zhang, Xinyu Zhang, Ziming Li, Jinhua Wu, Dongmei Zhang, Xingyi Wang, Yanbin Li, Zixu Zhang, Weiyi Chen, Mengjia Jiao, Hao Jia, Jing Li, Haonan Zuo, Yan Wang, Man Gu, Meixia Xie, Lizhu Wu, Jie Liu & Zhiying Ma
Novogene Bioinformatics Institute, Beijing, China
Shilin Tian, Xiangkong Li, Yafei Jiang & Kaijian Zhang
Collaborative Innovation Center for Cotton Industry of Hebei Province, Hebei Agricultural University, Baoding, China
Liqiang Wu, Huifeng Ke, Guiyin Zhang, Bin Chen, Jun Yang, Guoning Wang, Chengsheng Meng, Lizhu Wu, Zhikun Li, Yanru Cui & Xingfen Wang

Authors

Yan Zhang
View author publications
Search author on:PubMed Google Scholar
Zhengwen Sun
View author publications
Search author on:PubMed Google Scholar
Shilin Tian
View author publications
Search author on:PubMed Google Scholar
Liqiang Wu
View author publications
Search author on:PubMed Google Scholar
Qishen Gu
View author publications
Search author on:PubMed Google Scholar
Huifeng Ke
View author publications
Search author on:PubMed Google Scholar
Guiyin Zhang
View author publications
Search author on:PubMed Google Scholar
Bin Chen
View author publications
Search author on:PubMed Google Scholar
Zhicheng Wang
View author publications
Search author on:PubMed Google Scholar
Jin Zhang
View author publications
Search author on:PubMed Google Scholar
Xinyu Zhang
View author publications
Search author on:PubMed Google Scholar
Ziming Li
View author publications
Search author on:PubMed Google Scholar
Jun Yang
View author publications
Search author on:PubMed Google Scholar
Xiangkong Li
View author publications
Search author on:PubMed Google Scholar
Yafei Jiang
View author publications
Search author on:PubMed Google Scholar
Kaijian Zhang
View author publications
Search author on:PubMed Google Scholar
Jinhua Wu
View author publications
Search author on:PubMed Google Scholar
Guoning Wang
View author publications
Search author on:PubMed Google Scholar
Dongmei Zhang
View author publications
Search author on:PubMed Google Scholar
Xingyi Wang
View author publications
Search author on:PubMed Google Scholar
Chengsheng Meng
View author publications
Search author on:PubMed Google Scholar
Yanbin Li
View author publications
Search author on:PubMed Google Scholar
Zixu Zhang
View author publications
Search author on:PubMed Google Scholar
Weiyi Chen
View author publications
Search author on:PubMed Google Scholar
Mengjia Jiao
View author publications
Search author on:PubMed Google Scholar
Hao Jia
View author publications
Search author on:PubMed Google Scholar
Jing Li
View author publications
Search author on:PubMed Google Scholar
Haonan Zuo
View author publications
Search author on:PubMed Google Scholar
Yan Wang
View author publications
Search author on:PubMed Google Scholar
Man Gu
View author publications
Search author on:PubMed Google Scholar
Meixia Xie
View author publications
Search author on:PubMed Google Scholar
Lizhu Wu
View author publications
Search author on:PubMed Google Scholar
Zhikun Li
View author publications
Search author on:PubMed Google Scholar
Yuanyuan Yan
View author publications
Search author on:PubMed Google Scholar
Yanru Cui
View author publications
Search author on:PubMed Google Scholar
Jie Liu
View author publications
Search author on:PubMed Google Scholar
Xingfen Wang
View author publications
Search author on:PubMed Google Scholar
Zhiying Ma
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.Z., Z.S., Xingfen Wang and Z.M. performed most of the experiments and analyzed the data. Y.Z., Z.S., Xingfen Wang, L.W., Q.G., H.K., G.Z., B.C., Z.W., J.Z., X.Z. Z.L., J.Y., J.W., G.W., D.Z., Xingyi Wang, C.M., Y.L., Z.Z., W.C., M.J., H.J., J. Li, H.Z., Y.W., M.G., M.X., L.W., Z.L., Y.Y., Y.C. and J. Liu performed field trials, trait determination and sample preparation. S.T., X.L., Y.J., K.Z., Z.S., Y.Z. and Xingfen Wang performed the genome assembly and genomic analyses. Y.Z., Xingfen Wang, Z.S., S.T., X.L. and Z.M. identified genomic variations and constructed tables and figures. Y.Z., Z.S., Q.G. and Xingfen Wang conducted the genetic analyses of breeding traits. Y.Z., Xingfen Wang and Z.M. wrote the paper. Z.M. and Xingfen Wang conceived and supervised the project.

Corresponding authors

Correspondence to Xingfen Wang or Zhiying Ma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Phylogenetic tree of 1,671 accessions and highly diverse agronomic phenotypes across 28 accessions.

a, Phylogenetic tree using genome-wide SNP data. This tree incorporated G. barbadense cv. Pima90 as the outgroup, with all branch lengths annotated for clarity. Branch lengths are quantified through substitutions per site. (b-e) The diverse agronomic phenotypes among 28 accessions, including seed size and color (b), length of boll handle (c), leaf size and shape (d), boll size and shape (e). f, Fiber length. g, Fiber strength, h, Lint percentage. All scale bars represent 1 cm.

Extended Data Fig. 2 Density of SD blocks identified in the NDM13 and NDM8 genomes.

The blue lines indicate the synteny between NDM13 and NDM8 in each chromosome.

Extended Data Fig. 3 The density of gene models, Copia and Gypsy of the 28 genomes with 1,000 windows.

The vertical dashed lines indicate the 10% windows of the left and right, respectively.

Extended Data Fig. 4 Expression comparison among core, dispensable and private genes based on the averaged FPKM of 15 tissues in each cotton.

In the box plots, the center line denotes the median; box limits are the upper and lower quartiles; whiskers mark the range of the data. Statistical significance was determined using a two-side wilcox test.

Extended Data Fig. 5 An example for SV hotspots located in chromosome Dt01.

a, SV hotspots in 60-61 Mb of chromosome Dt01 from each accession. b, Disease resistance-related genes located in hotspot.

Supplementary information

Supplementary Information (download PDF )

Supplementary Notes 1 and 2, and Figs. 1–15.

Reporting Summary (download PDF )

Supplementary Tables (download XLSX )

Supplementary Tables 1–52.

Source data

Source Data Fig. 6 (download PDF )

Unprocessed gel.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Sun, Z., Tian, S. et al. A pangenome reference and population studies link structural variants with breeding traits in Gossypium hirsutum. Nat Genet 58, 928–939 (2026). https://doi.org/10.1038/s41588-026-02523-z

Download citation

Received: 11 July 2024
Accepted: 27 January 2026
Published: 20 March 2026
Version of record: 20 March 2026
Issue date: April 2026
DOI: https://doi.org/10.1038/s41588-026-02523-z