Genome assembly of two allotetraploid cotton germplasms reveals mechanisms of somatic embryogenesis and enables precise genome editing

Xu, Zhongping; Wang, Guanying; Zhu, Xiangqian; Wang, Ruipeng; Zhu, Longfu; Tu, Lili; Liu, Yuling; Peng, Renhai; Lindsey, Keith; Wang, Maojun; Zhang, Xianlong; Jin, Shuangxia

doi:10.1038/s41588-025-02258-3

Article
Published: 22 July 2025

Genome assembly of two allotetraploid cotton germplasms reveals mechanisms of somatic embryogenesis and enables precise genome editing

Nature Genetics volume 57, pages 2028–2039 (2025)Cite this article

2438 Accesses
11 Altmetric
Metrics details

Subjects

Abstract

Somatic embryogenesis is crucial for plant genetic engineering, yet the underlying mechanisms in cotton remain poorly understood. Here we present a telomere-to-telomere assembly of Jin668 and a high-quality assembly of YZ1, two highly regenerative allotetraploid cotton germplasms. The completion of the Jin668 genome enables characterization of ~30.1 Mb of centromeric regions invaded by centromeric retrotransposon of maize and Tekay retrotransposons, an ~8.1 Mb 5S rDNA array containing 25,190 copies and a ~75.1 Mb major 45S rDNA array with 8,131 copies. Comparative analyses of regenerative and recalcitrant genotypes reveal dynamic transcriptional patterns and chromatin accessibility during the initial regeneration process. A hierarchical gene regulatory network identifies AGL15 as a contributor to regeneration. Additionally, we demonstrate that genetic variation affects sgRNA target sites, while the Jin668 genome assembly reduces the risk of off-target effects in CRISPR-based genome editing. Together, the complete Jin668 genome reveals the complexity of genomic regions and cotton regeneration, and improves the precision of genome editing.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A comprehensive overview of the genetic transformation system of cotton and the complete genome features of Jin668 and YZ1.**

**Fig. 2: Completeness and accuracy validation of both chromosome arms and centromeres in Jin668 genome.**

**Fig. 3: Centromeric characteristics of Jin668.**

**Fig. 4: In-depth comparative analysis of the Jin668 versus TM-1, YZ1 and ZM24.**

**Fig. 5: Overview of chromatin accessibility and transcriptome dynamics in Jin668 and TM-1 during SE.**

**Fig. 6: Functional verification of regeneration-related genes.**

**Fig. 7: Cotton genetic variation substantially impacts the efficacy of CRISPR-based genome editing.**

A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation

Article 17 March 2025

Epigenomic and 3D genomic mapping reveals developmental dynamics and subgenomic asymmetry of transcriptional regulatory architecture in allotetraploid cotton

Article Open access 27 December 2024

A telomere-to-telomere cotton genome assembly reveals centromere evolution and a Mutator transposon-linked module regulating embryo development

Article 15 August 2024

Data availability

The T2T-Jin668 and YZ1 genome assemblies and annotation data are available at NCBI (PRJNA874817 and PRJNA960814) and T2TCotton-Hub (http://jinlab.hzau.edu.cn/T2TCottonHub/ or http://cotton.hzau.edu.cn/T2TCottonHub/). The raw sequencing data used for de novo whole-genome assembly of Jin668 and YZ1 are available in NCBI under BioProjects PRJNA874817 and PRJNA960814, respectively. RNA-seq of Jin668 and YZ1 are available in NCBI under BioProjects PRJNA874819 and PRJNA960820, respectively. ATAC–seq data of Jin668 is available in NCBI under BioProjects PRJNA960832. ATAC–seq and RNA-seq data for TM-1 during SE are available in NCBI under BioProjects PRJNA960828 and PRJNA960825, respectively. ATAC–seq and RNA-seq data for YZ1 during SE are available in NCBI under BioProjects PRJNA1059614 and PRJNA1059613, respectively. ATAC–seq and RNA-seq data for ZB1092 during SE are available in NCBI under BioProjects PRJNA1059611 and PRJNA1059609, respectively. The ChIP–seq data for Jin668 and YZ1 are uploaded to BioProjects PRJNA1079680 and PRJNA1079682, respectively. Seeds of Jin668 and YZ1 used in this study can be obtained from the corresponding author upon request. The reference genome assembly and annotation files of TM-1 (v3.1) used in this study were downloaded from https://phytozome-next.jgi.doe.gov/ and are also accessible from T2TCottonHub. Additionally, all available SE-related motifs were obtained from Plant Transcription Factor Database (v5.0; https://planttfdb.gao-lab.org/). Further details on data accessibility are outlined in the Supplementary Methods and Methods. Source data are provided with this paper.

Code availability

All original codes used in this article are available via Zenodo at https://doi.org/10.5281/zenodo.15035095 (ref. ⁹⁶) and GitHub (https://github.com/tiramisutes/T2T-Cotton-Genomes).

References

Bhatia, S., Sharma, K., Dahiya, R. & Bera, T. (eds). Modern Applications of Plant Biotechnology in Pharmaceutical Sciences, pp. 209–230 (Academic Press, 2015).
Zheng, Q. & Perry, S. E. Alterations in the transcriptome of Soybean in response to enhanced somatic embryogenesis promoted by orthologs of AGAMOUS-like15 and AGAMOUS-like18. Plant Physiol. 164, 1365–1377 (2014).
Article PubMed PubMed Central CAS Google Scholar
Horstman, A., Bemer, M. & Boutilier, K. A transcriptional view on somatic embryogenesis. Regeneration (Oxf.) 4, 201–216 (2017).
Article PubMed Google Scholar
Wang, K. et al. The gene TaWOX5 overcomes genotype dependency in wheat genetic transformation. Nat. Plants 8, 110–117 (2022).
Article PubMed Google Scholar
Chen, Z., Debernardi, J. M., Dubcovsky, J. & Gallavotti, A. Recent advances in crop transformation technologies. Nat. Plants 8, 1343–1351 (2022).
Article PubMed CAS Google Scholar
Li, J. et al. Multi-omics analyses reveal epigenomics basis for cotton somatic embryogenesis through successive regeneration acclimation process. Plant Biotechnol. J. 17, 435–450 (2019).
Article PubMed CAS Google Scholar
Iwase, A. et al. WIND1-based acquisition of regeneration competency in Arabidopsis and rapeseed. J. Plant Res. 128, 389–397 (2015).
Article PubMed CAS Google Scholar
Lowe, K. et al. Morphogenic regulators Baby boom and Wuschel improve monocot transformation. Plant Cell 28, 1998 (2016).
Article PubMed PubMed Central CAS Google Scholar
Debernardi, J. M. et al. A GRF–GIF chimeric protein improves the regeneration efficiency of transgenic plants. Nat. Biotechnol. 38, 1274–1279 (2020).
Article PubMed PubMed Central CAS Google Scholar
Wang, M. et al. Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. Nat. Genet. 51, 224–229 (2019).
Article PubMed Google Scholar
Yang, Z. et al. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat. Commun. 10, 2989 (2019).
Article PubMed PubMed Central Google Scholar
Conover, J. L. & Wendel, J. F. Deleterious mutations accumulate faster in allopolyploid than diploid cotton (Gossypium) and unequally between subgenomes. Mol. Biol. Evol. 39, msac024 (2022).
Article PubMed PubMed Central CAS Google Scholar
Sun, C. et al. Precise integration of large DNA sequences in plant genomes using PrimeRoot editors. Nat. Biotechnol. 42, 316–327 (2024).
Article PubMed CAS Google Scholar
Chen, K., Wang, Y., Zhang, R., Zhang, H. & Gao, C. CRISPR/Cas genome editing and precision plant breeding in agriculture. Annu. Rev. Plant Biol. 70, 667–697 (2019).
Article PubMed CAS Google Scholar
Wang, G. et al. Precise fine-turning of GhTFL1 by base editing tools defines ideal cotton plant architecture. Genome Biol. 25, 59 (2024).
Article PubMed PubMed Central Google Scholar
Jiang, T., Zhang, X.-O., Weng, Z. & Xue, W. Deletion and replacement of long genomic sequences using prime editing. Nat. Biotechnol. 40, 227–234 (2022).
Article PubMed CAS Google Scholar
Fernie, A. R. & Yan, J. De novo domestication: an alternative route toward new crops for the future. Mol. Plant 12, 615–631 (2019).
Article PubMed CAS Google Scholar
Shoemaker, R., Couche, L. & Galbraith, D. Characterization of somatic embryogenesis and plant regeneration in cotton (Gossypium hirsutum L.). Plant Cell Rep. 5, 178–181 (1986).
Article PubMed CAS Google Scholar
Jin, S. et al. Identification of a novel elite genotype for in vitro culture and genetic transformation of cotton. Biol. Plant. 50, 519–524 (2006).
Article CAS Google Scholar
Wang, L. et al. The GhmiR157a–GhSPL10 regulatory module controls initial cellular dedifferentiation and callus proliferation in cotton by modulating ethylene-mediated flavonoid biosynthesis. J. Exp. Bot. 69, 1081–1093 (2017).
Xu, J. GhL1L1 affects cell fate specification by regulating GhPIN1-mediated auxin distribution. Plant Biotechnol. J. 17, 63–74 (2019).
Article PubMed CAS Google Scholar
Deng, J. et al. GhTCE1–GhTCEE1 dimers regulate transcriptional reprogramming during wound-induced callus formation in cotton. Plant Cell 34, 4554–4568 (2022).
Article PubMed PubMed Central Google Scholar
Yuan, J. et al. GhRCD1 regulates cotton somatic embryogenesis by modulating the GhMYC3–GhMYB44–GhLBD18 transcriptional cascade. New Phytol. 240, 207–223 (2023).
Article PubMed CAS Google Scholar
Guo, H. et al. Somatic embryogenesis critical initiation stage-specific mCHH hypomethylation reveals epigenetic basis underlying embryogenic redifferentiation in cotton. Plant Biotechnol. J. 18, 1648–1650 (2020).
Article PubMed PubMed Central Google Scholar
Ge, X. et al. Efficient genotype-independent cotton genetic transformation and genome editing. J. Integr. Plant Biol. 65, 907–917 (2023).
Article PubMed CAS Google Scholar
Liu, Y. et al. Cloning and preliminary verification of telomere-associated sequences in upland cotton. Comp. Cytogenet. 14, 183–195 (2020).
Article PubMed PubMed Central Google Scholar
Wang, P. & Wang, F. A proposed metric set for evaluation of genome assembly quality. Trends Genet. 39, 175–186 (2023).
Article PubMed CAS Google Scholar
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Article PubMed PubMed Central CAS Google Scholar
Vollger, M. R. et al. Long-read sequence and assembly of segmental duplications. Nat. Methods 16, 88–94 (2019).
Article PubMed CAS Google Scholar
Luo, S. et al. The cotton centromere contains a Ty3-Gypsy-like LTR retroelement. PLoS ONE 7, e35261 (2012).
Article PubMed PubMed Central CAS Google Scholar
Gorinšek, B., Gubenšek, F. & Kordiš, D. A. Evolutionary genomics of chromoviruses in eukaryotes. Mol. Biol. Evol. 21, 781–798 (2004).
Article PubMed Google Scholar
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
Article PubMed PubMed Central Google Scholar
Schmitz, R. J., Grotewold, E. & Stam, M. Cis-regulatory sequences in plants: their importance, discovery, and future challenges. Plant Cell 34, 718–741 (2021).
Article PubMed Central Google Scholar
Zhu, X. et al. Single-cell resolution analysis reveals the preparation for reprogramming the fate of stem cell niche in cotton lateral meristem. Genome Biol. 24, 194 (2023).
Article PubMed PubMed Central CAS Google Scholar
Braybrook, S. A. & Harada, J. J. LECs go crazy in embryo development. Trends Plant Sci. 13, 624–630 (2008).
Article PubMed CAS Google Scholar
Ji, J. et al. WOX4 promotes procambial development. Plant Physiol. 152, 1346–1356 (2009).
Article PubMed PubMed Central Google Scholar
Wang, F. et al. Chromatin accessibility dynamics and a hierarchical transcriptional regulatory network structure for plant somatic embryogenesis. Dev. Cell 54, 742–757 (2020).
Article PubMed CAS Google Scholar
Izhaki, A. & Bowman, J. L. KANADI and Class III HD-Zip gene families regulate embryo patterning and modulate auxin flow during embryogenesis in Arabidopsis. Plant Cell 19, 495–508 (2007).
Article PubMed PubMed Central CAS Google Scholar
Wang, G. et al. Development of an efficient and precise adenine base editor (ABE) with expanded target range in allotetraploid cotton (Gossypium hirsutum). BMC Biol. 20, 45 (2022).
Article PubMed PubMed Central Google Scholar
Li, C. et al. Targeted, random mutagenesis of plant genes with dual cytosine and adenine base editors. Nat. Biotechnol. 38, 875–882 (2020).
Article PubMed CAS Google Scholar
Xue, C. et al. Tuning plant phenotypes by precise, graded downregulation of gene expression. Nat. Biotechnol. 41, 1758–1764 (2023).
Article PubMed CAS Google Scholar
Xu, M., Du, Q., Tian, C., Wang, Y. & Jiao, Y. Stochastic gene expression drives mesophyll protoplast regeneration. Sci. Adv. 7, eabg8466 (2021).
Article PubMed PubMed Central CAS Google Scholar
Zhang, L. et al. A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour. Nat. Commun. 10, 1494 (2019).
Article PubMed PubMed Central Google Scholar
Qin, L. et al. High-efficient and precise base editing of C·G to T·A in the allotetraploid cotton (Gossypium hirsutum) genome using a modified CRISPR/Cas9 system. Plant Biotechnol. J. 18, 45–56 (2020).
Article PubMed CAS Google Scholar
Jin, S. et al. Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292–295 (2019).
Article PubMed CAS Google Scholar
Hirano, H. et al. Structure and engineering of Francisella novicida Cas9. Cell 164, 950–961 (2016).
Article PubMed PubMed Central CAS Google Scholar
Huang, G. et al. Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution. Nat. Genet. 52, 516–524 (2020).
Article PubMed PubMed Central CAS Google Scholar
Chen, Z. J. et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 52, 525–533 (2020).
Article PubMed PubMed Central CAS Google Scholar
Sreedasyam, A. et al. Genome resources for three modern cotton lines guide future breeding efforts. Nat. Plants 10, 1039–1051 (2024).
Article PubMed PubMed Central Google Scholar
Han, J. et al. Rapid proliferation and nucleolar organizer targeting centromeric retrotransposons in cotton. Plant J. 88, 992–1005 (2016).
Article PubMed CAS Google Scholar
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford nanopore sequencing. Genome Biol. 20, 129 (2019).
Article PubMed PubMed Central Google Scholar
Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol. 25, 107 (2024).
Article PubMed PubMed Central Google Scholar
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2019).
Article Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article PubMed PubMed Central CAS Google Scholar
Kirov, I., Gilyok, M., Knyazev, A. & Fesenko, I. Pilot satellitome analysis of the model plant, Physcomitrella patens, revealed a transcribed and high-copy IGS related tandem repeat. Comp. Cytogenet. 12, 493–513 (2018).
Article PubMed PubMed Central Google Scholar
Stovner, E. B. & Sætrom, P. epic2 efficiently finds diffuse domains in ChIP–seq data. Bioinformatics 35, 4392–4393 (2019).
Article PubMed CAS Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764 (2011).
Article PubMed PubMed Central Google Scholar
Liu, J. et al. Gapless assembly of maize chromosomes using long-read technologies. Genome Biol. 21, 121 (2020).
Article PubMed PubMed Central CAS Google Scholar
Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
Article PubMed PubMed Central Google Scholar
Quinlan, A. R. BEDTools: the Swiss‐army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11.12. 1–11.12. 34 (2014).
PubMed Google Scholar
Mikheenko, A., Bzikadze, A. V., Gurevich, A., Miga, K. H. & Pevzner, P. A. TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 36, i75–i83 (2020).
Article PubMed PubMed Central CAS Google Scholar
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275–292 (2019).
Article PubMed PubMed Central CAS Google Scholar
Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).
Article PubMed Google Scholar
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
Article PubMed PubMed Central CAS Google Scholar
Peng, R. et al. Evolutionary divergence of duplicated genomes in newly described allotetraploid cottons. Proc. Natl Acad. Sci. USA 119, e2208496119 (2022).
Article PubMed PubMed Central CAS Google Scholar
Gel, B. & Serra, E. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics 33, 3088–3090 (2017).
Article PubMed PubMed Central CAS Google Scholar
Hu, L. et al. The chromosome-scale reference genome of black pepper provides insight into piperine biosynthesis. Nat. Commun. 10, 4702 (2019).
Article PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).
Article PubMed CAS Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article PubMed PubMed Central CAS Google Scholar
Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
Article PubMed PubMed Central Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w¹¹¹⁸; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Giordano, F., Stammnitz, M. R., Murchison, E. P. & Ning, Z. scanPAV: a pipeline for extracting presence–absence variations in genome pairs. Bioinformatics 34, 3022–3024 (2018).
Article PubMed PubMed Central CAS Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article PubMed PubMed Central CAS Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988–995 (2004).
Article PubMed PubMed Central CAS Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article PubMed PubMed Central CAS Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Article PubMed PubMed Central CAS Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article PubMed PubMed Central CAS Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Article PubMed PubMed Central Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article PubMed PubMed Central CAS Google Scholar
Bu, D. et al. KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis. Nucleic Acids Res. 49, W317–W325 (2021).
Article PubMed PubMed Central CAS Google Scholar
Grandi, F. C., Modi, H., Kampman, L. & Corces, M. R. Chromatin accessibility profiling by ATAC–seq. Nat. Protoc. 17, 1518–1552 (2022).
Article PubMed PubMed Central CAS Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).
Article PubMed PubMed Central CAS Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP–seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central Google Scholar
Li, Q., Brown, J. B., Huang, H. & Bickel, P. J. Measuring reproducibility of high-throughput experiments. Preprint at https://arxiv.org/abs/1110.4705 (2011).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Article PubMed PubMed Central CAS Google Scholar
Machlab, D. et al. monaLisa: an R/Bioconductor package for identifying regulatory motifs. Bioinformatics 38, 2624–2625 (2022).
Article PubMed PubMed Central Google Scholar
Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2021).
Article PubMed Central Google Scholar
Tan, G. & Lenhard, B. TFBSTools: an R/bioconductor package for transcription factor binding site analysis. Bioinformatics 32, 1555–1556 (2016).
Article PubMed PubMed Central CAS Google Scholar
Concordet, J.-P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).
Article PubMed PubMed Central CAS Google Scholar
Xu, Z. Scripts used in ‘Genome assembly of two allotetraploid cotton germplasms reveals mechanisms of somatic embryogenesis and enables precise genome editing’. Zenodo https://doi.org/10.5281/zenodo.15035095 (2025).

Download references

Acknowledgements

The study was supported by grants from the National Science Fund for Distinguished Young Scholars (32325039 to S.J.) and the Young Scientists Fund under the National Natural Science Foundation of China (32201856 to Z.X.). The computations in this paper were run on the bioinformatics computing platform of the National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University.

Author information

These authors contributed equally: Zhongping Xu, Guanying Wang.

Authors and Affiliations

National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China
Zhongping Xu, Guanying Wang, Xiangqian Zhu, Ruipeng Wang, Longfu Zhu, Lili Tu, Maojun Wang, Xianlong Zhang & Shuangxia Jin
Research Base, Anyang Institute of Technology, State Key Laboratory of Cotton Biology, Anyang, China
Yuling Liu & Renhai Peng
Department of Biosciences, Durham University, Durham, UK
Keith Lindsey

Authors

Zhongping Xu
View author publications
Search author on:PubMed Google Scholar
Guanying Wang
View author publications
Search author on:PubMed Google Scholar
Xiangqian Zhu
View author publications
Search author on:PubMed Google Scholar
Ruipeng Wang
View author publications
Search author on:PubMed Google Scholar
Longfu Zhu
View author publications
Search author on:PubMed Google Scholar
Lili Tu
View author publications
Search author on:PubMed Google Scholar
Yuling Liu
View author publications
Search author on:PubMed Google Scholar
Renhai Peng
View author publications
Search author on:PubMed Google Scholar
Keith Lindsey
View author publications
Search author on:PubMed Google Scholar
Maojun Wang
View author publications
Search author on:PubMed Google Scholar
Xianlong Zhang
View author publications
Search author on:PubMed Google Scholar
Shuangxia Jin
View author publications
Search author on:PubMed Google Scholar

Contributions

S.J. and X. Zhang designed and supervised the research. Z.X. and G.W. collected materials for genome and transcriptome sequencing. Z.X. and G.W. collected materials for ATAC–seq. Z.X. and R.W. performed bioinformatics analysis. Y.L. and R.P. performed the FISH experiment. G.W. and X.Z. performed the gene editing experiments. M.W., L.T. and L.Z. contributed to the project discussion. Z.X. and G.W. wrote the manuscript with input from all other authors. S.J., K.L., X. Zhang and M.W. edited the manuscript. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Maojun Wang, Xianlong Zhang or Shuangxia Jin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Amanda Hulse-Kemp, Jinsheng Lai and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Genomic characterization of Jin668 and YZ1.

The k-mer (31-mer) frequency distribution was estimated by GenomeScope2.0 analysis of Jin668 (a) and YZ1 (b) genome. c, The distribution of telomeres, centromeres, and gaps regions in the genomes of Jin668 and YZ1.

Source data

Extended Data Fig. 2 Completeness and accuracy validation of chromosome arms and centromeres in the Jin668 genome.

a, Assessment of the Jin668 genome assembly using LTR Assembly Index (LAI). The x-axis lists chromosomes and y-axis represents LAI values. The red dashed line represents the mean value. b,c, Hi-C map of the Jin668 genome showing genome- and chromosome-wide all-by-all interactions. d, Bionano de novo assembly contigs were mapped to the Jin668 reference assembly. e, The distribution of QV values calculated from Merqury and Merfin on different chromosomes. f, The distribution of different types of unique k-mers along the centromeric regions in chromosomes of Jin668. Each bar shows the number of different types of k-mers in a bin of length 20 Kb. The blue bars represent single-clump k-mers, which suggest a good base-level quality. While the orange (multiple-clumps) and green (no-clumps) bars suggest a low base-level quality in the region. The x-axis unit is megabases (Mb), and the y-axis represents values multiplied by 10³. g, NucFreq plot of centromeric regions in chromosomes of Jin668. HiFi coverage depth (black) along with secondary allele frequency (red) for all centromeres and surrounding regions. The x-axis represents chromosome positions in megabases (Mb), and the y-axis shows HiFi depth, ranging from 0 to 100.

Source data

Extended Data Fig. 3 Segmental duplication (SD) content of the Jin668 genome.

a,b, The intrachromosome SDs with a length greater or less than 5 kb. c, Comparison of SD length and identity in different regions of the genome. The identity (top) and length (bottom) of SDs across commonly delineated regions of the genome (colors).

Source data

Extended Data Fig. 4 The percentage of AT/GC base composition, TEs, 5mC DNA methylation, histone modification and 3D genome architecture within the centromeres.

Quantification of genomic features plotted along chromosome arms that were proportionally scaled between telomeres (TEL) and centromere midpoints (CEN).

Source data

Extended Data Fig. 5 A comparative genome analysis of the Jin668 versus TM-1, YZ1, and ZM24.

a, Genome-wide syntenic relationships among At and Dt subgenomes in four Upland cotton accessions relative to the A-genome-like Ga (A₂ genome) and D-genome-like Gr (D₅ genome). b, The distribution of variation density between Jin668 versus TM-1, YZ1, and ZM24 genomes. c, The distribution of SE-related genes and gene density, as well as genomic variations from Fig.1, on the chromosomes of the Jin668 genome. The circos plot from outside to inside shows: chromosome, SE-related genes, gene density in Jin668, SNPs, InDels, PAVs, inversions and translocation density between Jin668 with TM-1. d, The correlation between SNPs, InDels, PAVs, inversions and translocations with SE-related genes. The significance was tested using the overlapPermTest function in regioneR (https://www.bioconductor.org/packages/devel/bioc/html/regioneR.html), which calls permTest with the appropriate parameters to perform the permutation test. SE, somatic embryogenesis.

Source data

Extended Data Fig. 6 Sample preparation, sequencing, and quality evaluation of ATAC library.

a, Schematic outline of genome-wide ATAC–seq and RNA-seq assays and time points of sample collection with two independent biological replicates, as well as the roadmap of an integration analysis. HAI, hours after inoculation. b, The principal component plots of ATAC–seq data sequenced from Jin668, YZ1, TM-1 and ZB1092, respectively. Color code is shown. Each dot represents one sample. c, The genome-wide distribution of ATAC–seq peaks for Jin668, YZ1, TM-1 and ZB1092. Window size: TSS ± 3.0 Kb. d, The distribution of SPOT values for all samples from Jin668, YZ1, TM-1 and ZB1092, respectively. e, The distribution of FRiP values for all samples from Jin668, YZ1, TM-1 and ZB1092, respectively. f, ATAC–seq profiling in Jin668, YZ1, TM-1 and ZB1092. Left, the number statistics of chromatin accessibility peaks on each chromosome. Middle, the bar chart shows the total number of peaks detected at the corresponding induction time point. Right, the pie chart illustrates the percentage distribution of peaks across various genomic regions.

Source data

Extended Data Fig. 7 Identification of SE-related genes.

Identification of SE-related genes based on genome information, chromatin accessibility and gene expression change trend at different time points of SE stage, and other external data (published literature and data generated in our laboratory earlier, etc.).

Extended Data Fig. 8 AGL promotes cotton regeneration.

a, Transcription and epigenetic tracks for AGL. Transcription data shown as mean ± s.d. of 3 biological replicates. b, The CPR of CRISPR edited, overexpression and control lines at 60 days post-induction. Data are represented as mean ± s.d. Differences between groups were evaluated by two-sided Student’s t-test, *p < 0.05, **p < 0.01, ***p < 0.001. n = 3 independent biological replicates. The P7N was used as control. c, The paraffin sections of CRISPR edited, overexpression and control lines for Jin668 and TM-1 at different days post-induction. Scale bars represent 100 μm.

Source data

Extended Data Fig. 9 Genetic transformation experiments demonstrate that the Jin668 genome enhances the accuracy of sgRNA design for CRISPR gene editing.

a, The sequence homology of a putative sgRNA (present in Jin668 but absent in TM-1) between Jin668 and TM-1. b,c, The mutational landscape of the genomic regions corresponding to targets sgRNA within the callus of Jin668 (b) and TM-1 (c) after edited using CRISPR/Cas9. The percentage signifies the ratio of reads corresponding to this mutation type to the total read count.

Extended Data Fig. 10 Gene expression dynamics across 15 tissues in Jin668 and YZ1.

a, The profile of genes expressed in all 15 tissues of Jin668 (left) and YZ1 (right). b, Statistics on the number of genes co-expressed in multiple tissues of Jin668 (left) and YZ1 (right). c, Statistics on the number of genes specifically expressed in each tissue of Jin668 (left) and YZ1 (right). Each tissue is represented by the number corresponding to that shown in the graphic illustration in a.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–31, Supplementary Notes 1–13 and Supplementary Methods.

Reporting Summary

Supplementary Tables 1–35

Supplementary Tables 1–35.

Source data

Source Data Fig. 3

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 10

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, Z., Wang, G., Zhu, X. et al. Genome assembly of two allotetraploid cotton germplasms reveals mechanisms of somatic embryogenesis and enables precise genome editing. Nat Genet 57, 2028–2039 (2025). https://doi.org/10.1038/s41588-025-02258-3

Download citation

Received: 21 August 2023
Accepted: 05 June 2025
Published: 22 July 2025
Issue date: August 2025
DOI: https://doi.org/10.1038/s41588-025-02258-3

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links