Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genome assembly of two allotetraploid cotton germplasms reveals mechanisms of somatic embryogenesis and enables precise genome editing

Abstract

Somatic embryogenesis is crucial for plant genetic engineering, yet the underlying mechanisms in cotton remain poorly understood. Here we present a telomere-to-telomere assembly of Jin668 and a high-quality assembly of YZ1, two highly regenerative allotetraploid cotton germplasms. The completion of the Jin668 genome enables characterization of ~30.1 Mb of centromeric regions invaded by centromeric retrotransposon of maize and Tekay retrotransposons, an ~8.1 Mb 5S rDNA array containing 25,190 copies and a ~75.1 Mb major 45S rDNA array with 8,131 copies. Comparative analyses of regenerative and recalcitrant genotypes reveal dynamic transcriptional patterns and chromatin accessibility during the initial regeneration process. A hierarchical gene regulatory network identifies AGL15 as a contributor to regeneration. Additionally, we demonstrate that genetic variation affects sgRNA target sites, while the Jin668 genome assembly reduces the risk of off-target effects in CRISPR-based genome editing. Together, the complete Jin668 genome reveals the complexity of genomic regions and cotton regeneration, and improves the precision of genome editing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A comprehensive overview of the genetic transformation system of cotton and the complete genome features of Jin668 and YZ1.
Fig. 2: Completeness and accuracy validation of both chromosome arms and centromeres in Jin668 genome.
Fig. 3: Centromeric characteristics of Jin668.
Fig. 4: In-depth comparative analysis of the Jin668 versus TM-1, YZ1 and ZM24.
Fig. 5: Overview of chromatin accessibility and transcriptome dynamics in Jin668 and TM-1 during SE.
Fig. 6: Functional verification of regeneration-related genes.
Fig. 7: Cotton genetic variation substantially impacts the efficacy of CRISPR-based genome editing.

Similar content being viewed by others

Data availability

The T2T-Jin668 and YZ1 genome assemblies and annotation data are available at NCBI (PRJNA874817 and PRJNA960814) and T2TCotton-Hub (http://jinlab.hzau.edu.cn/T2TCottonHub/ or http://cotton.hzau.edu.cn/T2TCottonHub/). The raw sequencing data used for de novo whole-genome assembly of Jin668 and YZ1 are available in NCBI under BioProjects PRJNA874817 and PRJNA960814, respectively. RNA-seq of Jin668 and YZ1 are available in NCBI under BioProjects PRJNA874819 and PRJNA960820, respectively. ATAC–seq data of Jin668 is available in NCBI under BioProjects PRJNA960832. ATAC–seq and RNA-seq data for TM-1 during SE are available in NCBI under BioProjects PRJNA960828 and PRJNA960825, respectively. ATAC–seq and RNA-seq data for YZ1 during SE are available in NCBI under BioProjects PRJNA1059614 and PRJNA1059613, respectively. ATAC–seq and RNA-seq data for ZB1092 during SE are available in NCBI under BioProjects PRJNA1059611 and PRJNA1059609, respectively. The ChIP–seq data for Jin668 and YZ1 are uploaded to BioProjects PRJNA1079680 and PRJNA1079682, respectively. Seeds of Jin668 and YZ1 used in this study can be obtained from the corresponding author upon request. The reference genome assembly and annotation files of TM-1 (v3.1) used in this study were downloaded from https://phytozome-next.jgi.doe.gov/ and are also accessible from T2TCottonHub. Additionally, all available SE-related motifs were obtained from Plant Transcription Factor Database (v5.0; https://planttfdb.gao-lab.org/). Further details on data accessibility are outlined in the Supplementary Methods and Methods. Source data are provided with this paper.

Code availability

All original codes used in this article are available via Zenodo at https://doi.org/10.5281/zenodo.15035095 (ref. 96) and GitHub (https://github.com/tiramisutes/T2T-Cotton-Genomes).

References

  1. Bhatia, S., Sharma, K., Dahiya, R. & Bera, T. (eds). Modern Applications of Plant Biotechnology in Pharmaceutical Sciences, pp. 209–230 (Academic Press, 2015).

  2. Zheng, Q. & Perry, S. E. Alterations in the transcriptome of Soybean in response to enhanced somatic embryogenesis promoted by orthologs of AGAMOUS-like15 and AGAMOUS-like18. Plant Physiol. 164, 1365–1377 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Horstman, A., Bemer, M. & Boutilier, K. A transcriptional view on somatic embryogenesis. Regeneration (Oxf.) 4, 201–216 (2017).

    Article  PubMed  Google Scholar 

  4. Wang, K. et al. The gene TaWOX5 overcomes genotype dependency in wheat genetic transformation. Nat. Plants 8, 110–117 (2022).

    Article  PubMed  Google Scholar 

  5. Chen, Z., Debernardi, J. M., Dubcovsky, J. & Gallavotti, A. Recent advances in crop transformation technologies. Nat. Plants 8, 1343–1351 (2022).

    Article  PubMed  CAS  Google Scholar 

  6. Li, J. et al. Multi-omics analyses reveal epigenomics basis for cotton somatic embryogenesis through successive regeneration acclimation process. Plant Biotechnol. J. 17, 435–450 (2019).

    Article  PubMed  CAS  Google Scholar 

  7. Iwase, A. et al. WIND1-based acquisition of regeneration competency in Arabidopsis and rapeseed. J. Plant Res. 128, 389–397 (2015).

    Article  PubMed  CAS  Google Scholar 

  8. Lowe, K. et al. Morphogenic regulators Baby boom and Wuschel improve monocot transformation. Plant Cell 28, 1998 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Debernardi, J. M. et al. A GRF–GIF chimeric protein improves the regeneration efficiency of transgenic plants. Nat. Biotechnol. 38, 1274–1279 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. Wang, M. et al. Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. Nat. Genet. 51, 224–229 (2019).

    Article  PubMed  Google Scholar 

  11. Yang, Z. et al. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat. Commun. 10, 2989 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Conover, J. L. & Wendel, J. F. Deleterious mutations accumulate faster in allopolyploid than diploid cotton (Gossypium) and unequally between subgenomes. Mol. Biol. Evol. 39, msac024 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Sun, C. et al. Precise integration of large DNA sequences in plant genomes using PrimeRoot editors. Nat. Biotechnol. 42, 316–327 (2024).

    Article  PubMed  CAS  Google Scholar 

  14. Chen, K., Wang, Y., Zhang, R., Zhang, H. & Gao, C. CRISPR/Cas genome editing and precision plant breeding in agriculture. Annu. Rev. Plant Biol. 70, 667–697 (2019).

    Article  PubMed  CAS  Google Scholar 

  15. Wang, G. et al. Precise fine-turning of GhTFL1 by base editing tools defines ideal cotton plant architecture. Genome Biol. 25, 59 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Jiang, T., Zhang, X.-O., Weng, Z. & Xue, W. Deletion and replacement of long genomic sequences using prime editing. Nat. Biotechnol. 40, 227–234 (2022).

    Article  PubMed  CAS  Google Scholar 

  17. Fernie, A. R. & Yan, J. De novo domestication: an alternative route toward new crops for the future. Mol. Plant 12, 615–631 (2019).

    Article  PubMed  CAS  Google Scholar 

  18. Shoemaker, R., Couche, L. & Galbraith, D. Characterization of somatic embryogenesis and plant regeneration in cotton (Gossypium hirsutum L.). Plant Cell Rep. 5, 178–181 (1986).

    Article  PubMed  CAS  Google Scholar 

  19. Jin, S. et al. Identification of a novel elite genotype for in vitro culture and genetic transformation of cotton. Biol. Plant. 50, 519–524 (2006).

    Article  CAS  Google Scholar 

  20. Wang, L. et al. The GhmiR157aGhSPL10 regulatory module controls initial cellular dedifferentiation and callus proliferation in cotton by modulating ethylene-mediated flavonoid biosynthesis. J. Exp. Bot. 69, 1081–1093 (2017).

  21. Xu, J. GhL1L1 affects cell fate specification by regulating GhPIN1-mediated auxin distribution. Plant Biotechnol. J. 17, 63–74 (2019).

    Article  PubMed  CAS  Google Scholar 

  22. Deng, J. et al. GhTCE1–GhTCEE1 dimers regulate transcriptional reprogramming during wound-induced callus formation in cotton. Plant Cell 34, 4554–4568 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Yuan, J. et al. GhRCD1 regulates cotton somatic embryogenesis by modulating the GhMYC3–GhMYB44–GhLBD18 transcriptional cascade. New Phytol. 240, 207–223 (2023).

    Article  PubMed  CAS  Google Scholar 

  24. Guo, H. et al. Somatic embryogenesis critical initiation stage-specific mCHH hypomethylation reveals epigenetic basis underlying embryogenic redifferentiation in cotton. Plant Biotechnol. J. 18, 1648–1650 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Ge, X. et al. Efficient genotype-independent cotton genetic transformation and genome editing. J. Integr. Plant Biol. 65, 907–917 (2023).

    Article  PubMed  CAS  Google Scholar 

  26. Liu, Y. et al. Cloning and preliminary verification of telomere-associated sequences in upland cotton. Comp. Cytogenet. 14, 183–195 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Wang, P. & Wang, F. A proposed metric set for evaluation of genome assembly quality. Trends Genet. 39, 175–186 (2023).

    Article  PubMed  CAS  Google Scholar 

  28. Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Vollger, M. R. et al. Long-read sequence and assembly of segmental duplications. Nat. Methods 16, 88–94 (2019).

    Article  PubMed  CAS  Google Scholar 

  30. Luo, S. et al. The cotton centromere contains a Ty3-Gypsy-like LTR retroelement. PLoS ONE 7, e35261 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Gorinšek, B., Gubenšek, F. & Kordiš, D. A. Evolutionary genomics of chromoviruses in eukaryotes. Mol. Biol. Evol. 21, 781–798 (2004).

    Article  PubMed  Google Scholar 

  32. Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Schmitz, R. J., Grotewold, E. & Stam, M. Cis-regulatory sequences in plants: their importance, discovery, and future challenges. Plant Cell 34, 718–741 (2021).

    Article  PubMed Central  Google Scholar 

  34. Zhu, X. et al. Single-cell resolution analysis reveals the preparation for reprogramming the fate of stem cell niche in cotton lateral meristem. Genome Biol. 24, 194 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Braybrook, S. A. & Harada, J. J. LECs go crazy in embryo development. Trends Plant Sci. 13, 624–630 (2008).

    Article  PubMed  CAS  Google Scholar 

  36. Ji, J. et al. WOX4 promotes procambial development. Plant Physiol. 152, 1346–1356 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Wang, F. et al. Chromatin accessibility dynamics and a hierarchical transcriptional regulatory network structure for plant somatic embryogenesis. Dev. Cell 54, 742–757 (2020).

    Article  PubMed  CAS  Google Scholar 

  38. Izhaki, A. & Bowman, J. L. KANADI and Class III HD-Zip gene families regulate embryo patterning and modulate auxin flow during embryogenesis in Arabidopsis. Plant Cell 19, 495–508 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. Wang, G. et al. Development of an efficient and precise adenine base editor (ABE) with expanded target range in allotetraploid cotton (Gossypium hirsutum). BMC Biol. 20, 45 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Li, C. et al. Targeted, random mutagenesis of plant genes with dual cytosine and adenine base editors. Nat. Biotechnol. 38, 875–882 (2020).

    Article  PubMed  CAS  Google Scholar 

  41. Xue, C. et al. Tuning plant phenotypes by precise, graded downregulation of gene expression. Nat. Biotechnol. 41, 1758–1764 (2023).

    Article  PubMed  CAS  Google Scholar 

  42. Xu, M., Du, Q., Tian, C., Wang, Y. & Jiao, Y. Stochastic gene expression drives mesophyll protoplast regeneration. Sci. Adv. 7, eabg8466 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Zhang, L. et al. A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour. Nat. Commun. 10, 1494 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Qin, L. et al. High-efficient and precise base editing of C·G to T·A in the allotetraploid cotton (Gossypium hirsutum) genome using a modified CRISPR/Cas9 system. Plant Biotechnol. J. 18, 45–56 (2020).

    Article  PubMed  CAS  Google Scholar 

  45. Jin, S. et al. Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292–295 (2019).

    Article  PubMed  CAS  Google Scholar 

  46. Hirano, H. et al. Structure and engineering of Francisella novicida Cas9. Cell 164, 950–961 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Huang, G. et al. Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution. Nat. Genet. 52, 516–524 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Chen, Z. J. et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 52, 525–533 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Sreedasyam, A. et al. Genome resources for three modern cotton lines guide future breeding efforts. Nat. Plants 10, 1039–1051 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Han, J. et al. Rapid proliferation and nucleolar organizer targeting centromeric retrotransposons in cotton. Plant J. 88, 992–1005 (2016).

    Article  PubMed  CAS  Google Scholar 

  51. Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford nanopore sequencing. Genome Biol. 20, 129 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol. 25, 107 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2019).

    Article  Google Scholar 

  54. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Kirov, I., Gilyok, M., Knyazev, A. & Fesenko, I. Pilot satellitome analysis of the model plant, Physcomitrella patens, revealed a transcribed and high-copy IGS related tandem repeat. Comp. Cytogenet. 12, 493–513 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Stovner, E. B. & Sætrom, P. epic2 efficiently finds diffuse domains in ChIP–seq data. Bioinformatics 35, 4392–4393 (2019).

    Article  PubMed  CAS  Google Scholar 

  57. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Liu, J. et al. Gapless assembly of maize chromosomes using long-read technologies. Genome Biol. 21, 121 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Quinlan, A. R. BEDTools: the Swiss‐army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11.12. 1–11.12. 34 (2014).

    PubMed  Google Scholar 

  61. Mikheenko, A., Bzikadze, A. V., Gurevich, A., Miga, K. H. & Pevzner, P. A. TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 36, i75–i83 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  62. Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275–292 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  63. Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).

    Article  PubMed  Google Scholar 

  64. Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. Peng, R. et al. Evolutionary divergence of duplicated genomes in newly described allotetraploid cottons. Proc. Natl Acad. Sci. USA 119, e2208496119 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. Gel, B. & Serra, E. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics 33, 3088–3090 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Hu, L. et al. The chromosome-scale reference genome of black pepper provides insight into piperine biosynthesis. Nat. Commun. 10, 4702 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).

  69. Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).

    Article  PubMed  CAS  Google Scholar 

  71. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  72. Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).

  74. Giordano, F., Stammnitz, M. R., Murchison, E. P. & Ning, Z. scanPAV: a pipeline for extracting presence–absence variations in genome pairs. Bioinformatics 34, 3022–3024 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  75. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  76. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  77. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  78. Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988–995 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  79. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  80. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  81. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  82. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  84. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  85. Bu, D. et al. KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis. Nucleic Acids Res. 49, W317–W325 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  86. Grandi, F. C., Modi, H., Kampman, L. & Corces, M. R. Chromatin accessibility profiling by ATAC–seq. Nat. Protoc. 17, 1518–1552 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  87. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  88. Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  89. Zhang, Y. et al. Model-based analysis of ChIP–seq (MACS). Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Li, Q., Brown, J. B., Huang, H. & Bickel, P. J. Measuring reproducibility of high-throughput experiments. Preprint at https://arxiv.org/abs/1110.4705 (2011).

  91. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  92. Machlab, D. et al. monaLisa: an R/Bioconductor package for identifying regulatory motifs. Bioinformatics 38, 2624–2625 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  93. Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2021).

    Article  PubMed Central  Google Scholar 

  94. Tan, G. & Lenhard, B. TFBSTools: an R/bioconductor package for transcription factor binding site analysis. Bioinformatics 32, 1555–1556 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  95. Concordet, J.-P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  96. Xu, Z. Scripts used in ‘Genome assembly of two allotetraploid cotton germplasms reveals mechanisms of somatic embryogenesis and enables precise genome editing’. Zenodo https://doi.org/10.5281/zenodo.15035095 (2025).

Download references

Acknowledgements

The study was supported by grants from the National Science Fund for Distinguished Young Scholars (32325039 to S.J.) and the Young Scientists Fund under the National Natural Science Foundation of China (32201856 to Z.X.). The computations in this paper were run on the bioinformatics computing platform of the National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University.

Author information

Authors and Affiliations

Authors

Contributions

S.J. and X. Zhang designed and supervised the research. Z.X. and G.W. collected materials for genome and transcriptome sequencing. Z.X. and G.W. collected materials for ATAC–seq. Z.X. and R.W. performed bioinformatics analysis. Y.L. and R.P. performed the FISH experiment. G.W. and X.Z. performed the gene editing experiments. M.W., L.T. and L.Z. contributed to the project discussion. Z.X. and G.W. wrote the manuscript with input from all other authors. S.J., K.L., X. Zhang and M.W. edited the manuscript. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Maojun Wang, Xianlong Zhang or Shuangxia Jin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Amanda Hulse-Kemp, Jinsheng Lai and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Genomic characterization of Jin668 and YZ1.

The k-mer (31-mer) frequency distribution was estimated by GenomeScope2.0 analysis of Jin668 (a) and YZ1 (b) genome. c, The distribution of telomeres, centromeres, and gaps regions in the genomes of Jin668 and YZ1.

Source data

Extended Data Fig. 2 Completeness and accuracy validation of chromosome arms and centromeres in the Jin668 genome.

a, Assessment of the Jin668 genome assembly using LTR Assembly Index (LAI). The x-axis lists chromosomes and y-axis represents LAI values. The red dashed line represents the mean value. b,c, Hi-C map of the Jin668 genome showing genome- and chromosome-wide all-by-all interactions. d, Bionano de novo assembly contigs were mapped to the Jin668 reference assembly. e, The distribution of QV values calculated from Merqury and Merfin on different chromosomes. f, The distribution of different types of unique k-mers along the centromeric regions in chromosomes of Jin668. Each bar shows the number of different types of k-mers in a bin of length 20 Kb. The blue bars represent single-clump k-mers, which suggest a good base-level quality. While the orange (multiple-clumps) and green (no-clumps) bars suggest a low base-level quality in the region. The x-axis unit is megabases (Mb), and the y-axis represents values multiplied by 10³. g, NucFreq plot of centromeric regions in chromosomes of Jin668. HiFi coverage depth (black) along with secondary allele frequency (red) for all centromeres and surrounding regions. The x-axis represents chromosome positions in megabases (Mb), and the y-axis shows HiFi depth, ranging from 0 to 100.

Source data

Extended Data Fig. 3 Segmental duplication (SD) content of the Jin668 genome.

a,b, The intrachromosome SDs with a length greater or less than 5 kb. c, Comparison of SD length and identity in different regions of the genome. The identity (top) and length (bottom) of SDs across commonly delineated regions of the genome (colors).

Source data

Extended Data Fig. 4 The percentage of AT/GC base composition, TEs, 5mC DNA methylation, histone modification and 3D genome architecture within the centromeres.

Quantification of genomic features plotted along chromosome arms that were proportionally scaled between telomeres (TEL) and centromere midpoints (CEN).

Source data

Extended Data Fig. 5 A comparative genome analysis of the Jin668 versus TM-1, YZ1, and ZM24.

a, Genome-wide syntenic relationships among At and Dt subgenomes in four Upland cotton accessions relative to the A-genome-like Ga (A2 genome) and D-genome-like Gr (D5 genome). b, The distribution of variation density between Jin668 versus TM-1, YZ1, and ZM24 genomes. c, The distribution of SE-related genes and gene density, as well as genomic variations from Fig.1, on the chromosomes of the Jin668 genome. The circos plot from outside to inside shows: chromosome, SE-related genes, gene density in Jin668, SNPs, InDels, PAVs, inversions and translocation density between Jin668 with TM-1. d, The correlation between SNPs, InDels, PAVs, inversions and translocations with SE-related genes. The significance was tested using the overlapPermTest function in regioneR (https://www.bioconductor.org/packages/devel/bioc/html/regioneR.html), which calls permTest with the appropriate parameters to perform the permutation test. SE, somatic embryogenesis.

Source data

Extended Data Fig. 6 Sample preparation, sequencing, and quality evaluation of ATAC library.

a, Schematic outline of genome-wide ATAC–seq and RNA-seq assays and time points of sample collection with two independent biological replicates, as well as the roadmap of an integration analysis. HAI, hours after inoculation. b, The principal component plots of ATAC–seq data sequenced from Jin668, YZ1, TM-1 and ZB1092, respectively. Color code is shown. Each dot represents one sample. c, The genome-wide distribution of ATAC–seq peaks for Jin668, YZ1, TM-1 and ZB1092. Window size: TSS ± 3.0 Kb. d, The distribution of SPOT values for all samples from Jin668, YZ1, TM-1 and ZB1092, respectively. e, The distribution of FRiP values for all samples from Jin668, YZ1, TM-1 and ZB1092, respectively. f, ATAC–seq profiling in Jin668, YZ1, TM-1 and ZB1092. Left, the number statistics of chromatin accessibility peaks on each chromosome. Middle, the bar chart shows the total number of peaks detected at the corresponding induction time point. Right, the pie chart illustrates the percentage distribution of peaks across various genomic regions.

Source data

Extended Data Fig. 7 Identification of SE-related genes.

Identification of SE-related genes based on genome information, chromatin accessibility and gene expression change trend at different time points of SE stage, and other external data (published literature and data generated in our laboratory earlier, etc.).

Extended Data Fig. 8 AGL promotes cotton regeneration.

a, Transcription and epigenetic tracks for AGL. Transcription data shown as mean ± s.d. of 3 biological replicates. b, The CPR of CRISPR edited, overexpression and control lines at 60 days post-induction. Data are represented as mean ± s.d. Differences between groups were evaluated by two-sided Student’s t-test, *p < 0.05, **p < 0.01, ***p < 0.001. n = 3 independent biological replicates. The P7N was used as control. c, The paraffin sections of CRISPR edited, overexpression and control lines for Jin668 and TM-1 at different days post-induction. Scale bars represent 100 μm.

Source data

Extended Data Fig. 9 Genetic transformation experiments demonstrate that the Jin668 genome enhances the accuracy of sgRNA design for CRISPR gene editing.

a, The sequence homology of a putative sgRNA (present in Jin668 but absent in TM-1) between Jin668 and TM-1. b,c, The mutational landscape of the genomic regions corresponding to targets sgRNA within the callus of Jin668 (b) and TM-1 (c) after edited using CRISPR/Cas9. The percentage signifies the ratio of reads corresponding to this mutation type to the total read count.

Extended Data Fig. 10 Gene expression dynamics across 15 tissues in Jin668 and YZ1.

a, The profile of genes expressed in all 15 tissues of Jin668 (left) and YZ1 (right). b, Statistics on the number of genes co-expressed in multiple tissues of Jin668 (left) and YZ1 (right). c, Statistics on the number of genes specifically expressed in each tissue of Jin668 (left) and YZ1 (right). Each tissue is represented by the number corresponding to that shown in the graphic illustration in a.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–31, Supplementary Notes 1–13 and Supplementary Methods.

Reporting Summary

Supplementary Tables 1–35

Supplementary Tables 1–35.

Source data

Source Data Fig. 3

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 10

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, Z., Wang, G., Zhu, X. et al. Genome assembly of two allotetraploid cotton germplasms reveals mechanisms of somatic embryogenesis and enables precise genome editing. Nat Genet 57, 2028–2039 (2025). https://doi.org/10.1038/s41588-025-02258-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41588-025-02258-3

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing