Abstract
Somatic embryogenesis is crucial for plant genetic engineering, yet the underlying mechanisms in cotton remain poorly understood. Here we present a telomere-to-telomere assembly of Jin668 and a high-quality assembly of YZ1, two highly regenerative allotetraploid cotton germplasms. The completion of the Jin668 genome enables characterization of ~30.1 Mb of centromeric regions invaded by centromeric retrotransposon of maize and Tekay retrotransposons, an ~8.1 Mb 5S rDNA array containing 25,190 copies and a ~75.1 Mb major 45S rDNA array with 8,131 copies. Comparative analyses of regenerative and recalcitrant genotypes reveal dynamic transcriptional patterns and chromatin accessibility during the initial regeneration process. A hierarchical gene regulatory network identifies AGL15 as a contributor to regeneration. Additionally, we demonstrate that genetic variation affects sgRNA target sites, while the Jin668 genome assembly reduces the risk of off-target effects in CRISPR-based genome editing. Together, the complete Jin668 genome reveals the complexity of genomic regions and cotton regeneration, and improves the precision of genome editing.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout







Similar content being viewed by others
Data availability
The T2T-Jin668 and YZ1 genome assemblies and annotation data are available at NCBI (PRJNA874817 and PRJNA960814) and T2TCotton-Hub (http://jinlab.hzau.edu.cn/T2TCottonHub/ or http://cotton.hzau.edu.cn/T2TCottonHub/). The raw sequencing data used for de novo whole-genome assembly of Jin668 and YZ1 are available in NCBI under BioProjects PRJNA874817 and PRJNA960814, respectively. RNA-seq of Jin668 and YZ1 are available in NCBI under BioProjects PRJNA874819 and PRJNA960820, respectively. ATAC–seq data of Jin668 is available in NCBI under BioProjects PRJNA960832. ATAC–seq and RNA-seq data for TM-1 during SE are available in NCBI under BioProjects PRJNA960828 and PRJNA960825, respectively. ATAC–seq and RNA-seq data for YZ1 during SE are available in NCBI under BioProjects PRJNA1059614 and PRJNA1059613, respectively. ATAC–seq and RNA-seq data for ZB1092 during SE are available in NCBI under BioProjects PRJNA1059611 and PRJNA1059609, respectively. The ChIP–seq data for Jin668 and YZ1 are uploaded to BioProjects PRJNA1079680 and PRJNA1079682, respectively. Seeds of Jin668 and YZ1 used in this study can be obtained from the corresponding author upon request. The reference genome assembly and annotation files of TM-1 (v3.1) used in this study were downloaded from https://phytozome-next.jgi.doe.gov/ and are also accessible from T2TCottonHub. Additionally, all available SE-related motifs were obtained from Plant Transcription Factor Database (v5.0; https://planttfdb.gao-lab.org/). Further details on data accessibility are outlined in the Supplementary Methods and Methods. Source data are provided with this paper.
Code availability
All original codes used in this article are available via Zenodo at https://doi.org/10.5281/zenodo.15035095 (ref. 96) and GitHub (https://github.com/tiramisutes/T2T-Cotton-Genomes).
References
Bhatia, S., Sharma, K., Dahiya, R. & Bera, T. (eds). Modern Applications of Plant Biotechnology in Pharmaceutical Sciences, pp. 209–230 (Academic Press, 2015).
Zheng, Q. & Perry, S. E. Alterations in the transcriptome of Soybean in response to enhanced somatic embryogenesis promoted by orthologs of AGAMOUS-like15 and AGAMOUS-like18. Plant Physiol. 164, 1365–1377 (2014).
Horstman, A., Bemer, M. & Boutilier, K. A transcriptional view on somatic embryogenesis. Regeneration (Oxf.) 4, 201–216 (2017).
Wang, K. et al. The gene TaWOX5 overcomes genotype dependency in wheat genetic transformation. Nat. Plants 8, 110–117 (2022).
Chen, Z., Debernardi, J. M., Dubcovsky, J. & Gallavotti, A. Recent advances in crop transformation technologies. Nat. Plants 8, 1343–1351 (2022).
Li, J. et al. Multi-omics analyses reveal epigenomics basis for cotton somatic embryogenesis through successive regeneration acclimation process. Plant Biotechnol. J. 17, 435–450 (2019).
Iwase, A. et al. WIND1-based acquisition of regeneration competency in Arabidopsis and rapeseed. J. Plant Res. 128, 389–397 (2015).
Lowe, K. et al. Morphogenic regulators Baby boom and Wuschel improve monocot transformation. Plant Cell 28, 1998 (2016).
Debernardi, J. M. et al. A GRF–GIF chimeric protein improves the regeneration efficiency of transgenic plants. Nat. Biotechnol. 38, 1274–1279 (2020).
Wang, M. et al. Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. Nat. Genet. 51, 224–229 (2019).
Yang, Z. et al. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat. Commun. 10, 2989 (2019).
Conover, J. L. & Wendel, J. F. Deleterious mutations accumulate faster in allopolyploid than diploid cotton (Gossypium) and unequally between subgenomes. Mol. Biol. Evol. 39, msac024 (2022).
Sun, C. et al. Precise integration of large DNA sequences in plant genomes using PrimeRoot editors. Nat. Biotechnol. 42, 316–327 (2024).
Chen, K., Wang, Y., Zhang, R., Zhang, H. & Gao, C. CRISPR/Cas genome editing and precision plant breeding in agriculture. Annu. Rev. Plant Biol. 70, 667–697 (2019).
Wang, G. et al. Precise fine-turning of GhTFL1 by base editing tools defines ideal cotton plant architecture. Genome Biol. 25, 59 (2024).
Jiang, T., Zhang, X.-O., Weng, Z. & Xue, W. Deletion and replacement of long genomic sequences using prime editing. Nat. Biotechnol. 40, 227–234 (2022).
Fernie, A. R. & Yan, J. De novo domestication: an alternative route toward new crops for the future. Mol. Plant 12, 615–631 (2019).
Shoemaker, R., Couche, L. & Galbraith, D. Characterization of somatic embryogenesis and plant regeneration in cotton (Gossypium hirsutum L.). Plant Cell Rep. 5, 178–181 (1986).
Jin, S. et al. Identification of a novel elite genotype for in vitro culture and genetic transformation of cotton. Biol. Plant. 50, 519–524 (2006).
Wang, L. et al. The GhmiR157a–GhSPL10 regulatory module controls initial cellular dedifferentiation and callus proliferation in cotton by modulating ethylene-mediated flavonoid biosynthesis. J. Exp. Bot. 69, 1081–1093 (2017).
Xu, J. GhL1L1 affects cell fate specification by regulating GhPIN1-mediated auxin distribution. Plant Biotechnol. J. 17, 63–74 (2019).
Deng, J. et al. GhTCE1–GhTCEE1 dimers regulate transcriptional reprogramming during wound-induced callus formation in cotton. Plant Cell 34, 4554–4568 (2022).
Yuan, J. et al. GhRCD1 regulates cotton somatic embryogenesis by modulating the GhMYC3–GhMYB44–GhLBD18 transcriptional cascade. New Phytol. 240, 207–223 (2023).
Guo, H. et al. Somatic embryogenesis critical initiation stage-specific mCHH hypomethylation reveals epigenetic basis underlying embryogenic redifferentiation in cotton. Plant Biotechnol. J. 18, 1648–1650 (2020).
Ge, X. et al. Efficient genotype-independent cotton genetic transformation and genome editing. J. Integr. Plant Biol. 65, 907–917 (2023).
Liu, Y. et al. Cloning and preliminary verification of telomere-associated sequences in upland cotton. Comp. Cytogenet. 14, 183–195 (2020).
Wang, P. & Wang, F. A proposed metric set for evaluation of genome assembly quality. Trends Genet. 39, 175–186 (2023).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Vollger, M. R. et al. Long-read sequence and assembly of segmental duplications. Nat. Methods 16, 88–94 (2019).
Luo, S. et al. The cotton centromere contains a Ty3-Gypsy-like LTR retroelement. PLoS ONE 7, e35261 (2012).
Gorinšek, B., Gubenšek, F. & Kordiš, D. A. Evolutionary genomics of chromoviruses in eukaryotes. Mol. Biol. Evol. 21, 781–798 (2004).
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
Schmitz, R. J., Grotewold, E. & Stam, M. Cis-regulatory sequences in plants: their importance, discovery, and future challenges. Plant Cell 34, 718–741 (2021).
Zhu, X. et al. Single-cell resolution analysis reveals the preparation for reprogramming the fate of stem cell niche in cotton lateral meristem. Genome Biol. 24, 194 (2023).
Braybrook, S. A. & Harada, J. J. LECs go crazy in embryo development. Trends Plant Sci. 13, 624–630 (2008).
Ji, J. et al. WOX4 promotes procambial development. Plant Physiol. 152, 1346–1356 (2009).
Wang, F. et al. Chromatin accessibility dynamics and a hierarchical transcriptional regulatory network structure for plant somatic embryogenesis. Dev. Cell 54, 742–757 (2020).
Izhaki, A. & Bowman, J. L. KANADI and Class III HD-Zip gene families regulate embryo patterning and modulate auxin flow during embryogenesis in Arabidopsis. Plant Cell 19, 495–508 (2007).
Wang, G. et al. Development of an efficient and precise adenine base editor (ABE) with expanded target range in allotetraploid cotton (Gossypium hirsutum). BMC Biol. 20, 45 (2022).
Li, C. et al. Targeted, random mutagenesis of plant genes with dual cytosine and adenine base editors. Nat. Biotechnol. 38, 875–882 (2020).
Xue, C. et al. Tuning plant phenotypes by precise, graded downregulation of gene expression. Nat. Biotechnol. 41, 1758–1764 (2023).
Xu, M., Du, Q., Tian, C., Wang, Y. & Jiao, Y. Stochastic gene expression drives mesophyll protoplast regeneration. Sci. Adv. 7, eabg8466 (2021).
Zhang, L. et al. A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour. Nat. Commun. 10, 1494 (2019).
Qin, L. et al. High-efficient and precise base editing of C·G to T·A in the allotetraploid cotton (Gossypium hirsutum) genome using a modified CRISPR/Cas9 system. Plant Biotechnol. J. 18, 45–56 (2020).
Jin, S. et al. Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292–295 (2019).
Hirano, H. et al. Structure and engineering of Francisella novicida Cas9. Cell 164, 950–961 (2016).
Huang, G. et al. Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution. Nat. Genet. 52, 516–524 (2020).
Chen, Z. J. et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 52, 525–533 (2020).
Sreedasyam, A. et al. Genome resources for three modern cotton lines guide future breeding efforts. Nat. Plants 10, 1039–1051 (2024).
Han, J. et al. Rapid proliferation and nucleolar organizer targeting centromeric retrotransposons in cotton. Plant J. 88, 992–1005 (2016).
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford nanopore sequencing. Genome Biol. 20, 129 (2019).
Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol. 25, 107 (2024).
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2019).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Kirov, I., Gilyok, M., Knyazev, A. & Fesenko, I. Pilot satellitome analysis of the model plant, Physcomitrella patens, revealed a transcribed and high-copy IGS related tandem repeat. Comp. Cytogenet. 12, 493–513 (2018).
Stovner, E. B. & Sætrom, P. epic2 efficiently finds diffuse domains in ChIP–seq data. Bioinformatics 35, 4392–4393 (2019).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764 (2011).
Liu, J. et al. Gapless assembly of maize chromosomes using long-read technologies. Genome Biol. 21, 121 (2020).
Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
Quinlan, A. R. BEDTools: the Swiss‐army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11.12. 1–11.12. 34 (2014).
Mikheenko, A., Bzikadze, A. V., Gurevich, A., Miga, K. H. & Pevzner, P. A. TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 36, i75–i83 (2020).
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275–292 (2019).
Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
Peng, R. et al. Evolutionary divergence of duplicated genomes in newly described allotetraploid cottons. Proc. Natl Acad. Sci. USA 119, e2208496119 (2022).
Gel, B. & Serra, E. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics 33, 3088–3090 (2017).
Hu, L. et al. The chromosome-scale reference genome of black pepper provides insight into piperine biosynthesis. Nat. Commun. 10, 4702 (2019).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Giordano, F., Stammnitz, M. R., Murchison, E. P. & Ning, Z. scanPAV: a pipeline for extracting presence–absence variations in genome pairs. Bioinformatics 34, 3022–3024 (2018).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988–995 (2004).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Bu, D. et al. KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis. Nucleic Acids Res. 49, W317–W325 (2021).
Grandi, F. C., Modi, H., Kampman, L. & Corces, M. R. Chromatin accessibility profiling by ATAC–seq. Nat. Protoc. 17, 1518–1552 (2022).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).
Zhang, Y. et al. Model-based analysis of ChIP–seq (MACS). Genome Biol. 9, R137 (2008).
Li, Q., Brown, J. B., Huang, H. & Bickel, P. J. Measuring reproducibility of high-throughput experiments. Preprint at https://arxiv.org/abs/1110.4705 (2011).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Machlab, D. et al. monaLisa: an R/Bioconductor package for identifying regulatory motifs. Bioinformatics 38, 2624–2625 (2022).
Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2021).
Tan, G. & Lenhard, B. TFBSTools: an R/bioconductor package for transcription factor binding site analysis. Bioinformatics 32, 1555–1556 (2016).
Concordet, J.-P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).
Xu, Z. Scripts used in ‘Genome assembly of two allotetraploid cotton germplasms reveals mechanisms of somatic embryogenesis and enables precise genome editing’. Zenodo https://doi.org/10.5281/zenodo.15035095 (2025).
Acknowledgements
The study was supported by grants from the National Science Fund for Distinguished Young Scholars (32325039 to S.J.) and the Young Scientists Fund under the National Natural Science Foundation of China (32201856 to Z.X.). The computations in this paper were run on the bioinformatics computing platform of the National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University.
Author information
Authors and Affiliations
Contributions
S.J. and X. Zhang designed and supervised the research. Z.X. and G.W. collected materials for genome and transcriptome sequencing. Z.X. and G.W. collected materials for ATAC–seq. Z.X. and R.W. performed bioinformatics analysis. Y.L. and R.P. performed the FISH experiment. G.W. and X.Z. performed the gene editing experiments. M.W., L.T. and L.Z. contributed to the project discussion. Z.X. and G.W. wrote the manuscript with input from all other authors. S.J., K.L., X. Zhang and M.W. edited the manuscript. All authors have read and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Amanda Hulse-Kemp, Jinsheng Lai and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Genomic characterization of Jin668 and YZ1.
The k-mer (31-mer) frequency distribution was estimated by GenomeScope2.0 analysis of Jin668 (a) and YZ1 (b) genome. c, The distribution of telomeres, centromeres, and gaps regions in the genomes of Jin668 and YZ1.
Extended Data Fig. 2 Completeness and accuracy validation of chromosome arms and centromeres in the Jin668 genome.
a, Assessment of the Jin668 genome assembly using LTR Assembly Index (LAI). The x-axis lists chromosomes and y-axis represents LAI values. The red dashed line represents the mean value. b,c, Hi-C map of the Jin668 genome showing genome- and chromosome-wide all-by-all interactions. d, Bionano de novo assembly contigs were mapped to the Jin668 reference assembly. e, The distribution of QV values calculated from Merqury and Merfin on different chromosomes. f, The distribution of different types of unique k-mers along the centromeric regions in chromosomes of Jin668. Each bar shows the number of different types of k-mers in a bin of length 20 Kb. The blue bars represent single-clump k-mers, which suggest a good base-level quality. While the orange (multiple-clumps) and green (no-clumps) bars suggest a low base-level quality in the region. The x-axis unit is megabases (Mb), and the y-axis represents values multiplied by 10³. g, NucFreq plot of centromeric regions in chromosomes of Jin668. HiFi coverage depth (black) along with secondary allele frequency (red) for all centromeres and surrounding regions. The x-axis represents chromosome positions in megabases (Mb), and the y-axis shows HiFi depth, ranging from 0 to 100.
Extended Data Fig. 3 Segmental duplication (SD) content of the Jin668 genome.
a,b, The intrachromosome SDs with a length greater or less than 5 kb. c, Comparison of SD length and identity in different regions of the genome. The identity (top) and length (bottom) of SDs across commonly delineated regions of the genome (colors).
Extended Data Fig. 4 The percentage of AT/GC base composition, TEs, 5mC DNA methylation, histone modification and 3D genome architecture within the centromeres.
Quantification of genomic features plotted along chromosome arms that were proportionally scaled between telomeres (TEL) and centromere midpoints (CEN).
Extended Data Fig. 5 A comparative genome analysis of the Jin668 versus TM-1, YZ1, and ZM24.
a, Genome-wide syntenic relationships among At and Dt subgenomes in four Upland cotton accessions relative to the A-genome-like Ga (A2 genome) and D-genome-like Gr (D5 genome). b, The distribution of variation density between Jin668 versus TM-1, YZ1, and ZM24 genomes. c, The distribution of SE-related genes and gene density, as well as genomic variations from Fig.1, on the chromosomes of the Jin668 genome. The circos plot from outside to inside shows: chromosome, SE-related genes, gene density in Jin668, SNPs, InDels, PAVs, inversions and translocation density between Jin668 with TM-1. d, The correlation between SNPs, InDels, PAVs, inversions and translocations with SE-related genes. The significance was tested using the overlapPermTest function in regioneR (https://www.bioconductor.org/packages/devel/bioc/html/regioneR.html), which calls permTest with the appropriate parameters to perform the permutation test. SE, somatic embryogenesis.
Extended Data Fig. 6 Sample preparation, sequencing, and quality evaluation of ATAC library.
a, Schematic outline of genome-wide ATAC–seq and RNA-seq assays and time points of sample collection with two independent biological replicates, as well as the roadmap of an integration analysis. HAI, hours after inoculation. b, The principal component plots of ATAC–seq data sequenced from Jin668, YZ1, TM-1 and ZB1092, respectively. Color code is shown. Each dot represents one sample. c, The genome-wide distribution of ATAC–seq peaks for Jin668, YZ1, TM-1 and ZB1092. Window size: TSS ± 3.0 Kb. d, The distribution of SPOT values for all samples from Jin668, YZ1, TM-1 and ZB1092, respectively. e, The distribution of FRiP values for all samples from Jin668, YZ1, TM-1 and ZB1092, respectively. f, ATAC–seq profiling in Jin668, YZ1, TM-1 and ZB1092. Left, the number statistics of chromatin accessibility peaks on each chromosome. Middle, the bar chart shows the total number of peaks detected at the corresponding induction time point. Right, the pie chart illustrates the percentage distribution of peaks across various genomic regions.
Extended Data Fig. 7 Identification of SE-related genes.
Identification of SE-related genes based on genome information, chromatin accessibility and gene expression change trend at different time points of SE stage, and other external data (published literature and data generated in our laboratory earlier, etc.).
Extended Data Fig. 8 AGL promotes cotton regeneration.
a, Transcription and epigenetic tracks for AGL. Transcription data shown as mean ± s.d. of 3 biological replicates. b, The CPR of CRISPR edited, overexpression and control lines at 60 days post-induction. Data are represented as mean ± s.d. Differences between groups were evaluated by two-sided Student’s t-test, *p < 0.05, **p < 0.01, ***p < 0.001. n = 3 independent biological replicates. The P7N was used as control. c, The paraffin sections of CRISPR edited, overexpression and control lines for Jin668 and TM-1 at different days post-induction. Scale bars represent 100 μm.
Extended Data Fig. 9 Genetic transformation experiments demonstrate that the Jin668 genome enhances the accuracy of sgRNA design for CRISPR gene editing.
a, The sequence homology of a putative sgRNA (present in Jin668 but absent in TM-1) between Jin668 and TM-1. b,c, The mutational landscape of the genomic regions corresponding to targets sgRNA within the callus of Jin668 (b) and TM-1 (c) after edited using CRISPR/Cas9. The percentage signifies the ratio of reads corresponding to this mutation type to the total read count.
Extended Data Fig. 10 Gene expression dynamics across 15 tissues in Jin668 and YZ1.
a, The profile of genes expressed in all 15 tissues of Jin668 (left) and YZ1 (right). b, Statistics on the number of genes co-expressed in multiple tissues of Jin668 (left) and YZ1 (right). c, Statistics on the number of genes specifically expressed in each tissue of Jin668 (left) and YZ1 (right). Each tissue is represented by the number corresponding to that shown in the graphic illustration in a.
Supplementary information
Supplementary Information
Supplementary Figs. 1–31, Supplementary Notes 1–13 and Supplementary Methods.
Supplementary Tables 1–35
Supplementary Tables 1–35.
Source data
Source Data Fig. 3
Statistical source data.
Source Data Fig. 6
Statistical source data.
Source Data Fig. 7
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 8
Statistical source data.
Source Data Extended Data Fig. 10
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, Z., Wang, G., Zhu, X. et al. Genome assembly of two allotetraploid cotton germplasms reveals mechanisms of somatic embryogenesis and enables precise genome editing. Nat Genet 57, 2028–2039 (2025). https://doi.org/10.1038/s41588-025-02258-3
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41588-025-02258-3