Abstract
The hexaploid sweetpotato (Ipomoea batatas [L.] Lam.) is a globally important stable crop that plays a key role in biofortification. Its high resilience and adaptability provide distinct advantages in addressing food security and climate challenges. Here we report a haplotype-resolved chromosome-level genome assembly of an African cultivar, ‘Tanzania’, revealing mosaic genomic origins along haplotype-phased chromosomes. The wild tetraploid I. aequatoriensis, currently found in coastal Ecuador, contributes to a substantial fraction of the sweetpotato genome. Another large proportion of the genome shows a closer genetic relationship to the wild tetraploid I. batatas 4×, distributed in Central America. The sequences contributed by different wild species are not distributed in typical subgenomes but are intertwined along chromosomes, possibly owing to the known non-preferential recombination among sweetpotato haplotypes. This study improves our understanding of sweetpotato origin and genome architecture and provides valuable genomic resources to accelerate sweetpotato breeding.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout


Similar content being viewed by others
Data availability
Raw genome sequencing reads have been deposited in the National Centre for Biotechnology Information BioProject database under the accession no. PRJNA1138727. The ‘Tanzania’ phased and consensus genome assemblies and annotated genes are also available via Sweetpotato Genomics Resource at http://sweetpotato.uga.edu/.
Code availability
The source code for the haplotype phasing is available via GitHub at https://github.com/wu728/tanzania_genome/blob/main/hapByHiC.py.
References
Muñoz-Rodríguez, P. et al. Discovery and characterization of sweetpotato’s closest tetraploid relative. N. Phytol. 234, 1185–1194 (2022).
Austin, D. F. in Exploration, Maintenance and Utilization of Sweet Potato Genetic Resources. Report of the First Sweet Potato Planning Conference 1987, Lima, Peru (ed Gregory, P.) 27–59 (International Potato Centre, 1988).
Roullier, C., Benoit, L., McKey, D. B. & Lebot, V. Historical collections reveal patterns of diffusion of sweet potato in Oceania obscured by modern plant movements and recombination. Proc. Natl Acad. Sci. USA 110, 2205–2210 (2013).
Sapakhova, Z. et al. Sweet potato as a key crop for food security under the conditions of global climate change: a review. Plants 12, 2516 (2023).
Bashaasha, B., Mwanga, R. O. M., p’Obwoya, C. O. & Ewell, P. T. Sweetpotato in the Farming and Food Systems of Uganda: A Farm Survey Report (International Potato Center Sub-Saharan Africa Region, 2011).
Wu, S. et al. Genome sequences of two diploid wild relatives of cultivated sweetpotato reveal targets for genetic improvement. Nat. Commun. 9, 4580 (2018).
Lau, K. H. et al. Transcriptomic analysis of sweet potato under dehydration stress identifies candidate genes for drought tolerance. Plant Direct 2, e00092 (2018).
Bednarek, R., David, M., Fuentes, S., Kreuze, J. & Fei, Z. Transcriptome analysis provides insights into the responses of sweet potato to sweet potato virus disease (SPVD). Virus Res. 295, 198293 (2021).
Kitavi, M. et al. Identification of genes associated with abiotic stress tolerance in sweetpotato using weighted gene co-expression network analysis. Plant Direct 7, e532 (2023).
Magoon, M. L., Krishnan, R. & Vljaya Bai, K. Cytological evidence on the origin of sweet potato. Theor. Appl. Genet. 40, 360–366 (1970).
Mollinari, M. et al. Unraveling the hexaploid sweetpotato inheritance using ultra-dense multilocus mapping. G3. 10, 281–292 (2020).
International Wheat Genome Sequencing Consortium (IWGSC). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018).
Muñoz-Rodríguez, P. et al. Reconciling conflicting phylogenies in the origin of sweet potato and dispersal to Polynesia. Curr. Biol. 28, 1246–1256.e12 (2018).
Yan, M. et al. Haplotype-based phylogenetic analysis and population genomics uncover the origin and domestication of sweetpotato. Mol. Plant 17, 277–296 (2024).
Mwanga, R. O. M. et al. Release of five sweetpotato cultivars in Uganda. HortScience 36, 385–386 (2001).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Abou Saada, O., Tsouris, A., Eberlein, C., Friedrich, A. & Schacherer, J. nPhase: an accurate and contiguous phasing method for polyploids. Genome Biol. 22, 126 (2021).
Zhang, X. et al. Refining polyploid breeding in sweetpotato through allele dosage enhancement. Nat. Plants 11, 36–48 (2025).
De Smet, R. et al. Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proc. Natl Acad. Sci. USA 110, 2898–2903 (2013).
Geiser, C., Mandáková, T., Arrigo, N., Lysak, M. A. & Parisod, C. Repeated whole-genome duplication, karyotype reshuffling, and biased retention of stress-responding genes in buckler mustard. Plant Cell 28, 17–27 (2016).
Hoopes, G. et al. Phased, chromosome-scale genome assemblies of tetraploid potato reveal a complex genome, transcriptome, and predicted proteome landscape underpinning genetic diversity. Mol. Plant 15, 520–536 (2022).
Kyndt, T. et al. The genome of cultivated sweet potato contains Agrobacterium T-DNAs with expressed genes: an example of a naturally transgenic food crop. Proc. Natl Acad. Sci. USA 112, 5844–5849 (2015).
Quispe-Huamanquispe, D. G. et al. The horizontal gene transfer of Agrobacterium T-DNAs into the series Batatas (genus Ipomoea) genome is not confined to hexaploid sweetpotato. Sci. Rep. 9, 12584 (2019).
Stebbins, G. L. Types of polyploids: their classification and significance. Adv. Genet. 1, 403–429 (1947).
Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 40, 1332–1335 (2022).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. In 2019 IEEE International Parallel and Distributed Processing Symposium 314–324 (IEEE, 2019).
Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37, 4572–4574 (2021).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).
Tang, H. et al. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16, 3 (2015).
Mapleson, D., Accinelli, G. G., Kettleborough, G., Wright, J. & Clavijo, B. J. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 33, 574 (2017).
Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 48, W177–W184 (2020).
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Mollinari, M. & Garcia, A. A. F. Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models. G3 9, 3297–3314 (2019).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).
Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome annotation and curation using MAKER and MAKER-P. Curr. Protoc. Bioinforma. 48, 4.11.1–4.11.39 (2014).
Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. Chapter 4, 4.10.1–4.10.14 (2009).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics Bioinf. 3, lqaa108 (2021).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Lamesch, P. et al. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 40, D1202–D1210 (2012).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).
Li, W. et al. The EMBL-EBI bioinformatics web and programmatic tools framework. Nucleic Acids Res. 43, W580–W584 (2015).
Götz, S. et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 36, 3420–3435 (2008).
Song, B. et al. AnchorWave: sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication. Proc. Natl Acad. Sci. USA 119, e2113075119 (2022).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Khelik, K., Lagesen, K., Sandve, G. K., Rognes, T. & Nederbragt, A. J. NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences. BMC Bioinf. 18, 338 (2017).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Bioinformatics 13, 555–556 (1997).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Acknowledgements
We thank S. Williamson from NC State University for her assistance in sample collection, T. Ranney and N. Lynch for assistance in flow cytometry analysis, and B. Yada from National Crops Resources Research Institute, Uganda for providing the ‘Tanzania’ images. This research was supported by grants from the Bill and Melinda Gates Foundation through SweetGAINS (OPP1213329) and RTB Breeding (CGIAR Investment ID1523-BMGF) Projects under a subcontract with the International Potato Centre, Lima, Peru, and USDA National Institute of Food and Agriculture (2022-67013-36269).
Author information
Authors and Affiliations
Contributions
Z.F., C.R.B. and G.C.Y. designed and managed the project. S.W. and Z.F. coordinated the genome sequencing. S.W. and H.S. performed the genome assembly and evaluation. J.P.H. and M.K. performed ONT transcriptome sequencing and genome annotation. S.W., H.S., X.Z., M.Y., H.W. and J.Y. contributed to genomic analyses. M.M. and G.D.S.G. constructed the phased genetic map. S.W. wrote the manuscript. Z.F. revised the manuscript. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Plants thanks Qinghe Cao, Adam Session and the other, anonymous reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Hi-C contact heatmaps for the 90 chromosomes in the ‘Tanzania’ assembly.
The Hi-C contact heatmap was generated using all mapped reads. Each of the 15 homoeologous chromosome groups forms a distinct block.
Extended Data Fig. 2 Haplotype phasing assessment of the ‘Tanzania’ assembly using the phased genetic map.
The density of reference alleles of SNPs within each haplotype across 2-Mb non-overlapping windows throughout the ‘Tanzania’ genome assembly is shown. For each chromosome and haplotype combination used to obtain the SNPs, the majority of reference alleles in these SNPs (98.6%) were phased within a specific haplotype, as indicated by the green boxes.
Extended Data Fig. 3 Large inversions among the six ‘Tanzania’ haplotypes supported by Hi-C contact signals and HiFi read alignments.
The example shown here is the 5.68-Mb inversion on ‘Tanzania’ chromosome 12 F. a, Heatmap of the Hi-C contact signals among the six haplotypes of ‘Tanzania’, supporting the inversion on chromosome 12 F. b, HiFi read alignments using TznChr12F as the reference. Reads spanning the inversion breakpoints at TznChr12F 11.8 Mb and 17.5 Mb, which originated from haplotype TznChr12F, are highlighted by the green boxes. Reads that are split-aligned, which originated from the other five haplotypes, are highlighted by orange boxes.
Extended Data Fig. 4 Dotplot showing the fold change of gene family sizes between the I. trifida NCNSP0306 and phased I. batatas ‘Tanzania’ genome assemblies.
An enlarged view of gene families with sizes smaller than 200 is shown on the right.
Extended Data Fig. 5
Distribution of TIR-NBS-LRR genes across the phased ‘Tanzania’ genome.
Extended Data Fig. 6 Distribution of TIR-NBS-LRR genes across the six haplotypes of ‘Tanzania’ chromosome 7.
Genes within the same syntenic paralogous group are labeled with the same color.
Extended Data Fig. 7 Distribution of sequence identity differences between the two wild tetraploid species and hexaploid sweetpotato.
For each analyzed phased genomic window, the sequence identity difference was calculated as: the mean sequence identity between ten I. aequatoriensis accessions and the hexaploid sweetpotato ‘Tanzania’ minus the mean sequence identity between eight I. batatas 4× accessions and the hexaploid sweetpotato ‘Tanzania’. The two peaks of the identity difference distribution are labeled by the orange and green lines in the top panel. Normal distributions were fitted to the patterns displayed by ‘Tanzania’ genomic windows with higher sequence identity to I. batatas 4× (middle) and those with higher identity to I. aequatoriensis (bottom). The 95% confidence intervals are indicated by the blue dotted lines.
Extended Data Fig. 8 Proportion and length of ‘Tanzania’ genomic sequences inferred to have different origins.
a, Proportion calculated across the entire genome. b, Proportions calculated for each homoeologous chromosome group. c, Lengths of ‘Tanzania’ sequences with different origins across the 90 chromosomes. d, Proportion of ‘Tanzania’ sequences with different origins in each of the 90 chromosomes.
Extended Data Fig. 9 Distribution of sequence identities between ‘Tanzania’ and wild I. aequatoriensis, I. batatas 4×, and I. trifida accessions in ‘Tanzania’ genomic windows with inferred ancestry based on genetic distance and sequence identity.
Genomic windows with consistent and contradictory origin inference between the two methods are shown on the left and in the middle, respectively. Windows whose origin could not be determined based on sequence identity are shown on the right. KNN refers to genomic origin inference based on genetic distance using a k-nearest neighbor algorithm.
Extended Data Fig. 10 IbT-DNA sequences in the hexaploid sweetpotato ‘Tanzania’ genome.
a, Positions of IbT-DNA1 and IbT-DNA2 sequences on the ‘Tanzania’ chromosomes. b, Alignments between the IbT-DNAs and the ‘Tanzania’ genomic regions containing these IbT-DNA insertions.
Supplementary information
Supplementary Information
Supplementary Figs. 1–13.
Supplementary Tables 1–12
Supplementary Tables 1–12.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, S., Sun, H., Zhao, X. et al. Phased chromosome-level assembly provides insight into the genome architecture of hexaploid sweetpotato. Nat. Plants 11, 1951–1959 (2025). https://doi.org/10.1038/s41477-025-02079-6
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41477-025-02079-6