Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

Phased chromosome-level assembly provides insight into the genome architecture of hexaploid sweetpotato

Abstract

The hexaploid sweetpotato (Ipomoea batatas [L.] Lam.) is a globally important stable crop that plays a key role in biofortification. Its high resilience and adaptability provide distinct advantages in addressing food security and climate challenges. Here we report a haplotype-resolved chromosome-level genome assembly of an African cultivar, ‘Tanzania’, revealing mosaic genomic origins along haplotype-phased chromosomes. The wild tetraploid I. aequatoriensis, currently found in coastal Ecuador, contributes to a substantial fraction of the sweetpotato genome. Another large proportion of the genome shows a closer genetic relationship to the wild tetraploid I. batatas 4×, distributed in Central America. The sequences contributed by different wild species are not distributed in typical subgenomes but are intertwined along chromosomes, possibly owing to the known non-preferential recombination among sweetpotato haplotypes. This study improves our understanding of sweetpotato origin and genome architecture and provides valuable genomic resources to accelerate sweetpotato breeding.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Phased genome assembly of the hexaploid sweetpotato cultivar ‘Tanzania’.
Fig. 2: Architecture of the hexaploid ‘Tanzania’ genome.

Similar content being viewed by others

Data availability

Raw genome sequencing reads have been deposited in the National Centre for Biotechnology Information BioProject database under the accession no. PRJNA1138727. The ‘Tanzania’ phased and consensus genome assemblies and annotated genes are also available via Sweetpotato Genomics Resource at http://sweetpotato.uga.edu/.

Code availability

The source code for the haplotype phasing is available via GitHub at https://github.com/wu728/tanzania_genome/blob/main/hapByHiC.py.

References

  1. Muñoz-Rodríguez, P. et al. Discovery and characterization of sweetpotato’s closest tetraploid relative. N. Phytol. 234, 1185–1194 (2022).

    Article  Google Scholar 

  2. Austin, D. F. in Exploration, Maintenance and Utilization of Sweet Potato Genetic Resources. Report of the First Sweet Potato Planning Conference 1987, Lima, Peru (ed Gregory, P.) 27–59 (International Potato Centre, 1988).

  3. Roullier, C., Benoit, L., McKey, D. B. & Lebot, V. Historical collections reveal patterns of diffusion of sweet potato in Oceania obscured by modern plant movements and recombination. Proc. Natl Acad. Sci. USA 110, 2205–2210 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Sapakhova, Z. et al. Sweet potato as a key crop for food security under the conditions of global climate change: a review. Plants 12, 2516 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Bashaasha, B., Mwanga, R. O. M., p’Obwoya, C. O. & Ewell, P. T. Sweetpotato in the Farming and Food Systems of Uganda: A Farm Survey Report (International Potato Center Sub-Saharan Africa Region, 2011).

    Google Scholar 

  6. Wu, S. et al. Genome sequences of two diploid wild relatives of cultivated sweetpotato reveal targets for genetic improvement. Nat. Commun. 9, 4580 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Lau, K. H. et al. Transcriptomic analysis of sweet potato under dehydration stress identifies candidate genes for drought tolerance. Plant Direct 2, e00092 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Bednarek, R., David, M., Fuentes, S., Kreuze, J. & Fei, Z. Transcriptome analysis provides insights into the responses of sweet potato to sweet potato virus disease (SPVD). Virus Res. 295, 198293 (2021).

    Article  CAS  PubMed  Google Scholar 

  9. Kitavi, M. et al. Identification of genes associated with abiotic stress tolerance in sweetpotato using weighted gene co-expression network analysis. Plant Direct 7, e532 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Magoon, M. L., Krishnan, R. & Vljaya Bai, K. Cytological evidence on the origin of sweet potato. Theor. Appl. Genet. 40, 360–366 (1970).

    Article  CAS  PubMed  Google Scholar 

  11. Mollinari, M. et al. Unraveling the hexaploid sweetpotato inheritance using ultra-dense multilocus mapping. G3. 10, 281–292 (2020).

    Article  PubMed  Google Scholar 

  12. International Wheat Genome Sequencing Consortium (IWGSC). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018).

    Article  Google Scholar 

  13. Muñoz-Rodríguez, P. et al. Reconciling conflicting phylogenies in the origin of sweet potato and dispersal to Polynesia. Curr. Biol. 28, 1246–1256.e12 (2018).

    Article  PubMed  Google Scholar 

  14. Yan, M. et al. Haplotype-based phylogenetic analysis and population genomics uncover the origin and domestication of sweetpotato. Mol. Plant 17, 277–296 (2024).

    Article  CAS  PubMed  Google Scholar 

  15. Mwanga, R. O. M. et al. Release of five sweetpotato cultivars in Uganda. HortScience 36, 385–386 (2001).

    Article  Google Scholar 

  16. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).

    PubMed  PubMed Central  Google Scholar 

  18. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article  PubMed  Google Scholar 

  19. Abou Saada, O., Tsouris, A., Eberlein, C., Friedrich, A. & Schacherer, J. nPhase: an accurate and contiguous phasing method for polyploids. Genome Biol. 22, 126 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Zhang, X. et al. Refining polyploid breeding in sweetpotato through allele dosage enhancement. Nat. Plants 11, 36–48 (2025).

    Article  PubMed  Google Scholar 

  21. De Smet, R. et al. Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proc. Natl Acad. Sci. USA 110, 2898–2903 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Geiser, C., Mandáková, T., Arrigo, N., Lysak, M. A. & Parisod, C. Repeated whole-genome duplication, karyotype reshuffling, and biased retention of stress-responding genes in buckler mustard. Plant Cell 28, 17–27 (2016).

    Article  CAS  PubMed  Google Scholar 

  23. Hoopes, G. et al. Phased, chromosome-scale genome assemblies of tetraploid potato reveal a complex genome, transcriptome, and predicted proteome landscape underpinning genetic diversity. Mol. Plant 15, 520–536 (2022).

    Article  CAS  PubMed  Google Scholar 

  24. Kyndt, T. et al. The genome of cultivated sweet potato contains Agrobacterium T-DNAs with expressed genes: an example of a naturally transgenic food crop. Proc. Natl Acad. Sci. USA 112, 5844–5849 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Quispe-Huamanquispe, D. G. et al. The horizontal gene transfer of Agrobacterium T-DNAs into the series Batatas (genus Ipomoea) genome is not confined to hexaploid sweetpotato. Sci. Rep. 9, 12584 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Stebbins, G. L. Types of polyploids: their classification and significance. Adv. Genet. 1, 403–429 (1947).

    Article  PubMed  Google Scholar 

  27. Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 40, 1332–1335 (2022).

    Article  CAS  PubMed  Google Scholar 

  28. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. In 2019 IEEE International Parallel and Distributed Processing Symposium 314–324 (IEEE, 2019).

  31. Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37, 4572–4574 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Tang, H. et al. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16, 3 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Mapleson, D., Accinelli, G. G., Kettleborough, G., Wright, J. & Clavijo, B. J. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 33, 574 (2017).

    Article  CAS  PubMed  Google Scholar 

  36. Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 48, W177–W184 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Mollinari, M. & Garcia, A. A. F. Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models. G3 9, 3297–3314 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome annotation and curation using MAKER and MAKER-P. Curr. Protoc. Bioinforma. 48, 4.11.1–4.11.39 (2014).

    Article  Google Scholar 

  41. Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. Chapter 4, 4.10.1–4.10.14 (2009).

    Google Scholar 

  43. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).

    Article  Google Scholar 

  44. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics Bioinf. 3, lqaa108 (2021).

    Article  Google Scholar 

  47. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Lamesch, P. et al. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 40, D1202–D1210 (2012).

    Article  CAS  PubMed  Google Scholar 

  49. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  50. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).

    Article  CAS  PubMed  Google Scholar 

  51. Li, W. et al. The EMBL-EBI bioinformatics web and programmatic tools framework. Nucleic Acids Res. 43, W580–W584 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Götz, S. et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 36, 3420–3435 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Song, B. et al. AnchorWave: sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication. Proc. Natl Acad. Sci. USA 119, e2113075119 (2022).

    Article  CAS  PubMed  Google Scholar 

  54. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Khelik, K., Lagesen, K., Sandve, G. K., Rognes, T. & Nederbragt, A. J. NucDiff: in-depth characterization and annotation of differences between two sets of DNA sequences. BMC Bioinf. 18, 338 (2017).

    Article  Google Scholar 

  56. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Bioinformatics 13, 555–556 (1997).

    Article  CAS  Google Scholar 

  59. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank S. Williamson from NC State University for her assistance in sample collection, T. Ranney and N. Lynch for assistance in flow cytometry analysis, and B. Yada from National Crops Resources Research Institute, Uganda for providing the ‘Tanzania’ images. This research was supported by grants from the Bill and Melinda Gates Foundation through SweetGAINS (OPP1213329) and RTB Breeding (CGIAR Investment ID1523-BMGF) Projects under a subcontract with the International Potato Centre, Lima, Peru, and USDA National Institute of Food and Agriculture (2022-67013-36269).

Author information

Authors and Affiliations

Authors

Contributions

Z.F., C.R.B. and G.C.Y. designed and managed the project. S.W. and Z.F. coordinated the genome sequencing. S.W. and H.S. performed the genome assembly and evaluation. J.P.H. and M.K. performed ONT transcriptome sequencing and genome annotation. S.W., H.S., X.Z., M.Y., H.W. and J.Y. contributed to genomic analyses. M.M. and G.D.S.G. constructed the phased genetic map. S.W. wrote the manuscript. Z.F. revised the manuscript. All authors read and approved the manuscript.

Corresponding author

Correspondence to Zhangjun Fei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Plants thanks Qinghe Cao, Adam Session and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Hi-C contact heatmaps for the 90 chromosomes in the ‘Tanzania’ assembly.

The Hi-C contact heatmap was generated using all mapped reads. Each of the 15 homoeologous chromosome groups forms a distinct block.

Extended Data Fig. 2 Haplotype phasing assessment of the ‘Tanzania’ assembly using the phased genetic map.

The density of reference alleles of SNPs within each haplotype across 2-Mb non-overlapping windows throughout the ‘Tanzania’ genome assembly is shown. For each chromosome and haplotype combination used to obtain the SNPs, the majority of reference alleles in these SNPs (98.6%) were phased within a specific haplotype, as indicated by the green boxes.

Extended Data Fig. 3 Large inversions among the six ‘Tanzania’ haplotypes supported by Hi-C contact signals and HiFi read alignments.

The example shown here is the 5.68-Mb inversion on ‘Tanzania’ chromosome 12 F. a, Heatmap of the Hi-C contact signals among the six haplotypes of ‘Tanzania’, supporting the inversion on chromosome 12 F. b, HiFi read alignments using TznChr12F as the reference. Reads spanning the inversion breakpoints at TznChr12F 11.8 Mb and 17.5 Mb, which originated from haplotype TznChr12F, are highlighted by the green boxes. Reads that are split-aligned, which originated from the other five haplotypes, are highlighted by orange boxes.

Extended Data Fig. 4 Dotplot showing the fold change of gene family sizes between the I. trifida NCNSP0306 and phased I. batatas ‘Tanzania’ genome assemblies.

An enlarged view of gene families with sizes smaller than 200 is shown on the right.

Extended Data Fig. 5

Distribution of TIR-NBS-LRR genes across the phased ‘Tanzania’ genome.

Extended Data Fig. 6 Distribution of TIR-NBS-LRR genes across the six haplotypes of ‘Tanzania’ chromosome 7.

Genes within the same syntenic paralogous group are labeled with the same color.

Extended Data Fig. 7 Distribution of sequence identity differences between the two wild tetraploid species and hexaploid sweetpotato.

For each analyzed phased genomic window, the sequence identity difference was calculated as: the mean sequence identity between ten I. aequatoriensis accessions and the hexaploid sweetpotato ‘Tanzania’ minus the mean sequence identity between eight I. batatas 4× accessions and the hexaploid sweetpotato ‘Tanzania’. The two peaks of the identity difference distribution are labeled by the orange and green lines in the top panel. Normal distributions were fitted to the patterns displayed by ‘Tanzania’ genomic windows with higher sequence identity to I. batatas 4× (middle) and those with higher identity to I. aequatoriensis (bottom). The 95% confidence intervals are indicated by the blue dotted lines.

Extended Data Fig. 8 Proportion and length of ‘Tanzania’ genomic sequences inferred to have different origins.

a, Proportion calculated across the entire genome. b, Proportions calculated for each homoeologous chromosome group. c, Lengths of ‘Tanzania’ sequences with different origins across the 90 chromosomes. d, Proportion of ‘Tanzania’ sequences with different origins in each of the 90 chromosomes.

Extended Data Fig. 9 Distribution of sequence identities between ‘Tanzania’ and wild I. aequatoriensis, I. batatas 4×, and I. trifida accessions in ‘Tanzania’ genomic windows with inferred ancestry based on genetic distance and sequence identity.

Genomic windows with consistent and contradictory origin inference between the two methods are shown on the left and in the middle, respectively. Windows whose origin could not be determined based on sequence identity are shown on the right. KNN refers to genomic origin inference based on genetic distance using a k-nearest neighbor algorithm.

Extended Data Fig. 10 IbT-DNA sequences in the hexaploid sweetpotato ‘Tanzania’ genome.

a, Positions of IbT-DNA1 and IbT-DNA2 sequences on the ‘Tanzania’ chromosomes. b, Alignments between the IbT-DNAs and the ‘Tanzania’ genomic regions containing these IbT-DNA insertions.

Supplementary information

Supplementary Information

Supplementary Figs. 1–13.

Reporting Summary

Supplementary Tables 1–12

Supplementary Tables 1–12.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, S., Sun, H., Zhao, X. et al. Phased chromosome-level assembly provides insight into the genome architecture of hexaploid sweetpotato. Nat. Plants 11, 1951–1959 (2025). https://doi.org/10.1038/s41477-025-02079-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41477-025-02079-6

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing