Chromosome-level genome assemblies of Thai cassava ecotypes (Manihot esculenta & Manihot glaziovii)

Beier, Sebastian; Bolger, Marie Elizabeth; Chanvivattana, Yindee; Bolger, Anthony Michael; Schmidt, Maximilian Heinrich-Wilhelm; Leelapon, Oranuch; Rakmit, Rungroj; Changmai, Kulnida; Ruayreun, Ratiporn; Srithundon, Sukanya; Totaiya, Duangjit; Sitchanukrit, Boriphat; Usadel, Björn; Wongtiem, Prapit; Amawan, Suwaluk; Wojciechowski, Tobias

doi:10.1038/s41597-025-05998-3

Download PDF

Data Descriptor
Open access
Published: 30 September 2025

Chromosome-level genome assemblies of Thai cassava ecotypes (Manihot esculenta & Manihot glaziovii)

Sebastian Beier ORCID: orcid.org/0000-0002-2177-8781¹,
Marie Elizabeth Bolger ORCID: orcid.org/0000-0001-6335-1578¹,
Yindee Chanvivattana²,
Anthony Michael Bolger¹,
Maximilian Heinrich-Wilhelm Schmidt^1,3,
Oranuch Leelapon²,
Rungroj Rakmit²,
Kulnida Changmai²,
Ratiporn Ruayreun²,
Sukanya Srithundon²,
Duangjit Totaiya²,
Boriphat Sitchanukrit²,
Björn Usadel ORCID: orcid.org/0000-0003-0921-8041^1,4,
Prapit Wongtiem⁵,
Suwaluk Amawan⁵ &
…
Tobias Wojciechowski⁶

Scientific Data volume 12, Article number: 1591 (2025) Cite this article

2032 Accesses
Metrics details

Subjects

Abstract

Cassava is a vital staple crop, yet genomic resources for diverse ecotypes, particularly from key regions, remain limited. To address this, we generated high-quality genome assemblies for nine Thai M. esculenta cultivars and one wild relative, Manihot glaziovii. The sequencing strategy combined Oxford Nanopore long reads for initial assembly with Illumina short reads for polishing and quality assessment. For five of the genotypes, extensive RNA-Seq data from various tissues and developmental stages were also produced to guide gene annotation. We provide detailed technical validation of the ten genome assemblies, reporting on key metrics of contiguity (N50s from 28.9 to 35.2 Mb), completeness (Complete BUSCO scores from 95.69% to 99.21%), and base-level accuracy (k-mer QV scores from 33.47 to 37.67). The final annotated assemblies and all raw sequencing data have been deposited in public archives and are readily accessible. These datasets represent a significant expansion of the genomic toolkit for Asian cassava, providing a foundational resource for future genetic discovery, comparative genomics, and advanced breeding applications.

Chromosome-level genome assembly of the casuarina moth, Lymantria xylina Swinhoe (1903)

Article Open access 04 February 2026

Haplotype-resolved genome of heterozygous African cassava cultivar TMEB117 (Manihot esculenta)

Article Open access 09 December 2023

De novo assembly and characterization of the draft genome of the cashew (Anacardium occidentale L.)

Article Open access 28 October 2022

Background & Summary

Cassava (Manihot esculenta Crantz) is a crop of great importance in global food security, economic development, and industrial applications. This starchy root vegetable serves as a staple food for over 800 million people worldwide, particularly in tropical and subtropical regions¹. Production reached 315 million tonnes in 2021, marking a 9% increase from 2017 worldwide². Nigeria, the leading producer, accounted for approximately 63 million tonnes, representing 31% of African production and 20% of global production. Notably, the top three producers - Nigeria, the Democratic Republic of Congo, and Thailand - contribute a combined 44% share of global cassava production². The resilience of cassava to challenging growing conditions, including drought and marginal soil, makes it an ideal crop for regions prone to climate variability³. This adaptability ensures a stable food supply, contributing significantly to food security in developing countries⁴. Additionally, its industrial applications are diverse, ranging from starch production to biofuels, sweeteners, glues and animal feed, with the global cassava starch market expected to reach USD 99.91 billion by 2032. Cassava’s role in local economies is equally significant, supporting rural development and poverty alleviation by providing income opportunities for smallholder farmers, processors, and traders.

Originating from the southern Amazon basin, where it was likely domesticated thousands of years ago, cassava’s cultivation spread throughout pre-Columbian South America^5,6. Following European contact, Portuguese traders introduced cassava to Africa, likely starting in the Congo Basin region around the 16th century. Its remarkable adaptability to diverse climates and soils fueled its widespread adoption across the African continent, where it became a fundamental staple crop. Introduction into Asia occurred later, possibly around the 18th century via trade routes, establishing cassava as a vital agricultural commodity in countries like Thailand, Indonesia, and Vietnam. This historical global spread has shaped distinct regional germplasm pools and adaptation strategies.

To fully leverage cassava’s potential and accelerate breeding efforts, comprehensive genomic resources are indispensable. Initial efforts resulted in a draft genome sequence from the Latin American-derived cultivar AM560-2 (originating from Colombia/CIAT breeding programs)⁷, providing a foundational resource for the research community. Subsequent advancements, incorporating long-read sequencing technologies (like PacBio and ONT) and scaffolding methods such as chromatin conformation capture (Hi-C), have significantly enhanced the quality and contiguity of reference genomes⁸. This includes improved chromosome-level assemblies for AM560-2 (e.g., versions v6, v7, v8)⁹ and the African landrace TMEB117 (widely used in IITA breeding programs)¹⁰.

Manihot esculenta is a diploid species (2n = 36) with an estimated haploid genome size of approximately 750 Mbp¹¹. A key complicating factor is the genome’s high degree of heterozygosity, often reported to be in the range of 1.0-1.5%¹². This high level of sequence divergence between homologous chromosomes, a consequence of cassava’s typically outcrossing reproductive system and widespread clonal propagation, poses significant hurdles for genome assembly algorithms. Standard approaches often struggle to differentiate allelic sequences, leading to the artificial merging (collapse) of haplotypes into a single chimeric sequence or the fragmentation of the assembly where divergent alleles are represented as separate contigs. Accurately resolving these haplotypes is critical for understanding allele-specific expression, identifying causal variants for traits, and developing precise breeding strategies.

While reference genomes exist for American and African cassava germplasm, high-quality genomic resources for Asian ecotypes have been lacking. A draft assembly for the Thai cultivar ‘Kasetsart 50’ (KU50)¹³ was an important first step, but a broader, high-quality representation of the diversity within Thailand was needed to empower regional breeding and research efforts. This study addresses this gap by generating and validating ten chromosome-level genome assemblies from a panel of diverse Thai Manihot esculenta cultivars and a wild M. glaziovii relative. Here, we describe the methods used for plant selection, sequencing, genome assembly, and annotation. We then present a detailed technical validation of the assemblies and gene models, confirming their quality and completeness. The resulting datasets provide a foundational genomic resource for the cassava research community, enabling future studies into the genetic diversity and improvement of this crop.

Methods

Plant material

Ten distinct Manihot genotypes were selected for genome sequencing to capture genetic diversity related to various traits within cultivated cassava (Manihot esculenta) and its wild relative M. glaziovii. The panel included the established low-yield, sweet cultivar ‘Hanatee’. Four commercially important Thai cultivars were also sequenced: ‘Kasetsart50’, a widely recommended variety; ‘Rayong9’, noted for high ethanol yield potential; ‘Rayong72’, characterized by high yield, high dry matter, and adaptation to Northeast Thailand; and ‘Rayong90’, known for high root dry matter content.

Additionally, five accessions were collected during a field visit in Northern Thailand (Nan province). These included ‘HighlandRough’ and ‘HighlandSmooth’, which are putative ‘Hanatee’ variants selected specifically for their contrasting rough and smooth bark phenotypes. Two landraces distinguished by their storage root flesh color, ‘WhiteRoot’ and ‘YellowRoot’, were also collected in the tropical rainforests of Narathiwat province in the south of Thailand. Finally, an accession of the wild relative Manihot glaziovii (designated ‘M. glaziovii WildType’), historically significant in breeding programs (e.g., for rubber traits), was provided by the Rayong Field Crop Research Center. This diverse germplasm set provides a foundation for comparative genomics studies within the Manihot genus and especially the cassava ecotypes of Thailand.

Sample preparation and sequencing

Young leaves from mature cassava plants were collected and flash-frozen. High molecular weight genomic DNA (HMW gDNA) used for Illumina and Oxford Nanopore Technologies (ONT) sequencing was extracted from the leaf tissues using a protocol provided by ONT (https://nanoporetech.com/document/extraction-method/fever-tree-gdna). The concentration and quality of the extracted DNA were assessed using a NanoDrop spectrophotometer and Qubit. Short strands of DNA were removed from the samples using circulomic SRE XL.

ONT reads

The HMW gDNA was used for ONT DNA library prep using the SQK-LSK109 kit and sequenced either on a MinION using the FLO-MIN106 flow cell (21 libraries), or on a PromethION using the FLO-PR002 flow cell (19 libraries). Reads were basecalled using Dorado (v0.5.2) with the model r941_prom_sup_g507 which generated 793.4 Gbp in total^{14,15,16,17,18,19,20,21,22,23}.

Illumina short reads

Illumina short-read library was constructed from the HMW gDNA and sequenced on Illumina NextSeq 2000 to generate 150 bp paired-end reads. The short-read sequencing generated approximately 138 Gbp of raw data, consisting of 460.1 million paired-end (2 × 150 bp) reads^{14,15,16,17,18,19,20,21,22,23}.

RNA-seq reads

RNA used for gene prediction was obtained from a time-course experiment on cassava (Manihot esculenta and Manihot glaziovii) tubers. Total RNA was extracted from tubers at multiple tuber developmental stages and time points. RNA sequencing was performed by an external service provider using Illumina technology. The Manihot esculenta cultivar Hanatee was sequenced using 75 bp single-end reads, whereas all other samples (Manihot esculenta Kasetsart50, Rayong9, Rayong72, and Manihot glaziovii WildType) were sequenced using 150 bp paired-end reads (2 × 150 bp). In total approximately 1898 Gbp of raw data was generated^{24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111}.

Genome size and heterozygosity estimation

The genome characteristics of the ten Manihot species, including genome size and heterozygosity were estimated using Illumina short read data and a k-mer based approach. A 21-mer frequency distribution was generated with Jellyfish (v2.3.1)¹¹², and the genome’s key features were inferred using GenomeScope2 (v2.0)¹¹³. The haploid genome size of the nine Manihot esculenta genotypes was estimated between 556 Mbp and 676 Mbp, with a heterozygosity rate estimated between 1.30% and 1.79%, while the genome size of Manihot glaziovii was estimated at 659 Mbp, with a heterozygosity rate at 4.62%.

De novo genome assembly, Ragtag scaffolding and quality assessment

The assembly of cassava genomes was performed using a combination of long-read sequencing data and multiple assembly refinement steps. Oxford Nanopore Technologies (ONT) reads were assembled using Flye (v.2.9.3)¹¹⁴ with parameters --read-error 0.03, -m 10000, and NextDenovo (v.2.5.0)¹¹⁵ to generate two independent draft assemblies. The completeness and quality of these assemblies were then assessed using Merqury (v.1.3)¹¹⁶. To improve the base accuracy, the assemblies were then polished using Medaka (https://github.com/nanoporetech/medaka, v.1.12.0) with ONT read data. After that, Purge_Dups (v.1.2.6)¹¹⁷ was applied to both assemblies to reduce redundancy caused by haplotigs. The two purged assemblies for each genome were then merged using QuickMerge (v.0.3)¹¹⁸ to generate a consensus genome. To further refine the assembly structure, RagTag (v.2.1.0)¹¹⁹ was employed, with the correct submodule first applied using the published South American reference genome of Manihot esculenta AM560-2 (GCA_001659605.2), followed by the scaffold submodule to enhance contiguity.

Finally, the NextPolish programme (v.1.4.1)¹²⁰ was used for two rounds of polishing with ONT read data to fill gaps and improve sequence accuracy. Following this, a contaminant screening step was performed. All unplaced contigs were subjected to a blastn (v2.16)¹²¹ search against the ‘core_nt’ database using the parameters: -max_target_seqs. 1 -evalue 1e-10 -culling_limit 5 -outfmt “6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids sscinames”. Any contigs whose best hit was not to a plant species were removed. The resulting assemblies provided high-quality genome sequences for all 10 cassava accessions (Table 1)^{122,123,124,125,126,127,128,129,130,131}.

Table 1 Summary of the ten Manihot genome assemblies.

Full size table

Genome annotation

Repeat elements within each of the ten cassava genome assemblies were identified and masked using the Extensive de novo TE Annotator (EDTA, v2.2.1)¹³². The identified repetitive sequences constitute a significant portion of each genome, ranging from 300.5 to 445.6 Mbp per assembly, which represents over 50% of the total genome size (Table 2). In addition to transposable element annotation by EDTA, SSRs were identified using MISA (v2.1)¹³³. The continuity of the repeat regions was estimated by calculating the adjusted LAI¹³⁴. Furthermore, to identify potential telomeric repeat sequences and assess chromosome completeness, the ten Manihot genome assemblies were analyzed using the tidk (telomere identification toolkit, v0.2.41)¹³⁵ with the motif ‘CCCTAAA’ and a window size of 10 kbp.

Table 2 Summary of repeat annotations for ten Manihot assemblies.

Full size table

The strategy for predicting protein-coding genes varied depending on the availability of transcriptomic data for each genotype. For five genotypes (Manihot Esculenta Hanatee, Kasetsart50, Rayong9, Rayong72, and Manihot Glaziovii WildType), available RNA-Seq data were utilized. Briefly, individual RNA-Seq library reads were aligned to their respective genome assemblies using HISAT2 (v2.2.1)¹³⁶ with ‘--dta’ parameter. Using Samtools (v1.20)¹³⁷ libraries were combined and coordinate-sorted. Transcripts were then assembled using StringTie2 (v2.2.3)¹³⁸. Separately, deep learning gene predictions were generated using Helixer (v0.3.3)¹³⁹ using the ‘Land plant’ lineage model via its web interface (https://www.plabipd.de/helixer_main.html). The transcript evidence from StringTie2 and the ab initio predictions from Helixer were then integrated using Mikado (v2.2.3)¹⁴⁰ to produce a consolidated, non-redundant set of gene models for these five genotypes. For the remaining five cassava genotypes, where corresponding RNA-Seq data were not generated in this study, protein-coding genes were predicted solely using the deep-learning-based approach implemented in Helixer (v0.3.3), accessed via its web interface.

Functional annotations for the predicted proteomes derived from all ten genotypes were obtained using Mercator4 (v7)¹⁴¹ through the Plabipd web platform (https://www.plabipd.de/mercator_main.html). This process incorporated information from ProtScriber (v0.1.6, https://github.com/usadellab/prot-scriber) and Swiss-Prot¹⁴² to assign functional categories.

The completeness of the predicted protein-coding gene sets for each of the ten genotypes, in terms of expected gene content, was assessed using BUSCO (v5.8.3)¹⁴³ against the eudicotyledons_odb12 lineage dataset. Furthermore, the quality and consistency were evaluated using Mercator4 (v7) and OMArk (v0.3.0, OMAmer v2.0.5)^144,145 and PSAURON (v1.0.6)¹⁴⁶. Results of the quality assessments are summarized in Table 3.

Table 3 Summary of gene annotations for ten Manihot assemblies.

Full size table

Data Records

The raw sequencing data, including genomic DNA Illumina and ONT reads and RNA-Seq reads, have been deposited at the EMBL-EBI European Nucleotide Archive (ENA) under BioProject number PRJEB89494 (ERP172520)¹⁴⁷. The genome assemblies of the ten genotypes have been submitted to ENA under the accessions GCA_965363265¹²², GCA_965363285¹²⁴, GCA_965363275¹²³, GCA_965363475¹²⁵, GCA_965365905¹³¹, GCA_965364345¹²⁸, GCA_965364185¹²⁶, GCA_965364695¹³⁰, GCA_965364235¹²⁷, GCA_965364665¹²⁹. The assembled genome, including annotations, is accessible via an interactive Jbrowse2¹⁴⁸ instance at https://www.plabipd.de/ceplas/?config=cassavastore.json.

Technical Validation

Assembly and annotation quality assessment

We assessed the quality and completeness of the ten Manihot genome assemblies using DNA sequencing read mapping and Merqury k-mer based evaluation. Illumina paired-end reads were mapped using bwa-mem2 (v2.2.1)¹⁴⁹, while ONT reads were aligned using minimap2 (v2.28)¹⁵⁰. Across the ten assemblies, mapping rates ranged from 96.11% to 97.11% for Illumina reads and from 92.82% to 98.12% for ONT reads, indicating successful alignment of the majority of sequencing data to each assembly.

Assembly quality was further evaluated using CRAQ (v1.0.9)¹⁵¹ based on read mappings. The regional AQI (R-AQI) scores ranged from 80.89% to 93.33%, and the structural AQI (S-AQI) scores ranged from 56.38% to 71.64% across the assemblies. Assembly completeness was assessed with compleasm (v0.2.7)¹⁵² using the eudicotyledons_odb12 lineage database. The analysis identified between 95.69% and 99.21% of the expected BUSCO orthologous groups as complete within the assemblies (Table 1). A detailed summary of these genomic features is visualized for the ‘Hanatee’ assembly in Fig. 1.

To evaluate assembly continuity specifically in repetitive regions, we calculated the LTR Assembly Index (LAI). Across the ten assemblies, the adjusted LAI scores ranged from 18.78 to 30.61, indicating assembly qualities spanning high-quality draft (LAI > 10) to reference standard (LAI > 20), with the upper range approaching gold standard quality in terms of intact LTR retrotransposon representation (Table 2).

Finally, we performed Merqury (v1.3) analysis, using a Meryl (v1.3) database constructed from Illumina reads for each assembly, estimated k-mer based genome completeness ranging from 71.97% to 89.76% (Table 4). This k-mer completeness range is inherently influenced by the nature of these purged, single pseudo-haplotype assemblies. K-mers unique to the excluded alternative haplotype are intentionally absent from the reference pseudo-haplotype, thus preventing a 100% representation of all k-mers derived from the diploid sequencing reads.

Table 4 K-mer completeness analysis for pseudo-haplotype Manihot assemblies (k = 21).

Full size table

The theoretical maximum k-mer completeness for an ideal pseudo-haplotype, when measured against the total unique k-mers from diploid reads, can be derived from established k-mer distribution models in heterozygous genomes¹⁵³. This maximum is given by the formula:

$$Maximum\,pseudo-haplotype\,completeness=1/(2-{(1-r)}^{k})$$

where r is the organism’s heterozygosity rate and k is the k-mer size used in the analysis. For instance, using a k-mer size of 21 and representative organismal heterozygosity rates in the range of 0.5%-2.0%, this theoretical maximum k-mer completeness would typically fall between 74% (for r = 2.0%) and 91% (for r = 0.5%).

It is important to consider that the initial assemblies, prior to purging to create the pseudo-haplotypes, may not have captured the entirety of k-mers present in the Illumina reads. This can be due to factors such as incomplete genome coverage by the assembly, sequencing or assembly errors, or challenges in accurately resolving and representing complex heterozygous regions in the diploid state. To account for this, we use the observed k-mer completeness of the assembly prior to purging (denoted as scaling factor s) as the effective starting fraction of captured diploid k-mers. The expected pseudo-haplotype completeness, scaled by this initial capture rate, is then calculated as:

$${Expected\; pseudo}-{haplotype\; completeness}=s\ast {maximum\; pseudo}-{haplotype\; completeness}$$

The observed k-mer completeness values for the pseudo-haplotype assemblies generally align well with the expected completeness calculated from the respective pre-purged assembly completeness and heterozygosity. This correspondence suggests that the haplotype purging process was largely effective across these genomes, yielding results consistent with theoretical expectations for single haplotype representation, albeit with minor variations likely reflecting small inefficiencies or specific choices made during the purging process itself. Notably, the Manihot glaziovii sample exhibits a significant deviation from this trend. Its observed pseudo-haplotype k-mer completeness (89.76%) is substantially higher than both the theoretical maximum for a pseudo-haplotype given its heterozygosity (61.65%) and the scaled expectation based on its pre-purged assembly’s completeness (59.42%). This marked discrepancy strongly suggests that the haplotype purging process was incomplete for this particular, highly heterozygous (4.56%) genome. The retention of a significant portion of both haplotypes is further evidenced by the final ‘purged’ assembly size of 1299 Mbp, which is nearly twice the estimated haploid genome size of 659 Mbp. From this large assembly, only 727 Mbp could be assigned to the 18 chromosomes, indicating substantial unplaced or redundant sequence. Such an inflated assembly size relative to the haploid estimate, coupled with the exceptionally high k-mer completeness, indicates that the assembly for Manihot glaziovii more closely represents a partially diploid or largely unpurged state rather than a true pseudo-haplotype. The exceptionally high heterozygosity level in M. glaziovii WildType is a plausible factor that likely complicated the accurate differentiation and removal of the second haplotype during the purging stage.

The practical implications for users of this specific assembly are significant. The partially diploid nature leads to several unavoidable artifacts: an inflated total genome size (1299 Mbp vs. an estimated 659 Mbp) and gene count (51,770); a high proportion of duplicated gene models, as evidenced by the 70.38% duplicated HOGs in the OMArk analysis (Table 3); and a large amount of sequence (~572 Mbp) that could not be confidently placed onto chromosomes. Researchers should be aware that this redundancy can create challenges for read mapping, variant calling, and comparative genomic analyses, and the data for this specific genome should be interpreted with these limitations in mind.

The corresponding estimated Phred-scaled quality values from the Merqury analysis (QV) ranged from 33.47 to 37.67 across the ten genomes. These QV scores translate directly to high base-level accuracy, indicating estimated consensus error rates between approximately 1 error in 2,220 bases (QV = 33.47) and 1 error in 5,890 bases (QV = 37.67).

Completeness of the gene annotation for each assembly was assessed using OMArk (v0.3.0, OMAmer v2.0.2), PSAURON (v1.0.6), and Mercator4 (v7). OMArk analysis demonstrated that the annotations captured a high proportion of Hierarchical Orthologous Groups (HOGs), with missing HOGs ranging from only 1.69% to 4.18%. However, a substantial proportion of these captured HOGs were identified as duplicates, with duplication rates ranging from 39.85% to 70.38%, while single-copy HOGs ranged from 27.93% to 55.54% across the annotations (Table 3). Complementary analysis with PSAURON indicated high annotation completeness, yielding scores between 97.0 and 97.3 (Table 3). Protein classification via Mercator4 showed that 95.87% to 96.52% of proteins were annotated, with 62.18% to 63.57% being successfully classified into functional bins. Across the assemblies, the annotations covered 93.79% to 96.43% of the Mercator4 BINs (Table 3).

Limitations of Ab Initio Gene Annotation

It is important for users of this dataset to note a key difference in the gene prediction methodologies used. For five of the ten genotypes — Hanatee, Kasetsart50, Rayong9, Rayong72, and WildType — gene prediction was supported by organism-specific RNA-Seq data, which improves the accuracy of gene models. The remaining five genotypes — HighlandSmooth, HighlandRough, Rayong90, WhiteRoot, and YellowRoot — were annotated using only the deep-learning-based ab initio tool Helixer. While modern gene predictors like Helixer are powerful, annotations generated without direct transcriptomic evidence are more likely to contain errors, such as incorrect exon boundaries, missed exons, or falsely merged or split genes. We therefore advise that researchers exercise particular caution when analyzing genes from these five genomes, especially in studies focused on rapidly evolving gene families or novel genes, where ab initio models may be less reliable.

Data availability

The raw genomic DNA (Illumina and ONT) and RNA-Seq sequencing data generated for this study have been deposited in the EMBL-EBI European Nucleotide Archive (ENA) under BioProject accession number PRJEB89494. The final genome assemblies for the ten genotypes are available in the ENA under the individual accession numbers GCA_965363265, GCA_965363285, GCA_965363275, GCA_965363475, GCA_965365905, GCA_965364345, GCA_965364185, GCA_965364695, GCA_965364235, and GCA_965364665. An interactive browser for the assembled genomes and their annotations is accessible at https://www.plabipd.de/ceplas/?config=cassavastore.json. All code used in this project and the final data are available as an ARC repository at PLANTdataHUB¹⁵⁴ via https://git.nfdi4plants.org/usadellab/Cassava_genome_sequencing_2017.

Code availability

All code and final data is also available as ARC deposited at PLANTdataHUB¹⁵⁴ https://git.nfdi4plants.org/usadellab/Cassava_genome_sequencing_2017.

References

Declaration, F. R. World food summit plan of action. FAO Rome Italy (1996).
Fao, F. and agriculture organization of the U. N. FAOSTAT Statistical Database. Rome URL Httpfaostat Fao OrgendataQCL 403 (2023).
EL-Sharkawy, M. A. Cassava biology and physiology. Plant Mol. Biol. 53, 621–641 (2003).
Article Google Scholar
Nweke, F. I., Spencer, D. S. C. & Lynam, J. K. The Cassava Transformation. (Michigan State University Press, 2002).
Olsen, K. M. & Schaal, B. A. Evidence on the origin of cassava: Phylogeography of Manihot esculenta. Proc. Natl. Acad. Sci. 96, 5586–5591 (1999).
Article ADS PubMed PubMed Central CAS Google Scholar
Olsen, K. M. & Schaal, B. A. Microsatellite variation in cassava (Manihot esculenta, Euphorbiaceae) and its wild relatives: further evidence for a southern Amazonian origin of domestication. Am. J. Bot. 88, 131–142 (2001).
Article PubMed CAS Google Scholar
Prochnik, S. et al. The Cassava Genome: Current Progress, Future Directions. Trop. Plant Biol. 5, 88–94 (2012).
Article PubMed PubMed Central CAS Google Scholar
Bredeson, J. V. et al. Sequencing wild and cultivated cassava and related species reveals extensive interspecific hybridization and genetic diversity. Nat. Biotechnol. 34, 562–570 (2016).
Article PubMed CAS Google Scholar
Lyons, J. B. et al. Current status and impending progress for cassava structural genomics. Plant Mol. Biol. 109, 177–191 (2022).
Article PubMed CAS Google Scholar
Landi, M. et al. Haplotype-resolved genome of heterozygous African cassava cultivar TMEB117 (Manihot esculenta). Sci. Data 10, 887 (2023).
Article PubMed PubMed Central CAS Google Scholar
Awoleye, F., van Duren, M., Dolezel, J. & Novak, F. J. Nuclear DNA content and in vitro induced somatic polyploidization cassava (Manihot esculenta Crantz) breeding. Euphytica 76, 195–202 (1994).
Article CAS Google Scholar
Halsey, M. E., Olsen, K. M., Taylor, N. J. & Chavarriaga-Aguirre, P. Reproductive Biology of Cassava (Manihot esculenta Crantz) and Isolation of Experimental Field Trials. Crop Sci. 48, 49–58 (2008).
Article Google Scholar
Wang, W. et al. Cassava genome from a wild ancestor to cultivated varieties. Nat. Commun. 5, 5110 (2014).
Article ADS PubMed CAS Google Scholar
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509704 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509705 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509706 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509707 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509708 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509709 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509710 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509711 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509712 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509713 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509714 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509715 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509716 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509717 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509718 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509719 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509720 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509721 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509722 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509723 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509724 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509725 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509726 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509727 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509728 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509729 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509730 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509731 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509732 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509733 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509734 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509735 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509736 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509737 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509738 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509739 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509740 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509741 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509742 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509743 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509744 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509745 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509746 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509747 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509748 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509749 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509750 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509751 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509752 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509753 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509754 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509755 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509756 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509757 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509758 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509759 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509760 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509761 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509762 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509763 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509764 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509765 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509766 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509767 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509768 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509769 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509770 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509771 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509772 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509773 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509774 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509775 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509776 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509777 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509778 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509779 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509780 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509781 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509782 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509783 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509784 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509785 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509786 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509787 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509788 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509789 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509790 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509791 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509792 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509793 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509794 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509795 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509796 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509797 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509798 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509799 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509800 (2025).
NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:ERS24509801 (2025).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article PubMed PubMed Central Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article PubMed CAS Google Scholar
Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol. 25, 107 (2024).
Article PubMed PubMed Central Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Article PubMed PubMed Central CAS Google Scholar
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Article PubMed PubMed Central CAS Google Scholar
Chakraborty, M., Baldwin-Brown, J. G., Long, A. D. & Emerson, J. J. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res. 44, e147–e147 (2016).
PubMed PubMed Central Google Scholar
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).
Article PubMed PubMed Central CAS Google Scholar
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020).
Article PubMed CAS Google Scholar
Camacho, C. et al. BLAST + : architecture and applications. BMC Bioinformatics 10, 421 (2009).
Article PubMed PubMed Central Google Scholar
NCBI GenBank. http://identifiers.org/ncbi/insdc.gca:GCA_965363265 (2025).
NCBI GenBank. http://identifiers.org/ncbi/insdc.gca:GCA_965363275 (2025).
NCBI GenBank. http://identifiers.org/ncbi/insdc.gca:GCA_965363285 (2025).
NCBI GenBank. http://identifiers.org/ncbi/insdc.gca:GCA_965363475 (2025).
NCBI GenBank. http://identifiers.org/ncbi/insdc.gca:GCA_965364185 (2025).
NCBI GenBank. http://identifiers.org/ncbi/insdc.gca:GCA_965364235 (2025).
NCBI GenBank. http://identifiers.org/ncbi/insdc.gca:GCA_965364345 (2025).
NCBI GenBank. http://identifiers.org/ncbi/insdc.gca:GCA_965364665 (2025).
NCBI GenBank. http://identifiers.org/ncbi/insdc.gca:GCA_965364695 (2025).
NCBI GenBank. http://identifiers.org/ncbi/insdc.gca:GCA_965365905 (2025).
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275 (2019).
Article PubMed PubMed Central CAS Google Scholar
Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics 33, 2583–2585 (2017).
Article PubMed PubMed Central CAS Google Scholar
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126–e126 (2018).
PubMed PubMed Central Google Scholar
Brown, M. R., Manuel Gonzalez de La Rosa, P. & Blaxter, M. tidk: a toolkit to rapidly identify telomeric repeats from genomic datasets. Bioinformatics 41, btaf049 (2025).
Article PubMed PubMed Central CAS Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article PubMed PubMed Central CAS Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Article PubMed PubMed Central CAS Google Scholar
Holst, F. et al. Helixer–de novo Prediction of Primary Eukaryotic Gene Models Combining Deep Learning and a Hidden Markov Model. bioRxiv 2023.02.06.527280 https://doi.org/10.1101/2023.02.06.527280 (2023).
Venturini, L., Caim, S., Kaithakottil, G. G., Mapleson, D. L. & Swarbreck, D. Leveraging multiple transcriptome assembly methods for improved gene structure annotation. GigaScience 7, giy093 (2018).
Article PubMed PubMed Central Google Scholar
Schwacke, R. et al. MapMan4: A Refined Protein Classification and Annotation Framework Applicable to Multi-Omics Data Analysis. Plant Syst. Biol. 12, 879–892 (2019).
CAS Google Scholar
The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617 (2025).
Article Google Scholar
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Article PubMed PubMed Central CAS Google Scholar
Nevers, Y. et al. Quality assessment of gene repertoire annotations with OMArk. Nat. Biotechnol. 43, 124–133 (2025).
Article PubMed CAS Google Scholar
Rossier, V., Warwick Vesztrocy, A., Robinson-Rechavi, M. & Dessimoz, C. OMAmer: tree-driven and alignment-free protein assignment to subfamilies outperforms closest sequence approaches. Bioinformatics 37, 2866–2873 (2021).
Article PubMed PubMed Central CAS Google Scholar
Sommer, M. J., Zimin, A. V. & Salzberg, S. L. PSAURON: a tool for assessing protein annotation across a broad range of species. NAR Genomics Bioinforma. 7, lqae189 (2025).
Article CAS Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:ERP172520 (2025).
Diesh, C. et al. JBrowse 2: a modular genome browser with views of synteny and structural variation. Genome Biol. 24, 74 (2023).
Article PubMed PubMed Central Google Scholar
Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. in 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 314–324. https://doi.org/10.1109/IPDPS.2019.00041 (2019).
Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37, 4572–4574 (2021).
Article PubMed PubMed Central CAS Google Scholar
Li, K., Xu, P., Wang, J., Yi, X. & Jiao, Y. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nat. Commun. 14, 6556 (2023).
Article ADS PubMed PubMed Central CAS Google Scholar
Huang, N. & Li, H. compleasm: a faster and more accurate reimplementation of BUSCO. Bioinformatics 39, btad595 (2023).
Article PubMed PubMed Central CAS Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Article PubMed PubMed Central CAS Google Scholar
Weil, H. L. et al. PLANTdataHUB: a collaborative platform for continuous FAIR data sharing in plant research. Plant J. 116, 974–988 (2023).
Article ADS PubMed CAS Google Scholar

Download references

Acknowledgements

This work benefits from resources and services provided by ELIXIR, a distributed infrastructure for life science data, funded by national governments and the European Commission, particularly de.NBI / ELIXIR-DE. SB, MB and BU are supported by the German Federal Ministry of Research, Technology and Space (BMFTR) in the frame of the German Network for Bioinformatics Infrastructure (de.NBI). The Bioeconomy International 2015 program (CASSAVASTORE: 031B0070) of the Federal Ministry of Research, Technology and Space (BMFTR), Germany and the National Science and Technology Development Agency (NSTDA), Thailand, funded the work of TW, MB, AB, MHWS, BU, YC, OL, RR, KC, RR, SS, DT, BS, PW, SA, and UC. AI tools (Google Gemini Advanced, 2.5 Pro) assisted in drafting this manuscript, with human oversight ensuring scientific accuracy.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute of Bio- and Geosciences (IBG-4 Bioinformatics), CEPLAS, BIOSC, Forschungszentrum Jülich GmbH, Wilhelm Johnen Straße, Jülich, Germany
Sebastian Beier, Marie Elizabeth Bolger, Anthony Michael Bolger, Maximilian Heinrich-Wilhelm Schmidt & Björn Usadel
National Biobank of Thailand (NBT), National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
Yindee Chanvivattana, Oranuch Leelapon, Rungroj Rakmit, Kulnida Changmai, Ratiporn Ruayreun, Sukanya Srithundon, Duangjit Totaiya & Boriphat Sitchanukrit
Department of Grapevine Breeding, Hochschule Geisenheim University, Geisenheim, Germany
Maximilian Heinrich-Wilhelm Schmidt
Heinrich-Heine University Düsseldorf, Faculty of Mathematics and Natural Sciences, Institute for Biological Data Science, CEPLAS, Düsseldorf, Germany
Björn Usadel
Rayong Field Crops Research Center, Department of Agriculture, 320 Huaypong, Muang, Rayong, 21150, Thailand
Prapit Wongtiem & Suwaluk Amawan
Institute of Bio- and Geosciences (IBG-2 Plant Sciences), Jülich Plant Phenotyping Center, Forschungszentrum Jülich GmbH, Wilhelm Johnen Straße, Jülich, Germany
Tobias Wojciechowski

Authors

Sebastian Beier
View author publications
Search author on:PubMed Google Scholar
Marie Elizabeth Bolger
View author publications
Search author on:PubMed Google Scholar
Yindee Chanvivattana
View author publications
Search author on:PubMed Google Scholar
Anthony Michael Bolger
View author publications
Search author on:PubMed Google Scholar
Maximilian Heinrich-Wilhelm Schmidt
View author publications
Search author on:PubMed Google Scholar
Oranuch Leelapon
View author publications
Search author on:PubMed Google Scholar
Rungroj Rakmit
View author publications
Search author on:PubMed Google Scholar
Kulnida Changmai
View author publications
Search author on:PubMed Google Scholar
Ratiporn Ruayreun
View author publications
Search author on:PubMed Google Scholar
Sukanya Srithundon
View author publications
Search author on:PubMed Google Scholar
Duangjit Totaiya
View author publications
Search author on:PubMed Google Scholar
Boriphat Sitchanukrit
View author publications
Search author on:PubMed Google Scholar
Björn Usadel
View author publications
Search author on:PubMed Google Scholar
Prapit Wongtiem
View author publications
Search author on:PubMed Google Scholar
Suwaluk Amawan
View author publications
Search author on:PubMed Google Scholar
Tobias Wojciechowski
View author publications
Search author on:PubMed Google Scholar

Contributions

S.B.: Methodology, Data curation, Investigation, Visualization, Writing - Original Draft. M.B.: Investigation, Writing - Review & Editing. B.U.: Writing - Review & Editing. Y.C.: Provision & Selection of plant material & cassava RNA samples, Discussion. O.L., R.R., K.C., R.R., S.S., D.T. and B.S.: Provision of cassava RNA samples. P.W. and S.A.: Provision & Selection of plant material, Discussion T.W.: Conceptualization, Validation, Resources, Supervision, Project administration, Writing - Review & Editing.

Corresponding author

Correspondence to Sebastian Beier.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Beier, S., Bolger, M.E., Chanvivattana, Y. et al. Chromosome-level genome assemblies of Thai cassava ecotypes (Manihot esculenta & Manihot glaziovii). Sci Data 12, 1591 (2025). https://doi.org/10.1038/s41597-025-05998-3

Download citation

Received: 20 June 2025
Accepted: 17 September 2025
Published: 30 September 2025
Version of record: 30 September 2025
DOI: https://doi.org/10.1038/s41597-025-05998-3