Abstract
The cultivated Zizania latifolia, an aquatic vegetable prevalent in the Yangtze River Basin, represents a unique plant-fungus complex whose domestication is associated with host-parasite co-evolution. In this study, we present a high-quality, chromosome-scale genome assembly of cultivated Z. latifolia. We employed PacBio long-read sequencing and Hi-C technology to generate ~578.42 Mb genome assembly, which contains 47.59% repeat sequences with a contig N50 of ~33.75 Mb. The contigs were successfully clustered into 17 chromosomal-sized scaffolds with a GC content of 43.26%, showing 98.39% completeness in BUSCO analysis. In total, we predicted 39,934 protein-coding genes, 88.79% of which could be functionally annotated. This genome assembly provides a valuable resource for unraveling Z. latifolia’s domestication process, and advances our understanding of the evolutionary history and agricultural potential of Z. latifolia.
Similar content being viewed by others
Background & Summary
The genus Zizania, belonging to the rice tribe (Oryzeae) of the grass family Poaceae, is closely related to the genus Oryza, along with Leersia1,2,3,4. Of the four species within Zizania, three are native to North America, including the annual Zizania palustris, commonly known as wild rice, which has recently been domesticated as a grain crop3,5,6. On the other hand, the only East Asian species, perennial Zizania latifolia, is prevalent in freshwater wetlands of eastern China and plays a crucial role in emergent plant communities7,8,9. Interestingly, Z. latifolia’s young stems can be infested by the smut fungus Ustilago esculenta, resulting in the formation of fleshy, edible galls. This phenomenon, observed and documented by ancient Chinese over 2,000 years ago, led to the domestication of the Z. latifolia-U. esculenta complex as an aquatic vegetable called “Jiaobai”8,10,11,12.
The domestication of cultivated Z. latifolia presents unique characteristics compared to other cultivated plants. According to historical literatures, Z. latifolia was domesticated as a vegetable crop in the late Tang Dynasty (more than 1000 years ago). The infection of U. esculenta disabled the ability to reproduce sexually, forcing the cultivated Z. latifolia to rely on asexual tillers for reproduction11,13. This reproductive constraint led to extremely low genetic diversity among cultivated varieties7,8. Notably, the domestication of Z. latifolia deviates from the ordinary binary co-evolutionary relationship consisting of a sole domesticated crop and humans14. Instead, Z. latifolia was domesticated as a plant-fungus complex, involving two closely related species that are simultaneously subjected to human selective pressure8,10,15. This unique domestication process makes cultivated Z. latifolia a potentially novel model for studying host-parasite co-evolution and the response of symbiotic systems to artificial selection8,13,16. Additionally, as a close relative of Z. palustris, Z. latifolia has historical significance as a former grain crop. This ancestral usage, combined with its perennial nature, suggests that Z. latifolia harbors the potential to be de novo domesticated into a new perennial grain crop12,17,18,19,20.
Significant progress has been made in understanding the genomic structure of Z. latifolia. The draft genome and chromosome-level genome of wild Z. latifolia have been sequenced successively, providing valuable resources for exploring the origin of cultivated Z. latifolia and dissecting the potential agronomic traits in wild Z. latifolia germplasms12,13,20. We have also previously made preliminary inferences on the possible domestication scenarios of cultivated Z. latifolia using molecular markers8. Despite these advancements, a high-quality genome of cultivated Z. latifolia remains indispensable to further address the origin affairs, and to infer the genetic basis of domesticated traits.
This study presents the first near-complete chromosomal-scale genome assembly of cultivated Z. latifolia using long-read sequencing data and Hi-C sequencing technologies. The assembly yielded a 578.42 Mb genome with a contig N50 of 33.75 Mb, and the contigs were successfully clustered into 17 chromosomal-sized scaffolds with only one gap. The assembly’s quality was validated through Benchmarking Universal Single-Copy Ortholog (BUSCO) analysis, which revealed 98.39% completeness. Furthermore, 39,934 protein-coding genes were predicted, with 88.79% of these genes being functionally annotated.
This genome assembly and annotation will lay out a genetic map and milestone for comparative genomics in the genus Zizania. It enables researchers to unravel the mysteries surrounding the domestication of cultivated Z. latifolia, and serves as an important resource for future conservation and breeding efforts of Z. latifolia. These genomic insights pave the way for deeper understanding of Z. latifolia’s evolutionary history and its potential for agricultural improvement.
Methods
Sampling and genomic sequencing
In 2022, a landrace of cultivated Z. latifolia was collected from the rural area near Tonglu city (29.78°N, 119.57°E) of Zhejiang province in China. The collected sample was transplanted to the Z. latifolia germplasm in Lushan Botanical Garden, then the young leaves were harvested for DNA extraction and genome sequencing. Genomic DNA was extracted following the CTAB method. DNA quality and concentration were examined using NanoDrop ND2000 spectrophotometer (Thermo Fisher Scientific, USA) and Qubit 3.0 Fluorometer (Thermo Fisher Scientific, USA).
For genome survey, the paired-end (PE 150 bp) library was generated using the DNBSEQ-T7RS High-throughput Sequencing FCL PE150 Kit (MGI Tech, China), and the library was sequenced on an DNBSEQ-T7 platform (MGI Tech, China) following the manufacturer’s instructions. This yielded ~60.86 Gb of paired-end reads, covering about 110.7× of the estimated Z. latifolia genome (Supplementary Table S1). The Pacbio HiFi sequencing was then performed on the PacBio revio platform (Pacific Biosciences, USA), according to the manufacturer’s instructions. It produced ~127.37 Gb HiFi reads, equivalent to about 231.6× coverage of the Z. latifolia genome (Supplementary Table S1). To prepare the library for High-through chromosome conformation capture (Hi-C) sequencing, formaldehyde was used for crosslinking the fresh leaves. Subsequently, the Hi-C library was constructed based on the instructions and sequenced on DNBSEQ-T7 platform, generating ~111.89 Gb raw reads (Supplementary Table S1). For the RNA-seq, diverse tissues including stem, leaves, inflorescence and roots, were collected and immediately frozen in liquid nitrogen, with three biological replications. The total RNA per sample was extracted and purified. The integrity of the RNA was assessed on an Agilent 2100 Bioanalyzer (Agilent, USA). After DNase treatment, RNA-seq libraries were constructed and sequenced on the DNBSEQ-T7 platform with 150 bp paired-end sequences according to the manufacturer’s recommended protocol. A total of ~21.65 Gb RNA-seq reads were obtained to assist the subsequent analysis (Supplementary Table S1).
Genome estimation and chromosome-level assembly
Prior to the actual genome assembly, a genome survey was conducted using the filtered MGI short reads to assess the main features of the Z. latifolia genome, including genome size, heterozygosity, and repetitive sequence content. The k-mer analyses (17–31 k-mer) were conducted using Jellyfish v2.1.421. Genome evaluation was performed based on k-mer frequency distribution and k-mer = 23 using Genome Scope22. Subsequently, the survey results estimated the genome size as ~550.84 Mb with a heterozygosity of 0.39% and a repeat rate of 43.59%.
The PacBio HiFi reads were used to perform de novo genome assembly by using hifiasm v0.19.623 with default parameters. This initial assembly resulted in a genome size of ~583.74 Mb, containing 41 contigs with N50 sizes of ~33.75 Mb. Finally, Hi-C sequencing data were used to anchor the assembled contigs into pseudochromosome molecules. The filtered Hi-C data were first mapped to the polished genome assembly with Juicer v1.624. Then the unique mapped reads were taken as input for 3D-DNA pipeline v18092225 with parameters “-r 0”. Afterward, a careful manual inspection and correction of any visual errors in the graph was done using JuiceBox v1.11.0826. As a result, seventeen pseudochromosomes were identified by distinct interaction signals in the Hi-C interaction heatmap (Supplementary Fig. S1).
We finally obtained chromosomal-level genome of ~578.42 Mb in size, closely aligning with the estimated genome size of ~550.84 Mb from the initial survey (Fig. 1, Table 1). This assembly incorporated 99.44% of the assembled contigs, resulting in a scaffold N50 length of ~34.71 Mb. The GC content in cultivated Z. latifolia genome was observed to be 43.26%. Benchmarking Universal Single-Copy Ortholog (BUSCO) v5.4.327 was employed to assess the integrity, purity and completeness of the genome using embryophyta gene set (odb10). Out of the 1614 BUSCOs, 1588 (98.39%) BUSCOs were identified as complete, including 1261 (78.07%) single-copy BUSCOs. Additionally, 327 BUSCO genes were identified as duplicates, 8 being fragmented and 18 identified as missing BUSCO genes (Table 1).
A circular visualization of chromosomes in Z. latifolia genome. The outermost plot represents ideograms of 17 chromosomes (scale mark = 5 Mb). Moving from the second outermost track to the innermost track, each concentric circle denotes the GC content, density of protein-coding genes, repeat sequence density, Gypsy-like element distribution and Copia-like element distribution. The innermost track indicates genomic synteny among the chromosomes.
Repeat elements prediction
Repeat elements in the assembled genome were identified by combining de novo and homology-based methods. Tandem repeat sequences were annotated using Tandem Repeat Finder (TRF v4.09)28 with default parameters. For de novo-based searches, RepeatModeler v1.0.1129 and LTR_FINDER v1.0730 were used to construct the de novo repeat libraries following default parameters. Subsequently, RepeatMasker v4.0.931 was applied to detect repeat sequences based on these libraries. For homology-based searches, RepeatMasker v4.0.9 was employed against a known repeat library Repbase v23.0832.
After completing the aforementioned analyses, we identified a total of ~276.82 Mb as repeat sequence length representing 47.59% of the entire genome. The majority of these repeats were the long terminal repeats (LTRs), which contributed to 35.08% of the genome. The DNA transposons, long interspersed nuclear elements (LINEs), and short interspersed nuclear elements (SINEs) accounted for 10.17%, 0.92%, and 0.01% of the genome, respectively (Supplementary Table S2).
Gene prediction and functional annotation
To annotate protein-coding genes in the cultivated Z. latifolia genome, a multi-approach criterion was performed by employing ab initio prediction, homolog-based gene prediction and transcriptome-based prediction.
The assembled genome was masked by RepeatMasker v4.0.931 to prevent repetitive sequences from interfering with gene prediction. Applying the default settings, the ab initio gene prediction approach was performed by using AUGUSTUS v3.2.232 and GlimmerHMM v3.0233 based on statistical models of gene structure. For homology-based gene annotation, the Exonerate v2.2.034 program was employed to search against protein sequences from wild Z. latifolia (NGDC Genome Warehouse, GWHBFHI00000000)12,35, Z. palustris (NCBI database, GCA_019279435.1)5, Oryza sativa (MSU 7.0)36 and Aegilops tauschii (NCBI database, GCF_002575655.2)37. For the transcriptome gene prediction, quality-controlled RNA-seq reads were mapped to the wild Z. latifolia genome by HiSat2 v2.1.038, and StringTie v1.3.539 was used to generate transcripts for referencing-guided assembly. Moreover, Trinity v2.15.140 was employed for de novo assembling transcripts based on RNA-seq data. The resulting transcripts were consolidated, with redundancies removed using CD-HIT v.4.8.141. Then TansDecoder v5.5.0 (https://github.com/TransDecoder/TransDecoder) was used to predict the open reading frames (ORFs) based on the assembled transcripts.
Applying the default parameters, Maker2 v2.31.1042 was used to integrate the three gene prediction models into a consensus gene set. The integration resulted in the prediction of 39,934 protein-coding genes distributed across the genome, with a mean gene length of 5,087.29 bp. Gene functional annotation was executed by aligning the predicted protein sequences against public functional databases using BLAST v2.11.043 (e-value < 10−5), including Trembl44, NCBI-nr45, KEGG46, InterPro47, KOG48 and SwissProt49. This comprehensive annotation process resulted into 35,458 being functionally annotated genes representing 88.79% of the protein-coding genes (Supplementary Table S3). Gene Ontology (GO) was performed using InterProScan v5.55–88.050 (Supplementary Fig. S2).
To provide a comprehensive visual representation of the cultivated Z. latifolia genome, we employed Circos v0.69-951 to create a circular genome map. This visualization depicts the distribution of several key genomic features across the 17 chromosomes, including the GC content, density of protein-coding genes, repeat sequence density, Gypsy-like element, Copia-like element and intra-genomic synteny (Fig. 1). In addition to protein-coding genes, we also annotated various non-coding RNA elements in the genome. tRNAscan-SE v1.3.152 software was used to predict tRNAs. The rRNA, miRNA, and snRNA were predicted using INFERNAL v1.1.253 software through searches against the Rfam database v14.854. The non-coding RNA annotation yielded 228 miRNAs, 2,805 rRNAs, 659 tRNAs, and 756 snRNAs in the cultivated Z. latifolia genome (Supplementary Table S4).
Data Records
The sequencing data and genome assembly were deposited in the National Genomics Data Center (NGDC), Beijing Institute of Genomics, the Chinese Academy of Sciences/China National Center for Bioinformation with BioProject accession number PRJCA02078655. The sequencing data of MGI short reads, PacBio HiFi long-reads, RNA-seq data, Hi-C reads were deposited in the Genome Sequence Archive (GSA) of NGDC under accession numbers CRA01318656, CRA01798857, CRA01809158 and CRA01798759, respectively. The genome assembly was deposited in GenBank under the accession number GCA_043380935.160, and it was also deposited in Genome Warehouse (GWH) of NGDC under the accession number GWHFFOM0000000061. Furthermore, the assembled genome and annotation data were deposited in the figshare database for broader accessibility62.
Technical Validation
Genome assembly assessment
Two approaches were used to evaluate the robustness and completeness of the assembled genomes. First, the conserved protein models from the lineage database embryophyta_odb10 were searched against genome using the Benchmarking Universal Single-Copy Orthologs (BUSCO) v5.4.3. 98.39% of the genes were present in the assembled genome, which suggests that a substantial majority of the essential and conserved genes were successfully captured. Second, the MGI short paired-end reads generated in genome survey were mapped to the final genome using BWA v0.7.1263 with default settings. Approximately, 99.59% of the short reads were aligned to the genome, which covered 98.50% of the assembled genome.
In addition, the plant-specific telomeric repeats (T3AG3) were identified in all seventeen chromosome sequences. 13 chromosomes harbored telomeric repeats at both sides, and the rest 4 chromosomes had telomeric repeats at one side (Supplementary Table S5), underlining the near-complete assembly of chromosome ends.
We further compared the assembly parameters of newly assembled cultivated Z. latifolia genome with two published wild Z. latifolia genomes12,13 and found that it has better assembly integrity and contiguity (Table 2). We also investigated the syntenic relationships between the cultivated Z. latifolia genome and the other two published chromosome-level Zizania genomes5,12 using JCVI v1.2.764. The results indicate that our genome assembly of cultivated Z. latifolia demonstrates superior sequence continuity and genome correctness (Fig. S3).
Assessment of the gene annotation
The annotated and integrated proteins were also evaluated using BUSCO v5.4.3 with the lineage dataset embryophyte_odb10. Briefly, the proportion of complete core gene coverage was 98.10% (1218 single-copy genes and 365 duplicated genes), and there were only a few fragmented (1.40%) and missing (2.40%) genes, indicating high-quality annotation of the predicted gene models.
Code availability
No custom codes were used in this study. All bioinformatics tools and software applications were executed in accordance with their respective manuals and protocols. The specific software versions and the parameters used are detailed in the methods section.
References
Kellogg, E. A. The evolutionary history of Ehrhartoideae, Oryzeae, and Oryza. Rice. 2, 1–14 (2009).
Xu, X. et al. Phylogeny and biogeography of the eastern Asian-North American disjunct wild-rice genus (Zizania L., Poaceae). Mol. Phylogenet. Evol. 55, 1008–1017 (2010).
Porter, R. in North American crop wild relatives: important species. Vol.2 (eds. Greene, S. L., Williams, K. A., Khoury, C. K., Kantar, M. B., Marek, L. F.) Ch.3 (Springer International Publishing 2019).
Zhang, T. et al. Phylogenomic profiles of whole-genome duplications in Poaceae and landscape of differential duplicate retention and losses among major Poaceae lineages. Nat. Commun. 15, 3305 (2024).
Haas, M. et al. Whole-genome assembly and annotation of northern wild rice, Zizania palustris L., supports a whole-genome duplication in the Zizania genus. Plant J. 107, 1802–1818 (2021).
McGlip, L., Castell-Miller, C., Haas, M., Millas, R. & Kimball, J. Northern Wild Rice (Zizania palustris L.) breeding, genetics, and conservation. Crop Sci. 63, 1904–1933 (2023).
Xu, X., Ke, W., Yu, X., Wen, J. & Ge, S. A preliminary study on population genetic structure and phylogeography of the wild and cultivated Zizania latifolia (Poaceae) based on Adh1a sequences. Theor. Appl. Genet. 116, 835–843 (2008).
Zhao, Y. et al. Inferring the origin of cultivated Zizania latifolia, an aquatic vegetable of a plant-fungus complex in the Yangtze River Basin. Front. Plant Sci. 10, 1406 (2019).
Wagutu, G. K. Genetic structure of wild rice Zizania latifolia in an expansive heterogeneous landscape along a latitudinal gradient. Front. Ecol. Evol. 10, 929944 (2022).
Chan, Y. S. & Thrower, L. The host-parasite relationship between Zizania caducifyora Turcz. and Ustilago esculenta P. Henn. I. structure and development of the host and host-parasite combination. New Phytol. 85, 201–207 (1980).
Guo, H. B., Li, S. M., Peng, J. & Ke, W. D. Zizania latifolia Turcz. Cultivated in China. Genet. Resour. Crop Evol. 54, 1211–1217 (2007).
Yan, N. et al. Chromosome-level genome assembly of Zizania latifolia provides insights into its seed shattering and phytocassane biosynthesis. Commun. Biol. 5, 36 (2022).
Guo, L. B. et al. A host plant genome (Zizania latifolia) after a century-long endophyte infection. Plant J. 83, 600–609 (2015).
Purugganan and Fuller Purugganan, M. D., Fuller, D. Q. The nature of selection during plant domestication. Nature. 457, 843–848 (2009).
Zhang, J. Z., Chu, F. Q., Guo, D. P., Hyde, K. D. & Xie, G. L. Cytology and ultrastructure of interactions between Ustilago esculenta and Zizania latifolia. Mycol. Prog. 11, 499–508 (2012).
Guttman, D. S., McHardy, A. C. & Schulze-Lefert, P. Microbial genome-enabled insights into plant-microorganism interactions. Nat. Rev. Genet. 15, 797–813 (2014).
Zhai, C. K., Jiang, X. L., Xu, Y. S. & Lorenz, K. J. Protein and amino acid composition of Chinese and North American wild rice. J. Food Compos. Anal. 14, 371–382 (1994).
Zhao, Y. et al. Seed characteristic variations and genetic structure of wild Zizania latifolia along a latitudinal gradient in China: implications for neo-domestication as a grain crop. AoB. PLANTS. 10, ply072 (2018).
Yan, N. et al. A comparative UHPLC-QqQ-MS-based metabolomics approach for evaluating Chinese and North American wild rice. Food Chem. 275, 618–627 (2019).
Xie, Y. N. et al. Domestication, breeding, omics research, and important genes of Zizania latifolia and Zizania palustris. Front. Plant Sci. 14, 1183739 (2023).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27, 764–770 (2011).
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 33, 2202–2204 (2017).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with Hifiasm. Nat. Methods. 18, 170–175 (2021).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes Aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 356, 92–95 (2017).
Dudchenko, O. et al. The Juicebox assembly tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. Cold Spring Harbor: Cold Spring Harbor Laboratory Press, 2018.
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Benson, G. Tandem Repeats Finder: a program to analyze DNA sequences. Nucleic. Acids. Res. 27, 573–580 (1999).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics. 21(Suppl 1), i351–i358 (2005).
Xu, Z. & Wang, H. LTR_Finder: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic. Acids. Res. 35, W265–W268 (2007).
Tempel, S. Using and understanding Repeatmasker. Totowa, NJ: Humana Press, 29–51 (2012).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 6, 11 (2015).
Stanke, M. et al. Augustus: ab initio prediction of alternative transcripts. Nucleic. Acids. Res. 34, W435–W439 (2006).
Majoros, W. H., Pertea, M. & Salzberg, S. L. Tigrscan and Glimmerhmm: two open source ab initio eukaryotic gene-finders. Bioinformatics. 20, 2878–2879 (2004).
Slater, G. S. C. & Birney, E. Automated Generation of Heuristics for Biological sequence comparison. BMC Bioinform. 6, 31 (2005).
Ouyang, S. et al. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic. Acids. Res. 35, D883–D887 (2007).
Wang, L. et al. Aegilops tauschii genome assembly Aet v5.0 features greater sequence contiguity and improved annotation. G3-Genes. Genom. Genet. 11, jkab325 (2021).
Kim, D., Langmead, B. & Salzberg, S. L. Hisat: a fast spliced aligner with low memory requirements. Nat. Methods. 12, 357–360 (2015).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Grabherr, M. G. M. G. et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-seq data. Nat. Biotechnol. 29, 644–652 (2011).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 28, 3150–3152 (2012).
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinform. 12, 491 (2011).
Boratyn, G. M. et al. Blast: a more efficient report with usability improvements. Nucleic. Acids. Res. 41, W29–W33 (2013).
Coudert, E. et al. Annotation of biologically relevant ligands in UniProtKB using ChEBI. Bioinformatics. 39, btac793 (2023).
Coordinators, N. R. Database resources of the national center for biotechnology information. Nucleic. Acids. Res. 44, D7–D19 (2016).
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic. Acids. Res. 28, 27–30 (2000).
Blum, M. et al. The Interpro protein families and domains database: 20 years on. Nucleic. Acids. Res. 49, D344–D354 (2021).
Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science. 278, 631–637 (1997).
Bateman, A. et al. Uniprot: the universal protein knowledgebase in 2021. Nucleic. Acids. Res. 49, D480–D489 (2021).
Jones, P. et al. Interproscan 5: Genome-scale protein function classification. Bioinformatics. 30, 1236–1240 (2014).
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome. Res. 19, 1639–1645 (2009).
Lowe, T. M. & Eddy, S. R. TRNAscan-Se: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic. Acids. Res. 25, 955–964 (1997).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 29, 2933–2935 (2013).
Griffiths-Jones, S. Rfam: annotating non-coding RNAs in complete genomes. Nucleic. Acids. Res. 33, D121–D124 (2004).
National Genomics Data Center (NGDC) BioProject https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA020786 (2023).
National Genomics Data Center (NGDC) Genome Sequence Archive https://ngdc.cncb.ac.cn/search/all?&q=CRA013186 (2024).
National Genomics Data Center (NGDC) Genome Sequence Archive https://ngdc.cncb.ac.cn/search/all?&q=CRA017988 (2024).
National Genomics Data Center (NGDC) Genome Sequence Archive https://ngdc.cncb.ac.cn/search/all?&q=CRA018091 (2024).
National Genomics Data Center (NGDC) Genome Sequence Archive https://ngdc.cncb.ac.cn/search/all?&q=CRA017987 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_043380935.1 (2024).
NGDC Genome Warehouse https://ngdc.cncb.ac.cn/search/all?q=GWHFFOM00000000 (2024).
Zhao, Y. The de novo assembled chromosome-scale genome of cultivated Zizania latifolia. figshare. Dataset. https://doi.org/10.6084/m9.figshare.26384776.v5 (2024).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25, 1754–1760 (2009).
Tang, H. et al. Synteny and collinearity in plant genomes. Science. 320, 486–488 (2008).
Acknowledgements
The research work received financial support from the National Natural Science Foundation of China (Grant No. 31600293, 32260091) and the Natural Science Foundation of Jiangxi province (Grant No. 20212BAB205029).
Author information
Authors and Affiliations
Contributions
Y.Z., C.Z. and J.R. conceived and led the research, L.L., Z.Z., L.Z., Z.X., Z.S., N.Y., J.Z. and A.Z. were involved in sample collection, preparation and genome assembly. C.Z. and Y.Z. contributed to gene prediction and annotation, data visualization and other bioinformatics analysis. Y.Z. and C.Z. wrote the manuscript and all authors read, revised and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhao, Y., Liao, Lb., Zhu, Zw. et al. De novo assembly of a near-complete genome of aquatic vegetable Zizania latifolia in the Yangtze River Basin. Sci Data 11, 1341 (2024). https://doi.org/10.1038/s41597-024-04220-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-024-04220-0



