Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
A chromosome-level reference genome of an endangered plant Craigia yunnanensis
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 02 March 2026

A chromosome-level reference genome of an endangered plant Craigia yunnanensis

  • Zhuo Cheng  ORCID: orcid.org/0000-0001-7807-25711,2 na1,
  • Yuanyuan Xing3 na1,
  • Yiming Pan4,5,
  • Jue Wang4,5,
  • Xinxin Wu1,
  • Jiahua Li6,
  • Congli Xu7,
  • Ren-ai Xu8,
  • Fangfang Xia8,
  • Zhong Liu4,5 &
  • …
  • Chunlin Long1,9 

Scientific Data , Article number:  (2026) Cite this article

  • 923 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Forestry
  • Genome

Abstract

Craigia yunnanensis, endemic to East Asia, is an endangered species with important economic and scientific research values. However, the absence of a reference genome has hindered studies on genetic variation and conservation management of C. yunnanensis. To address this gap, we present a high-quality chromosome-level genome sequence of C. yunnanensis by using PacBio HiFi sequencing and Hi-C scaffolding. The genome has a total length of 1,618.96 Mb with scaffold N50 of 39.39 Mb and 98.00% of the genome assigned to 41 chromosomes. BUSCO assessment yielded a completeness score of 99.40%. Furthermore, we predicted 58,969 proteincoding genes, and 94.09% of them was functionally annotated. Assembly of the C. yunnanensis genome facilitates a deeper understanding of adaptive evolution in Craigia, knowledge that is fundamental to promoting the conservation and enabling evidence-based management of this endangered plant.

Similar content being viewed by others

Chromosome-level assembly and gene annotation of Kappaphycus striatus genome

Article Open access 12 February 2025

A chromosome-level genome assembly and annotation of Cercis chuniana (Fabaceae)

Article Open access 08 July 2025

Chromosome-level genome assembly of the endangered tree species Ormosia henryi Prain

Article Open access 23 June 2025

Data availability

The raw sequencing data of C. yunnanensis have been deposited in the National Center for Biotechnology Information (NCBI) under the BioProject accession number PRJNA1327616 (SRR3538602434, SRR3621621835, SRR3537130836, SRR3636973337). The genome assembly was submitted to GenBank with the accession number GCA_054051545.138. Additionally, the genome assembly data of this species have been archived in Figshare and are accessible via the following persistent link: https://doi.org/10.6084/m9.figshare.3007572739.

Code availability

This study does not involve custom scripts or code. The software and code used are publicly accessible.

References

  1. Wang, B. et al. A new occurrence of Craigia (Malvaceae) from the Miocene of Yunnan and its biogeographic significance. Historical Biology 33, 3402–3412, https://doi.org/10.1080/08912963.2020.1867980 (2021).

    Google Scholar 

  2. Gao, Z., Zhang, C. & Milne, R. I. Size-class structure and variation in seed and seedling traits in relation to population size of an endangered species Craigia yunnanensis (Tiliaceae). Australian Journal of Botany 58, 214–223 (2010).

    Google Scholar 

  3. de Kok, R. Craigia yunnanensis. The IUCN Red List of Threatened Species 2024, e.T32335A2815412 (2024).

  4. Frankham, R. Challenges and opportunities of genetic approaches to biological conservation. Biological Conservation 143, 1919–1927, https://doi.org/10.1016/j.biocon.2010.05.011 (2010).

    Google Scholar 

  5. Yang, J., Gao, Z., Sun, W. & Zhang, C. High regional genetic differentiation of an endangered relict plant Craigia yunnanensis and implications for its conservation. Plant Diversity 38, 221–226, https://doi.org/10.1016/j.pld.2016.07.002 (2016).

    Google Scholar 

  6. Chen, Y. L., Yang, J. & Sun, W. B. Development of 14 microsatellite markers in the endangered relict plant Craigia yunnanensis (Tiliaceae). Russian Journal of Genetics 56, 123–127, https://doi.org/10.1134/S1022795420010032 (2020).

    Google Scholar 

  7. Wariss, H. M., Yaling, C. & Yang, J. The complete chloroplast genome of Craigia yunnanensis, an endangered plant species with extremely small populations (PSESP) from South China. Mitochondrial DNA Part B 4, 2740–2741, https://doi.org/10.1080/23802359.2019.1644228 (2019).

    Google Scholar 

  8. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).

    Google Scholar 

  9. Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).

    Google Scholar 

  10. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).

    Google Scholar 

  11. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Systems 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).

    Google Scholar 

  12. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95, https://doi.org/10.1126/science.aal3327 (2017).

    Google Scholar 

  13. Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Systems 3, 99–101, https://doi.org/10.1016/j.cels.2015.07.012 (2016).

    Google Scholar 

  14. Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic acids research 48, W177–w184, https://doi.org/10.1093/nar/gkaa220 (2020).

    Google Scholar 

  15. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics Chapter 4, 4.10.11–14.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).

    Google Scholar 

  16. Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11, https://doi.org/10.1186/s13100-015-0041-9 (2015).

    Google Scholar 

  17. Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic acids research 41, D70–82, https://doi.org/10.1093/nar/gks1265 (2013).

    Google Scholar 

  18. Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: Homology-based gene prediction utilizing intron position conservation and RNA-seq data. Methods Molecular Biology 1962, 161–177, https://doi.org/10.1007/978-1-4939-9173-0_9 (2019).

    Google Scholar 

  19. Shao, L. et al. High-quality genomes of Bombax ceiba and Ceiba pentandra provide insights into the evolution of Malvaceae species and differences in their natural fiber development. Plant Communications 5, 100832, https://doi.org/10.1016/j.xplc.2024.100832 (2024).

    Google Scholar 

  20. Argout, X. et al. The genome of Theobroma cacao. Nature Genetics 43, 101–108, https://doi.org/10.1038/ng.736 (2011).

    Google Scholar 

  21. Li, W., Chen, X., Yu, J. & Zhu, Y. Upgraded durian genome reveals the role of chromosome reshuffling during ancestral karyotype evolution, lignin biosynthesis regulation, and stress tolerance. Science China Life Sciences 67, 1266–1279, https://doi.org/10.1007/s11427-024-2580-3 (2024).

    Google Scholar 

  22. Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic acids research 32, W309–312, https://doi.org/10.1093/nar/gkh379 (2004).

    Google Scholar 

  23. Borodovsky, M. & Lomsadze, A. Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Current Protocols in Bioinformatics Chapter 4, 4.6.1–4.6.10, https://doi.org/10.1002/0471250953.bi0406s35 (2011).

    Google Scholar 

  24. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).

    Google Scholar 

  25. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33, 290–295, https://doi.org/10.1038/nbt.3122 (2015).

    Google Scholar 

  26. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biology 9, R7, https://doi.org/10.1186/gb-2008-9-1-r7 (2008).

    Google Scholar 

  27. UniProt. the universal protein knowledgebase. Nucleic Acids Research 45, D158–169, https://doi.org/10.1093/nar/gkw1099 (2017).

    Google Scholar 

  28. Punta, M. et al. The Pfam protein families database. Nucleic Acids Research 40, D290–301, https://doi.org/10.1093/nar/gkr1065 (2012).

    Google Scholar 

  29. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41, https://doi.org/10.1186/1471-2105-4-41 (2003).

    Google Scholar 

  30. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25, 25–29, https://doi.org/10.1038/75556 (2000).

    Google Scholar 

  31. Tanabe, M. & Kanehisa, M. Using the KEGG database resource. Current Protocols in Bioinformatics Chapter 1, 1.12.11–11.12.43, https://doi.org/10.1002/0471250953.bi0112s38 (2012).

    Google Scholar 

  32. Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).

    Google Scholar 

  33. Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Research 49, 9077–9096, https://doi.org/10.1093/nar/gkab688 (2021).

    Google Scholar 

  34. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35386024 (2025).

  35. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR36216218 (2025).

  36. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35371308 (2025).

  37. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR36369733 (2025).

  38. NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_054051545.1 (2025).

  39. Cheng, Z. & Xing, Y. Y. A chromosome-level reference genome of an endangered plant Craigia yunnanensis. Figshare https://doi.org/10.6084/m9.figshare.30075727 (2025).

  40. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).

    Google Scholar 

  41. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).

    Google Scholar 

  42. Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, https://doi.org/10.1093/gigascience/giab008 (2021).

  43. Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: Assessing genome assembly and annotation completeness. Methods Molecular Biology 1962, 227–245, https://doi.org/10.1007/978-1-4939-9173-0_14 (2019).

    Google Scholar 

  44. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21, 245, https://doi.org/10.1186/s13059-020-02134-9 (2020).

    Google Scholar 

  45. Zhang, R. G. et al. Reticulate allopolyploidy and subsequent dysploidy drive evolution and diversification in the cotton family. Nature Communications 16, 7480, https://doi.org/10.1038/s41467-025-62644-7 (2025).

    Google Scholar 

  46. Al-Fatlawi, A., Menzel, M. & Schroeder, M. Is Protein BLAST a thing of the past? Nat Commun 14, 8195, https://doi.org/10.1038/s41467-023-44082-5 (2023).

    Google Scholar 

  47. Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. Imeta. 3(4), e211 (2024).

  48. Sun, P. et al. WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant. 15(12), 1841–1851, https://doi.org/10.1016/j.molp.2022.10.018 (2022).

Download references

Acknowledgements

This research was supported by grants from the Yunnan Provincial Baoshan Administration of Gaoligongshan National Nature Reserve (202305AF150121 & GBP-2022-01), the National Natural Science Foundation of China (32370407, 31761143001 & 31870316).

Author information

Author notes
  1. These authors contributed equally: Zhuo Cheng, Yuanyuan Xing.

Authors and Affiliations

  1. College of Life and Environmental Sciences, Minzu University of China, Beijing, 100081, China

    Zhuo Cheng, Xinxin Wu & Chunlin Long

  2. College of Hortuculture and Gardening, Yangtze University, Hubei, 434023, China

    Zhuo Cheng

  3. Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100193, China

    Yuanyuan Xing

  4. Institute of Blood Transfusion, Chinese Academy of Medical Sciences & Peking Union Medical College, Chengdu, 610052, China

    Yiming Pan, Jue Wang & Zhong Liu

  5. China Key Laboratory of Transfusion Adverse Reactions, Chinese Academy of Medical Sciences, Chengdu, 610052, China

    Yiming Pan, Jue Wang & Zhong Liu

  6. Gaoligongshan National Nature Reserve (Longyang Branch), Yunnan, 678000, China

    Jiahua Li

  7. Gaoligongshan National Nature Reserve (Baoshan Bureau), Yunnan, 678000, China

    Congli Xu

  8. The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China

    Ren-ai Xu & Fangfang Xia

  9. Institute of National Security Studies, Minzu University of China, Beijing, 100081, China

    Chunlin Long

Authors
  1. Zhuo Cheng
    View author publications

    Search author on:PubMed Google Scholar

  2. Yuanyuan Xing
    View author publications

    Search author on:PubMed Google Scholar

  3. Yiming Pan
    View author publications

    Search author on:PubMed Google Scholar

  4. Jue Wang
    View author publications

    Search author on:PubMed Google Scholar

  5. Xinxin Wu
    View author publications

    Search author on:PubMed Google Scholar

  6. Jiahua Li
    View author publications

    Search author on:PubMed Google Scholar

  7. Congli Xu
    View author publications

    Search author on:PubMed Google Scholar

  8. Ren-ai Xu
    View author publications

    Search author on:PubMed Google Scholar

  9. Fangfang Xia
    View author publications

    Search author on:PubMed Google Scholar

  10. Zhong Liu
    View author publications

    Search author on:PubMed Google Scholar

  11. Chunlin Long
    View author publications

    Search author on:PubMed Google Scholar

Contributions

C.L.L., Z.L. and F.F.X. conceived the project. J.H.L. and C.L.X. collected the samples and coordinated the sequencing. Z.C., Y.Y.X., Y.M.P., J.W., X.X. W. and R.A.X. carried out the analysis. Z.C. and Y.Y.X. wrote and reviewed the manuscript.

Corresponding authors

Correspondence to Fangfang Xia, Zhong Liu or Chunlin Long.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, Z., Xing, Y., Pan, Y. et al. A chromosome-level reference genome of an endangered plant Craigia yunnanensis. Sci Data (2026). https://doi.org/10.1038/s41597-026-06746-x

Download citation

  • Received: 12 May 2025

  • Accepted: 27 January 2026

  • Published: 02 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-06746-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing