Abstract
Craigia yunnanensis, endemic to East Asia, is an endangered species with important economic and scientific research values. However, the absence of a reference genome has hindered studies on genetic variation and conservation management of C. yunnanensis. To address this gap, we present a high-quality chromosome-level genome sequence of C. yunnanensis by using PacBio HiFi sequencing and Hi-C scaffolding. The genome has a total length of 1,618.96 Mb with scaffold N50 of 39.39 Mb and 98.00% of the genome assigned to 41 chromosomes. BUSCO assessment yielded a completeness score of 99.40%. Furthermore, we predicted 58,969 proteincoding genes, and 94.09% of them was functionally annotated. Assembly of the C. yunnanensis genome facilitates a deeper understanding of adaptive evolution in Craigia, knowledge that is fundamental to promoting the conservation and enabling evidence-based management of this endangered plant.
Similar content being viewed by others
Data availability
The raw sequencing data of C. yunnanensis have been deposited in the National Center for Biotechnology Information (NCBI) under the BioProject accession number PRJNA1327616 (SRR3538602434, SRR3621621835, SRR3537130836, SRR3636973337). The genome assembly was submitted to GenBank with the accession number GCA_054051545.138. Additionally, the genome assembly data of this species have been archived in Figshare and are accessible via the following persistent link: https://doi.org/10.6084/m9.figshare.3007572739.
Code availability
This study does not involve custom scripts or code. The software and code used are publicly accessible.
References
Wang, B. et al. A new occurrence of Craigia (Malvaceae) from the Miocene of Yunnan and its biogeographic significance. Historical Biology 33, 3402–3412, https://doi.org/10.1080/08912963.2020.1867980 (2021).
Gao, Z., Zhang, C. & Milne, R. I. Size-class structure and variation in seed and seedling traits in relation to population size of an endangered species Craigia yunnanensis (Tiliaceae). Australian Journal of Botany 58, 214–223 (2010).
de Kok, R. Craigia yunnanensis. The IUCN Red List of Threatened Species 2024, e.T32335A2815412 (2024).
Frankham, R. Challenges and opportunities of genetic approaches to biological conservation. Biological Conservation 143, 1919–1927, https://doi.org/10.1016/j.biocon.2010.05.011 (2010).
Yang, J., Gao, Z., Sun, W. & Zhang, C. High regional genetic differentiation of an endangered relict plant Craigia yunnanensis and implications for its conservation. Plant Diversity 38, 221–226, https://doi.org/10.1016/j.pld.2016.07.002 (2016).
Chen, Y. L., Yang, J. & Sun, W. B. Development of 14 microsatellite markers in the endangered relict plant Craigia yunnanensis (Tiliaceae). Russian Journal of Genetics 56, 123–127, https://doi.org/10.1134/S1022795420010032 (2020).
Wariss, H. M., Yaling, C. & Yang, J. The complete chloroplast genome of Craigia yunnanensis, an endangered plant species with extremely small populations (PSESP) from South China. Mitochondrial DNA Part B 4, 2740–2741, https://doi.org/10.1080/23802359.2019.1644228 (2019).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Systems 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95, https://doi.org/10.1126/science.aal3327 (2017).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Systems 3, 99–101, https://doi.org/10.1016/j.cels.2015.07.012 (2016).
Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic acids research 48, W177–w184, https://doi.org/10.1093/nar/gkaa220 (2020).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics Chapter 4, 4.10.11–14.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11, https://doi.org/10.1186/s13100-015-0041-9 (2015).
Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic acids research 41, D70–82, https://doi.org/10.1093/nar/gks1265 (2013).
Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: Homology-based gene prediction utilizing intron position conservation and RNA-seq data. Methods Molecular Biology 1962, 161–177, https://doi.org/10.1007/978-1-4939-9173-0_9 (2019).
Shao, L. et al. High-quality genomes of Bombax ceiba and Ceiba pentandra provide insights into the evolution of Malvaceae species and differences in their natural fiber development. Plant Communications 5, 100832, https://doi.org/10.1016/j.xplc.2024.100832 (2024).
Argout, X. et al. The genome of Theobroma cacao. Nature Genetics 43, 101–108, https://doi.org/10.1038/ng.736 (2011).
Li, W., Chen, X., Yu, J. & Zhu, Y. Upgraded durian genome reveals the role of chromosome reshuffling during ancestral karyotype evolution, lignin biosynthesis regulation, and stress tolerance. Science China Life Sciences 67, 1266–1279, https://doi.org/10.1007/s11427-024-2580-3 (2024).
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic acids research 32, W309–312, https://doi.org/10.1093/nar/gkh379 (2004).
Borodovsky, M. & Lomsadze, A. Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Current Protocols in Bioinformatics Chapter 4, 4.6.1–4.6.10, https://doi.org/10.1002/0471250953.bi0406s35 (2011).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33, 290–295, https://doi.org/10.1038/nbt.3122 (2015).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biology 9, R7, https://doi.org/10.1186/gb-2008-9-1-r7 (2008).
UniProt. the universal protein knowledgebase. Nucleic Acids Research 45, D158–169, https://doi.org/10.1093/nar/gkw1099 (2017).
Punta, M. et al. The Pfam protein families database. Nucleic Acids Research 40, D290–301, https://doi.org/10.1093/nar/gkr1065 (2012).
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41, https://doi.org/10.1186/1471-2105-4-41 (2003).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25, 25–29, https://doi.org/10.1038/75556 (2000).
Tanabe, M. & Kanehisa, M. Using the KEGG database resource. Current Protocols in Bioinformatics Chapter 1, 1.12.11–11.12.43, https://doi.org/10.1002/0471250953.bi0112s38 (2012).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Research 49, 9077–9096, https://doi.org/10.1093/nar/gkab688 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35386024 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR36216218 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35371308 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR36369733 (2025).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_054051545.1 (2025).
Cheng, Z. & Xing, Y. Y. A chromosome-level reference genome of an endangered plant Craigia yunnanensis. Figshare https://doi.org/10.6084/m9.figshare.30075727 (2025).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, https://doi.org/10.1093/gigascience/giab008 (2021).
Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: Assessing genome assembly and annotation completeness. Methods Molecular Biology 1962, 227–245, https://doi.org/10.1007/978-1-4939-9173-0_14 (2019).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21, 245, https://doi.org/10.1186/s13059-020-02134-9 (2020).
Zhang, R. G. et al. Reticulate allopolyploidy and subsequent dysploidy drive evolution and diversification in the cotton family. Nature Communications 16, 7480, https://doi.org/10.1038/s41467-025-62644-7 (2025).
Al-Fatlawi, A., Menzel, M. & Schroeder, M. Is Protein BLAST a thing of the past? Nat Commun 14, 8195, https://doi.org/10.1038/s41467-023-44082-5 (2023).
Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. Imeta. 3(4), e211 (2024).
Sun, P. et al. WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant. 15(12), 1841–1851, https://doi.org/10.1016/j.molp.2022.10.018 (2022).
Acknowledgements
This research was supported by grants from the Yunnan Provincial Baoshan Administration of Gaoligongshan National Nature Reserve (202305AF150121 & GBP-2022-01), the National Natural Science Foundation of China (32370407, 31761143001 & 31870316).
Author information
Authors and Affiliations
Contributions
C.L.L., Z.L. and F.F.X. conceived the project. J.H.L. and C.L.X. collected the samples and coordinated the sequencing. Z.C., Y.Y.X., Y.M.P., J.W., X.X. W. and R.A.X. carried out the analysis. Z.C. and Y.Y.X. wrote and reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Cheng, Z., Xing, Y., Pan, Y. et al. A chromosome-level reference genome of an endangered plant Craigia yunnanensis. Sci Data (2026). https://doi.org/10.1038/s41597-026-06746-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-06746-x


