Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
The chromosome-scale genome assembly, annotation of Bischofia polycarpa (H. Lév.) Airy Shaw, Phyllanthaceae
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 02 March 2026

The chromosome-scale genome assembly, annotation of Bischofia polycarpa (H. Lév.) Airy Shaw, Phyllanthaceae

  • Guiliang Xin1,
  • Gang Wang1,
  • Bobin Liu1,
  • Daizhen Zhang1,
  • Boping Tang1,
  • Chuanyuan Deng2 &
  • …
  • Lie Wang3 

Scientific Data , Article number:  (2026) Cite this article

  • 1184 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Forest ecology
  • Plant ecology

Abstract

Bischofia polycarpa (2n = 68), belonging to Phyllanthaceae family, is a native deciduous tree with naturally distribution ranging from southern Qinling Mountains and Huaihe River basin to the northern regions of Fujian and Guangdong, China. It holds significant horticultural, ornamental, and medicinal value and serves as a crucial winter food resource for wild birds. Herein, we report a de novo genome assembly for B. polycarpa, utilizing a combination of PacBio HiFi Reads and Hi-C data. In total, the genome size reaches 585.68 Mb with a contig N50 of 12.62 Mb, and 99.06% (580.18 Mb) of the assembly successfully anchored on 34 chromosomes. The genome comprises approximately 62.77% repetitive sequences and 32,554 protein-coding genes, of which 96.15% could be functionally annotated. The BUSCO analysis reveals a genome completeness of 95.42% (n = 1,540), including 1,499 (92.87%) single-copy BUSCOs and 41 (2.54%) duplicated BUSCOs. This high-quality genome of the Phyllanthaceae enriches our understanding of the genetic underpinnings of plant reproductive ecology.

Similar content being viewed by others

Chromosomal-level genome assembly of solitary bee pollinator Osmia excavata Alfken (Hymenoptera: Megachilidae)

Article Open access 29 May 2025

A chromosome-scale assembly and comparative genomics of the Yunnanopilia longistaminata

Article Open access 02 March 2026

Chromosome-scale telomere to telomere genome assembly of common crystalwort (Riccia sorocarpa Bisch.)

Article Open access 15 January 2025

Data availability

The finalized chromosome assembly were deposited in NCBI GenBank under BioProject (PRJNA1267844) with accession number GCA_053574235.1. RNA-seq data from various tissues are accessible under the BioProject (PRJNA1365770) with accession numbers SRR36186603. The genome annotation files (GFF3, GTF, FASTA) were available in the Figshare database. All datasets are publicly available without restriction.

Code availability

All sofware and pipelines were executed in strict accordance with the manuals and protocols provided by the published bioinformatics tools. No custom programming or coding was used.

References

  1. Webster, G. L. Synopsis of the genera and suprageneric taxa of Euphorbiaceae. Annals of the Missouri Botanical Garden 81(1), 33–144 (1994).

    Google Scholar 

  2. GROUP TAP: An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Botanical Journal of the Linnean Society 141, 399–436 (2003).

  3. Kawakita, A. & Kato M. Diversity of Phyllanthaceae plants. Obligate pollination mutualism 81–115 (2017).

  4. Mazumdar, A. B. & Chattopadhyay, S. Sequencing, de novo assembly, functional annotation and analysis of Phyllanthus amarus leaf transcriptome using the Illumina Platform. Frontiers in Plant Science 6, 1199 (2016).

    Google Scholar 

  5. Ahmad, B. et al. Phyllanthus emblica: a comprehensive review of its therapeutic benefits. South African Journal of Botany 138(1), 278–310 (2021).

    Google Scholar 

  6. Rani, N. Z. A. et al. Mechanistic studies of the antiallergic activity of Phyllanthus amarus Schum. & Thonn. and its compounds. Molecules 26(3), 695 (2021).

    Google Scholar 

  7. Zhang, W. T. et al. The first high-quality chromosome-level genome assembly of Phyllanthaceae (Phyllanthus cochinchinensis) provides insights into flavonoid biosynthesis. Planta 256(6), 109 (2022).

    Google Scholar 

  8. Xia, F. G. et al. Polyploid genome assembly provides insights into morphological development and ascorbic acid accumulation of Sauropus androgynus. International journal of molecular sciences 25(1), 300 (2024).

    Google Scholar 

  9. Li, F. et al. Haplotype-resolved genomes of octoploid species in Phyllanthaceae family reveal a critical role for polyploidization and hybridization in speciation. The Plant Journal 119(1), 348–363 (2024).

    Google Scholar 

  10. Huang, J. et al. Genome assembly provides insights into the genome evolution of Baccaurea ramiflora Lour. Scientific Reports 14(1), 4867 (2024).

    Google Scholar 

  11. Chen, B.-Z. et al. Chromosome-level genome assembly and annotation of Flueggea virosa (Phyllanthaceae). Scientific Data 11(1), 875 (2024).

    Google Scholar 

  12. Wannamethee, S. G. et al. Serum conjugated linoleic acid and risk of incident heart failure in older men: the British Regional heart study. Journal of the American Heart Association 7, e006653 (2018).

    Google Scholar 

  13. Allen, G. C., Flores-Vergara, M., Krasnyanski, K. & Thompson, W. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nature protocols 1(5), 2320–2325 (2006).

    Google Scholar 

  14. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17), i884–i890 (2018).

    Google Scholar 

  15. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6), 764–770 (2011).

    Google Scholar 

  16. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature communications 11(1), 1432 (2020).

    Google Scholar 

  17. He, Z., Zhang, W., Luo X. & Huan, J. Five Fabaceae Karyotype and Phylogenetic Relationship Analysis Based on Oligo-FISH for 5S rDNA and (AG3T3)3. Genes. (Basel) 13(5), 768 (2022).

  18. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome biology 16, 1–11 (2015).

    Google Scholar 

  19. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18(2), 170–175 (2021).

    Google Scholar 

  20. Simão, F. A., Waterhouse, R. M., Panagiotis, I., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19), 3210–3212 (2015).

    Google Scholar 

  21. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009).

    Google Scholar 

  22. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nature biotechnology 31(12), 1119–1125 (2013).

    Google Scholar 

  23. Robinson, J. T. et al. Juicebox.js provides a Cloud-Based Visualization System for Hi-C Data. Cell Systems 6(2), 256–258.e251 (2018).

    Google Scholar 

  24. Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic acids research 31(19), 5654–5666 (2003).

    Google Scholar 

  25. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols 8(8), 1494–1512 (2013).

    Google Scholar 

  26. Hart, A. J. et al. EnTAP: Bringing faster and smarter functional annotation to non‐model eukaryotic transcriptomes. Molecular ecology resources 20(2), 591–604 (2020).

    Google Scholar 

  27. Flynn, J. M., Hubley, R., Rosen, J., Clark, A. G. & Smit, A. F. RepeatModeler2 for automated genomic discovery of transposable element families. PNAS 117(17), 9451–9457 (2020).

    Google Scholar 

  28. Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Research 12(8), 1269–1276 (2002).

    Google Scholar 

  29. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(suppl_1), i351–i358 (2005).

    Google Scholar 

  30. Flynn, J. M. et al. AFJPotNAoS: RepeatModeler2 for automated genomic discovery of transposable element families, 117(17):9451-9457 (2020).

  31. Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Research 44(D1), D81–D89 (2016).

    Google Scholar 

  32. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).

    Google Scholar 

  33. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research 35, W265–W268 (2007).

    Google Scholar 

  34. Ou, S. & Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiology 176(2), 1410–1422 (2018).

    Google Scholar 

  35. Tarailo‐Graovac, M. & Chen, N. S. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics 25(1), 4–10 (2009).

    Google Scholar 

  36. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27(2), 573–580 (1999).

    Google Scholar 

  37. Beier, S., Thiel, T., Scholz, T. M. & Mascher, U. M MISA-web: a web server for microsatellite prediction. Bioinformatics 33(16), 2583–2585 (2017).

    Google Scholar 

  38. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24(5), 637–644 (2008).

    Google Scholar 

  39. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 14(5), 59 (2004).

    Google Scholar 

  40. Jens, K. et al. Using intron position conservation for homology-based gene prediction. Nucleic acids research 44(9), e89–e89 (2016).

    Google Scholar 

  41. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nature methods 12(4), 357–360 (2015).

    Google Scholar 

  42. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology 33(3), 290–295 (2015).

    Google Scholar 

  43. Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Research 43(12), e78 (2015).

    Google Scholar 

  44. Grabherr, M. G., Haas, B. J., Yassour, M. & Levin, J. Z. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nature Biotechnology 29, 644 (2013).

    Google Scholar 

  45. Wu, T. D., Reeder, J., Lawrence, M., Becker, G. & Brauer M. J. GMAP and GSNAP for genomic sequence alignment: enhancements to speed, accuracy, and functionality. Statistical genomics: methods and protocols: 283–334 (2016).

  46. Cantarel, B. L. et al. Yandell MJGr: MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes, 18(1), 188–196 (2008).

  47. Haas, B. J., Salzberg, S. L., Zhu, W. & Mihaela, P. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome biology 9, 1–22 (2008).

    Google Scholar 

  48. Zhang, R.-G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Horticulture Research 9, uhac017 (2022).

    Google Scholar 

  49. Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research 47(D1), D309–D314 (2019).

    Google Scholar 

  50. Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1), 365–370 (2003).

    Google Scholar 

  51. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12(12), 59–60 (2015).

    Google Scholar 

  52. Kanehisa, M., Sato, Y., Kawashima, M. & Mao, T. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Research 44(D1), D457–D462 (2015).

    Google Scholar 

  53. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30(9), 1236–1240 (2014).

    Google Scholar 

  54. Finn, R. D. et al. Pfam: clans, web tools and services, 34(Database issue):D247-251 (2006).

  55. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research 25(5), 955–964 (1997).

    Google Scholar 

  56. Torkel, L. A novel method for predicting ribosomal RNA genes in prokaryotic genomes. Lund University (2017).

  57. Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29(22), 2933–2935 (2013).

    Google Scholar 

  58. Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic acids research 33(Database issue), D121–124 (2005).

    Google Scholar 

  59. She, R., Chu, J. S. C., Wang, K., Pei, J. & Chen, N. genBlastA: Enabling BLAST to identify homologous gene sequences. Genome research 19(1), 143–149 (2009).

    Google Scholar 

  60. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Frontiers in Plant Science 14(5), 988 (2004).

    Google Scholar 

  61. Partners C-NMA. Database resources of the national genomics data center, China national center for bioinformation in 2024. Nucleic acids research 52(D1), D18–D32 (2024).

    Google Scholar 

  62. Chen, M. et al. Genome Warehouse: a public repository housing genome-scale data. Genomics Proteomics Bioinformatics 19(4), 584–589 (2021).

    Google Scholar 

  63. NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_053574235.1 (2025).

  64. NCBI GenBank https://identifiers.org/ncbi/insdc.sra:SRR36186603 (2025).

  65. NCBI GenBank https://identifiers.org/ncbi/insdc.sra:SRR36589530 (2025).

  66. Xin, G. et al. The chromosome-scale genome assembly, annotation of Bischofia polycarpa (Levl.) Airy Shaw, Phyllanthaceae. Figshare https://doi.org/10.6084/m9.figshare.27458694 (2025).

  67. Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9), 1061–1067 (2007).

    Google Scholar 

  68. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. J. G. B. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies, 21, 1–27 (2020).

  69. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 31(18), 3094–3100 (2018).

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Program for Young Talents of Science and Technology in Universities of Yancheng Teachers University (grant number: 206670157, and 204670012); Hunan Provincial Natural Science Foundation of China (grant number: 2024JJ5295), and General Project of Philosophy and Social Sciences in Hunan Province (grant number: 22YBA306); Key Scientific Research Projects of Hunan Provincial Education Department (grant number: 24A0751). Thanks to Professor Chenglang Pan from Minjiang University for providing the photographs of Bischofia.

Author information

Authors and Affiliations

  1. Jiangsu Key Laboratory for Bioresources of Saline Soils, Yancheng Teachers University, Yancheng, 224007, China

    Guiliang Xin, Gang Wang, Bobin Liu, Daizhen Zhang & Boping Tang

  2. College of Landscape Architecture and Art, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China

    Chuanyuan Deng

  3. Art School, Hunan University of Information Technology, Changsha, 410151, Hunan, China

    Lie Wang

Authors
  1. Guiliang Xin
    View author publications

    Search author on:PubMed Google Scholar

  2. Gang Wang
    View author publications

    Search author on:PubMed Google Scholar

  3. Bobin Liu
    View author publications

    Search author on:PubMed Google Scholar

  4. Daizhen Zhang
    View author publications

    Search author on:PubMed Google Scholar

  5. Boping Tang
    View author publications

    Search author on:PubMed Google Scholar

  6. Chuanyuan Deng
    View author publications

    Search author on:PubMed Google Scholar

  7. Lie Wang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

L. Wang and B.B. conceived and designed the study, C.Y. revised the manuscript. G.L. prepared the materials. G.L. and C.Y. analyzed the data and wrote the manuscript. G. Wang, D.Z., B.P., and B.B. edited and improved the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lie Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xin, G., Wang, G., Liu, B. et al. The chromosome-scale genome assembly, annotation of Bischofia polycarpa (H. Lév.) Airy Shaw, Phyllanthaceae. Sci Data (2026). https://doi.org/10.1038/s41597-026-06554-3

Download citation

  • Received: 21 November 2024

  • Accepted: 29 December 2025

  • Published: 02 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-06554-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing