The chromosome-scale genome assembly, annotation of Bischofia polycarpa (H. Lév.) Airy Shaw, Phyllanthaceae

Xin, Guiliang; Wang, Gang; Liu, Bobin; Zhang, Daizhen; Tang, Boping; Deng, Chuanyuan; Wang, Lie

doi:10.1038/s41597-026-06554-3

Download PDF

Data Descriptor
Open access
Published: 02 March 2026

The chromosome-scale genome assembly, annotation of Bischofia polycarpa (H. Lév.) Airy Shaw, Phyllanthaceae

Guiliang Xin¹,
Gang Wang¹,
Bobin Liu¹,
Daizhen Zhang¹,
Boping Tang¹,
Chuanyuan Deng² &
…
Lie Wang³

Scientific Data , Article number: (2026) Cite this article

1184 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Bischofia polycarpa (2n = 68), belonging to Phyllanthaceae family, is a native deciduous tree with naturally distribution ranging from southern Qinling Mountains and Huaihe River basin to the northern regions of Fujian and Guangdong, China. It holds significant horticultural, ornamental, and medicinal value and serves as a crucial winter food resource for wild birds. Herein, we report a de novo genome assembly for B. polycarpa, utilizing a combination of PacBio HiFi Reads and Hi-C data. In total, the genome size reaches 585.68 Mb with a contig N50 of 12.62 Mb, and 99.06% (580.18 Mb) of the assembly successfully anchored on 34 chromosomes. The genome comprises approximately 62.77% repetitive sequences and 32,554 protein-coding genes, of which 96.15% could be functionally annotated. The BUSCO analysis reveals a genome completeness of 95.42% (n = 1,540), including 1,499 (92.87%) single-copy BUSCOs and 41 (2.54%) duplicated BUSCOs. This high-quality genome of the Phyllanthaceae enriches our understanding of the genetic underpinnings of plant reproductive ecology.

Chromosomal-level genome assembly of solitary bee pollinator Osmia excavata Alfken (Hymenoptera: Megachilidae)

Article Open access 29 May 2025

A chromosome-scale assembly and comparative genomics of the Yunnanopilia longistaminata

Article Open access 02 March 2026

Chromosome-scale telomere to telomere genome assembly of common crystalwort (Riccia sorocarpa Bisch.)

Article Open access 15 January 2025

Data availability

The finalized chromosome assembly were deposited in NCBI GenBank under BioProject (PRJNA1267844) with accession number GCA_053574235.1. RNA-seq data from various tissues are accessible under the BioProject (PRJNA1365770) with accession numbers SRR36186603. The genome annotation files (GFF3, GTF, FASTA) were available in the Figshare database. All datasets are publicly available without restriction.

Code availability

All sofware and pipelines were executed in strict accordance with the manuals and protocols provided by the published bioinformatics tools. No custom programming or coding was used.

References

Webster, G. L. Synopsis of the genera and suprageneric taxa of Euphorbiaceae. Annals of the Missouri Botanical Garden 81(1), 33–144 (1994).
Google Scholar
GROUP TAP: An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Botanical Journal of the Linnean Society 141, 399–436 (2003).
Kawakita, A. & Kato M. Diversity of Phyllanthaceae plants. Obligate pollination mutualism 81–115 (2017).
Mazumdar, A. B. & Chattopadhyay, S. Sequencing, de novo assembly, functional annotation and analysis of Phyllanthus amarus leaf transcriptome using the Illumina Platform. Frontiers in Plant Science 6, 1199 (2016).
Google Scholar
Ahmad, B. et al. Phyllanthus emblica: a comprehensive review of its therapeutic benefits. South African Journal of Botany 138(1), 278–310 (2021).
Google Scholar
Rani, N. Z. A. et al. Mechanistic studies of the antiallergic activity of Phyllanthus amarus Schum. & Thonn. and its compounds. Molecules 26(3), 695 (2021).
Google Scholar
Zhang, W. T. et al. The first high-quality chromosome-level genome assembly of Phyllanthaceae (Phyllanthus cochinchinensis) provides insights into flavonoid biosynthesis. Planta 256(6), 109 (2022).
Google Scholar
Xia, F. G. et al. Polyploid genome assembly provides insights into morphological development and ascorbic acid accumulation of Sauropus androgynus. International journal of molecular sciences 25(1), 300 (2024).
Google Scholar
Li, F. et al. Haplotype-resolved genomes of octoploid species in Phyllanthaceae family reveal a critical role for polyploidization and hybridization in speciation. The Plant Journal 119(1), 348–363 (2024).
Google Scholar
Huang, J. et al. Genome assembly provides insights into the genome evolution of Baccaurea ramiflora Lour. Scientific Reports 14(1), 4867 (2024).
Google Scholar
Chen, B.-Z. et al. Chromosome-level genome assembly and annotation of Flueggea virosa (Phyllanthaceae). Scientific Data 11(1), 875 (2024).
Google Scholar
Wannamethee, S. G. et al. Serum conjugated linoleic acid and risk of incident heart failure in older men: the British Regional heart study. Journal of the American Heart Association 7, e006653 (2018).
Google Scholar
Allen, G. C., Flores-Vergara, M., Krasnyanski, K. & Thompson, W. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nature protocols 1(5), 2320–2325 (2006).
Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17), i884–i890 (2018).
Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6), 764–770 (2011).
Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature communications 11(1), 1432 (2020).
Google Scholar
He, Z., Zhang, W., Luo X. & Huan, J. Five Fabaceae Karyotype and Phylogenetic Relationship Analysis Based on Oligo-FISH for 5S rDNA and (AG₃T₃)₃. Genes. (Basel) 13(5), 768 (2022).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome biology 16, 1–11 (2015).
Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18(2), 170–175 (2021).
Google Scholar
Simão, F. A., Waterhouse, R. M., Panagiotis, I., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19), 3210–3212 (2015).
Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009).
Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nature biotechnology 31(12), 1119–1125 (2013).
Google Scholar
Robinson, J. T. et al. Juicebox.js provides a Cloud-Based Visualization System for Hi-C Data. Cell Systems 6(2), 256–258.e251 (2018).
Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic acids research 31(19), 5654–5666 (2003).
Google Scholar
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols 8(8), 1494–1512 (2013).
Google Scholar
Hart, A. J. et al. EnTAP: Bringing faster and smarter functional annotation to non‐model eukaryotic transcriptomes. Molecular ecology resources 20(2), 591–604 (2020).
Google Scholar
Flynn, J. M., Hubley, R., Rosen, J., Clark, A. G. & Smit, A. F. RepeatModeler2 for automated genomic discovery of transposable element families. PNAS 117(17), 9451–9457 (2020).
Google Scholar
Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Research 12(8), 1269–1276 (2002).
Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(suppl_1), i351–i358 (2005).
Google Scholar
Flynn, J. M. et al. AFJPotNAoS: RepeatModeler2 for automated genomic discovery of transposable element families, 117(17):9451-9457 (2020).
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Research 44(D1), D81–D89 (2016).
Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research 35, W265–W268 (2007).
Google Scholar
Ou, S. & Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiology 176(2), 1410–1422 (2018).
Google Scholar
Tarailo‐Graovac, M. & Chen, N. S. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics 25(1), 4–10 (2009).
Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27(2), 573–580 (1999).
Google Scholar
Beier, S., Thiel, T., Scholz, T. M. & Mascher, U. M MISA-web: a web server for microsatellite prediction. Bioinformatics 33(16), 2583–2585 (2017).
Google Scholar
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24(5), 637–644 (2008).
Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 14(5), 59 (2004).
Google Scholar
Jens, K. et al. Using intron position conservation for homology-based gene prediction. Nucleic acids research 44(9), e89–e89 (2016).
Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nature methods 12(4), 357–360 (2015).
Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology 33(3), 290–295 (2015).
Google Scholar
Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Research 43(12), e78 (2015).
Google Scholar
Grabherr, M. G., Haas, B. J., Yassour, M. & Levin, J. Z. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nature Biotechnology 29, 644 (2013).
Google Scholar
Wu, T. D., Reeder, J., Lawrence, M., Becker, G. & Brauer M. J. GMAP and GSNAP for genomic sequence alignment: enhancements to speed, accuracy, and functionality. Statistical genomics: methods and protocols: 283–334 (2016).
Cantarel, B. L. et al. Yandell MJGr: MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes, 18(1), 188–196 (2008).
Haas, B. J., Salzberg, S. L., Zhu, W. & Mihaela, P. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome biology 9, 1–22 (2008).
Google Scholar
Zhang, R.-G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Horticulture Research 9, uhac017 (2022).
Google Scholar
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research 47(D1), D309–D314 (2019).
Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research 31(1), 365–370 (2003).
Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12(12), 59–60 (2015).
Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M. & Mao, T. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Research 44(D1), D457–D462 (2015).
Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30(9), 1236–1240 (2014).
Google Scholar
Finn, R. D. et al. Pfam: clans, web tools and services, 34(Database issue):D247-251 (2006).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research 25(5), 955–964 (1997).
Google Scholar
Torkel, L. A novel method for predicting ribosomal RNA genes in prokaryotic genomes. Lund University (2017).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29(22), 2933–2935 (2013).
Google Scholar
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic acids research 33(Database issue), D121–124 (2005).
Google Scholar
She, R., Chu, J. S. C., Wang, K., Pei, J. & Chen, N. genBlastA: Enabling BLAST to identify homologous gene sequences. Genome research 19(1), 143–149 (2009).
Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Frontiers in Plant Science 14(5), 988 (2004).
Google Scholar
Partners C-NMA. Database resources of the national genomics data center, China national center for bioinformation in 2024. Nucleic acids research 52(D1), D18–D32 (2024).
Google Scholar
Chen, M. et al. Genome Warehouse: a public repository housing genome-scale data. Genomics Proteomics Bioinformatics 19(4), 584–589 (2021).
Google Scholar
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_053574235.1 (2025).
NCBI GenBank https://identifiers.org/ncbi/insdc.sra:SRR36186603 (2025).
NCBI GenBank https://identifiers.org/ncbi/insdc.sra:SRR36589530 (2025).
Xin, G. et al. The chromosome-scale genome assembly, annotation of Bischofia polycarpa (Levl.) Airy Shaw, Phyllanthaceae. Figshare https://doi.org/10.6084/m9.figshare.27458694 (2025).
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9), 1061–1067 (2007).
Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. J. G. B. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies, 21, 1–27 (2020).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 31(18), 3094–3100 (2018).
Google Scholar

Download references

Acknowledgements

This work was supported by the Program for Young Talents of Science and Technology in Universities of Yancheng Teachers University (grant number: 206670157, and 204670012); Hunan Provincial Natural Science Foundation of China (grant number: 2024JJ5295), and General Project of Philosophy and Social Sciences in Hunan Province (grant number: 22YBA306); Key Scientific Research Projects of Hunan Provincial Education Department (grant number: 24A0751). Thanks to Professor Chenglang Pan from Minjiang University for providing the photographs of Bischofia.

Author information

Authors and Affiliations

Jiangsu Key Laboratory for Bioresources of Saline Soils, Yancheng Teachers University, Yancheng, 224007, China
Guiliang Xin, Gang Wang, Bobin Liu, Daizhen Zhang & Boping Tang
College of Landscape Architecture and Art, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
Chuanyuan Deng
Art School, Hunan University of Information Technology, Changsha, 410151, Hunan, China
Lie Wang

Authors

Guiliang Xin
View author publications
Search author on:PubMed Google Scholar
Gang Wang
View author publications
Search author on:PubMed Google Scholar
Bobin Liu
View author publications
Search author on:PubMed Google Scholar
Daizhen Zhang
View author publications
Search author on:PubMed Google Scholar
Boping Tang
View author publications
Search author on:PubMed Google Scholar
Chuanyuan Deng
View author publications
Search author on:PubMed Google Scholar
Lie Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

L. Wang and B.B. conceived and designed the study, C.Y. revised the manuscript. G.L. prepared the materials. G.L. and C.Y. analyzed the data and wrote the manuscript. G. Wang, D.Z., B.P., and B.B. edited and improved the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lie Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Xin, G., Wang, G., Liu, B. et al. The chromosome-scale genome assembly, annotation of Bischofia polycarpa (H. Lév.) Airy Shaw, Phyllanthaceae. Sci Data (2026). https://doi.org/10.1038/s41597-026-06554-3

Download citation

Received: 21 November 2024
Accepted: 29 December 2025
Published: 02 March 2026
DOI: https://doi.org/10.1038/s41597-026-06554-3