Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
Chromosome-Level Genome Assembly and Annotation of Piptanthus nepalensis (Hook.) Sweet
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 31 March 2026

Chromosome-Level Genome Assembly and Annotation of Piptanthus nepalensis (Hook.) Sweet

  • Jiayin Zhang1 na1,
  • Zhefei Zeng2,3 na1,
  • Ngawang Bonjor2,3,
  • Ngawang Norbu2,3,
  • Norzin Tso2,3,
  • Xin Tan2,3,
  • Yuguo Wang1,
  • Junwei Wang2,3 &
  • …
  • La Qiong2,3 

Scientific Data , Article number:  (2026) Cite this article

  • 505 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Abstract

Piptanthus nepalensis (Hook.) Sweet is a leguminous shrub native to the Himalayan region and is valued for its medicinal uses and ornamental yellow flowers. Here, we report a chromosome-scale genome assembly of P. nepalensis generated using PacBio HiFi long reads, Illumina short reads, and Hi-C chromatin interaction data. The assembled genome spans approximately 1.04 Gb, with a contig N50 of 39.5 Mb and a scaffold N50 of 111.5 Mb. Benchmarking universal single-copy orthologs (BUSCO) analysis indicated high completeness for both the genome (99.3%) and predicted protein set (99.2%). Approximately 99.0% of the assembled sequences were anchored onto nine pseudo-chromosomes. We annotated 26,035 protein-coding genes, and repetitive elements were estimated to comprise ~75.2% of the genome. This high-quality genome assembly provides a foundational genomic resource for P. nepalensis and supports future studies on its biology and utilization.

Similar content being viewed by others

Chromosome-level genome assembly of Hippophae salicifolia

Article Open access 28 August 2025

Chromosome-level genome assembly of Hippophae gyantsensis

Article Open access 25 January 2024

Chromosome-level and haplotype-resolved genome assembly of Bougainvillea glabra

Article Open access 18 January 2025

Data availability

The raw Illumina, PacBio, Hi-C, and RNA-seq sequencing data are available in the NCBI Sequence Read Archive (SRA) under accession number SRP679460. The final chromosome-level genome assembly is available in NCBI GenBank under accession number JBVQTV000000000. The genome annotation files are available in Figshare (https://doi.org/10.6084/m9.figshare.30416158.v1).

Code availability

All software and pipelines were implemented in full compliance with the manuals and protocols specified by the respective published bioinformatics tools. No custom programming or coding was employed.

References

  1. Wu, Z. Y., Raven, P. H. & Hong, D. Y. (eds.) Flora of China, Vol. 10 (Fabaceae). Beijing: Science Press; St. Louis: Missouri Botanical Garden Press (2010).

  2. Bhattarai, S. & Basukala, O. Antibacterial activity of selected ethnomedicinal plants of Sagarmatha region of Nepal. Int. J. Ther. Appl. 31, 27–31, https://doi.org/10.20530/IJTA_31_27-31 (2016).

    Google Scholar 

  3. Tamang, R. et al. Ethnomedicinal uses of plants and their antibacterial activities in Solukhumbu district, eastern Nepal. BMC Complement. Med. Ther. 20, 201, https://doi.org/10.1186/s12906-020-02986-9 (2020).

    Google Scholar 

  4. Paris, R. R., Faugeras, G. & Dobremez, J.-F. Isoflavones des rameaux de Piptanthus nepalensis. Planta Med. 29(1), 32–36, https://doi.org/10.1055/s-0028-1097625 (1976).

    Google Scholar 

  5. Sener, B. et al. Quinolizidine alkaloids in the Leguminosae: distribution and biological activities. Phytochem. Rev. 2, 123–136, https://doi.org/10.1023/A:1026064906835 (2003).

    Google Scholar 

  6. Wink, M. Evolution of secondary metabolites from an ecological and molecular phylogenetic perspective. Phytochemistry 64, 3–19, https://doi.org/10.1016/S0031-9422(03)00300-5 (2003).

    Google Scholar 

  7. Sun, H. et al. The complete chloroplast genome of Piptanthus nepalensis, a medicinal plant. Mitochondrial DNA B Resour. 7, 796–797, https://doi.org/10.1080/23802359.2021.1994897 (2022).

    Google Scholar 

  8. Chen, S. et al. Herbal genomics: examining the biology of traditional medicines. Science 347(6219), S27–S29 (2015).

    Google Scholar 

  9. Liu, X. et al. The genome of the medicinal plant Macleaya cordata provides new insights into benzylisoquinoline alkaloid metabolism. Mol. Plant 10, 975–989, https://doi.org/10.1016/j.molp.2017.05.007 (2017).

    Google Scholar 

  10. Doyle, J. J. & Doyle, J. L. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemical Bulletin 19, 11–15 (1987).

    Google Scholar 

  11. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).

    Google Scholar 

  12. Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276, https://doi.org/10.1016/j.ymeth.2012.05.001 (2012).

    Google Scholar 

  13. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).

    Google Scholar 

  14. Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).

    Google Scholar 

  15. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432, https://doi.org/10.1038/s41467-020-14998-3 (2020).

    Google Scholar 

  16. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Hifiasm: a haplotype-resolved assembler for accurate HiFi reads. Nat. Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).

    Google Scholar 

  17. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).

    Google Scholar 

  18. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421, https://doi.org/10.1186/1471-2105-10-421 (2009).

    Google Scholar 

  19. Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898, https://doi.org/10.1093/bioinformatics/btaa025 (2020).

    Google Scholar 

  20. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).

    Google Scholar 

  21. Zeng, X. et al. Chromosome-level scaffolding of haplotype-resolved assemblies using Hi-C data without reference genomes. Nat. Plants 10, 1184–1200, https://doi.org/10.1038/s41477-024-01755-3 (2024).

    Google Scholar 

  22. Xu, M. et al. TGS-GapCloser: a fast and accurate gap closer for large genomes with third-generation sequencing reads. GigaScience 9, giaa067, https://doi.org/10.1093/gigascience/giaa067 (2020).

    Google Scholar 

  23. Hu, J. et al. NextPolish2: A repeat-aware polishing tool for genomes assembled using HiFi long reads. Genomics Proteomics Bioinformatics 22(1), qzad009, https://doi.org/10.1093/gpbjnl/qzad009 (2024).

    Google Scholar 

  24. Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245, https://doi.org/10.1186/s13059-020-02134-9 (2020).

    Google Scholar 

  25. Ou, S. et al. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126, https://doi.org/10.1093/nar/gky730 (2018).

    Google Scholar 

  26. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).

    Google Scholar 

  27. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268, https://doi.org/10.1093/nar/gkm286 (2007).

    Google Scholar 

  28. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358, https://doi.org/10.1093/bioinformatics/bti1018 (2005).

    Google Scholar 

  29. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).

    Google Scholar 

  30. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 25, 4.10.1–4.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).

    Google Scholar 

  31. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467, https://doi.org/10.1159/000084979 (2005).

    Google Scholar 

  32. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580, https://doi.org/10.1093/nar/27.2.573 (1999).

    Google Scholar 

  33. Kim, D., Langmead, B. & Salzberg, S. L. HISAT2: graph-based alignment of next-generation sequencing reads to a population of genomes. Nat. Methods 12, 357–360, https://doi.org/10.1038/nmeth.3317 (2015).

    Google Scholar 

  34. Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008, https://doi.org/10.1093/gigascience/giab008 (2021).

    Google Scholar 

  35. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278, https://doi.org/10.1186/s13059-019-1910-1 (2019).

    Google Scholar 

  36. Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics Bioinformatics 3, lqaa108, https://doi.org/10.1093/nargab/lqaa108 (2021).

    Google Scholar 

  37. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439, https://doi.org/10.1093/nar/gkl200 (2006).

    Google Scholar 

  38. Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. O. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506, https://doi.org/10.1093/nar/gki937 (2005).

    Google Scholar 

  39. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745, https://doi.org/10.1093/nar/gkv1189 (2016).

    Google Scholar 

  40. The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531, https://doi.org/10.1093/nar/gkac1052 (2023).

    Google Scholar 

  41. Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource. Nucleic Acids Res. 47, D309–D314, https://doi.org/10.1093/nar/gky1085 (2019).

    Google Scholar 

  42. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 11, 41, https://doi.org/10.1186/1471-2105-11-41 (2010).

    Google Scholar 

  43. Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195, https://doi.org/10.1371/journal.pcbi.1002195 (2011).

    Google Scholar 

  44. Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419, https://doi.org/10.1093/nar/gkaa913 (2021).

    Google Scholar 

  45. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29, https://doi.org/10.1038/75556 (2000).

    Google Scholar 

  46. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30, https://doi.org/10.1093/nar/28.1.27 (2000).

    Google Scholar 

  47. Seemann, T. Barrnap 0.9: rapid ribosomal RNA prediction. https://github.com/tseemann/barrnap (2014).

  48. Chan, P. P. & Lowe, T. M. tRNAscan-SE: searching for tRNA genes in genomic sequences. Nucleic Acids Res. 47, 1–14, https://doi.org/10.1093/nar/gky1048 (2019).

    Google Scholar 

  49. Ontiveros-Palacios, N. et al. Rfam 15: RNA families database in 2025. Nucleic Acids Res. 53(D1), D258–D267, https://doi.org/10.1093/nar/gkae1023 (2025).

    Google Scholar 

  50. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP679460 (2026).

  51. Zhang, C. et al. Piptanthus nepalensis Genome sequencing and assembly. Genbank https://identifiers.org/ncbi/insdc:JBVQTV000000000 (2026).

  52. Zhang, J. et al. The genome annotation files of Piptanthus nepalensis. Figshare. https://doi.org/10.6084/m9.figshare.30416158.v1 (2026).

    Google Scholar 

Download references

Acknowledgements

The computations in this research were performed using the CFFF platform of Fudan University. This work was supported by the Science and Technology Projects of Xizang Autonomous Region, China (Project Nos. XZ202402ZD0005, XZ202402ZY0023, XZ202402JX0003, XZ202401ZR0028, and XZ202303ZY0002G), and by the Open Project of the Key Laboratory of Biodiversity and Environment on the Qinghai-Tibet Plateau, Ministry of Education (Project No. KLBE2025003).

Author information

Author notes
  1. These authors contributed equally: Jiayin Zhang, Zhefei Zeng.

Authors and Affiliations

  1. Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, Institute of Biodiversity Science, School of Life Sciences, Fudan University, Shanghai, 200438, China

    Jiayin Zhang & Yuguo Wang

  2. Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Xizang University, Lhasa, China

    Zhefei Zeng, Ngawang Bonjor, Ngawang Norbu, Norzin Tso, Xin Tan, Junwei Wang & La Qiong

  3. Yani Observation and Research Station for Wetland Ecosystem of the Tibet (Xizang) Autonomous Region, Xizang University, Linzhi, China

    Zhefei Zeng, Ngawang Bonjor, Ngawang Norbu, Norzin Tso, Xin Tan, Junwei Wang & La Qiong

Authors
  1. Jiayin Zhang
    View author publications

    Search author on:PubMed Google Scholar

  2. Zhefei Zeng
    View author publications

    Search author on:PubMed Google Scholar

  3. Ngawang Bonjor
    View author publications

    Search author on:PubMed Google Scholar

  4. Ngawang Norbu
    View author publications

    Search author on:PubMed Google Scholar

  5. Norzin Tso
    View author publications

    Search author on:PubMed Google Scholar

  6. Xin Tan
    View author publications

    Search author on:PubMed Google Scholar

  7. Yuguo Wang
    View author publications

    Search author on:PubMed Google Scholar

  8. Junwei Wang
    View author publications

    Search author on:PubMed Google Scholar

  9. La Qiong
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Y.W., J.W., and L.Q. conceived and designed the research framework. J.Z., and Z.Z. collected the samples, prepared experimental materials, and conducted laboratory work. N.B., N.N., N.T., and X.T. contributed to data acquisition, curation, and preliminary analyses. Z.Z. and J.Z. performed the formal data analysis and drafted the initial manuscript, including figures and tables. Y.W., J.W., and L.Q. critically revised the manuscript for intellectual content and provided guidance throughout the study. All authors contributed to improving the manuscript and approved the final version for publication.

Corresponding authors

Correspondence to Yuguo Wang, Junwei Wang or La Qiong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Zeng, Z., Bonjor, N. et al. Chromosome-Level Genome Assembly and Annotation of Piptanthus nepalensis (Hook.) Sweet. Sci Data (2026). https://doi.org/10.1038/s41597-026-07134-1

Download citation

  • Received: 25 October 2025

  • Accepted: 25 March 2026

  • Published: 31 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-07134-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing