Chromosome-level genome assembly of Hippophae neurocarpa

Tian, Ziyue; Yang, Xingyu; Yang, Danni; Huang, Jian; Yang, Shihai; Yang, Yunqiang

doi:10.1038/s41597-025-06022-4

Download PDF

Data Descriptor
Open access
Published: 04 November 2025

Chromosome-level genome assembly of Hippophae neurocarpa

Ziyue Tian^1,2,3,
Xingyu Yang^1,2,3,
Danni Yang^1,2,
Jian Huang⁴,
Shihai Yang^1,5 &
…
Yunqiang Yang^1,2

Scientific Data volume 12, Article number: 1743 (2025) Cite this article

1702 Accesses
Metrics details

Subjects

Abstract

Hippophae neurocarpa (H. neurocarpa) is a deciduous shrub from the Elaeagnaceae family, widely distributed in the Qinghai-Tibet Plateau. It possesses significant ecological and economic value, while also being rich in nutrients and offering various health benefits. In this study, we successfully assembled the chromosome-level genome of H. neurocarpa utilizing Illumina sequencing, Nanopore sequencing, and chromosome conformation capture techniques. The total genome size is 682.80 Mb, with an N50 length of 62.17 Mb. We successfully anchored 669.20 Mb to 12 chromosomes, achieving a mapping rate of 98.0%. Furthermore, using RepeatMasker, we annotated repetitive sequences, revealing that 56.27% of the genome consists of repetitive elements, with long terminal repeats (LTRs) accounting for 36.26% of the total genome. We also annotated gene structures using de novo, homology, and transcriptome assembly strategies, predicting 36844 protein-coding genes, of which 32952 genes had successful functional annotations. Lastly, we employed various strategies for quality evaluation, with a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness score reaching 97.6% and a Long Terminal Repeat (LTR) Retrotransposon Assembly Index (LAI) of 11.61, the mapping rate further indicating high assembly quality. This study provides valuable genomic data for future research on H. neurocarpa and the broader sea buckthorn genus.

Chromosome-level genome assembly of Hippophae gyantsensis

Article Open access 25 January 2024

Chromosome-level genome assembly of Hippophae salicifolia

Article Open access 28 August 2025

Chromosome-level genome assembly of Hippophae rhamnoides variety

Article Open access 13 July 2024

Background & Summary

Hippophae neurocarpa (H. neurocarpa) is a member of the Sea buckthorn genus (Hippophae) within the Elaeagnaceae family¹. All sea buckthorn species are diploid, with a chromosome count of 2n = 24^2,3,4. Their distribution spans temperate regions of Eurasia⁴. These plants exhibit drought tolerance and resilience, and possess root systems capable of forming nitrogen-fixing nodules, which contribute to soil enhancement⁵. Moreover, Hippophae species demonstrate robust ecological adaptability and stress resistance, playing a vital role in ecological preservation^6,7. H. neurocarpa represents one of the most recently diverged and evolutionarily advanced species within this group^3,8. As a result of Quaternary climatic fluctuations, natural populations of H. neurocarpa are predominantly found in high-altitude regions of the Qinghai-Tibet Plateau⁹. This shrub species, endemic to the eastern Qinghai-Tibet Plateau, favors moist environments and exhibits sensitivity to precipitation variations¹⁰. The cylindrical fruits of H. neurocarpa are abundant in bioactive compounds, including vitamins A, C, E, K, and P, rendering them a valuable resource in green foods and traditional medicine, with considerable research potential^11,12.

Previous studies have extensively investigated the morphological characteristics, ecological traits, and origins of H. neurocarpa. The chloroplast genome of H. neurocarpa has been sequenced, and comparative chloroplast genomics and phylogenetic analyses have been conducted with other species in the Hippophae genus^12,13. However, its genetic and evolutionary relationships remain unclear due to the absence of a complete genome sequence. In this study, we employed multiple sequencing technologies to achieve a high-quality genome assembly of H. neurocarpa. We anticipate that this research will not only elucidate the basic characteristics of the H. neurocarpa genome but also enhance the understanding of its adaptability and medicinal properties at the molecular level. Furthermore, these data will provide a robust foundation for future molecular, ecological, and economic research on the entire Hippophae genus.

Methods

Plant materials and genome sequencing

To examine the genomic features of H. neurocarpa, we collected samples from the Tibet Autonomous Region (N 31°14′10″, E 96°36′14″) and extracted high-quality genomic DNA from the leaves using the cetyl trimethylammonium bromide (CTAB) method¹⁴. Following the protocols of the MGIEasy Universal DNA Library Preparation Kit, we constructed DNA libraries. These libraries were subsequently submitted to GrandOmics Bioscience Co, Ltd. for high-throughput sequencing using the DNBSEQ-T7RS platform to generate second-generation sequencing data. To ensure data quality, we employed Fastp v0.23.2¹⁵ software (fastp -w 8 -n 0 -l 140) to filter the raw sequencing data generated by the Illumina platform, ultimately obtaining 66.79 Gb of high-quality clean next-generation sequencing (NGS) reads (Table 1). For Oxford Nanopore Technologies (ONT) sequencing, we adhered to the manufacturer’s guidelines, extracting DNA with a Grandomics kit, and utilized the Blue Pippin and Pippin HT systems to carefully select long DNA fragments. Following damage repair, end repair, and A-tailing, sequencing adapters were successfully ligated, and the library concentration was accurately measured using a Qubit^® 3.0 fluorometer. Finally, sequencing was conducted on the Nanopore PromethION platform, yielding genome data that comprised 94.90 Gb of Nanopore long reads (Table 2).

Table 1 Characteristics of NGS data for genome assembly.

Full size table

Table 2 Characteristics of ONT raw data for genome assembly.

Full size table

Hi-C Sequencing

Hi-C sequencing is a high-throughput chromosome conformation capture technique that enables the analysis of the three-dimensional structure of chromosomes across the entire genome¹⁶. Following established research protocols, genomic DNA was extracted from young leaves of H. neurocarpa and sequenced using the DNBSEQ-T7RS platform. High-quality Hi-C samples were prepared through processes including formaldehyde cross-linking, cell lysis, chromatin digestion, and biotin labeling. The DNA quality was assessed, and high-quality DNA was retained for the standard library construction process. PCR conditions were optimized before amplification. Prior to constructing the sequencing library, a “Hi-C fragment ligation quality control test” was performed on the amplified products to ensure data accuracy. The DNBSEQ-T7RS platform ultimately generated 73.40 Gb of high-quality Hi-C data (Table 3).

Table 3 Characteristics of Hi-C raw data for genome assembly.

Full size table

Genome survey and evaluation

In this study, we conducted K-mer analysis (k-mer = 21) on the clean NGS data, utilizing Jellyfish v2.3.0¹⁷ (jellyfish count -t 22 -C -m 19 -s 1000 -o)software for processing. Subsequently, we employed GenomeScope v2.0¹⁸ (genomescope.R -i k21.histo -o k21.gs -k 21)for visualization, precisely estimating the genome size and heterozygosity of H. neurocarpa at 684 Mb and 1.04%, respectively (Fig. 1).

De Novo genome assembly

This study integrated advanced genome sequencing technologies, including Nanopore long reads, MGI short reads, and Hi-C reads, to perform a comprehensive and detailed de novo assembly of the H. neurocarpa genome using clean Illumina and Nanopore data. Initially, NextDenovo (v2.5.0) (https://github.com/Nextomics/NextDenovo) was utilized to construct an initial genome framework based on the Nanopore data. Subsequently, the preliminary assembly results were optimized and corrected using clean Illumina data with NextPolish (v1.4.1)¹⁹ (nextpolish2 -t 5), significantly enhancing assembly accuracy. To eliminate potential redundant sequences in the genome, Purge-dups (v1.2.5)²⁰ (purge_dups -2 -T -c)was employed to construct a more refined and accurate chromosome-level genome assembly. Following this, Hi-C data were validated with HiCPro (v3.1.0)²¹ (hicpro -i reads -o out -c), and detailed manual inspection and fine-tuning were conducted using Juicebox assembly tools²² and high-quality Hippophae salicifolia as a reference genome²³. This process included cluster optimization, sequence reordering, and orientation corrections, ensuring that each step met rigorous standards.

Building upon this foundation, we employed 3D-DNA²⁴ tools (run-3ddna-pipeline.sh -r -c) to accurately anchor the contigs to chromosomes, resulting in a final chromosome-level assembly. The total genome size reached 682.80 Mb, comprising 38 scaffolds with an N50 length of 62.17 Mb. The 12 main chromosomes encompassed over 98.0% of the entire genome (Table 4). Furthermore, we utilized BUSCO (v5.4.5)²⁵ (busco -i -o -c 70 -m geno -l) to comprehensively assess genome quality, with results indicating a genome completeness of 97.6%, confirming the high quality and completeness of the assembly. Concurrently, PlantLAI²⁶-based (https://bioinformatics.um6p.ma/PlantLAI/) evaluation of long terminal repeat (LTR) retrotransposons yielded an assembly index (LAI) of 11.61, indicating that the genome assembly is of reference quality. To visually represent the distribution of genome elements, we employed ShinyCircos v2.0²⁷ (https://venyao.xyz/shinyCircos/) to generate a Circos plot, clearly illustrating the structural features of the genome (Fig. 3). Additionally, the whole-genome Hi-C heatmap generated using HiCExplorer²⁸ (hicPlotMatrix–title–matrix–dpi–colorMap–log1p–fontsize 8–rotationX 30 –outFileName) vividly revealed interchromosomal interaction patterns, providing robust data support and visualization tools for further exploration of the genome’s complex structure and functional characteristics (Fig. 2).

Table 4 Characteristics of the Hippophae neurocarpa genome at scaffold level.

Full size table

Annotation of repetitive sequences

In annotating the repetitive sequences of the sea buckthorn genome, we utilized a comprehensive tool—Extensive de novo TE Annotator (EDTA, v2.1.2)²⁹ (edta–genome–sensitive 1–threads)—to integrate results from multiple prediction tools. Initially, the transposable element (TE) library obtained from EDTA was further classified using TEsorter (v1.33)³⁰ (TEsorter -db rexdb-plant -p 56 -pre), specifically reclassifying elements in the “LTR-unknown” category. Subsequently, we conducted an in-depth classification analysis of these reclassified elements using DeepTE³¹ (deepTE_domain.py -d -s -i -d). Ultimately, we merged the results from the three independent TE databases and employed RepeatMasker (v4.1.2)³² (repeatmasker -pa 14 -s -xsmall -lib), which defaults to the RepBase RepeatMasker Edition (RMB, 20181026) version of the repeat sequence database, to identify homologous sequences in the total TE database.

The analysis revealed that repetitive elements comprise 56.27% of the sea buckthorn genome, with LTRs accounting for 36.26% and terminal inverted repeats (TIRs) accounting for 12.21% (Table 5, Fig. 3). Furthermore, we used the MegaLTR³³ (MegaLTR.sh -A 3 -F -G -T -P results -l 100 -L 7000 -d 1000 -D 15000 -S 85 -M 20 -B rexdb -C 20 -V 0.001 -Q 80-80-80 -E rexdb -R 0.000000015 -U 5000 -X 5000 -W 1000000 -N 12 -t 104) software to conduct a more refined classification of LTRs into superfamilies and lineages. We found that the LTR retrotransposons in H. neurocarpa are categorized into four superfamilies, specifically Ty1-copia, Ty3-gypsy, BARE-2, and TR-GAG. Additionally, we identified 13 distinct lineages, which include Ale, Tork, Athila, Ivana, Galadriel, Ikeros, SIRE, TAR, CRM, Reina, Crm, and Reina (Tables 6, 7).

Table 5 Summary of transposable elements in Hippophae neurocarpa genome.

Full size table

Table 6 Classification of LTR-RT in Hippophae neurocarpa genome.

Full size table

Table 7 Classification of LTR-RT Clade in Hippophae neurocarpa genome.

Full size table

In addition, we employed the MISA (v2.1)³⁴ (misa.pl genome.fa) to identify SSR loci across the entire genome of Hippophae neurocarpa. The analysis identified 205,053 SSR loci across 12 sequences. All examined sequences contained multiple microsatellite loci, with mononucleotide and dinucleotide repeats being predominant, accounting for 49.1% and 37.1% of the total SSR loci, respectively. The frequency of SSRs decreased significantly with increasing repeat unit length (3–6 nucleotides). Notably, 20,473 compound microsatellites were detected, representing 10% of the total SSRs, indicating that a considerable proportion of microsatellites exist in tandem arrangements (Table 8).

Table 8 Summary of SSR in Hippophae neurocarpa genome.

Full size table

Annotation of protein-coding gene structure

For the structural annotation of protein-coding genes, we employed three prediction methods: de novo prediction, homology protein sequence alignment, and RNA-seq data analysis. Initially, we performed soft masking of repetitive sequences in the genome using RepeatMasker. Subsequently, we annotated the masked genome using BRAKER (v3.0.8)³⁵ software (braker.pl–genome–bam–prot_seq–species–threads 56–workingdir–PROTHINT_PATH–TSEBRA_PATH). The prediction results were then merged using TSEBRA³⁶ software (tsebra.py–cfg–gtf–keep_gtf–hintfiles–out). Finally, we integrated the annotation files using MAKER (v3.01.04)³⁷ (mpirun -n -R–ignore_nfs_tmp -TMP) and EVidenceModeler (v1.1.1)³⁸ to obtain non-redundant gene models, resulting in a GFF3 file that includes the locations of genes, coding sequences, proteins, and mRNA. In total, we predicted 36,844 protein-coding genes, with gene lengths ranging from 155 to 185,811 bp.

Non-coding region annotation

Non-coding RNAs in H. neurocarpa were identified using the Infernal (v1.1.4)³⁹ (infernal-tblout2gff.pl–cmscan–fmt2–desc) search tool with the Rfam⁴⁰ database, employing default parameters. This analysis revealed 5917 non-coding RNAs spanning 817623 bp, comprising 724 transfer RNAs (53476 bp), 4194 small nucleolar RNAs (440956 bp), 607 ribosomal RNAs (270544 bp), 84 spliceosomal nuclear RNAs (11934 bp), 182 microRNAs (23063 bp), and 126 other RNA types totaling 17650 bp. It is particularly noteworthy that snoRNAs dominate with 4,194 loci (accounting for 70.9%), primarily involved in the modification and processing of rRNAs and tRNAs. The 724 tRNAs and 607 rRNAs collectively maintain the proper functioning of the protein synthesis system, while 182 miRNAs participate in fine-tuning gene expression through post-transcriptional regulation. Additionally, 84 spliceosomal RNAs are responsible for precise pre-mRNA splicing, along with the identification of 126 ncRNAs whose functions remain to be elucidated. These ncRNAs work synergistically to play crucial roles in key biological processes including transcriptional regulation, protein synthesis, and RNA processing (Table 9).

Table 9 Classification of non-coding RNA in the Hippophae neurocarpa genome.

Full size table

Functional annotation

To enhance the functional annotation of predicted genes, we conducted a homology search using the BLASTP⁴¹ tool in conjunction with multiple public databases accessible through the Baimaike website (https://international.biocloud.net/zh/user/login), employing an e-value threshold of 1e⁻¹⁰. The databases utilized included non-redundant database (NR), Swissprot⁴², TrEMBL⁴³, KOG, Gene Ontology (GO)⁴⁴, KEGG⁴⁵, and COG⁴⁶. The analysis resulted in functional annotations for 89.44% of the genes. Specifically, the annotation percentages for individual databases were as follows: NR (89.22%), TrEMBL (92.01%), Swissprot (61.98%), KOG (50.90%), KEGG (15.22%), GO (5.43%), and COG (30.96%) (Table 10). To conclude, we assessed the annotation results using the OMArk⁴⁷ website (https://omark.omabrowser.org/), and the genome annotation assessment revealed that among the 10,551 conserved orthologous gene groups (HOGs), the completeness rate reached 96.21%, with only 3.79% missing, demonstrating highly complete and reliable annotation results (Table 11).

Table 10 Statistical analysis of the functional gene annotations of the Hippophae neurocarpa genome.

Full size table

Table 11 Annotation assessment of Hippophae neurocarpa.

Full size table

Furthermore, we conducted systematic GO functional classification, KEGG pathway enrichment analysis, and gene family distribution of the annotated results. Out of 2001 annotated genes, 5606 were annotated by KEGG (Table 10). The GO analysis indicated a predominance of cellular components, molecular functions, and biological processes, highlighting biological processes (Fig. 4). KEGG analysis showed significant enrichment in metabolism and Genetic Information Processing pathways (Fig. 5). The identified gene families, such as RVT_2, UBN2, and Lipase_GDSL, suggest active transposon activity, essential protein functions, and complex metabolic pathways in Ribesia sandwedge. All functionally annotated genes were classified into 2,071 gene families, with predominant enrichment observed in families such as RVT_2, UBN2, NAM, among others, suggesting the potential presence of active transposable element activity, important intracellular protein functions, and complex metabolic pathways in H. neurocarpa.

Data Records

The raw sequencing data are publicly available in the Genome Sequence Archive (GSA) in National Data Center (https://ngdc.cncb.ac.cn/gsa) under the number CRA020687⁴⁸. The genome assembly sequences and annotation files, including Gene Ontology (GO) annotation statistics, KEGG pathway analysis results, and gene family classification statistics, have been deposited in Figshare⁴⁹ and NCBl GenBank database⁵⁰.

Technical Validation

The completeness of the non-redundant draft genome was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO)²⁴ with the embryophyta odb10 dataset, which consists of 1614 single copy genes with the default parameters. Revealing that 97.6% of these genes exhibited complete coverage. Among them, 97.6% were complete, 89.0% were complete and single-copy (Table 12).

Table 12 Statistics for genome assessment using BUSCO.

Full size table

Data availability

The raw sequencing data are publicly available in the Genome Sequence Archive (GSA) in National Data Center under the number CRA020687⁴⁸ (https:/ngdc.cncb.ac.cn/gsa/browse/CRA020687). The genome assembly sequences and annotation files, including Gene Ontology (GO) annotation statistics, KEGG pathway analysis results, and gene family classification statistics, have been deposited in Figshare⁴⁹ (https://doi.org/10.6084/m9.figshare.27890145.v6) and NCBl GenBank database⁵⁰ (https://identifiers.org/ncbi/insdc:JBPJAO000000000).

Code availability

No specific code was developed in this work.

References

He, S. W. L. T. N. Hippophaë neurocarpa. Acta Phytotax 16(2), 107 (1978).
Google Scholar
Corriveau, J. L. & Coleman, A. W. Rapid screening method to detect potential biparental inheritance of plastid DNA and results for over 200 angiosperm species. Am. J. Bot. 75, 1443–1458 (1988).
Article Google Scholar
Zhao, J., Zhang, Z., Zhou, H., Bai, Z. & Sun, K. The Study on Sea Buckthorn (Genus Hippophae L.) Fruit Reveals Cell Division and Cell Expansion to Promote Morphogenesis. Plants 12, 1005 (2023).
Article PubMed PubMed Central Google Scholar
Yang, X. et al. Chromosome-level genome assembly of Hippophae rhamnoides variety. Sci.Data 11, 776, https://doi.org/10.1038/s41597-024-03549-w (2024).
Article PubMed PubMed Central CAS Google Scholar
Zhou, X. et al. Rhizospheric fungi and their link with the nitrogen-fixing Frankia harbored in host plant Hippophae rhamnoides L. J. Basic Microbiol. 57, 1055–1064, https://doi.org/10.1002/jobm.201700312 (2017).
Article PubMed CAS Google Scholar
Chen, M. et al. Chromosome-level genome assembly of Hippophae gyantsensis. Sci.Data 11, 126, https://doi.org/10.1038/s41597-024-02909-w (2024).
Article PubMed PubMed Central CAS Google Scholar
Kortesniemi, M., Sinkkonen, J., Yang, B. & Kallio, H. NMR metabolomics demonstrates phenotypic plasticity of sea buckthorn (Hippophaë rhamnoides) berries with respect to growth conditions in Finland and Canada. Food Chem 219, 139–147, https://doi.org/10.1016/j.foodchem.2016.09.125 (2017).
Article PubMed CAS Google Scholar
LI, N. et al. Genetic diversity and genetic structure of the northern margin populations of Hippophae neurocarpa. Guihaia 36, 557–563, https://doi.org/10.11931/guihaia.gxzw201511006 (2016).
Article CAS Google Scholar
Meng, L.-H., Yang, H., Wu, G. & Wang, Y.-J. Phylogeography of Hippophae neurocarpa (Elaeagnaceae) inferred from the chloroplast DNA trnL-F sequence variation. J SYST EVOL 46, 32–40 (2007).
Google Scholar
Kou, Y.-X., Wu, Y.-X., Jia, D.-R., Li, Z.-H. & Wang, Y.-J. Range expansion, genetic differentiation, and phenotypic adaption of Hippophaë neurocarpa (Elaeagnaceae) on the Qinghai–Tibet Plateau. J SYST EVOL 52, 303–312, https://doi.org/10.1111/jse.12063 (2014).
Article Google Scholar
Meng, D. et al. An insight into the association between volatile and other active components of sea buckthorn berries in barren soils of western China. Food and Humanity 2, 100293, https://doi.org/10.1016/j.foohum.2024.100293 (2024).
Article Google Scholar
Zhou, W., Hu, N., Dong, Q., Wang, H. & Wang, Y. Complete chloroplast genome sequences of Hippophae neurocarpa. Mitochondrial DNA Part B-Resources 4, 2048–2049, https://doi.org/10.1080/23802359.2019.1617077 (2019).
Article Google Scholar
Li, X., Liu, L. E., Li, H., Li, J. & Zhou, W. Chloroplast Genomes and Phylogenetic Analysis of Hippophae tibetana and Hippophae neurocarpa. Genomics and Applied Biology 43, 767–787 (2024).
Google Scholar
Murray, M. G. & Thompson, W. F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res 8, 4321–4325, https://doi.org/10.1093/nar/8.19.4321 (1980).
Article PubMed PubMed Central CAS Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).
Article PubMed PubMed Central CAS Google Scholar
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276, https://doi.org/10.1016/j.ymeth.2012.05.001 (2012).
Article PubMed CAS Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Article PubMed PubMed Central CAS Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
Article PubMed PubMed Central CAS Google Scholar
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255, https://doi.org/10.1093/bioinformatics/btz891 (2019).
Article CAS Google Scholar
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898, https://doi.org/10.1093/bioinformatics/btaa025 (2020).
Article PubMed PubMed Central CAS Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259, https://doi.org/10.1186/s13059-015-0831-x (2015).
Article PubMed PubMed Central CAS Google Scholar
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
Article PubMed PubMed Central CAS Google Scholar
Chen, M. et al. The genome of Hippophae salicifolia provides new insights into the sexual differentiation of sea buckthorn, GigaScience, Volume 14, https://doi.org/10.1093/gigascience/giaf046 (2025).
Zhang, Y., Xiong, Y. & Xiao, Y. 3dDNA: A Computational Method of Building DNA 3D Structures. Molecules 27, https://doi.org/10.3390/molecules27185936 (2022).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Article PubMed CAS Google Scholar
Mokhtar, M. M., Abd-Elhalim, H. M. & El Allali, A. A large-scale assessment of the quality of plant genome assemblies using the LTR assembly index. AoB PLANTS 15, https://doi.org/10.1093/aobpla/plad015 (2023).
Wang, Y. et al. shinyCircos-V2.0: Leveraging the creation of Circos plot with enhanced usability and advanced features. iMeta 2, https://doi.org/10.1002/imt2.109 (2023).
Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 48, W177–W184, https://doi.org/10.1093/nar/gkaa220 (2020).
Article PubMed PubMed Central CAS Google Scholar
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275, https://doi.org/10.1186/s13059-019-1905-y (2019).
Article PubMed PubMed Central CAS Google Scholar
Zhang, R.-G. et al. TEsorter: An accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 9, https://doi.org/10.1093/hr/uhac017 (2022).
Yan, H., Bombarely, A. & Li, S. DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics 36, 4269–4275, https://doi.org/10.1093/bioinformatics/btaa519 (2020).
Article PubMed CAS Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358, https://doi.org/10.1093/bioinformatics/bti10 (2005).
Article PubMed CAS Google Scholar
Mokhtar, M. M. & El Allali, A. MegaLTR: a web server and standalone pipeline for detecting and annotating LTR-retrotransposons in plant genomes. Frontiers in plant science 14, 1237426, https://doi.org/10.3389/fpls.2023.1237426 (2023).
Article PubMed PubMed Central Google Scholar
Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics 33, 2583–2585 (2017). 10.1093/bioinformatics/btx198%JBioinformatics.
Article PubMed PubMed Central CAS Google Scholar
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinform. 3, https://doi.org/10.1093/nargab/lqaa108 (2021).
Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M. & Stanke, M. TSEBRA: transcript selector for BRAKER. BMC Bioinform. 22, 566, https://doi.org/10.1186/s12859-021-04482-0 (2021).
Article CAS Google Scholar
Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome Annotation and Curation Using MAKER and MAKER-P. Curr. Protoc. Bioinformatics 48, 4.11.11–14.11.39, https://doi.org/10.1002/0471250953.bi0411s48 (2014).
Article Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7, https://doi.org/10.1186/gb-2008-9-1-r7 (2008).
Article PubMed PubMed Central CAS Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013). 10.1093/bioinformatics/btt509%JBioinformatics.
Article PubMed PubMed Central CAS Google Scholar
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124, https://doi.org/10.1093/nar/gki081 (2005).
Article PubMed CAS Google Scholar
Jacob, A., Lancaster, J., Buhler, J., Harris, B. & Chamberlain, R. D. Mercury BLASTP: Accelerating Protein Sequence Alignment. TRETS 1, 9, https://doi.org/10.1145/1371579.1371581 (2008).
Article Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370, https://doi.org/10.1093/nar/gkg095 (2003).
Article PubMed PubMed Central CAS Google Scholar
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48, https://doi.org/10.1093/nar/28.1.45 (2000).
Article PubMed PubMed Central CAS Google Scholar
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29, https://doi.org/10.1038/75556 (2000).
Article PubMed PubMed Central CAS Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462, https://doi.org/10.1093/nar/gkv1070 (2015).
Article PubMed PubMed Central CAS Google Scholar
Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36, https://doi.org/10.1093/nar/28.1.33 (2000).
Article PubMed PubMed Central CAS Google Scholar
Nevers, Y. et al. Quality assessment of gene repertoire annotations with OMArk. Nat. Biotechnol. 43, 124–133 (2025).
Article PubMed CAS Google Scholar
CNCB-NGDC Genome Sequence Archive. https://ngdc.cncb.ac.cn/gsa/browse/CRA020687 (2025).
Tian, Z. data for Hippophae neurocarpa genome data. figshare. Dataset. https://doi.org/10.6084/m9.figshare.27890145.v6 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc:JBPJAO000000000 (2025).

Download references

Acknowledgements

This research was supported by the Second Tibetan Plateau Scientific Expedition and Research (STEP) program (2019QZKK0502), Key Research and Development Program for Bureau of Science and Technology of Xizang Autonomous Region (XZ202401ZY0006), Key Research, Development and Transformation Program for Shigatse City Bureau of Science and Technology (RKZ2023ZY-03).

Author information

Authors and Affiliations

Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, Yunnan, 666303, China
Ziyue Tian, Xingyu Yang, Danni Yang, Shihai Yang & Yunqiang Yang
Yunnan International Joint Laboratory for the Conservation and Utilization of Tropical Timber Tree Species, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, Yunnan, 666303, China
Ziyue Tian, Xingyu Yang, Danni Yang & Yunqiang Yang
University of Chinese Academy of Sciences, Beijing, 100049, China
Ziyue Tian & Xingyu Yang
Institute of Tropical and Subtropical Economic Crops, Yunnan Academy of Agricultural Sciences, Kunming, Yunnan, 678000, China
Jian Huang
Xizang Ecological Harmony Seed Industry Co, Ltd, Shigatse, Xizang, 857000, China
Shihai Yang

Authors

Ziyue Tian
View author publications
Search author on:PubMed Google Scholar
Xingyu Yang
View author publications
Search author on:PubMed Google Scholar
Danni Yang
View author publications
Search author on:PubMed Google Scholar
Jian Huang
View author publications
Search author on:PubMed Google Scholar
Shihai Yang
View author publications
Search author on:PubMed Google Scholar
Yunqiang Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

Yunqiang Yang conceived the study and supervised the project. Ziyue Tian wrote the manuscript and participated in the data analysis. Xingyu Yang, Danni Yang, Jian Huang, Shihai Yang, Yunqiang Yang collected the samples, performed the figures drawing and upload the data. All authors have read, revised, and approved the final manuscript for submission.

Corresponding authors

Correspondence to Jian Huang, Shihai Yang or Yunqiang Yang.

Ethics declarations

Competing interests

The authors declear no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Tian, Z., Yang, X., Yang, D. et al. Chromosome-level genome assembly of Hippophae neurocarpa. Sci Data 12, 1743 (2025). https://doi.org/10.1038/s41597-025-06022-4

Download citation

Received: 01 January 2025
Accepted: 23 September 2025
Published: 04 November 2025
Version of record: 04 November 2025
DOI: https://doi.org/10.1038/s41597-025-06022-4