Chromosome-level genome assembly of Hippophae salicifolia

Zeng, Zhefei; Mao, Chunmin; Zhang, Jiayin; Tian, Zhongqiong; Xu, Min; Li, Wei; Wang, Junwei; Zhang, Wenju; Qiong, La

doi:10.1038/s41597-025-05844-6

Download PDF

Data Descriptor
Open access
Published: 28 August 2025

Chromosome-level genome assembly of Hippophae salicifolia

Zhefei Zeng^1,2,
Chunmin Mao³,
Jiayin Zhang⁴,
Zhongqiong Tian^1,2,
Min Xu⁵,
Wei Li ORCID: orcid.org/0000-0003-4310-2544^1,2,
Junwei Wang^1,2,
Wenju Zhang⁴ &
…
La Qiong^1,2

Scientific Data volume 12, Article number: 1503 (2025) Cite this article

2962 Accesses
Metrics details

Subjects

Abstract

Hippophae salicifolia, a dioecious small tree species endemic to the Himalayan region, holds great potential in both ecological conservation and industrial applications. In this study, we employed PacBio HiFi long reads, Illumina short reads, and Hi-C technology to construct a high-quality, chromosome-level reference genome. The assembled genome is approximately 1.11 Gb in size, with a scaffold N50 of 95.29 Mb, and 99.94% of the sequences were successfully anchored to 12 pseudo-chromosomes. A total of 42,547 protein-coding genes were predicted, and approximately 85% of these genes obtained functional annotations. Repetitive elements constituted about 45.25% of the genome, with Long Terminal Repeat (LTR) being the most abundant (32.54%). BUSCO analysis indicated that both the assembly and annotation are highly complete. This high-quality genomic resource provides a valuable foundation for investigating sex determination mechanisms, adaptive evolution, and genomic diversity in H. salicifolia and related species, as well as for advancing genetic improvement, resource conservation, and utilization efforts.

Chromosome-level genome assembly of Hippophae gyantsensis

Article Open access 25 January 2024

Chromosome-level genome assembly and annotation of Zicaitai (Brassica rapa var. purpuraria)

Article Open access 03 November 2023

Chromosome-level genome assembly of the threatened ornamental plant Hibiscus yunnanensis

Article Open access 25 March 2025

Background & Summary

The genus Hippophae (family Elaeagnaceae) is broadly distributed across the Qinghai–Tibet Plateau and its adjacent regions. Members of this genus exhibit remarkable adaptability to harsh environments, including drought, cold, salinity, and nutrient-poor soils^1,2. Furthermore, Hippophae species can form nitrogen-fixing root nodules with Frankia bacteria^2,3, thereby improving soil conditions. Through clonal propagation via root suckers, they can establish stable vegetation communities and effectively mitigate soil erosion. Moreover, their fruits provide a vital food source for various wild animals, thereby contributing to the maintenance of ecosystem diversity. Beyond their ecological significance, these species hold substantial potential for applications in the food, pharmaceutical, and cosmetic industries. Their fruits and leaves are particularly rich in vitamin C, flavonoids, essential fatty acids, and various secondary metabolites, which exhibit potential antioxidant, anti-inflammatory, antibacterial, and cardiovascular protective activities^1,2,4.

Hippophae salicifolia D. Don, a dioecious deciduous tree named for its willow-like leaves⁵, is primarily distributed along riverbanks, slopes, and shrublands on the southern slopes of the Himalayas, including southeastern Tibet in China, as well as regions in Nepal, Bhutan, and northern India⁶. Previous studies have indicated that Hippophae species possess an XY sex determination system (2n = 24)^7,8,9,10. Elucidating the mechanisms underlying sex determination and dioecy in Hippophae is crucial for gaining deeper insight into its adaptive evolution in the unique environment of the Qinghai–Tibet Plateau. However, research on sex-related differences in dioecious Hippophae species remains notably limited. Most previous studies have focused on chromosome-level genomes^{11,12,13,14,15,16}, chloroplast genomes^17,18,19, mitochondrial genomes^20,21, sex-specific molecular markers^7,9,10,22,23, fruit nutrient content^24,25, and transcriptomes^26,27. With advancements in third-generation sequencing technologies (e.g., PacBio HiFi), which offer higher accuracy and throughput at reduced costs^28,29, it is now feasible to generate high-quality reference genomes for non-model species such as H. salicifolia, thereby facilitating comprehensive investigations into sex determination and adaptive evolution.

To date, several Hippophae species have been successfully assembled at the genome level, including H. rhamnoides^11,12,13, H. tibetana^15,16, and H. gyantsensis¹⁴. These genomic resources have substantially advanced our understanding of gene diversity, phylogenetics, adaptive evolution, and sex determination mechanisms within the genus. Nonetheless, genomic research on H. salicifolia remains scarce, hindering systematic insights into its genetic background, evolutionary status, and sex determination processes.

In the present study, we employed PacBio HiFi long-read sequencing, Illumina short-read sequencing, and Hi-C data to generate a high-quality, chromosome-level reference genome of H. salicifolia. The assembled genome is approximately 1.11 Gb in size, with a scaffold N50 of 95.29 Mb (Table 1), and 99.4% of the sequences were successfully anchored to 12 putative chromosomes. A total of 42,547 genes were predicted, with repetitive elements accounting for approximately 45.25% of the genome. This is the first high-quality reference genome for H. salicifolia, thereby providing crucial data for elucidating sex determination mechanisms, genome evolution, and adaptive evolution within the genus. Moreover, it lays a robust scientific foundation for future research in genetic improvement, resource conservation, and sustainable utilization of Hippophae species.

Table 1 Statistics of H. salicifolia genome assembly and annotation.

Full size table

Methods

Sample collection and sequencing

In June 2024, samples of a female individual (XX) of H. salicifolia were identified and collected by La Qiong and Junwei Wang in the Lebu Valley Scenic Area of Cuona County, Shannan City, Tibet (27°55′25.98″ N, 91°48′49.62″ E) (Fig. 1a). A voucher specimen (No. LQ20240672) was deposited in the Herbarium of the College of Ecology and Environment, Xizang University. To ensure the acquisition of high-quality DNA and RNA, fresh young leaves and bark tissues were collected and promptly flash-frozen in liquid nitrogen on-site, followed by storage at –80 °C in the laboratory. Genomic DNA was extracted using a modified CTAB method³⁰, and its purity and concentration were measured using a NanoDrop 2000 (Thermo Fisher Scientific, USA) and Qubit 2.0 (Invitrogen, USA). Integrity was assessed by 1% agarose gel electrophoresis. Only DNA samples that satisfied the quality criteria were selected for subsequent library construction and sequencing.

High-quality genomic DNA was sheared into fragments of approximately 350 bp to construct Illumina sequencing libraries used for an initial genome survey. Library quality was verified using an Agilent 2100 Bioanalyzer and quantitative PCR. Paired-end sequencing (150 bp) was conducted on the Illumina NovaSeq. 6000 platform, yielding approximately 62.78 Gb of high-quality short-read data. These short reads were primarily utilized to estimate genome size, GC content, and heterozygosity (Table 2). To obtain a high-accuracy de novo genome assembly, PacBio Sequel II was employed for HiFi sequencing. DNA was sheared into large fragments of 15–20 kb, and the SMRTbell Express Template Prep Kit 2.0 was applied to construct HiFi libraries. Approximately 31.80 Gb of HiFi data were obtained, with an average read length of 15.94 kb and a sequencing depth of about 28.69× (Table 2).

Table 2 DNA sequencing statistics.

Full size table

To achieve a chromosome-level assembly, high-throughput chromosome conformation capture (Hi-C) technology was employed. Fresh leaf tissues were fixed in formaldehyde and digested with the restriction enzyme MboI, then subjected to ligation and purification to construct Hi-C libraries. Libraries that passed quality checks underwent sequencing on the Illumina NovaSeq. 6000 platform (150 bp paired-end), generating approximately 102.43 Gb of Hi-C data (Table 2).

In addition, to support gene structure annotation and functional analysis, total RNA was extracted from leaves and bark of H. salicifolia to construct transcriptome libraries. The resulting libraries were sequenced on the Illumina NovaSeq. 6000 platform using 150 bp paired-end reads, generating approximately 8 Gb of transcriptome data per sample (Table 3). These transcriptomic resources provide valuable data for subsequent gene prediction and functional annotation.

Table 3 RNA sequencing statistics.

Full size table

Genome size estimation and survey analysis

Using high-quality Illumina short-read data, a k-mer analysis was performed to estimate the genome size, heterozygosity, and proportion of repetitive sequences in H. salicifolia. First, raw reads were processed with fastp v0.20.0³¹ to remove adapter contamination and low-quality reads. Next, Jellyfish v2.2.10³² was utilized to count the distribution of 21-mer frequencies, and GenomeScope v2.0³³ was employed to estimate genome characteristics. These analyses indicated that the H. salicifolia genome size is approximately 1.05 Gb, with a heterozygosity of about 0.49% (Fig. 1b).

Genome assembly

The H. salicifolia genome was de novo assembled using high-quality HiFi long-read data with hifiasm v0.19.9³⁴. Hifiasm was executed with default parameters to fully exploit the high accuracy and long fragment lengths of the HiFi data, resulting in a highly continuous initial assembly. Subsequently, Illumina short-read data were integrated for assembly polishing. Specifically, BWA-MEM v2.21³⁵ was applied to map quality-controlled Illumina reads back to the initial assembly, followed by two rounds of error correction with Pilon v1.24³⁶ (“–fix all”) to address potential base errors and fill small gaps, thereby enhancing the assembly’s accuracy and completeness.

To achieve a chromosome-level genome assembly, Hi-C data were integrated to further optimize the polished assembly. First, Juicer v2.0³⁷ was employed to preprocess and map the Hi-C reads to the corrected genome. Then, 3D-DNA³⁸ was executed with default parameters to scaffold the assembly, followed by manual inspection and adjustment in Juicebox v2.15.07³⁹ to ensure accuracy and integrity at the chromosome scale. Based on previously published H. rhamnoides genome information¹⁰, the chromosomes were designated as Chr01 through Chr12. The final chromosome-level assembly has a total length of 1.11 Gb, with a scaffold N50 of 95.29 Mb, a contig N50 of 38.57 Mb, and an L50 of 5, containing a total of 210 gaps and exhibiting a GC content of 29.76% (Table 1). Approximately 99.94% of the assembled sequences were successfully anchored onto 12 putative chromosomes (Table 4). The Hi-C interaction matrix revealed clear intra-chromosomal signals along the diagonal (Fig. 2), indicating a high level of continuity and accuracy, thus providing a robust foundation for subsequent gene annotation and functional analyses.

Table 4 Summary of the 12 pseudochromosomes.

Full size table

Repeat annotation

To comprehensively characterize the repetitive elements in the H. salicifolia genome, de novo repeat prediction was conducted using RepeatModeler v2.0.1⁴⁰ (http://www.repeatmasker.org/RepeatModeler/) to establish a species-specific repeat library. This customized library was subsequently merged with the RepBase database⁴¹ (v20181026, http://www.girinst.org/repbase). The combined library served as input to RepeatMasker v4.1.0⁴² (http://www.repeatmasker.org) for the identification and masking of repetitive sequences. RepeatMasker identifies and annotates repetitive elements by comparing sequences to a curated transposable element (TE) library, which defines the classification based on sequence similarity and structural features, including major types such as LTR, LINE, SINE, and DNA transposons.

The annotation indicated that the total length of repetitive sequences was approximately 501.59 Mb, representing 45.25% of the entire genome (Table 5). Among these, LTR retrotransposons comprised the highest proportion (32.54%, ~360.72 Mb), while DNA transposons (4.01%, ~44.50 Mb) and LINE elements (0.45%, ~4.97 Mb) were also present (Table 5). This comprehensive repetitive element profile provides a valuable foundation for investigating genome evolution, structural variation, and gene regulatory mechanisms.

Table 5 Summary of the repetitive sequences in H. salicifolia genome assembly.

Full size table

Protein-coding gene prediction and functional annotation

To obtain a high-quality set of protein-coding genes, three complementary strategies were integrated: homology-based prediction, transcriptome-based prediction, and ab initio prediction. First, for homology-based prediction, protein sequences from sequenced Hippophae species (e.g., H. rhamnoides, H. tibetana, and H. gyantsensis) were aligned to the H. salicifolia genome using GeMoMa v1.9⁴³ to identify potential orthologous genes. Second, for transcriptome-based prediction, RNA-seq data from leaf and bark tissues were assembled into transcripts using stringtie v2.1.3⁴⁴, and coding regions were predicted using TransDecoder v5.1.0 (https://github.com/TransDecoder/TransDecoder). Lastly, ab initio predictions were performed with AUGUSTUS v3.3.3⁴⁵ (https://github.com/Gaius-Augustus/Augustus), GlimmerHMM v3.0.4⁴⁶, and GeneMark-ES v4.38⁴⁷. Species-specific parameters were applied to improve prediction accuracy. All these predictions were integrated using EVidenceModeler (EVM) v1.1.1⁴⁸ to generate a high-confidence gene set, and PASA v2.5.2⁴⁹ (https://github.com/PASApipeline/PASApipeline) was subsequently employed for further refinement, adding UTR information and identifying novel transcripts. In total, 42,547 protein-coding genes were predicted, with an average gene length of 3,663 bp and an average of 4.72 exons per gene (Table 6). TBtools v2.126⁵⁰ was utilized to visualize gene density, GC content, Gypsy and Copia element densities, and chromosomal synteny of the 12 chromosomes (Fig. 3).

Table 6 Summary of predicted protein-coding genes in H. salicifolia genome assembly.

Full size table

To gain comprehensive insights into gene functions, functional annotation was performed by conducting BLASTp⁵¹ searches (E-value ≤ 1e–5) against multiple databases, including the Kyoto Encyclopedia of Genes and Genomes (KEGG)⁵², euKaryotic Orthologous Groups (KOG), the National Center for Biotechnology Information Non-Redundant database (NCBI-NR, https://www.ncbi.nlm.nih.gov/), Gene Ontology (GO)⁵³, Clusters of Orthologous Groups (COG)⁵⁴, and SwissProt⁵⁵. Additionally, HMMER v3.2.1 was employed, along with the Pfam database, to predict protein domains. Approximately 85.06% of the genes (36,189 genes) were functionally annotated in at least one database, thus providing a solid basis for subsequent functional genomics research (Table 7).

Table 7 Annotation results of functional genes in H. salicifolia.

Full size table

Non-Coding RNA annotation

To identify non-coding RNAs (ncRNAs) in the H. salicifolia genome, multiple tools and databases were integrated. A total of 748 tRNA genes were detected using tRNAscan-SE v2.0.0 (http://lowelab.ucsc.edu/tRNAscan-SE/)⁵⁶ (Table 8). Using the Rfam v14.2 database⁵⁷ and Infernal v1.1.3⁵⁸, 5,924 rRNA genes, 196 miRNA genes, and 5,950 snRNA genes were identified. These ncRNA annotations will facilitate studies on transcriptional regulation and the functional mechanisms underlying the H. salicifolia genome.

Table 8 Annotation of Non-Coding RNAs in H. salicifolia.

Full size table

Genome-wide synteny analysis

The Python version of MCScan implemented in JCVI v1.2.7.52⁵⁹ (default parameters) was employed to examine genomic synteny between H. salicifolia and its close relatives. The resulting synteny maps support the assessment of structural accuracy and completeness of the assembled genome through comparative analysis (Fig. 4).

Data Records

The raw sequencing data have been deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under accession number SRP548553⁶⁰, including PacBio HiFi reads, Illumina PE150 reads, Hi-C reads, and RNA-seq data from various tissues. The final chromosome-scale assembled genome has been deposited in the NCBI GenBank under accession number JBJWFA000000000⁶¹. In addition, the genome assembly and annotation files have been stored in the Figshare database⁶².

Technical Validation

Several strategies were employed to assess the quality of the genome. The completeness of the non-redundant draft genome was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO v5.4.5)⁶³ with the embryophyta odb10 dataset (1,614 single-copy genes) under default parameters. At the assembly level, BUSCO analysis showed that 98.8% of BUSCO genes were complete (89.8% single-copy, 9.0% duplicated), with only 0.4% fragmented and 0.8% missing (Table 9). At the annotation level, 99.3% of BUSCO genes in the predicted protein-coding gene set were complete (70.6% single-copy, 28.7% duplicated), with only 0.1% fragmented and 0.6% missing. These findings indicate that both the assembly and annotation are highly complete, meeting the standards of a high-quality reference genome.

Table 9 BUSCO assessment results for H. salicifolia genome assembly and annotation.

Full size table

To validate the accuracy and completeness of the assembly, both Illumina short reads and PacBio HiFi long reads were mapped back to the assembled reference genome. Using BWA-MEM v2.21³², 97.82% of Illumina reads aligned successfully. With Minimap2 v2.17⁶⁴, the mapping rate of PacBio HiFi reads reached 98.53%. These high mapping rates reinforce the accuracy and completeness of the assembled genome.

We further assessed the base-level consensus accuracy of the assembled genome using Merqury v1.3⁶⁵, based on 21-mers derived from Illumina short reads. Merqury calculated the consensus quality value (QV) for each chromosome individually. The QV scores ranged from 36.22 to 38.80 across all 12 chromosomes, corresponding to base-level error rates between 1.32 × 10⁻⁴ and 2.39 × 10⁻⁴, or approximately one error per 750 to 1,500 bases (Table 10). These results demonstrate high base-level accuracy in the genome assembly.

Table 10 Chromosome-level QV and estimated error rates of the H. salicifolia genome assembly.

Full size table

In addition, a k-mer spectra-cn plot was generated (Fig. 5) to visualize the multiplicity distribution of 21-mers from Illumina reads compared to their presence in the assembly. The plot exhibited a dominant peak centered around multiplicity 50, with very few low-frequency or missing k-mers. This indicates that the vast majority of high-quality k-mers from the raw reads were accurately incorporated into the assembly, further supporting its high completeness and low redundancy.

Code availability

No custom scripts were utilized in this study. All data processing commands and pipelines were executed in strict accordance with the manuals and protocols provided by each bioinformatics software tool.

References

Ciesarová, Z. et al. Why is sea buckthorn (Hippophae rhamnoides L.) so exceptional? A review. Food Res. Int. 133, 109170 (2020).
Article PubMed Google Scholar
Nybom, H., Ruan, C. & Rumpunen, K. The Systematics, Reproductive Biology, Biochemistry, and Breeding of Sea Buckthorn—A Review. Genes 14, 2120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Huss‐Danell, K., Roelofsen, W., Akkermans, A. D. & Meijer, P. Carbon metabolism of Frankia spp. in root nodules of Alnus glutinosa and Hippophaë rhamnoides. Physiol. Plant. 54, 461–466 (1982).
Article Google Scholar
Suryakumar, G. & Gupta, A. Medicinal and therapeutic potential of Sea buckthorn (Hippophae rhamnoides L.). J. Ethnopharmacol. 138, 268–278 (2011).
Article PubMed Google Scholar
Pant, M., Lal, A. & Rani, A. Hippophae salicifolia D. Don–A plant with multifarious benefits. Int. J. Pharm. Pharm. Sci. 6, 37–40 (2014).
Google Scholar
Jia, D. R. & Bartish, I. V. Climatic changes and orogeneses in the Late Miocene of Eurasia: The main triggers of an expansion at a continental scale? Front. Plant Sci. 9, 1400 (2018).
Article PubMed PubMed Central Google Scholar
Zeng, Z. et al. Development and validation of sex-linked molecular markers for rapid and accurate identification of male and female Hippophae tibetana plants. Sci. Rep. 14, 19243 (2024).
Article CAS PubMed PubMed Central Google Scholar
Puterova, J. et al. Satellite DNA and transposable elements in seabuckthorn (Hippophae rhamnoides)., a dioecious plant with small Y and large X chromosomes. Genome Biol. Evol. 9, 197–212 (2017).
CAS PubMed PubMed Central Google Scholar
Zeng, Z. et al. Development of sex-specific molecular markers for early sex identification in Hippophae gyantsensis based on whole-genome resequencing. BMC Plant Biol. 24, 1187 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zeng, Z. et al. Development and application of sex-specific indel markers for Hippophae salicifolia based on third-generation sequencing. BMC Plant Biol. 25, 692 (2025).
Article CAS PubMed PubMed Central Google Scholar
Yu, L. et al. Genome sequence and population genomics provide insights into chromosomal evolution and phytochemical innovation of Hippophae rhamnoides. Plant Biotechnol. J. 20, 1257–1273 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wu, Z. et al. Genome of Hippophae rhamnoides provides insights into a conserved molecular mechanism in actinorhizal and rhizobial symbioses. New Phytol. 235, 276–291 (2022).
Article CAS PubMed Google Scholar
Yang, X. et al. Chromosome-level genome assembly of Hippophae rhamnoides variety. Sci. Data. 11, 776 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, M. et al. Chromosome-level genome assembly of Hippophae gyantsensis. Sci. Data. 11, 126 (2024).
Article CAS PubMed PubMed Central Google Scholar
Wang, R. et al. How to survive in the world’s third poplar: Insights from the genome of the highest altitude woody plant, Hippophae tibetana (Elaeagnaceae). Front. Plant Sci. 13, 1051587 (2022).
Article PubMed PubMed Central Google Scholar
Zhang, G. et al. Chromosome-level genome assembly of Hippophae tibetana provides insights into high-altitude adaptation and flavonoid biosynthesis. BMC biol. 22, 82 (2024).
Article PubMed PubMed Central Google Scholar
Zhou, W., Dong, Q., Wang, H. & Hu, N. Characterization of the complete chloroplast genome of Hippophae tibetana. Mitochondrial DNA B 5, 593–594 (2020).
Article Google Scholar
Wang, L., Wang, J., He, C., Zhang, J. & Zeng, Y. Characterization and comparison of chloroplast genomes from two sympatric Hippophae species (Elaeagnaceae). J. For. Res. 32, 307–318 (2021).
Article CAS Google Scholar
Zhang, Q., Li, J., Yu, Y. & Xu, H. A new map of the chloroplast genome of Hippophae based on inter-and intraspecific variation analyses of 13 accessions belonging to eight Hippophae species. Braz. J. Bot. 46, 367–382 (2023).
Article Google Scholar
Zeng, Z. et al. Complete mitochondrial genome of Hippophae tibetana: insights into adaptation to high-altitude environments. Front. Plant Sci. 15, 1449606 (2024).
Article PubMed PubMed Central Google Scholar
Zeng, Z. et al. Assembly and Comparative Analysis of the Complete Mitochondrial Genome of Hippophae salicifolia. Biology 14, 448 (2025).
Article CAS PubMed PubMed Central Google Scholar
Sharma, A., Zinta, G., Rana, S. & Shirko, P. Molecular identification of sex in Hippophae rhamnoides L. using isozyme and RAPD markers. Forestry Studies in China 12, 62–66 (2010).
Article CAS Google Scholar
Zhou, W. et al. Molecular sex identification in dioecious Hippophae rhamnoides L. via RAPD and SCAR markers. Molecules 23, 1048 (2018).
Article PubMed PubMed Central Google Scholar
Wang, Z. et al. Phytochemistry, health benefits, and food applications of sea buckthorn (Hippophae rhamnoides L.): A comprehensive review. Front. Nutr. 9, 1036295 (2022).
Article PubMed PubMed Central Google Scholar
Ma, Q. G. et al. Hippophae rhamnoides L.: A comprehensive review on the botany, traditional uses, phytonutrients, health benefits, quality markers, and applications. J. Agric. Food. Chem. 71, 4769–4788 (2023).
Article CAS PubMed Google Scholar
Liang, J., Zhang, G., Song, Y., He, C. & Zhang, J. Targeted metabolome and transcriptome analyses reveal the pigmentation mechanism of Hippophae (sea buckthorn). fruit. Foods 11, 3278 (2022).
Article CAS PubMed PubMed Central Google Scholar
Cui, M., Hu, P., Wang, T., Tao, J. & Zong, S. Differential transcriptome analysis reveals genes related to cold tolerance in seabuckthorn carpenter moth, Eogystia hippophaecolus. PLoS One 12, e0187105 (2017).
Article PubMed PubMed Central Google Scholar
Metzker, M. L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 31–46 (2014).
Article Google Scholar
Li, H. & Durbin, R. Genome assembly in the telomere-to-telomere era. Nat. Rev. Genet. 25, 658–670 (2024).
Article CAS PubMed Google Scholar
Sahu, S. K., Thangaraj, M. & Kathiresan, K. DNA extraction protocol for plants with high levels of secondary metabolites and polysaccharides without using liquid nitrogen and phenol. Int. Sch. Res. Not. 2012, 1–6 (2012).
Google Scholar
Chen, S. Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2, e107 (2023).
Article CAS PubMed PubMed Central Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article PubMed PubMed Central Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jung, Y. & Han, D. BWA-MEME: BWA-MEM emulated with a machine learning approach. Bioinformatics 38, 2404–2413 (2022).
Article CAS PubMed Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS One 9, e112963 (2014).
Article ADS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Article CAS PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. PNAS 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 1–6 (2015).
Article Google Scholar
Tarailo‐Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics 25, 4–10 (2009).
Article Google Scholar
Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S. O. & Grau, J. Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. BMC Bioinf. 19, 1–12 (2018).
Article Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 1–13 (2019).
Article Google Scholar
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinf. 7, 1–11 (2006).
Article Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article CAS PubMed Google Scholar
Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. O. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506 (2005).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, 1–22 (2008).
Article Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Article CAS PubMed PubMed Central Google Scholar
Chen, C. et al. TBtools-II: A “one for all, all for one” bioinformatics platform for biological big-data mining. Mol Plant. 16, 1733–1742 (2023).
Article CAS PubMed Google Scholar
Chen, Y., Ye, W., Zhang, Y. & Xu, Y. High speed BLASTN: an accelerated MegaBLAST search tool. Nucleic Acids Res. 43, 7762–7768 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (2000).
Article ADS Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).
Article CAS PubMed PubMed Central Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124 (2005).
Article CAS PubMed Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).
Article ADS CAS PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP548553 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_046201785.1 (2024).
Zeng, Z. et al. Chromosome-level genome assembly of Hippophae salicifolia. figshare. https://doi.org/10.6084/m9.figshare.27987659.v3 (2024).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Science and Technology Projects of Xizang Autonomous Region, China (Project Nos. XZ202402ZD0005, XZ202402ZY0023, and XZ202402JX0003).

Author information

Authors and Affiliations

Key Laboratory of Biodiversity and Environment on the Qinghai-Tibetan Plateau, Ministry of Education, School of Ecology and Environment, Xizang University, Lasa, China
Zhefei Zeng, Zhongqiong Tian, Wei Li, Junwei Wang & La Qiong
Yani Observation and Research Station for Wetland Ecosystem of the Tibet (Xizang) Autonomous Region, Xizang University, Linzhi, China
Zhefei Zeng, Zhongqiong Tian, Wei Li, Junwei Wang & La Qiong
Yunnan Key Laboratory for Integrative Conservation of Plant Species With Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
Chunmin Mao
Ministry of Education Key Laboratory of Biodiversity Sciences and Ecological Engineering and State Key Laboratory of Genetic Engineering, Institute of Biodiversity Sciences and Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai, China
Jiayin Zhang & Wenju Zhang
Forestry Survey and Planning Research Institute of Tibet Autonomous Region, Lasa, China
Min Xu

Authors

Zhefei Zeng
View author publications
Search author on:PubMed Google Scholar
Chunmin Mao
View author publications
Search author on:PubMed Google Scholar
Jiayin Zhang
View author publications
Search author on:PubMed Google Scholar
Zhongqiong Tian
View author publications
Search author on:PubMed Google Scholar
Min Xu
View author publications
Search author on:PubMed Google Scholar
Wei Li
View author publications
Search author on:PubMed Google Scholar
Junwei Wang
View author publications
Search author on:PubMed Google Scholar
Wenju Zhang
View author publications
Search author on:PubMed Google Scholar
La Qiong
View author publications
Search author on:PubMed Google Scholar

Contributions

L.Q., W.L., W.Z., J.W. and Z.Z. conceived and designed the study. C.M., J.Z., M.X., and Z.T. prepared the materials, conducted the experiments, and contributed to the data collection. Z.Z. performed the data analysis, prepared the initial draft of the manuscript, and managed the figures and tables. J.W., W.Z., and L.Q. reviewed and revised the manuscript for intellectual content. All authors participated in improving the manuscript and unanimously approved the final version for publication.

Corresponding authors

Correspondence to Junwei Wang, Wenju Zhang or La Qiong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zeng, Z., Mao, C., Zhang, J. et al. Chromosome-level genome assembly of Hippophae salicifolia. Sci Data 12, 1503 (2025). https://doi.org/10.1038/s41597-025-05844-6

Download citation

Received: 25 December 2024
Accepted: 18 August 2025
Published: 28 August 2025
Version of record: 28 August 2025
DOI: https://doi.org/10.1038/s41597-025-05844-6