Chromosome-level genome assembly of tetraploid Chinese cherry (Prunus pseudocerasus)

Zhang, Wei; Wang, Jing; Wang, Xuncheng; Duan, Xuwei; Peng, Junbo; Zhang, Xiaoming; Xing, Qikai; Zhang, Kaichun; Yan, Jiye

doi:10.1038/s41597-025-04462-6

Download PDF

Data Descriptor
Open access
Published: 22 January 2025

Chromosome-level genome assembly of tetraploid Chinese cherry (Prunus pseudocerasus)

Wei Zhang ORCID: orcid.org/0000-0002-6814-5412¹,
Jing Wang²,
Xuncheng Wang¹,
Xuwei Duan²,
Junbo Peng¹,
Xiaoming Zhang²,
Qikai Xing¹,
Kaichun Zhang² &
…
Jiye Yan¹

Scientific Data volume 12, Article number: 136 (2025) Cite this article

2018 Accesses
Metrics details

Subjects

Abstract

Chinese cherry belongs to the family Rosaceae, genus Prunus, and has high nutritional and economic value. ‘Duiying’ is a Chinese cherry variety local to Beijing, and has better performance than sweet cherry in terms of disease resistance. However, disease resistance resources of ‘Duiying’ have not been fully exploited partially due to the lack of a high-quality genome. In this study, we report a high-quality chromosome-scale genome assembly for Chinese cherry ‘Duiying’, by combining PacBio HiFi, Bionano and Hi-C sequencing data. The assembled genome has a size of 1035.19 Mb, with a scaffold N50 of 28.99 Mb, and 978.61 Mb (94.54%) assembled into 32 pseudochromosomes. A total of 547.16 Mb (52.86%) sequences were identified as repetitive sequences, and 114,451 protein-coding genes were annotated. Moreover, a total of 1635 microRNA (miRNA), 6637 transfer RNA (tRNA), 38,258 ribosomal RNA (rRNA), and 169 small nuclear RNAs (snRNA) genes were identified. The genome assembly presented here provides valuable genomic resources to enhance our understanding of genetic and molecular basis of Chinese cherry.

Allele-aware chromosome-level genome assembly of the autohexaploid Diospyros kaki Thunb

Article Open access 11 May 2023

Chromosome-scale genome assembly and annotation of Huzhang (Reynoutria japonica)

Article Open access 21 March 2025

Chromosome-level genome assembly of Cnidium monnieri, a highly demanded traditional Chinese medicine

Article Open access 22 June 2024

Background & Summary

Chinese cherry (Prunus pseudocerasus (Lindl.)) belongs to the family Rosaceae, genus Prunus, and subgenus Cerasus. It originates from Southwest China and is distributed in the temperate zone of the Northern Hemisphere¹. Chinese cherry has been cultivated for more than 3000 years². Most Chinese cherries are tetraploid, with a main karyotype formula of 2n = 4x = 32 = 28 m + 4sm³. Karyotype analysis and rDNA distribution have shown that the Chinese cherry is more likely an autopolyploid rather than an allopolyploid⁴. And this is further demonstrated by the phylogenetic and comparative genomic analyses⁵.

Chinese cherry fruit contains rich nutritional ingredients and trace elements, such as proteins, carotene, Vitamin C, saccharides, iron, and phosphorus¹. Among 60 representative accessions, the soluble solids content ranged from 10.97% to 34.00%; about 70% of these accessions had a high yield ability³. In addition, the flowers, leaves, roots, bark, and core of Chinese cherry are of high medicinal value. Chinese cherries have a good affinity, developed roots, and soil salinity tolerance; thus, they have also been used as the root stock for sweet cherry⁶.

‘Duiying’, a Chinese cherry variety local to Beijing, is distributed in the valleys and on slopes. It has better performance than sweet cherry because of its leaf spot and crown gall disease resistance and adaptability to the Chinese soil and climate⁷. By crossing ‘Duiying’ with sweet cherry and sour cherry, serials of sweet cherry rootstocks have been released that present resistance to crown gall and leaf spot diseases⁸. It possesses great application potential for transferring resistance genes to sweet or sour cherry. However, the genomic features that underlie these important biological characteristics remain unclear. Several draft genomes or high-quality genomes have been assembled and released for sweet cherry varieties (2n = 2x = 16)^{9,10,11,12,13,14}, while no high-quality reference genomes are available for Chinese cherry ‘Duiying’ to date.

To understand the genetic and molecular basis of Chinese cherry and to promote genomic-associated breeding studies in cherry and Prunus crops, we present a high-quality chromosome-level genome assembly for Chinese cherry ‘Duiying’. The high-quality genome of ‘Duiying’ was obtained using Illumina, Pacific Biosciences (PacBio), high-fidelity (HiFi), and BioNano sequencing combined with 10 × genomic and high-throughput/resolution chromosome conformation capture (Hi-C) technologies. The genome sequence of P. pseudocerasus ‘Duiying’ reported here will be a valuable resource for genetic studies and breeding programs on cherry plants, both for exploring the genome evolution and functional genomic studies of Rosaceae/Prunus and for its excellent trait gene resources.

Methods

Sampling and whole genome sequencing

Leaf samples of ‘Duiying’ were collected from the cherry orchard of the Institute of Pomology and Forestry, Beijing Academy of Agriculture and Forestry Sciences, in Tongzhou District, Beijing. Genomic DNA of ‘Duiying’ was extracted from leaf samples using a plant genomic DNA extraction kit (TIANGEN, Beijing, China). The quality and quantity of the extracted DNA were assessed using NanoDrop 2000 (Thermo Fisher Scientific, Boston, MA, USA).

For Illumina paired-end sequencing, 1.5 μg of genomic DNA was used to construct a 350-bp DNA library using an Illumina TruSeq® Nano DNA library preparation kit (Illumina, San Diego, CA, USA). The refined library was subsequently sequenced using the Illumina Novaseq 6000 platform (Illumina, San Diego, CA, USA), generating 42.76 Gb of raw sequences. Fastp software (v0.23.4)¹⁵ was employed to filter out low-quality paired reads. The remaining 42.68 Gb (99.81%) of high-quality data, with 97.44% and 93.88% of the bases having a quality score of ≥Q20 and ≥Q30, respectively, was utilized for genome survey and assessment.

For long-read sequencing, a 40-kb SMRTbell library was constructed based on the PacBio protocol. PacBio polymerase reads were obtained using the PacBio Sequel II System (PacBio, Menlo Park, CA, USA) in circular consensus sequencing (CCS) mode. After the adapter sequences were removed from the raw polymerase reads, we derived subreads, with the parameter set to ‘Filtering subreads by minimum length = 50’. We then utilized ccs software (https://github.com/PacificBiosciences/ccs) to generate HiFi reads, using ‘min-passes = 3 and min-rq = 0.99’ parameters. This process yielded 39.21 Gb of HiFi data, with a contig N50 of 15,530 bp, which was then used for genome assembly (Table 1).

Table 1 Summary of sequencing data for Chinese cherry ‘Duiying’ (Prunus pseudocerasus).

Full size table

To generate Bionano optical mapping data, the Bionano official extraction kit¹⁶ was initially used to isolate long fragment molecules exceeding 150 Kb in length from high-quality DNA. Then, a single-enzyme cutting technique was applied with the DLE-1 (CTTAAG) endonuclease for digestion. Following standard Bionano protocols, the DNA molecules were labeled and subsequently imaged using the Bionano Irys system (Bionano Genomics, San Diego, CA, USA). The raw imaging data were transformed into BNX files, with the basic labeling and DNA length information converted via AutoDetect in the Bionano Solve package (v3.5.1) (https://bionanogenomics.com/support/software-downloads/). Following filtration based on molecule length and label density, we successfully produced optical mapping data for ‘Duiying’. We generated 584.546 Gb of data, with an average label density of 22.64 per 100 Kb and an N50 value of 366.4 Kb (Table 1).

Hi-C libraries were constructed using leaf cells from ‘Duiying’. The process started with cell fixation using formaldehyde, followed by cell lysis. The cross-linked DNA was then digested with the DpnII enzyme. The resulting sticky ends were biotinylated and proximity ligated to form chimeric junctions. We then enriched DNA fragments of 300–500 bp using a physical shearing process. These chimeric fragments, which are indicative of the original long-distance physical interactions within the cross-linked DNA, were converted into paired-end sequencing libraries. The paired-end reads were then sequenced using the Illumina NovaSeq platform (Illumina, San Diego, CA, USA), resulting in 145 Mb of read pairs. To ensure data quality, we employed fastp software¹⁷ to filter out low-quality reads from the raw sequencing data. After removing duplicate reads, we obtained 127 Mb of read pairs to assemble the chromosome-level genome.

Transcriptome sequencing and analysis

Total RNA was extracted from three tissues (leaf, stem, and root) using an RNA extraction kit (QIAGEN China(Shanghai) Co., Ltd., Shanghai, China). High-quality cDNA libraries were prepared using the TruSeq Stranded mRNA Sample Preparation Kit and sequenced on the Novaseq 6000 platform by Novogene (Beijing, China). Quality control was performed using fastp software¹⁵. An average of 6.94 Gb of high-quality RNA-seq data was used per tissue for transcript evidence analysis to determine the gene structure annotation for the ‘Duiying’ genome (Table 1).

Genome survey

Before genome assembly, we conducted a genome survey using k-mer spectrum analysis. Specifically, we used Jellyfish (v2.3.0)¹⁸ to count the k-mer frequency from high-quality paired-end reads by setting k to 17. We removed k-mers with a low frequency of 3, which occur due to sequencing errors. The genome size was calculated by dividing the total k-mers by their coverage depth, and the distribution of the k-mer frequency reflected that of this genome.

The k-mer frequency distribution graph displayed three distinct peaks (Fig. 1), suggesting that the ‘Duiying’ genome is a homologous tetraploid. Our analysis identified 30.08 billion k-mers, with a significant majority of 30.02 billion (98.05%) categorized as high frequency (≥3). The primary peak in the k-mer frequency distribution was observed at a depth of 27×. As a result, the genome size was estimated to be approximately 1118.42 Mb (Table 2).

Table 2 Genome survey of Chinese cherry ‘Duiying’.

Full size table

In addition, we aligned the high-quality paired-end reads of ‘Duiying’ to the genome sequence of its closely related diploid species, Prunus avium ‘Tieton’ (GCA_014155035.1), using the BWA-MEM algorithm (v0.7.17-r1188)¹⁹. Of ‘Duiying”s reads, 82.15% covered 95.78% of the P. avium genome (Table 2), supporting that the ‘Duiying’ genome is a homologous tetraploid.

Genome assembly of Chinese cherry ‘Duiying’

PacBio HiFi reads were used to assemble the initial contigs in the hifiasm (0.19.5-r587) package²⁰ with default parameters. This process yielded a 1013.46 Mb assembly for Chinese cherry ‘Duiying’, with a contig N50 value of 4.18 Mb (Table 3). We then conducted hybrid scaffolding analysis using Bionano optical maps by mapping the Bionano data to the initial contigs using RefAligner in the Bionano Solve software package (v3.5.1). The alignment results were visualized using IrysView within the Bionano Solve software package (v3.5.1). We combined the genome maps with the initial contigs to generate hybrid scaffold genome maps using the Bionano Solve software package (v.3.5.1), with the parameters set to ‘-B 2 -N 2’. We obtained a scaffold-level assembly with a genome size of 1023.26 Mb and a scaffold N50 value of 11.68 Mb (Table 3). Pseudochromosome construction was then performed to obtain the ‘Duiying’ assembly, and the single-ended model in Bowtie2 software (v2.4.1)²¹ was used to map the Hi-C data onto the previously established scaffold-level assembly. After discarding the invalid self-ligated and unligated fragments within the uniquely mapped pairs using the HiCUP pipeline (version 0.8.0)²², 91,274,501 interaction pairs were used to calculate the linkage frequency among all scaffolds via an agglomerative hierarchical clustering algorithm implemented in ALLHiC software (v0.9.8)²³ (Table 1). We manually rectified any placement and orientation errors that exhibited distinct chromatin interaction patterns. As a result, we produced a final assembly for ‘Duiying’ with a genome size of 1035.19 Mb and a scaffold N50 value of 28.99 Mb. A total of 978.61 Mb (94.54%) assembled sequences were anchored onto 32 pseudochromosomes (Tables 3, 4; Fig. 2). All chromosomes were grouped into eight clusters based on their sequence similarity, indicating that our assembly effectively distinguished the sequences of the four haplotypes in the ‘Duiying’ genome (Fig. 2). The synteny analysis indicated that the four haplotype sequences exhibited very high synteny, with a synteny rate exceeding 85%, which is significantly higher than the synteny between the ‘Duiying’ genome and its closely related species, P. avium ‘Tieton’ (68.15%) (Fig. 3).

Table 3 Assembly summary of Chinese cherry ‘Duiying’ in different assembly steps.

Full size table

Table 4 Chromosome length of the assembled genome of Chinese cherry ‘Duiying’.

Full size table

Genome assessment

We evaluated the genome assembly quality from two perspectives: completeness and accuracy. For assembly completeness, complete Benchmarking Universal Single-Copy Orthologs (BUSCOs) were evaluated in the assembled genome by searching against the 1614 BUSCOs in embryophyta_odb10 (version 5.4.2)²⁴, and the mapping ratio and coverage depth were calculated when the Illumina pairs were realigned to the assembled genome using BWA software¹⁹. For assembly accuracy, we detected homozygous SNPs from the realignment results, which represent single base errors in the assembly.

Genome structure annotation for Chinese cherry ‘Duiying’

Repetitive sequences

We utilized both homologous searching and ab initio prediction techniques to annotate repeated sequences within the ‘Duiying’ genome. For ab initio prediction, we concurrently utilized four transposable element (TE) prediction software packages—LTR_FINDER v1.0.7²⁵, PILER v3.3.0¹⁷, RepeatScout v1.0.5²⁶, and RepeatModeler v1.0.8²⁷—to build a candidate de novo library within the ‘Duiying’ genome. All software was run using their default parameters. Following this, the de novo libraries and the Repbase database were used to annotate repeated sequences in the ‘Duiying’ assembly with RepeatMasker (v4.0.5)²⁷. For homologous searching, we used RepeatProteinMask (v4.0.5) with default parameters to predict TEs. We then amalgamated these results, identifying 547.16 Mb (equivalent to 52.86%) of the ‘Duiying’ assembly as repeat sequences (Table 5). Notably, among these repeat sequences, long terminal repeat (LTR) sequences were the most abundant, accounting for 46.46% of the whole genome sequences.

Table 5 Summary of the repetitive sequences in the ‘Duiying’ genome.

Full size table

Protein-coding genes

We utilized homologous-, de novo-, and transcriptome-based approaches to predict protein-coding genes within the ‘Duiying’ genome. For homologous-based gene prediction, the protein sequences from eight Prunus genomes, namely P. avium ‘Bigstar’ (GCA_013416215.1)¹⁰, P. avium ‘Tieton’¹¹, P. persica²⁸, P. mume²⁹, P. yedoensis³⁰, P. armeniaca³¹, P. salicina³², and P. armeniaca³³, were aligned against the ‘Duiying’ genome using TBLASTN (version 2.2.29 +) with an e-value cut-off of 1e−5³⁴. All remaining blast hits were concatenated using Solar software (version 0.9.6). We extracted the corresponding genomic region, including 1000 bp upstream and downstream of each candidate gene, to predict the precise gene structure using wise2 (v2.4.1)³⁵. The resulting predictions were designated as the ‘Homology set’. For transcriptome-based prediction, RNA-seq data were assembled and transcript sequences were generated using Trinity (v2.1.1)³⁶. We aligned the transcript sequences against the ‘Duiying’ genome using the Program to Assemble Spliced Alignment (PASA)³⁷, in which effective alignments were clustered based on their genome mapping location and assembled into gene structures. The gene models created by PASA were labeled as the PASA Trinity set. RNA-seq reads were also directly mapped to the ‘Duiying’ genome using TopHat (v2.0.13)³⁸, and the mapped reads were assembled into gene models (Cufflinks-set) using Cufflinks (v2.1.1)³⁹. For de novo gene prediction, we employed Augustus (v2.5.5)⁴⁰, GeneID (v1.4)⁴¹, GeneScan (v1.0)⁴², GlimmerHMM (v3.0.1)⁴³, and SNAP (version 2013-11-29)⁴⁴ to predict genes in the repeat-masked genome. The specific parameters used in Augustus, SNAP, and GlimmerHMM were trained with the gene models from the PASA Trinity set. All gene models from these sets were integrated using EVidenceModeler (v1.1.1), with the following weights assigned to each type of evidence: PASA-T-set > Homology-set = Cufflinks-set > Augustus > GeneID = SNAP = GlimmerHMM = GeneScan. In addition, we filtered out genes that were less than 50 amino acids in length, supported only by ab initio evidence, and with an expression value of less than 1. As a result, 114,451 protein-coding genes were obtained in the ‘Duiying’ genome (Table 6). The length distribution of each element type in the gene structure annotated for ‘Duiying’ was similar to that of gene elements in other species within the Prunus genus (Fig. 4), reflecting the accuracy of the ‘Duiying’ gene structure annotation.

Table 6 Summary of gene structure in the ‘Duiying’ genome.

Full size table

We annotated the function of protein-coding genes within the ‘Duiying’ genome using SwissProt⁴⁵, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway⁴⁶, Non-Redundant Protein Sequence Database (NR, from NCBI), and InterPro databases, leveraging a homologous searching method. We obtained Pfam domain and Gene Ontology (GO) information from the InterPro database and predicted these using the InterProScan tool⁴⁷, based on conserved protein domains and functional sites. For the other databases, we used BLATP with an e-value cut-off of 1e−4³⁴. Consequently, 99.24% of the protein-coding genes were supported by functional databases (Table 7).

Table 7 Gene function annotation of the Chinese cherry ‘Duiying’.

Full size table

Noncoding RNA gene

We predicted the gene structures of noncoding RNAs in the ‘Duiying’ genome, using the t-RNAscan-SE tool (v1.3.1) to predict tRNAs⁴⁸. We predicted ribosomal RNA (rRNA) sequences by searching against the invertebrate rRNA database using BLAST, with an E-value cut-off of 1e−10⁴⁹. We also annotated small nuclear and nucleolar RNAs, as well as miRNAs using Infernal (v1.1rc4) based on the Rfam database⁸. As a result, we identified 1635 microRNA (miRNA), 6637 transfer RNA (tRNA), 38,258 ribosomal RNA (rRNA), and 169 small nuclear RNAs (snRNA) genes (Table 8).

Table 8 Summary of noncoding RNA genes.

Full size table

Data Records

The raw data (Illumina reads, PacBio HiFi reads, and Hi-C sequencing reads) used for genome assembly were deposited in the SRA at the National Center for Biotechnology Information (NCBI)⁵⁰. The RNA-seq data were deposited in the SRA at NCBI with accession numbers SRR29660545⁵¹ and SRR29660546⁵². The assembled genome was deposited in the DDBJ/ENA/GenBank databases under the accession number JBFBPF000000000⁵³, and the genome annotation files are available on figshare repository⁵⁴.

Technical Validation

Assembly assessment of Chinese cherry ‘Duiying’

The analysis results of the genome showed that the Chinese cherry genome was homologous tetraploid (Figs. 1, 2), supporting the previous karyotype research results on Chinese cherry chromosomes⁴. Our assembled ‘Duiying’ genome exhibited exceptional completeness, as evidenced by the coverage of 98.52% of Illumina paired reads across 99.82% of the genome. In addition, it recovered 99.4% of BUSCOs in the 1614 conserved Embryophyta genes from the embryophyta_odb10 database⁹ (Table 9). This assembled genome also demonstrated superior accuracy, with a single base error ratio of 9.08 × 10⁻⁸, indicating that there were only 9 assembly error sites per 100 Mb genome region.

Table 9 Assembly assessment for the genome of Chinese cherry ‘Duiying’.

Full size table

Code availability

There were no custom scripts or codes used in this study. The version and parameters have been mentioned in the Methods section.

References

Yu, D. & Li, C. in Flora of China Vol. 38 (Science Press (Beijing), 1986).
Luo, G. Approach upon history of cultivation of apricot and Chinese cherry. Ancient and modern agriculture 2, 38–46 (2013).
MATH Google Scholar
Wang, Y. et al. Ploidy level of Chinese cherry (Cerasus pseudocerasus Lindl.) and comparative study on karyotypes with four Cerasus species. Sci Hortic-Amsterdam 232, 46–51, https://doi.org/10.1016/j.scienta.2017.12.065 (2018).
Article ADS Google Scholar
Li X. Study on chromosomal homology of polyploid Chinese cherry (Cerasus pseudocerasus). (Sichuan Agricultural University, 2019).
Jiu, S. et al. Haplotype-resolved genome assembly for tetraploid Chinese cherry (Prunus pseudocerasus) offers insights into fruit firmness. Hortic Res 11, uhae142, https://doi.org/10.1093/hr/uhae142 (2024).
Article PubMed PubMed Central Google Scholar
Zhang, Q. & Gu, D. Genetic relationships among 10 Prunus rootstock species from China, based on simple sequence repeat markers. J Amer Soc Hort Sci 141, 520–526 (2016).
Article MATH Google Scholar
Zhang, X. et al. Identification of cherry crown gall by hydroponics. China Fruits 51–52 (2005).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).
Article CAS PubMed PubMed Central Google Scholar
Shirasawa, K. et al. The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding. DNA Res 24, 499–508, https://doi.org/10.1093/dnares/dsx020 (2017).
Article CAS PubMed PubMed Central Google Scholar
Pinosio, S. et al. A draft genome of sweet cherry (Prunus avium L.) reveals genome‐wide and local effects of domestication. Plant J 103, 1420–1432, https://doi.org/10.1111/tpj.14809 (2020).
Article CAS PubMed MATH Google Scholar
Wang, J. et al. Chromosome-scale genome assembly of sweet cherry (Prunus avium L.) cv. Tieton obtained using long-read and Hi-C sequencing. Hortic Res 7, 122, https://doi.org/10.1038/s41438-020-00343-8 (2020).
Article CAS PubMed PubMed Central Google Scholar
Xanthopoulou, A. et al. Whole genome re-sequencing of sweet cherry (Prunus avium L.) yields insights into genomic diversity of a fruit species. Hortic Res 7, 60, https://doi.org/10.1038/s41438-020-0281-9 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, J. et al. A de novo assembly of the sweet cherry (Prunus avium cv. Tieton) genome using linked-read sequencing technology. PeerJ. 8, e9114, https://doi.org/10.7717/peerj.9114 (2020).
Article PubMed PubMed Central Google Scholar
Sharpe, R. M. et al. Draft genome data of Prunus avium cv ‘Stella. ’. Data Brief 45, 108611, https://doi.org/10.1016/j.dib.2022.108611 (2022).
Article CAS PubMed MATH Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Lam, E. T. et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol 30, 771–776, https://doi.org/10.1038/nbt.2303 (2012).
Article CAS PubMed MATH Google Scholar
Edgar, R. C. & Myers, E. W. PILER: identification and classification of genomic repeats. Bioinformatics 21(Suppl 1), i152–i158, https://doi.org/10.1093/bioinformatics/bti1003 (2005).
Article CAS PubMed MATH Google Scholar
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Article CAS PubMed PubMed Central MATH Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595, https://doi.org/10.1093/bioinformatics/btp698 (2010).
Article CAS PubMed PubMed Central MATH Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359, https://doi.org/10.1038/nmeth.1923 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res 4, 1310, https://doi.org/10.12688/f1000research.7334.1 (2015).
Article CAS PubMed PubMed Central MATH Google Scholar
Zhang, X., Zhang, S., Zhao, Q., Ming, R. & Tang, H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat Plants 5, 833–845, https://doi.org/10.1038/s41477-019-0487-8 (2019).
Article CAS PubMed MATH Google Scholar
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Article CAS PubMed Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35, W265–W268, https://doi.org/10.1093/nar/gkm286 (2007).
Article PubMed PubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1), i351–i358, https://doi.org/10.1093/bioinformatics/bti1018 (2005).
Article CAS PubMed Google Scholar
Smit, A. & Hubley, R. R. Open-1.0. Available from. http://www.repeatmasker.org (2008).
Tan, Q. et al. Chromosome-level genome assemblies of five Prunus species and genome-wide association studies for key agronomic traits in peach. Hortic Res 8, 213, https://doi.org/10.1038/s41438-021-00648-2 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Zhang, Q. et al. The genome of Prunus mume. Nat Commun 3, https://doi.org/10.1038/ncomms2290 (2012).
Baek, S. et al. Draft genome sequence of wild Prunus yedoensis reveals massive inter-specific hybridization between sympatric flowering cherries. Genome Biol 19, https://doi.org/10.1186/s13059-018-1497-y (2018).
Jiang, F. The apricot (Prunus armeniaca L.) genome elucidates Rosaceae evolution and beta-carotenoid synthesis. Hortic Res 6, 128, https://doi.org/10.1038/s41438-019-0215-6 (2019).
Article CAS PubMed PubMed Central Google Scholar
Huang, Z. et al. Chromosome-scale genome assembly and population genomics provide insights into the adaptation, domestication, and flavonoid metabolism of Chinese plum. Plant J 108, 1174–1192, https://doi.org/10.1111/tpj.15482 (2021).
Article CAS PubMed MATH Google Scholar
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-4.0. 2013–2015 (2015).
Mount, D. W. Using the Basic Local Alignment Search Tool (BLAST). CSH Protoc 2007, pdb.top17, https://doi.org/10.1101/pdb.top17 (2007).
Article PubMed MATH Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res 14, 988–995, t https://doi.org/10.1101/gr.1865504 (2004).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644–652, https://doi.org/10.1038/nbt.1883 (2011).
Article CAS PubMed PubMed Central MATH Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31, 5654–5666, https://doi.org/10.1093/nar/gkg770 (2003).
Article CAS PubMed PubMed Central MATH Google Scholar
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14, R36, https://doi.org/10.1186/gb-2013-14-4-r36 (2013).
Article CAS PubMed PubMed Central MATH Google Scholar
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7, 562–578, https://doi.org/10.1038/nprot.2012.016 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19(Suppl 2), ii215–ii225, https://doi.org/10.1093/bioinformatics/btg1080 (2003).
Article PubMed MATH Google Scholar
Guigo, R. Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol 5, 681–702, https://doi.org/10.1089/cmb.1998.5.681 (1998).
Article CAS PubMed MATH Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J Mol Biol 268, 78–94, https://doi.org/10.1006/jmbi.1997.0951 (1997).
Article CAS PubMed MATH Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879, https://doi.org/10.1093/bioinformatics/bth315 (2004).
Article CAS PubMed MATH Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, https://doi.org/10.1186/1471-2105-5-59 (2004).
UniProt, C. T. UniProt: the universal protein knowledgebase. Nucleic Acids Res 46, https://doi.org/10.1093/nar/gky092 (2018).
Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42, D199–205 https://doi.org/10.1093/nar/gkt1076 (2014).
Article CAS PubMed MATH Google Scholar
Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res 33, W116–W120, https://doi.org/10.1093/nar/gki442 (2005).
Article CAS PubMed PubMed Central Google Scholar
Schattner, P., Brooks, A. N. & Lowe, T. M. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 33, W686–W689, https://doi.org/10.1093/nar/gki366 (2005).
Article CAS PubMed PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J Mol Biol 215, 403–410, https://doi.org/10.1016/S0022-2836(05)80360-2 (1990).
Article CAS PubMed MATH Google Scholar
NCBI Bioproject https://identifiers.org/ncbi/bioproject:PRJNA1125168 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29660545 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29660546 (2024).
Yan, J. et al. Prunus pseudocerasus cultivar Duiying, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JBFBPF000000000 (2024).
Zhang, W. Chromosome-level genome assembly of tetraploid Chinese cherry (Prunus pseudocerasus). figshare https://doi.org/10.6084/m9.figshare.26170204 (2024).

Download references

Acknowledgements

This work was supported by Beijing Academy of Agriculture and Forestry Sciences (KJCX20210403, KJCX20240326 and KJCX20240403).

Author information

Authors and Affiliations

Beijing Key Laboratory of Environment Friendly Management on Fruit Diseases and Pests in North China, Key Laboratory of Environment Friendly Management on Fruit and Vegetable Pests in North China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Institute of Plant Protection, Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100097, China
Wei Zhang, Xuncheng Wang, Junbo Peng, Qikai Xing & Jiye Yan
Cherry Engineering and Technical Research Center of the State Forestry and Grassland Administration, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture and Rural Affairs, Beijing Engineering Research Center for Deciduous Fruit Trees, Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100093, China
Jing Wang, Xuwei Duan, Xiaoming Zhang & Kaichun Zhang

Authors

Wei Zhang
View author publications
Search author on:PubMed Google Scholar
Jing Wang
View author publications
Search author on:PubMed Google Scholar
Xuncheng Wang
View author publications
Search author on:PubMed Google Scholar
Xuwei Duan
View author publications
Search author on:PubMed Google Scholar
Junbo Peng
View author publications
Search author on:PubMed Google Scholar
Xiaoming Zhang
View author publications
Search author on:PubMed Google Scholar
Qikai Xing
View author publications
Search author on:PubMed Google Scholar
Kaichun Zhang
View author publications
Search author on:PubMed Google Scholar
Jiye Yan
View author publications
Search author on:PubMed Google Scholar

Contributions

Jiye Yan, Kaichun Zhang and Wei Zhang conceived and designed this study. Jing Wang, Xuwei Duan, Xiaoming Zhang collected and prepared the sequencing samples. Xuncheng Wang, Junbo Peng and Qikai Xing performed bioinformatic analyses. Wei Zhang wrote the manuscript. Jiye Yan and Jing Wang revised it. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Kaichun Zhang or Jiye Yan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, W., Wang, J., Wang, X. et al. Chromosome-level genome assembly of tetraploid Chinese cherry (Prunus pseudocerasus). Sci Data 12, 136 (2025). https://doi.org/10.1038/s41597-025-04462-6

Download citation

Received: 15 August 2024
Accepted: 10 January 2025
Published: 22 January 2025
DOI: https://doi.org/10.1038/s41597-025-04462-6