Background & Summary

With approximately 3,000 species and 88–100 genera, Rosaceae is one of the most diverse angiosperm-family genera worldwide1,2. The Rosaceae have a wide variety of fruit types2,3, including Strawberry, Raspberry, Apple, Pear, Peach, Apricot, Plum, and Cherry. Due to the important economic value of these fruits, their production has increased rapidly in the past decade, for example, the production of plums has also increased from 5.52 million tons in 2010 to 6.63 million tons in 2021. In the meantime, many researchers have reported the evolutionary history of this family4,5. However, due to the wide variety of species in the Rosaceae family and the frequent occurrence of hybridization events, the evolutionary history of this family is still unclear.

Prunus is a shrub or tree of Rosaceae mainly distributed in the north temperate zone, with about 30 species6. In addition to important economic value, some species of Prunus also have high ornamental value, such as Prunus mira, Prunus persica, Prunus mume, and Prunus yedoensis. To date, many Prunus genomes have been released: P. persica, P. mira, Prunus dulcis, Prunus ferganensis, Prunus davidiana, Prunus mume ‘Xizang’, Prunus armeniaca ‘Xizang’, Prunus salicina, Prunus humilis, Prunus domestica, P. yedoensis, and Prunus avium7,8,9,10,11,12,13,14,15,16,17,18. However, the Xizang cherry genome has not been reported yet.

The Tibetan Plateau has an average elevation of more than 4,000 m, and with that comes an extremely harsh environment, such as high UV-B radiation, low temperatures, low oxygen content, and low barometric. In addition, the Tibetan Plateau contains many wild Prunus resources; thus, these wild Prunus resources were not only selected by humans but also by the environment. Up to date, many studies have been done on the mechanism of high-altitude adaptation. For example, genomic selective scavenging analysis of two subgroups at high and low altitudes of P. mira demonstrated that the selected genes were functionally enriched in response to UV-B radiation16; Comparative population analysis indicated that a CBF gene might be the key factor in the adaptation of P. mira to low temperatures at high altitudes19. Even so, the effects of a high-altitude environment on genomic variation are poorly understood. Moreover, given the abundance of many wild cherry resources on the Qinghai-Tibet Plateau, how these resources adapt to high altitudes has not been reported.

P. avium is a fruit crop that grows agronomically and economically in the Rosaceae family, and this species usually grows in temperate climatic areas to provide the chilling requirement necessary for flower induction20,21. P. avium originated probably between the Black Sea and the Caspian and then spread to European temperate regions22. To date, more than thirty cherry species have been identified, resulting in diverse phenotypic variations in fruit, size, color, and other important agronomic traits23,24. In addition, previous genetic analysis studies demonstrated that a narrow genetic bottleneck occurred in modern cultivars23,25. However, little is known about the phenotype and genetic variation of Xizang cherry resources.

In this study, we assembled a high-quality chromosome-level P. serrula genome using Oxford Nanopore ultra-long reads and chromosome conformation capture sequencing (Hi-C). In conclusion, this genome provides valuable genetic resources for underlying the high-altitude adaptation of the Prunus fruit tree.

Methods

Materials collection and sequencing

Fresh young leaves used for genome sequencing were collected from the P. serrula plant grown in the wild environment of Xizang, China. The total genomic DNA for each of the accessions was extracted from leaves using the CTAB method26. The DNA-seq was performed on the Illumina NovaSeq 6000 platform.

Transcriptome sequencing was performed on the mixed samples of the three tissues (fruit, leaves, branches) for genome annotation. Total RNA was extracted using the RNAprep Pure Plant Kit (DP441, TIANGEN Biotech). RNA-seq was conducted on the Illumina NovaSeq 6000 platform, and 150-bp paired-end reads were generated. Hi-C libraries were controlled for quality and sequenced on an Illumina Novaseq platform with 150 bp paired-end reads.

De novo assembly and annotation of three Prunus genomes

The RepeatModeler software27 was used to build a mixed de novo TE library based on the genomes of diploid Xizang cherry, tetraploid Xizang cherry, and hexaploidy Xizang plum. This TE library and the Repbase database (https://www.girinst.org/repbase/) were used to annotate repeat sequences using RepeatMasker28.

Gene models were annotated based on ab initio gene predictions, protein homology searches, and RNA-seq reads based transcript assemblies. For ab initio gene predictions, AUGUSTUS29, GlimmerHMM30, and SNAP31 were employed using default parameters. The protein databases were constructed by integrating the amino acid sequences from the Rosaceae databases. Homology searching was then conducted using GenomeThreader32. In addition, RNA-seq reads were generated from a mixture of tissues. The Trinity software33 was used to perform genome-guided and de novo transcript assembly. The PASA software34 was used to update the protein-coding gene annotations. All of the gene structures predicted were combined using the EVM software35.

We assembled a high-quality genome with an N50 value of 9.5 Mb and the longest contig size of 19.5 Mb. The error correction of contigs was performed using Racon36 and was iterated three times based on Nanopore reads, followed by two rounds of polishing using NextPolish37 with Illumina short reads. With the Hi-C library, the error-corrected contigs were anchored to eight superscaffolds using the tools 3D-DNA38 and juicer39. The analysis of Benchmarking Universal Single-Copy Orthologs (BUSCO) revealed40 a completeness score of 98.5% (Table 1).

Table 1 Genome assembly statistic.

We annotated 35,151 protein-coding genes and 47,340 transcripts by combining ab initio prediction, RNA-Seq read mapping, and homologous protein alignments. To show the characteristics of the P. serrula genome, we identified Presence/Absence Variations (PAVs, which are genomic regions that are present in one genome but absent in another, representing structural variations that may contribute to phenotypic differences between species) between the P. serrula genome and the cultivated cherry genome41, and we also exhibited GC content, gene density, and TE density of the P. serrula genome (Fig. 1B).

Fig. 1
figure 1

De novo genome assembly and genome features of P. serrula (A) Phenotypic characteristics of P. serrula fruit. (B) Genome features of the Xiang cherry genome and the landscape of presence/absence variation (PAV) between the Xizang cherry genome and the cultivated cherry genome. The lines in the center of the circle indicate pairs of homologous genes on the different chromosomes of P. serrula.

Data Records

The whole-genome sequencing data (Table 2) were deposited to the NCBI Sequence Read Archive with accession number SRP45415942. The genome assembly data had been submitted to GenBank with accession number JBJZPD00000000043. The genome and genome annotation files of the P. serrula and two other Polyploid Xizang Prunus were also deposited to the Figshare database44,45.

Table 2 Data sequencing accessions for genome assembly.

Technical Validation

High completeness of genome assembly

99.4% of short reads and 99.5% of Nanopore ultra-long reads were remapped to the assembled P. serrula genome, we also conducted statistics on the BUSCO data for 44 Prunus genomes, these results demonstrated that our assembled genome was highly complete. Furthermore, the Hi-C contact map also suggested the result (Fig. 2). These evaluations reveal that genome assemblies are of high quality and suitable for use as reference genomes.

Fig. 2
figure 2

Hi-C contact matrix heatmap of the P. serrula genome.