Chromosome-level genome and annotation of the tetraploid Rhodiola kirilowii

Zhang, Jianbo; Zhao, Wenji; Tang, Xiaohui; Jia, Guofu; Li, Daxu; Xia, Chongjing; Bai, Shiqie

doi:10.1038/s41597-025-04962-5

Download PDF

Data Descriptor
Open access
Published: 14 April 2025

Chromosome-level genome and annotation of the tetraploid Rhodiola kirilowii

Jianbo Zhang¹^na1,
Wenji Zhao¹^na1,
Xiaohui Tang¹,
Guofu Jia¹,
Daxu Li¹,
Chongjing Xia ORCID: orcid.org/0000-0002-4865-7920² &
…
Shiqie Bai^1,2

Scientific Data volume 12, Article number: 620 (2025) Cite this article

1997 Accesses
Metrics details

Subjects

Abstract

Rhodiola kirilowii, a perennial medical herb native to China, is highly valued for its detoxification and anti-swelling properties, as well as its role as an adaptogen, making it an intriguing subject for understanding its medicinal potential and molecular biochemistry. In this study, we generated a high-quality chromosome-level reference genome of R. kirilowii achieved through a combination of Illumina short-read sequencing, PacBio long-read sequencing, and Hi-C sequencing techniques. The final assembly spans 1.92 Gb, including 40 homoeologous chromosomes and one sex chromosome, with a scaffold NG50 of 46.03 Mb, and a BUSCO completeness of 98.9%. Additionally, we annotated a total of 1.23 Gb of repetitive sequences, encompassing 63.88% of the entire genome, and identified 122,035 protein-coding genes. Each sub-genome achieved similar completeness and continuity. This high-quality reference genome provides critical insights into the genetic underpinnings of R. kirilowii’s pharmacological properties, facilitating comparative genomics and the enhancement of its medicinal applications.

A high-quality chromosome-scale genome assembly of the Cherokee rose (Rosa laevigata)

Article Open access 22 January 2025

Chromosome-scale genome assembly and annotation of Huzhang (Reynoutria japonica)

Article Open access 21 March 2025

Chromosome-level Genome Assembly of the Tibetan Medicinal Plant Duyiwei (Phlomoides rotata)

Article Open access 29 September 2025

Background & Summary

Rhodiola kirilowii (Regel) Maxim is a perennial herbaceous plant belonging to the Crassulaceae family. It is traditionally used in Tibetan medicine, primarily for its roots and rhizomes, which have been employed for centuries due to their reputed medicinal properties. This plant is native to the Qinghai-Tibet Plateau and is commonly found in alpine regions of China, including Tibet, Qinghai, Sichuan, Gansu, Yunnan, Xinjiang, Shaanxi, Shanxi, and Hebei, thriving at elevations ranging from 2000m to 5600 m on rocky grasslands and slopes¹.

Historically, R. kirilowii has been documented in classical texts such as the “Four Medical Tantras” for its benefits in balancing lung heat and preventing epidemics. The “Chinese Tibetan Materia Medica” describes its capabilities in detoxification and reducing swelling, indicating its traditional use in treating epidemic diseases, lung heat, intoxication, and limb swelling. Modern pharmacological studies have identified a range of active compounds in R. kirilowii, including salidroside, tyrosol, daucosterol, cyanogenic glycosides, bergenin, lotaustralin, and flavonoids^2,3. These compounds contribute to the herb’s anti-hypoxic, anti-fatigue, anti-aging, and blood-activating effects, making it a valuable component in adaptogenic and anti-altitude sickness formulations.

Given its traditional and contemporary significance, studying the genome of R. kirilowii is crucial for several reasons. First, genome assembly and annotation can provide insights into the biosynthetic pathways responsible for its therapeutic compounds, potentially leading to enhanced cultivation practices and quality control in herbal medicine production. Second, understanding its genetic makeup can facilitate the development of more effective plant breeding strategies, aiming to increase the yield and potency of its active ingredients. Lastly, genomic research can uncover novel genes and pathways that may contribute to the plant’s adaptability to high-altitude environments, offering broader implications for plant biology and ecology. However, genomic resources of R. kirilowii is limited⁴, limited its utilization in traditional and modern medicine.

In this study, we successfully assembled and annotated the genome of R. kirilowii at the chromosome level by MGI short-read sequencing, PacBio Revio long-read sequencing, Hi-C sequencing, and RNA sequencing (RNA-seq) techniques. We estimated genome size and heterozygosity from clean short reads, performed long-read sequencing using the PacBio Revio System, and combined it with Hi-C reads to achieve chromosome-level assembly. Furthermore, homoeologous chromosomes were identified in this tetraploid R. kirilowii. Genome annotation was conducted using a combined methods, including RNA-seq reads, published genomes of closely related species, and de novo prediction methods. Additionally, we assessed the quality of genome assembly using various metrics. Our efforts culminated in the first high-quality reference genome with 40 homoeologous chromosome and one sex chromosome, of the genus Rhodiola, providing essential genetic data for studying adaptive evolution, genetic diversity, and genetics of biochemistry of the broader genus Rhodiola.

Methods

Sample collection

Rhodiola kirilowii (Regel) Maxim. (xh-4), was obtained from the Hongyuan Plateau Medicinal Plant Breeding Base of the Sichuan Grassland Science Research Institute (coordinates: 102.5442°, 32.7752°, elevation: 3495 m). Fresh leaves were collected, rinsed thoroughly with sterile water, and surface moisture was removed. The leaves were immediately preserved and transported in liquid nitrogen.

Library construction and sequencing

High-quality genomic DNA (gDNA) was extracted from collected leaves following the manufacturer’s instructions. The integrity and purity of the gDNA samples were assessed using agarose gel electrophoresis. The high-quality gDNA were sent to Wuhan Frasergen Bioinformatics Co., Ltd. (Wuhan, China), for DNA extraction, library construction, and genomic sequencing. Libraries were prepared using the TruSeq DNA PCR-Free Library Prep Kit (Illumina, San Diego, CA, USA) and SMRTbell (Sage Science, MA, USA) and following the manufacturer’s recommendations, with 200 bp insertion size for Illumina HiSeq 4000 sequencing and 20 kb fragments selected for PacBio Sequel II sequencing. In addition, in situ Hi-C experiment was performed and Hi-C library was sequenced using Illumina HiSeq 4000 PE 150 bp platform (Table 1). Specifically, chromatin was digested using restriction enzyme MboI enzyme.

Table 1 Library sequencing data and methods used in this study to assemble the R. kirilowii genome.

Full size table

To improve the precision of genome annotation, RNA sequencing was conducted from two tissues: leaf and roots (three locations, and three replicates). Each sample underwent RNA extraction utilizing TRIzol reagent (Invitrogen, USA), followed by assessment of RNA purity and concentration using Nanodrop and Qubit, construction of RNA-seq libraries employing the MGIEasy RNA Sample Prep Kit (UW Genetics), and sequencing on the Illumina HiSeq 4000 PE 150 bp platform. Totally, 446,936,037 pairs of raw reads were generated, and were subjected to Trimmomatic v0.38⁵ for high-quality data filtering following procedure as described⁶. These high-quality transcriptome reads were utilized for genome annotation.

Genome size and heterozygosity estimation

The genome size and heterozygosity of R. kirilowii were estimated based on distribution of 17-mer using GenomeScope v2.0. The prediction results indicated a genome size of 2.26 Gb, a heterozygosity of 0.39%, and repeat sequences of 92.49% (Fig. 1a). Interestingly, the peaks of the distribution of 21-mer around 140, 70 and 35 depth clearly showed the homozygous AAAA alleles, heterozygous AABB, and heterozygous ABCD alleles, suggesting the tetraploidy of R. kirilowii. Moreover, the fluorescent microscopy was used to measure the number of chromosomes within a nucleus of R. kirilowii. A total of 41 (4n = 40 + 1) chromosomes was captured (Fig. 1b).

De novo assembly of R. kirilowii genome

Flye mode in MaSuRCA v4.0.7⁷, a hybrid approach using a combination of PacBio and Illumina reads, was used for initial assembly, with default parameters except that the estimated genome size was set accordingly. Then POLCA within the MaSuRCA package was used for assembly polishing. At this step, the total length of the draft genome was 1,926,367,165 bp, comprising of 9,015 contigs with N50 of contig length of 474,563 bp and N50 of scaffold length of 44,362,222 bp.

Then the Hi-C reads, after quality control and trimming, were used to anchor the initial assembled contigs onto chromosomes through sorting, orientation, and ordering, following 3D-DNA pipeline (v180922⁸). Multiple iterations of manual refinement of chromosome boundaries using the 3D-DNA pipeline were performed. This process allowed us to detect and correct any apparent haplotype switches, ensuring precise haplotype assignment. Detailed procedures on Hi-C data processing and scaffolding were described⁶ and https://github.com/theaidenlab/Genome-Assembly-Cookbook. Juicebox Assembly Tools⁹ (v1.11.08) was used for contact frequency visualization and to manually re-define chromosome boundaries. At this step, a total of 41 chromosomes were obtained (Fig. 2a), and the Hi-C interaction heatmap reveals a clear diagonal pattern, which is indicative of strong intra-chromosomal interactions across all chromosomes. Interestingly, a set of four homoeologous chromosomes could be clearly visualized in the contact map. To assign homoeologous chromosomes to haplotypes, fastANI (v1.34¹⁰) was used to estimate average nucleotide identify (ANI) as a function of genomic distance to publicly available Rhodiola kirilowii chromosomes⁴. Briefly, a set of 4 homeologous chromosomes was set based on their high average nucleotide identity. Then haplotypes 1 to 4 were assigned according to value of ANI (Fig. 2b). We note that ChrIX had only one haplotype, and therefore denoted as the sex chromosome. Finally, the chromosome-level haplotype-solved genome was generated for our tetraploid R. kirilowii, with a genome size of 1.922 Gb which is over three times bigger than previously assembled diploid R. kirilowii⁴.

Genome annotation

Transposable elements (TEs) in our assembled R. kirilowii genome were masked using RepeatMasker (v4.0.6¹¹) using both the Repbase library and a de novo repeat library generated by RepeatModeler (v2.0.5¹²). This repeat mask step was performed for both whole-genome and each sub-genome. Overall, 63.88% of the R. kirilowii genome was identified as repeats (Table 2). This TE masked genome was used for gene model prediction.

Gene structure prediction was conducted through three methods: homology prediction, transcriptome prediction, and de novo prediction, with integration of the results to derive the final gene structure annotation using braker3 (v3.0.3¹³). For homology prediction, comparisons were made with the genomes: Vitis vinifera (GCA_030704545.1), Prunus persica L. (GCA_000346475.2), Vitis vinifera (GCA_030704545.1), Kalanchoe fedtschenkoi (GCA_002312865.1). Transcriptome prediction involved mapping quality-controlled RNA-seq reads (Table 1) to our assembled R. kirilowii genome using HiSAT2 (v2.2.1¹⁴). For de novo prediction, Augustus (v3.5.0¹⁵) was used to predict gene structure based on hidden Markov models. Finally, a total of 122,035 protein-coding genes were predicted in our assembled R. kirilowii genome, with each sub-genome encoding 25,446 to 28,034 proteins (Table 2).

Table 2 Metrics of the R. kirilowii xh-4 genome assembly.

Full size table

For gene function annotation, we employed the default parameters of the InterProScan (v5.53–87.0¹⁶, Jones et al.¹⁶) program to search Gene Ontology (GO) and Pfam databases. To annotate non-coding genes, various types of non-coding RNAs, including tRNA, rRNA, snRNA, and miRNA, were annotated using the Rfam database and Infernal (v1.1.4¹⁷, Nawrocki and Eddy 2014) within cmscan program. The annotated genome was visualized using circos plot (Fig. 3).

Data Records

Our assembled R. kirilowii xh-4 genome and annotation were deposited in the EBI-European Nucleotide Archive, under accession number GCA_965206585¹⁸, and in the Genome Warehouse in National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation, under accession number GWHFGNY00000000.1 within Project PRJCA031461 that is publicly accessible at https://ngdc.cncb.ac.cn/gwh¹⁹. Raw Illumina short read, PacBio long read, and Hi-C sequencing data for generating genome assembly and RNA-seq data for annotating the xh-4 assembly are available at NCBI SRA under accession number PRJNA1200924²⁰.

Technical Validation

Three methods were used to validate the quality of the assembled genome. First, to assess the accuracy and completeness of our assembled R. kirilowii xh-4 genome, we conducted BUSCO (v5.4.6²¹) assessment within the lineage of eudicots_odb10 (2326 single-copy genes) for both genome and annotated proteins. For assembled genome, no any conserved single-copy genes were missing for xh-4 genome, and only 4.7% to 10.2% were missing for each sub-genome. Similarly results were for annotated proteins (Table 2). Secondly, merqury (v1.3²²), a k-mer based assembly evaluator, was performed, and a 92.56 recovery rate with low error was obtained, showing high completeness of our assembly. Thirdly, we used the high-quality Illumina short reads to align back to our assembled R. kirilowii xh-4 genome and each sub-genome using BWA-MEM2 (v 2.0pre2²³). The analysis revealed that 94.93% to 95.75% of reads could be successfully mapped back to each assembled sub-genome, and 99.15% could be successfully mapped to all chromosomes (Table 2).

Code availability

No specific code was used in this study. All analytical processes were executed according to the program tutorials or manuals. The key software parameters used in this study are as follows:

Trimmomatic: PE -threads 16 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:90

GenomeScope v2.0: default

MaSuRCA: JF_SIZE = 40000000000 FLYE_ASSEMBLY = 1, and other default

POLCA: default

3D-DNA: run-asm-pipeline.sh--mode diploid--input 10000--editor-coarse-resolution 2500000--editor-coarse-region 7500000--editor-repeat-coverage 4--polisher-input-size 10000--polisher-coarse-resolution 100000 haploid.fasta merged_nodups.txt

fastANI:--threads 8--refList 0.ref.list--queryList 0.qury.list--matrix--output 4.fastANI.out

Braker3: braker.pl--species = arabidopsis--rnaseq_sets_ids = RNAseq_file--rnaseq_sets_dir = RNAseq_dir--prot_seq = Proteins.faa--gff3--busco_lineage = eudicots_odb10--rounds = 5

InterProScan: interproscan.sh -i protein.aa -f tsv -appl Pfam,SignalP_EUK,TMHMM--goterms -pa--iprlookup--cpu 32

cmscan:--rfam--cut_ga--nohmmonly--tblout 2.HJT.Chr.tblout--fmt 2 –clanin Rfam.clanin –cpu 16 Rfam.cm genome.fasta

Infernal: default

BUSCO: -i genome.fasta--lineage eudicots_odb10--augustus--augustus_species arabidopsis -o output -m genome--cpu 16

merqury: default

Meryl: k = 20 count output read1.meryl read_1.fastq.gz

bwa-mem2: mem -t 32 -M -a -o out.sam -R ‘@RG\tID:${mysample}\tSM:${mysample}\tPL:illumina’ ref.fasta Read1.fastq Read2.fastq

qualimap: bamqc -bam sorted.bam--java-mem-size = 32 G -outdir./qualimap/ -outformat pdf.

References

Ohba, H. Rhodiola. In: Eggli, U. (Ed.) Illustrated handbook of succulent plants: Crassulaceae. Berlin, Heidelberg, Germany: Springer- Verlag, pp. 210–227 (2003).
Mattioli, L. et al. Effects of a Rhodiola rosea L. extract on the acquisition, expression, extinction, and reinstatement of morphine-induced conditioned place preference in mice. Psychopharmacology 221, 183–193, https://doi.org/10.1007/s00213-012-2686-0 (2012).
Article CAS PubMed Google Scholar
Zhu, J. et al. Quantitative analysis of active components in Rhodiola species based on disease module-guided network pharmacology. Arabian Journal of Chemistry 17, 105570, https://doi.org/10.1016/j.arabjc.2023.105570 (2024).
Article CAS Google Scholar
Zhang, D. Q. et al. Two chromosome-level genome assemblies of Rhodiola shed new light on genome evolution in rapid radiation and evolution of the biosynthetic pathway of salidroside. Plant J. 117, 464–482, https://doi.org/10.1111/tpj.16501 (2024).
Article CAS PubMed Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120, https://doi.org/10.1093/bioinformatics/btu170 (2014).
Article CAS PubMed PubMed Central Google Scholar
Xia, C. J. et al. Folding Features and Dynamics of 3D Genome Architecture in Plant Fungal Pathogens. Microbiol Spectr 10, e0260822, https://doi.org/10.1128/spectrum.02608-22 (2022).
Article CAS PubMed Google Scholar
Zimin, A. V. et al. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm. Genome Research 27, 787–792, https://doi.org/10.1101/gr.213405.116 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95, https://doi.org/10.1126/science.aal3327 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
Article CAS PubMed PubMed Central Google Scholar
Jain, C. et al. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9, 5114, https://doi.org/10.1038/s41467-018-07641-9 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, N. Using Repeat Masker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics 5, 4.10.11–14.10. 14, https://doi.org/10.1002/0471250953.bi0410s05 (2004).
Article Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
Article ADS CAS Google Scholar
Gabriel, L. et al. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Res. 34, 769–777, https://doi.org/10.1101/gr.278090.123 (2024).
Article CAS PubMed PubMed Central Google Scholar
Kim, D. et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Research 32, 309–312, https://doi.org/10.1093/nar/gkh379 (2004).
Article CAS Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236, https://doi.org/10.1093/bioinformatics/btu031 (2014).
Article CAS PubMed PubMed Central Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).
Article CAS PubMed PubMed Central Google Scholar
EBI-European Nucleotide Archive https://identifiers.org/ncbi/insdc.gca:GCA_965206585.1 (2025).
Genome Warehouse in National Genomics Data Center Genome Assembly. https://ngdc.cncb.ac.cn/gwh/Assembly/86222/show (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRP552882 (2024).
Simao, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Article CAS PubMed Google Scholar
Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245, https://doi.org/10.1186/s13059-020-02134-9 (2020).
Article CAS PubMed PubMed Central Google Scholar
Vasimuddin, M. et al. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (2019).

Download references

Acknowledgements

This work was supported by Sichuan Forestry and Grassland Science and Technology Innovation Team (CXTD2024005) and National Forage Industry Technology System Aba Comprehensive Experimental Station of China (CARS-34). No conflict of interest is declared.

Author information

These authors contributed equally: Jianbo Zhang, Wenji Zhao.

Authors and Affiliations

Sichuan Academy of Grassland Sciences, Chengdu, China
Jianbo Zhang, Wenji Zhao, Xiaohui Tang, Guofu Jia, Daxu Li & Shiqie Bai
School of Life Science and Engineering, Southwest University of Science and Technology, Mianyang, China
Chongjing Xia & Shiqie Bai

Authors

Jianbo Zhang
View author publications
Search author on:PubMed Google Scholar
Wenji Zhao
View author publications
Search author on:PubMed Google Scholar
Xiaohui Tang
View author publications
Search author on:PubMed Google Scholar
Guofu Jia
View author publications
Search author on:PubMed Google Scholar
Daxu Li
View author publications
Search author on:PubMed Google Scholar
Chongjing Xia
View author publications
Search author on:PubMed Google Scholar
Shiqie Bai
View author publications
Search author on:PubMed Google Scholar

Contributions

S.B. conceived the project. and D.L. provided the financial support and participated in the supervision of the project. W.Z., X.T. and G.J. contributed to plant sample collection, DNA/RNA preparation, library construction, and sequencing. C.X. assisted with data analysis. C.X. performed genome assembly and annotation and comparative genomic analyses. J.Z. performed transcriptome analysis and analysis of the GST gene family. C.X. and J.Z. wrote and revised the manuscript.

Corresponding authors

Correspondence to Daxu Li, Chongjing Xia or Shiqie Bai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, J., Zhao, W., Tang, X. et al. Chromosome-level genome and annotation of the tetraploid Rhodiola kirilowii. Sci Data 12, 620 (2025). https://doi.org/10.1038/s41597-025-04962-5

Download citation

Received: 04 November 2024
Accepted: 07 April 2025
Published: 14 April 2025
Version of record: 14 April 2025
DOI: https://doi.org/10.1038/s41597-025-04962-5