Chromosome-level genome assembly of Malus niedzwetzkyana, the mother of Rosybloom crabapple

Wang, Ruizhen; Quan, Jian; Liu, Boyang; Wei, Yu; Liu, Hengxing; He, Ran; Guo, Ling; Dong, Leiming

doi:10.1038/s41597-024-04221-z

Download PDF

Data Descriptor
Open access
Published: 05 February 2025

Chromosome-level genome assembly of Malus niedzwetzkyana, the mother of Rosybloom crabapple

Ruizhen Wang¹^na1,
Jian Quan¹^na1,
Boyang Liu¹^na1,
Yu Wei¹^na1,
Hengxing Liu¹,
Ran He¹,
Ling Guo¹ &
…
Leiming Dong ORCID: orcid.org/0000-0003-0111-3720¹

Scientific Data volume 12, Article number: 211 (2025) Cite this article

2238 Accesses
1 Citations
Metrics details

Subjects

Abstract

Malus niedzwetzkyana (MN) is the mother of Rosybloom hybrids, ornamental crabapples with red and purple flowers. We present a high-quality chromosome-scale genome for MN with a size of 672.64 Mb, anchored to 17 chromosomes and with a high BUSCO completeness score of 98.6%, reaching the ‘gold standard’ level. Moreover, our assembly has captured 28 telomeres. A total of 43,813 protein-coding genes was annotated in the MN genome. The assembled high quality provides a valuable opportunity to enhance our understanding of the genetic basis of flower colour and other ornamental traits in crabapples, thereby advancing the field of genetics and breeding.

Chromosomal level genome assemblies of two Malus crabapple cultivars Flame and Royalty

Article Open access 13 February 2024

A chromosome-scale assembly of Ormosia boluoensis (Fabaceae)

Article Open access 20 October 2025

A chromosome-level genome assembly of the Knoxia roxburghii (Rubiaceae)

Article Open access 15 November 2023

Background & Summary

The ornamental crabapples (Malus spp., Rosaceae) are woody plants with fruit diameters of less than 5 cm, whose flowers, leaves, fruits and other traits have significant ornamental value¹. Interspecific hybridization plays a pivotal role in the breeding of new varieties of ornamental crabapple. For a considerable period, the predominant colours observed in crabapple flowers were white and pink. Following the discovery of Malus niedzwetzkyana (MN), there was a notable increase in the intensity of the colours red and purple, which significantly enhanced the ornamental value of crabapples. The Rosybloom hybrids, which are descendants of MN, exhibit flowers that are dark, deep rose, red, and purple in colour¹. They have been widely used in garden landscapes around the world. Previous studies have demonstrated that the colour of flowers, as well as foliage and fruit is primarily influenced by anthocyanins, a category of flavonoid compounds^2,3,4. The genetic analysis of MN at the molecular level provides a valuable opportunity to enhance our understanding of the genetic basis of flower colour and other ornamental traits in crabapples, thereby advancing the field of genetics and breeding.

De novo genome assembly represents a foundational and efficacious instrument employed in the field of molecular genetics research. Moreover, the development of third-generation PacBio HiFi sequencing and Hi-C techniques has markedly enhanced the completeness of genome assembly. Several chromosome-level genomes of crabapples are now accessible in public databases, including the European crabapple (M. sylvestris)⁵, the Pacific crabapple (M. fusca)⁶, M. prunifolia⁷, and two ornamental crabapple cultivars ‘Royalty’ and ‘Flame’⁴. The advent of chromosome-level genome sequences has opened new avenues of research, offering researchers the opportunity to investigate the functional, regulatory, and evolutionary aspects of the genome in the Malus genus, and to gain a more nuanced understanding of its significant characteristics.

This study presents a high-quality genome for MN, generated through an integrated approach utilising Illumina short-read, PacBio high-fidelity (HiFi) long-read, and high-throughput chromosome conformation capture (Hi-C) sequencing data. In total, we generated 40.02 Gb (~59 × coverage) of clean short reads, 29.42 Gb (~43 × coverage) of PacBio HiFi CCS reads with an N50 of 17.97 kb, and 63.95 Gb of clean Hi-C data (~94 × coverage) (Table 1). The estimated genome size, heterozygosity, and repetitive content were 678.26 Mb, 0.57% and 33.74%, respectively, using 17-mer analysis. The final assembled genome size was 672.64 Mb with a contig N50 of 36.45 Mb (Table 2). The assembled contigs were then anchored onto 17 pseudochromosomes, with an anchor rate of 98.38% (Fig. 1). The quality and completeness of the assembly were validated by employing four distinct approaches. First, the clean short reads and the PacBio HiFi CCS reads were mapped to the assembly, yielding a mapping ratio of 99.32% and 99.95%, respectively (Table 3). Secondly, two telomeres were identified at both ends of the twelve chromosomes, one telomere was identified at one end of four chromosomes (chr1, chr5, chr13, and chr16), and only chromosome 15 did not have a telomere identified (Fig. 2). Thirdly, the BUSCO results showed that 1590 (98.6%) genes could be compared with the lineal homologous database, of which 1058 (65.6%) were single-copy and 532 (33.0%) were duplicates (Table 4). Finally, the long terminal repeat (LTR) assembly index (LAI) score of the MN genome assembly was 21.99, reaching the ‘gold standard’ level (LAI > 20).

Table 1 Summary of MN sequencing data.

Full size table

Table 2 Summary of MN genome assembly data.

Full size table

Table 3 Mapping ratio of short reads and PacBio HiFi reads on the MN genome assembly.

Full size table

Table 4 BUSCO values of the assembly and annotation genes.

Full size table

Repeat sequences accounted for 476.89 Mb, representing 70.90% of the assembly (Table 5). The retransposon LTR was the most abundant component among the repetitive elements, accounting for 52.82% (Table 6). Gene annotation identified a total of 43,813 protein-coding genes (Table 2). The predicted proteins attained a complete BUSCO score of approximately 98.30%, which indicates a high quality of annotation (Table 4). A total of 42,972 (98.08%) protein-coding genes were successfully annotated in various databases, including Interpro, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), SwissProt, Translation of European Molecular Biology Laboratory (Trembl), and NCBI non-redundant database (NR) (Table 7). A total of 4,351 non-coding RNA genes (i.e., 152 microRNAs [miRNAs], 770 transfer RNAs [tRNAs], 3074 ribosomal RNAs [rRNAs], and 355 small nuclear RNAs [snRNAs]) in the genome (Table 8).

Table 5 General statistics of repeats in the MN genome assembly.

Full size table

Table 6 Transposable elements (TEs) in the MN genome assembly.

Full size table

Table 7 Number of functional annotations for predicted genes in the MN genome assembly.

Full size table

Table 8 Number of noncoding RNA genes in the MN genome assembly.

Full size table

Methods

Sample collection, DNA and RNA extractions

We collected fresh leaves, flowers, bark on tender branches and young fruits from a mature MN tree growing in the China National Botanical Garden (Beijing, China). The leaf samples were subjected to DNA isolation via a modified cetyltrimethylammonium bromide (CTAB) method, and all samples collected were subjected to RNA isolation using Trizol reagent (Invitrogen, CA, USA). The quality of DNA and RNA were examined using a NanoDrop 2000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and agarose gel electrophoresis. DNA and RNA quantity was determined using the Qubit dsDNA HS Assay Kit on a Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and a Bioanalyzer 2100 system (Agilent Technologies, CA, USA), respectively.

Preparation and sequencing of short insert libraries

For the DNA short insert libraries, a starting material of 1 μg DNA was used and sequencing libraries were constructed using the VAHTS Universal DNA Library Prep Kit for MGI (Vazyme, Nanjing, China). For the RNA-seq libraries, 1 μg RNA per sample served as the input material, with mRNA being isolated from the total RNA magnetic beads equipped with poly-T. Sequencing libraries were created using the VAHTS Universal V6 RNA-seq Library Kit for MGI (Vazyme, Nanjing, China). Unique index codes were incorporated to differentiate sequences among various samples. The quantification and sizing of these libraries were assessed with the Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and the Bioanalyzer 2100 system (Agilent Technologies, CA, USA). Finally, both DNA and RNA libraries were sequenced on a MGI-SEQ. 2000 platform to generate 150-bp paired-end reads.

Preparation and sequencing of long insert libraries

For the DNA PacBio long insert sequencing, a SMRTbell library was constructed following the manufacturer’s protocol using the SMRTbell Express Template Prep kit 2.0 (Pacific Biosciences). Initially, 15 μg of genomic DNA was processed in the first enzymatic step to eliminate single-stranded overhangs, followed by application of repair enzymes to mend any damage along the DNA backbone. The ends of the double-stranded DNA fragments were smoothed and then extended to create an A-overhang. The ligation with T-overhang SMRTbell adapters was conducted at 20 °C for 60 minutes. The SMRTbell library was cleaned with 1 × AMPure PB beads. The library’s size distribution and concentration were evaluated with the FEMTO Pulse automated pulsed-field capillary electrophoresis device (Agilent Technologies, Wilmington, DE) and the Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA). Following library characterization, 3 μg DNA underwent size selection with the BluePippin system (Sage Science, Beverly, MA), to eliminate SMRTbells shorter than 15 kb. The library was again purified with 1 × AMPure PB beads. The library’s size and quantity were reassessed using the FEMTO Pulse and the Qubit dsDNA HS reagents Assay kit. The sequencing primer and Sequel II DNA Polymerase were annealed and bound to the final SMRTbell library, respectively. The library was loaded at an on-plate concentration of 120 pM via diffusion loading. SMRT sequencing was executed using a single 8 M SMRT Cell on the Sequel II System, using the Sequel II Sequencing Kit and 1800-minute movies.

Hi-C Library construction and sequencing process

The Hi-C library was prepared according to a previous report⁸. In summary, samples were fixed by vacuum infiltration with 3% formaldehyde for 30 min at 4°C and then treated with 0.375 M glycine solution to quench the action for 5 min. The fixed samples were lysed, and endogenous nucleases were neutralized with 0.3% SDS. Chromatin DNA was digested using 100 U of MboI (NEB), labeled with biotin-14-dCTP (Invitrogen) and subsequently ligated with 50 U T4 DNA ligase (NEB). After reversing the cross-links, the ligated DNA was purified using the QIAamp DNA Mini Kit (Qiagen) as per the manufacturer’s guidelines. The extracted DNA was fragmented to 300–500 bp pieces, subjected to blunt-end repair, A-tailing and adaptor ligation, followed by purification via biotin-streptavidin pull-down and PCR amplification. The Hi-C libraries were then quantified and sequenced using the Illumina Nova-seq platform (San Diego, CA, USA).

Genome survey and analysis

After quality assessment and filtering, 40.02 Gb of clean DNA short reads were obtained for genome survey. The genome size, heterozygous ratio, and percentage of repetitive sequence were estimated using the GCE (v1.0.2)⁹ software with K-mer size of 17 bp and cleaned short reads.

Genome assembly

The yielding 29.42 Gb PacBio long high-fidelity (HiFi) reads were assembled to construct the draft genome using hifiasm (v0.16.1)¹⁰ with default parameters. The Hi-C reads were applied to assemble the contigs into 17 chromosomes using Juicer tools (v.1.6)¹¹ and 3D-DNA¹² based on Hi-C interaction data (70.74 Gb) (Fig. 1). Finally, JucieBox (v1.11.08)¹³ was used to manually adjusted the assembled genome. The BUSCO (Benchmarking Universal Single-Copy Orthologs) pipeline (v5.2.2)¹⁴ was used to assess the coverage of highly conserved genes to validate the completeness of the genome assembly with the embryophyte_odb10 dataset, which contains 1614 BUSCO gene sets.

Repetitive sequences annotation and telomeres identification

We used two methods to identify the repeat contents in the MN genome, combining de novo and homology-based prediction. For the homology-based analysis, we utilized RepeatMasker (v4.1.2; http://repeatmasker.org) with the Repbase TE library¹⁵ to detect known transposable elements (TEs) within the genome. Additionally, RepeatProteinMask (v4.1.2) was employed to search against the TE protein database. In terms of de novo prediction, we generated a custom repeat library for the genome using RepeatModeler (v2.0.2a; http://www.repeatmasker.org/RepeatModeler/), which automates the execute of RECON (v1.08)¹⁶ and RepeatScout (v1.0.5)¹⁷, two key tools for identifying, refining, and classifying potential interspersed repeats. Furthermore, LTR_FINDER (v1.0.7)¹⁸ was deployed for a de novo search of long terminal repeat (LTR) retrotransposons within the genome sequences. Tandem repeats were identified with the Tandem Repeat Finder (v4.10.0)¹⁹ and non-interspersed repeat sequences, such as low-complexity repeats, satellites and simple repeats, were detected using RepeatMasker. The repeat libraries from both methods were then integrated to ascertain the repeat content. Telomeric sequences in MN genome assembly were pinpointed using quarTeT (v1.0.3)²⁰ with the “-c plant” option, revealing the telomere repeat monomer as TTTAGGG.

Prediction and functional annotation of protein-coding genes

We used three strategies to predict genes, including ab initio prediction, homology-based prediction, and transcriptome-based prediction, all applied to the repeat-masked, chromosome-scale genome. First, ab initio prediction was performed with Augustus (v3.3.1)²¹ and Genescan (v1.0)²² utilizing models trained on a curated set of high-quality proteins derived from RNA-Seq data. Second, the genome sequences were aligned with the protein sequences of seven plants, M. domestica, M. sieversii, M. sylvestris, M. prunifolia, Pyrus pyrifolia, Crataegus pinnatifida, and Prunus persica, and gene structures were predicted with Exonerate (v2.2.0)²³ with default parameters. Third, for transcriptome-based gene prediction, the gene structure was built using PASA (v2.4.1)²⁴. For Iso-Seq-based gene prediction, the Iso-seq reads were aligned to scaffolds using GMAP (v2017-11-15)²⁵. The transcripts were used to predict Open Reading Frames (ORFs) using PASA, and full-length cDNA was screened as a training set. Finally, Maker (v3.00)²⁶ synthesized the predictions from the three approaches to generate gene models. The output was a coherent and non-overlapping set of sequence assemblies that delineated the gene structures. Gene functions were assigned based on the highest scoring matches of the alignments to the National Center for Biotechnology Information (NCBI) Non-Redundant (NR), TrEMBL²⁷, InterPro²⁸ and Swiss-Prot²⁷ protein databases using BLASTP (v2.6.0+)²⁹ and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database³⁰, all with an E-value threshold of 1E-5. Protein domains were annotated with InterProScan (v5.3574.0)³¹ based on InterPro protein databases. Pfam database³² was used to identify the motifs and domains within gene models. IDs of Gene Ontology (GO)³³ were assigned to each gene by Blast2GO³⁴. Non-coding RNA, including miRNA, tRNA, rRNA, and snRNA, were annotated using the following methods: tRNA were predicted using tRNAscan-SE (v1.3.1)³⁵ with default parameters; rRNA were identified by mapping Arabidopsis thaliana rRNA sequences to the MN genome using BLASTN-short v2.2.28; miRNA and snRNA were analyzed using INFERNAL (v1.1.3)³⁶ against the Rfam database with default parameters.

Data Records

The DNA sequence reads of MN (Experiments of DNA sequencing data from PacBio HiFi library: SRR30127381³⁷; Experiments of DNA sequencing data from Hi-C library: SRR30140708³⁸) have been deposited in the Sequence Read Archive (SRA) under BioProject accession PRJNA1143952. The genome assembly have been deposited in the GenBank database under the accession number JBHDYX000000000³⁹. The files of the gene structure annotation, repeat predictions and gene functional annotation have been deposited at Figshare database (https://doi.org/10.6084/m9.figshare.26962936)⁴⁰.

Technical Validation

We utilized a variety of methods to ascertain the accuracy and completeness of the MN genome assembly. First, the Hi-C heatmap demonstrated the accuracy of the genome assembly, with distinct Hi-C signals between the 17 pseudo-chromosomes, indicating their relative independence (Fig. 1). Second, the completeness and accuracy of the assembled genome was further substantiated by the benchmarking universal single-copy orthologues (BUSCO) analysis, which revealed that 1,590 complete plant orthologues (98.6%) were identified (Table 4). Third, the long terminal repeat (LTR) assembly index (LAI) score of the present assembly was 21.99, reaching the ‘gold standard’ level (LAI > 20). The accuracy was further corroborated by the very high mapping rates of two types of sequences to the MN assembly, with 99.32% of short insert reads and 99.95% of HiFi reads mapping to the MN assembly (Table 2). Finally, the chromosome telomere location map showed that the assembled genome extended to the telomeres, with the exception of chromosome 15, and that the majority of the chromosomes were assembled into double-ended telomeres (Fig. 2).

Code availability

There is no custom code was used during this study. All software and pipelines were executed according to the manual and protocols of the published bioinformatics tools. The version and code/parameters of software have been detailed and described in Methods.

References

Fiala, J. Flowering crabapples: the genus Malus. (Timber Press, Incorporated, 2003).
Lawrence, W. J. C. & Price, J. R. The genetics and chemistry of flower colour variation. Biol. Rev. 15, 35–57 (1940).
Article CAS MATH Google Scholar
Glover, B. J. & Martin, C. Anthocyanins. Curr. Biol. 22, R147–R150 (2012).
Article CAS PubMed Google Scholar
Li, H. et al. Chromosomal level genome assemblies of two Malus crabapple cultivars Flame and Royalty. Sci. Data 11, 201 (2024).
Article CAS PubMed PubMed Central Google Scholar
Sun, X. et al. Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication. Nat. Genet. 52, 1423–1432 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Mansfeld, B. N. et al. A haplotype resolved chromosome‐scale assembly of North American wild apple Malus fusca and comparative genomics of the fire blight Mfu10 locus. Plant J. 116, 989–1002 (2023).
Article CAS PubMed MATH Google Scholar
Li, Z. et al. Chromosome-scale reference genome provides insights into the genetic origin and grafting-mediated stress tolerance of Malus prunifolia. Plant Biotechnol. J. 20, 1015–1017 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Padmarasu, S., Himmelbach, A., Mascher, M. & Stein, N. In situ Hi-C for plants: an improved method to detect long-range chromatin interactions. in Plant long non-coding RNAs (eds. Chekanova, J. A. & Wang, H.-L. V.) 441–472 (Springer New York, New York, NY, 2019).
Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. (2013).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central MATH Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Robinson, J. T. et al. Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst. 6, 256–258.e1 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6 (2015).
Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 13, 1269–1276 (2003).
MATH Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).
Article CAS PubMed Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central MATH Google Scholar
Lin, Y. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic. Res. 26–28 (2023).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
Article CAS PubMed PubMed Central MATH Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS PubMed MATH Google Scholar
Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6 (2005).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9 (2008).
Wu, T. D. & Watanabe, C. K. GMAP: A genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Article CAS PubMed MATH Google Scholar
Cantarel, B. L. et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).
Article CAS PubMed PubMed Central Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).
Article CAS PubMed PubMed Central MATH Google Scholar
Mitchell, A. et al. The InterPro protein families database: The classification resource after 15 years. Nucleic Acids Res. 43, D213–D221 (2015).
Article PubMed MATH Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Article PubMed PubMed Central MATH Google Scholar
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40 (2012).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central MATH Google Scholar
El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).
Article CAS PubMed Google Scholar
Dwight, J. T. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article PubMed PubMed Central MATH Google Scholar
Conesa, A. & Gotz, S. Blast2GO: a comprehensive suite for functional analysis in plant genomics. Int. J. Plant Genomics 2008, 1–12 (2008).
Article MATH Google Scholar
Chan, P. P. & Lowe, T. M. tRNAscan-SE: Searching for tRNA genes in genomic sequences. Methods in Mol. Biol. 1962, 1–14 (2019).
Article CAS MATH Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR30127381 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR30140708 (2024).
Wang, R. et al. Malus niedzwetzkyana isolate LD-2024a, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JBHDYX000000000 (2024).
Wang, R. Chromosome-level genome assembly of Malus niedzwetzkyana, the mother of Rosybloom crabapple. figshare https://doi.org/10.6084/m9.figshare.26962936 (2024).

Download references

Acknowledgements

This work was supported by the Beijing Natural Science Foundation (6224061), Beijing Municipal Administration Center of Parks (kjxx2021004, kjxx2023006, zx2022010) and the Open Fund of State Key Laboratory of Tree Genetics and Breeding (Chinese Academy of Forestry) (TGB-KF-2024-001).

Author information

These authors contributed equally: Ruizhen Wang, Jian Quan, Boyang Liu, Yu Wei.

Authors and Affiliations

Key Laboratory of National Forestry and Grassland Administration on Plant Ex situ Conservation, Beijing Floriculture Engineering Technology Research Centre, Beijing Botanical Garden, Beijing, 100093, China
Ruizhen Wang, Jian Quan, Boyang Liu, Yu Wei, Hengxing Liu, Ran He, Ling Guo & Leiming Dong

Authors

Ruizhen Wang
View author publications
Search author on:PubMed Google Scholar
Jian Quan
View author publications
Search author on:PubMed Google Scholar
Boyang Liu
View author publications
Search author on:PubMed Google Scholar
Yu Wei
View author publications
Search author on:PubMed Google Scholar
Hengxing Liu
View author publications
Search author on:PubMed Google Scholar
Ran He
View author publications
Search author on:PubMed Google Scholar
Ling Guo
View author publications
Search author on:PubMed Google Scholar
Leiming Dong
View author publications
Search author on:PubMed Google Scholar

Contributions

Ran He, Ling Guo, and Leiming Dong supervised the project. Hengxing Liu prepared the plant material. Ruizhen Wang, Jian Quan and Boyang Liu drafted the manuscript. Yu Wei prepared all figures and revised the manuscript. Ran He, Ling Guo, and Leiming Dong contributed to the discussions and improvement of the manuscript. All authors read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Ran He, Ling Guo or Leiming Dong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, R., Quan, J., Liu, B. et al. Chromosome-level genome assembly of Malus niedzwetzkyana, the mother of Rosybloom crabapple. Sci Data 12, 211 (2025). https://doi.org/10.1038/s41597-024-04221-z

Download citation

Received: 12 September 2024
Accepted: 02 December 2024
Published: 05 February 2025
Version of record: 05 February 2025
DOI: https://doi.org/10.1038/s41597-024-04221-z

This article is cited by

Chromosome-scale genome assembly of Flemingia macrophylla
- Ting Yuan
- Xiangyu Wang
- Kunhua Wei
Scientific Data (2025)