Abstract
Astragalus membranaceus (Fisch.) Bge (AM) is a medicinal herb plant belonging to the Leguminosae family. In this study, we present a chromosome-scale genome assembly of AM, aiming to enhance the molecular biology and functional studies of Astragali Radix. The genome size of AM is about 1.43 Gb, with a contig N50 value of 1.67 Mb. A total of 98.16% of the assembly anchored to 9 pseudochromosomes using Hi-C technology. The assembly completeness was estimated to be 97.27% using BUSCO with the long terminal repeat assembly index (LAI) of 16.22 and quality value (QV) of 48.58. Additionally, the genome contained 67.98% repetitive sequences. Genome annotation predicted 29,914 protein-coding genes, including 73 genes involved in the flavonoid biosynthetic pathway and 2,048 transcription factors. The high-quality genome assembly and gene annotation resources will greatly facilitate future functional genomic studies in Leguminosae species.
Similar content being viewed by others
Background & Summary
Astragalus membranaceus (Fisch.) Bge (AM) is a widely used medicinal plants worldwide1. Its dried roots are known as Astragali Radix possessing hepatoprotective, diuretic, tonic and expectorant activities and play roles in anti-aging, anti-tumor, anti-neurodegeneration, and regulating blood glucose and immunity in Chinese medicine2. Flavonoids are one of the main active compounds in AM. Flavonoids have diverse biological activities and play numerous roles in the interaction between plants and the environment, such as resisting diseases and insect pests, preventing ultraviolet burns, attracting insects to pollinate, etc3. Recently, the genome of Astragalus mongholicus (AMM), another authorized plant source of Astragali Radix, has been reported4,5. It’s widely believed that the morphology and function of AM and AMM are highly divergent, and the latter species was more heterozygous. Based on metabolomics in the present study6, a total of 53 chemical markers was identified for the discrimination of AMM and AM. Among them, the contents of 36 components including 14 flavonoids in AM were significantly higher than those in AMM. AM may own stronger pharmacological activities than AMM.
To further understand the underlying molecular mechanism of flavonoid biosynthesis, we performed a chromosome-level genome sequencing of AM (2n = 18) using a combined PacBio reads and Hi-C scaffolding technology (Fig. 1). The assembled AM genome had a total length of 1.43 Gb, with a contig N50 of 1.67 Mb and a complete BUSCO score of 97.27%. A total of 1.40 Gb (98.16%) of the sequences was anchored to the 9 pseudochromosomes (Fig. 2). Genome annotation predicted 29,914 protein-coding genes and 972.44 Mb (67.98%) repetitive sequences. Moreover, 73 genes associated with the flavonoid biosynthetic pathway (Fig. 3) and 2,048 transcription factors (TFs) have been identified. The chromosome-scale genome of AM provides a genetic basis for exploring key genes and molecular regulatory mechanisms involved in the biosynthesis of important compounds, while also serves as a valuable resource for comparative genomic analysis between AM and AMM.
Circos plot illustrating the genome of AM genome. The plot includes the following components, arranged from inside to outside: (I) Collinear regions within AM assembly; (II) GC content in non-overlapping 1 Mb windows; (III) Percentage of repeats in 1-Mb sliding windows; (IV) Gene density in 1-Mb sliding windows; (V) Length of pseudo-chromosome in megabases (Mb).
Comparative genomic analysis between AM and AMM. (a) The syntenic regions. The analysis reveals intricate relationships between AM and AMM in their genomes. (b) AM protein length plotted against the orthologous protein length for AMM. (c) The density plot of SNPs between AM assembly and AMM assembly. (d) The density plot of Indels between AM assembly and AMM assembly.
The genes involved in the biosynthesis of flavonoids and the TFs in the AM genome. (a) The phylogenetic tree of genes involved in the flavonoid biosynthetic pathway. Genes with IDs highlighted in gold represent those originating from AM, while those highlighted in blue denote genes from AMM, and those in red denote genes from M. truncatula. (b) The distribution of TF family in AM genome. Only TF family containing 10 or more genes are shown.
Methods
Plant materials and sequencing
The plant material used for de novo genome assembly was a seven-year-old AM plant grown in Jinzhong, China. After the collection of vigorously growing leaves, they were immediately snap-frozen in liquid nitrogen. The frozen leaves were then stored at −80 °C in the laboratory until DNA extraction could be performed. Genomic DNA was extracted using DNeasy Plant Maxi kit (Qiagen, German). A short-fragmented library was prepared with an insert size of 350 bp and sequenced using BGISEQ, resulting in 150 bp paired-end reads. Two libraries were prepared following the manufacturer’s instructions from Pacific Biosciences, with an insert size of approximately 20 kb. These libraries were sequenced using PacBio Sequel platforms to generate continuous long reads. For chromosomal conformational capture (Hi-C) sequencing, libraries generated using DpnII restriction enzymes were prepared according to previously described methods7, and subsequently sequenced on the BGISEQ platform. RNA-seq libraries from root, leaf, and stem tissues during the fruit growth period were constructed using the NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (NEB, Ipswich, MA, USA) following the manufacturer’s protocol8. Then cDNA libraries were sequenced using a BGISEQ instrument, yielding 150 bp paired-end reads.
In summary, 156.2 Gb of paired-end next-generation sequencing reads (~109.2X), 196.4 Gb of PacBio subreads (~196.4X; the N50 length of subreads was larger than 22 kb), and 285.6 Gb of Hi-C data (~199.7X) were obtained (Table 1).
Genome survey
K-mer frequency distribution is a prevalent genomic survey technique. A K-mer is a sequence of K nucleotides extracted from sequencing data. With a read length of L, this method generates L-K + 1 K-mers. The 17-mer is a common choice for genome size estimation due to its capacity to cover a vast number of combinations (4^17), suitable for various species such as willow (338.93 MB)9, Dalbergia odorifera (653.45 Mb)10, camel (2.01–2.05 Gb)11, and gecko (2.55 Gb)12. Here, we counted 17-bp K-mers using Jellyfish (v 2.2.10)13 and estimated genome characteristics with GenomeScope (v2.0)14. The estimated genome size was 1.43 Gb with a heterozygosity rate of 1.01% (Table 2). This assessment closely matches the results obtained via flow cytometry, which indicated a genome size of 1.52 Gb15.
Genome assembly
Based on PacBio CLR data, Canu16, FALCON17, and MECAT218 have become widely used software in the field of genome assembly. Research by Nie et al. in 202419 demonstrated the high accuracy of these software packages in genome assembly. Notably, FALCON, endorsed by PacBio, has played a pivotal role in numerous high-quality plant genome projects. For instance, FALCON was utilized in the barley genome20, the maize Mo17 projetc21, the Asian rice genome research22, and the coffee genome study23, showcasing its effectiveness in facilitating efficient genome assembly. Here, the contig of the AM genome was assembled using Falcon (v2.0.5) assembler, with parameters as follows: -v -B48 -D250 -M24 -h600 -e.75 -l3000 -s1000 -k18 -w6 -T8–output_multi–min_idt 0.75–min_cov 4–max_n_read 200–n_core 8. After the Falcon assembly, the genome was polished by the command-line SMRT Link (v4.0.0) following the Reference Guide (https://programs.pacificbiosciences.com/l/1652/2017-02-01/3rzxn6/184345/SMRT_Tools_Reference_Guide__v4.0.0_.pdf). To enhance the contiguity of the genome and reduce errors, NGS short reads were through Pilon (v1.22)24. Finally, TrimDup, a component of the Rabbit Genome Assembler (https://github.com/gigascience/rabbit-genome-assembler), was applied to eliminate redundant sequences using a percentage of 0.3.
To anchor contigs onto pseudochromosomes, we used BWA (v 0.7.12)25 to align the Hi-C clean data to the assembled contigs. Low-quality reads were filtered out using the HiC-Pro pipeline26 with default parameters. The remaining valid reads were employed to anchor chromosomes with Juicer27 and 3d-dna pipeline28. Finally, the chromosome assemblies were cut into 500 kb bins of equal lengths and the interaction signals generated by the valid mapped read pairs between each bin were visualized in a heat map.
A genome assembly spanning 1.43 Gb was generated (Fig. 1; Table 3), which was close to the genome size of AMM (1.43 Gb vs 1.47 Gb) and the estimated genome size. The contig N50 value of AM genome was 1.67 Mb, which is comparable to the recently published genome of the closely related legume Astragalus sinicus29 (1.67 Mb vs 1.5 Mb). Approximately 1.40 Gb (98.16%) of the sequences were successfully anchored to the 9 pseudochromosomes (Table 3).
Annotation of repetitive sequences
Tandem repeats and interspersed repeats were identified using the method described in Qu et al.30. Approximately 67.98% of the assembled genome was classified as repetitive sequences, with interspersed repeats making up 65.99% of them (Table 4). Among the repetitive sequences, the most prevalent elements were long terminal repeats (LTRs), which accounted for 60.66% of the genome size.
Protein-coding genes prediction and functional annotation
Protein-coding genes were annotated using a similar method as described in Fang et al.31. To facilitate genome annotation of AM assembly, RNA sequencing of root, stem, and leaf samples was conducted and resulted in a total of 72.18 Gb clean reads (Table 5). For transcriptome-based prediction, RNA-seq clean reads were assembled using Trinity (v 2.15.1)32 with the following parameters: ‘–max_memory 200 G–CPU 40–min_contig_length 200–genome_guided_bam merged_sorted.bam–full_cleanup–min_kmer_cov 3–min_glue 3–bfly_opts ‘-V 5–edge-thr = 0.1–stderr’–genome_guided_max_intron 10000–genome_guided_min_coverage 2’. This generated 245,216 transcripts with an N50 of 1,997 bp. The assembled transcripts were aligned to the AM assembly using Program to Assemble Spliced Alignment (PASA) (v 2.4.1)33, and gene structures were generated from valid transcript alignments. Additionally, RNA-seq clean reads were also mapped to the AM assembly using Hisat2 (v 2.0.1)34. Stringtie (v 1.2.2)35 and TransDecoder (v 5.7.1) (https://github.com/TransDecoder/TransDecoder) were employed to assemble the transcripts and identify candidate coding regions into gene models. For homology-based method, homologous genomes and gene sets, including A. membranaceus var. mongholicus (AMM)5, Cicer arietinum (GenBank accession: GCA_026016865.1)36, Medicago truncatula (GenBank accession: GCA_000219495.2)37, Trifolium pratense (GenBank accession: GCA_949352195.3)38, Glycine max (ZH13-T2T)39, and Arabidopsis thaliana (Col-PEK1.5)40, were downloaded and used as queries to search against the AM assembly utilizing GeMoMa (v 1.9)41 approach. Genes with a coding sequence (CDS) length less than 150 bp were filtered out, along with single-exon genes lacking annotation of protein domains. Additionally, genes not anchored to chromosome sequences and lacking annotation of protein domains were also excluded. Finally, the generated gene models were refined with PASA (v 2.4.1) to obtain untranslated regions and information on alternative splicing variation by using Trinity assembled transcripts and isoforms from full-length transcriptomes of leaf and root tissues42. Following the method described in Bi et al.43, the integrated gene set was translated into amino-acid sequences and annotated. As a result, 29,828 genes (99.71% of the total) were successfully annotated.
Overall, we predicted 29,914 protein-coding genes, with average lengths of 4,752 bp for genes, 622 bp for introns, and 1,306 bp for coding sequences. We downloaded the genes related to the flavonoid biosynthetic pathway in the AMM genome and identified genes associated with the flavonoid biosynthetic pathway in the AM genome using the OrthoFinder method44. OrthoFinder is an accurate and comprehensive tool used for identifying and comparing homologous genomics among biological species. As a result, 73 genes associated with the flavonoid biosynthetic pathway in the AM genome were obtained. Homologous sequences were aligned by MAFFT (v 7.505)45, and the alignment was then processed with TrimAL (v 1.4.1)46 to remove poorly aligned positions. Subsequently, the phylogenetic tree was generated using iqtree2 (v 2.0.6)47 with parameters of “-b 1000” and visualized using Evolview48 (Fig. 3a). Using the method described in Li et al.49, a total of 2,048 transcription factors (TFs) were identified (Fig. 3b). In brief, the plant TF domain profile (https://planttfdb.gao-lab.org/)50 was searched against the AM protein data using the hmmsearch tool implemented in HMMER (v 3.1b2) (http://hmmer.org/). Proteins exhibiting a TF domain match with an E-value of 1E-5 or lower were chosen.
Genomic variations between AM and AMM
By applying the analytical tool MCScan (Python version)51, we conducted an in-depth identification of homologous regions between the AM and AMM genomes, with a threshold set to include at least ten genes. Our research findings revealed a total of 22,160 pairs of orthologous genes shared between the two genomes, with the AM genome containing 21,727 pairs (accounting for 72.63% of the total), and the AMM genome containing 21,474 pairs (accounting for 77.06% of the total) (Fig. 2a). The amino acid sequence lengths of the orthologous gene pairs within these collinear regions showed a significant positive correlation (Fig. 2b), further confirming their homology. Additionally, we observed two potential chromosomal rearrangement events. Firstly, a chromosomal fusion event occurred in the AMM linkage group Chr7, which connected the two chromosomes from the AM genome, namely Chr7 and Chr8. Secondly, a chromosomal fusion also took place in the AM linkage group Chr9, involving the two chromosomes from the AMM genome, Chr8 and Chr9. The specific causes of these fusion events, their timing, and how they affect the traits of the organism are important issues that require further in-depth exploration in future research.
Single nucleotide polymorphisms (SNPs) and small insertions/deletions (InDels) were identified using a similar method as previously reported52. Briefly, genome alignment between the AM and AMM assemblies was performed with the NUCmer program of MUMmer4 (v4.0.0)53 using the parameter settings “–mum -g 1000 -c 90 -l 40”. The delta-filter program was used to obtain alignment blocks with the parameter setting “-1 -l 5000”. The show-snps program was used to detect SNPs and InDels with the settings “-Clr -x 1 -T”. Finally, a total of 4,902,056 SNPs and 903,918 InDels were identified (Fig. 2c,d). These variations serve as resources for further research.
Data Records
The DNA and RNA sequence reads of AM have been deposited in the Sequence Read Archive (SRA) with accession numbers SRP48693054 under project number PRJNA1067739. The genome assembly has been deposited at GenBank under the WGS accession GCA_039519185.155. Additionally, the genome assembly, along with files for gene structure annotation, repeat predictions and gene functional annotation, variation information including SNP and InDels between AM and AMM genomes were deposited in Figshare56.
Technical Validation
Genome assembly and gene prediction quality assessment
The quality and accuracy of the AM assembly were assessed through the following analyses. Firstly, the Hi-C interaction map showed a strong intrachromosomal interactive signal along the diagonal (Fig. 4). Secondly, the distribution of CG depth indicated that there was no apparent contamination in the assembled sequences (Fig. 5). Thirdly, the AM assembly presented an LTR assembly index of 16.22 and a BUSCO score of 97.27%, indicating its high completeness (Table 3). In addition, evaluation using Merqury showed a QV of 48.58, suggesting high accuracy at the base-pair level. Lastly, 99.56% of the DNA next-generation sequencing reads were mapped to the AM genome assembly, whereas an equally impressive 99.25% of the error-corrected PacBio data could also be mapped to the assembly. Notably, the genome coverage achieved from the error-corrected PacBio data reached 99.40%, and the depth of each window remained consistent without significant fluctuations (Fig. 6).
We compared the length distribution of genes among the AMM5, C. arietinum36, and G. max39, and found similar patterns (Fig. 7). Meanwhile, 85.39% of the RNA-seq data were aligned to the predicted exons and only 2.5% located in intergenic region (Fig. 8). The BUSCO analysis showed that 96.59% (single-copy gene: 88.97%, duplicated gene: 7.62%) of 1,614 embryophyta single-copy orthologs were successfully identified as complete, while 1.12% were fragmented and 2.29% were missing in the assembly (Table 6). The 29,828 (99.71%) gene models were successfully annotated in diverse databases, such as NR, SwissProt, KEGG, KOG, TrEMBL and Interpro (Table 7). Taken together, all these results provide strong evidence that a high-quality AM genome has been obtained.
Code availability
No specific code or script was used in this work. Commands used for data processing were all executed according to the manuals and protocols of the corresponding software.
References
Fu, J. et al. Review of the botanical characteristics, phytochemistry, and pharmacology of Astragalus membranaceus (Huangqi). Phytotherapy research: PTR 28, 1275–1283 (2014).
Zheng, Y. et al. A Review of the Pharmacological Action of Astragalus Polysaccharide. Frontiers in pharmacology. 11, 349 (2020).
Chen, J. et al. Global transcriptome analysis profiles metabolic pathways in traditional herb Astragalus membranaceus Bge. var. mongolicus (Bge.) Hsiao. BMC genomics 16, 1–20 (2015).
Chen, Y. et al. A reference-grade genome assembly for Astragalus mongholicus and insights into the biosynthesis and high accumulation of triterpenoids and flavonoids in its roots. Plant Communications 4 (2022).
Global Pharmacopoeia Genome Database http://www.gpgenome.com/species/109 (2022).
Wang, Y. et al. Chemical Discrimination of Astragalus mongholicus and Astragalus membranaceus Based on Metabolomics Using UHPLC-ESI-Q-TOF-MS/MS Approach. Molecules (Basel, Switzerland) 24, E4064 (2019).
Belton, J.-M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58(3), 268–76 (2012).
Bian, X. et al. Regulatory role of non-coding RNA in ginseng rusty root symptom tissue. Scientific reports 11, 9211 (2021).
He, X. et al. The whole-genome assembly of an endangered Salicaceae species: Chosenia arbutifolia (Pall.) A. Skv. GigaScience 11 (2022).
Hong, Z. et al. The chromosome-level draft genome of Dalbergia odorifera. Gigascience 9.8 (2020).
Wu, H. et al. Camelid genomes reveal evolution and adaptation to desert environments. Nature communications 5.1 (2014).
Liu, Y. et al. Gekko japonicus genome reveals evolution of adhesive toe pads and tail regeneration. Nature communications 6.1 (2015).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature communications 11, 1432 (2020).
Fan, H. J. et al. Study of Genome Size of Medicinal Plant Astragali Radix. Chinese Journal of Basic Medicine In Traditional, 25(09), 1299–1302. (in Chinese with English abstract) (2019).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome research 27.5, 722–736 (2017).
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nature Methods 12, 780–786 (2015).
Xiao, C. L. et al. MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nature Methods 14.11, 1072–1074 (2017).
Nie, F. et al. De novo diploid genome assembly using long noisy reads. Nature Communications 15(1), 2964 (2024).
Zeng, X. et al. An improved high-quality genome assembly and annotation of Tibetan hulless barley. Scientific Data 7(1), 139 (2020).
Wang, B. et al. De novo genome assembly and analyses of 12 founder inbred lines provide insights into maize heterosis. Nature Genetics 55.2, 312–323 (2023).
Zhou, Y. et al. Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian rice. Nat Commun 14, 1567 (2023).
Salojärvi, J. et al. The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars. Nat Genet 56, 721–731 (2024).
Walker, B. J., Abeel, T., Shea, T., Priest, M. & Earl, A. M. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE 9, e112963 (2014).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Servant, N. et al. HiC-Pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biology 16 (2015).
Durand, N. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems 3, 95–98 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, eaal3327 (2017).
Chang, D. et al. The chromosome-level genome assembly of Astragalus sinicus and comparative genomic analyses provide new resources and insights for understanding legume-rhizobial interactions. Plant communications 3, 100263 (2022).
Qu, C. et al. Comparative genomic analyses reveal the genetic basis of the yellow-seed trait in Brassica napus. Nature Communications 14, 5194 (2023).
Fang, X. et al. The sequence and analysis of a Chinese pig genome. GigaScience 1, 16–16 (2012).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology 29(7), 644–52 (2011).
Haas, B. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31, 5654–5666 (2003).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12, 357–360 (2015).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20, 278 (2019).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_026016865.1 (2022).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000219495.2 (2014).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_949352195.3 (2023).
National Genomics Data Center https://ngdc.cncb.ac.cn/gwh/Assembly/66216/show (2023).
AraShare https://www.arashare.cn//static/uploads/Col-PEK1.5_assembly_and_annotation.tar.gz (2023).
Keilwagen, J., Hartung, F., Grau, J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. In: Kollmar, M. (eds) Gene Prediction. Methods in Molecular Biology, vol 1962 (2019).
Li, J. et al. Long read reference genome-free reconstruction of a full-length transcriptome from Astragalus membranaceus reveals transcript variants involved in bioactive compound biosynthesis. Cell Discovery 3 (2017).
Bi, Q. et al. The phased chromosome-scale genome of yellowhorn sheds light on the mechanism of petal color change. Horticultural Plant Journal (2023).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. in Genome biology 20, 238 (2019).
Nakamura, T., Yamada, K. D., Tomii, K. & Katoh, K. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics (Oxford, England) 34, 2490–2492 (2018).
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics (Oxford, England) 25, 1972–1973 (2009).
Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution 37, 1530–1534 (2019).
Subramanian, B., Gao, S., Lercher, M. J., Hu, S. & Chen, W.-H. Evolview v3: a webserver for visualization, annotation, and management of phylogenetic trees. Nucleic acids research 47, W270–W275 (2019).
Li, D. et al. A high-quality genome assembly of the eggplant provides insights into the molecular basis of disease resistance and chlorogenic acid synthesis. Molecular ecology resources 21, 1274–1286 (2021).
Jin, J., Zhang, H., Kong, L., Gao, G. & Luo, J. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucleic acids research 42, D1182–7 (2014).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic acids research 40, e49 (2012).
Li, T. et al. Genome assembly of KA105, a new resource for maize molecular breeding and genomic research. The Crop Journal (2023).
Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Computational Biology 14 (2018).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP486930 (2024).
Fan, H. Astragalus membranaceus isolate JZ-2020, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_039519185.1 (2024).
Fan, H. Genome Assembly and Annotation of Astragalus membranaceus (Fisch.) Bge (AM). figshare. Dataset. https://doi.org/10.6084/m9.figshare.25100393.v3 (2024).
Acknowledgements
This work was supported by Project of the “Modernization Research of Traditional Chinese Medicine” Key Research and Development Program of the Ministry of Science and Technology (No. 2019YFC1710800), Project of the Shanxi Collaborative Innovation Center of Astragali Radix Resource Industrialization and Industrial Internationalization (No. HQXTCXZX2016-005 and No. HQXTCXZX2016-016) and Key Research and Development (R&D) project of Shanxi Province (No.201603D3111001).
Author information
Authors and Affiliations
Contributions
H.J.F., Z.C., R.Z., C.G.M. and Q.S.L. conceived the study. H.J.F. collected and prepared the samples. Z.Y.W. and X.K.Y. performed bioinformatics analysis. Z.C. and H.J.F. wrote the manuscript with significant contributions from X.K.Y., A.K.L. and H.F.S. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Fan, H., Chai, Z., Yang, X. et al. Chromosome-scale genome assembly of Astragalus membranaceus using PacBio and Hi-C technologies. Sci Data 11, 1071 (2024). https://doi.org/10.1038/s41597-024-03852-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-024-03852-6
This article is cited by
-
A long road ahead to reliable and complete medicinal plant genomes
Nature Communications (2025)










