Abstract
The genus Acer (maples) is widely distributed and holds significant ecological and economic value, with over 200 species, including Acer truncatum, an important member known for its ornamental features, environmental stress tolerance, and high content of unsaturated fatty acids and nervonic acid in its seeds. Despite the genus’s importance, a high-completeness genome assembly has been lacking. In this study, we present a near-complete telomere-to-telomere (T2T) genome assembly of A. truncatum using advanced sequencing technologies, including PacBio HiFi long reads, ONT ultra-long reads, and Hi-C chromatin conformation capture. The resulting 1.2 Gb haplotype-resolved genome contains 58,569 protein-coding genes and shows a 4.8-fold improvement in contiguity over previous assemblies, with a scaffold N50 of 44 Mb. This high-quality genome provides a foundational resource for investigating the evolutionary history, functional genomics, and molecular breeding of A. truncatum.
Similar content being viewed by others
Background & Summary
The advent of telomere-to-telomere (T2T) genome assemblies represents a milestone in genomics, enabling near-complete chromosomal reconstructions and significantly improving resolution in previously intractable regions such as centromeres, telomeres, and rDNA clusters1,2. Traditional assemblies often leave these repetitive regions unresolved. Recent progress in sequencing technologies—including PacBio HiFi, Oxford Nanopore ultra-long reads, and Hi-C chromatin interaction maps—coupled with advanced assembly tools such as hifiasm3 and quarTeT4, has enabled the generation of highly contiguous and accurate plant genomes. T2T assemblies have now been achieved in multiple crops, including soybean5, rice6, maize7, and cucumber8, as well as woody species like Chinese cork oak9, Chinese bayberry10, and Populus alba × P. tremula (84 K poplar)11, offering unprecedented insights into genome structure and evolution.
The genus Acer (maples), belonging to the family Sapindaceae, comprises over 200 species distributed across Eurasia and the Americas, with China being the modern diversity center, harboring approximately 140 species12. A. truncatum, widely distributed between 28°N–46°N and 102°E–143°E13, is a species of ecological and economic significance. It is noted for its tolerance to drought and cold14,15, striking autumn foliage16,17, and seed oil rich in unsaturated fatty acids, especially nervonic acid18,19,20. These compounds are beneficial to cardiovascular and neurological health, with nervonic acid playing roles in neural repair and possibly mitigating neurodegenerative diseases such as Alzheimer’s and multiple sclerosis21.
To date, several Acer genomes have been assembled, including A. catalpifolium22, A. negundo23, A. palmatum24, A. pseudosieboldianum25, A. rubrum26, A. saccharum23, A. truncatum17,18, and A. yangbiense27, providing valuable resources for stuies on phylogeny, stress tolerance, and leaf coloration. However, no high-completeness, gap-less genome has been reported for the genus. Earlier versions of the A. truncatum genome included a 653 Mb assembly using PacBio Sequel I, 10x Genomics, and Illumina data (1,453 contigs, contig N50 = 773 kb)20, and a hybrid HiFi/ONT/Hi-C assembly with 181 scaffolds (scaffold N50 = 9.14 Mb)17.
In this study, we reanalyzed the dataset from Zhang et al.17 to generate a near-complete, haplotype-resolved genome of A. truncatum. The final assembly spans 1.2 Gb, with a contig N50 of 42 Mb and scaffold N50 of 44 Mb, and contains only seven gaps. Telomeric and centromeric sequences were accurately resolved across most chromosomes. This represents the first T2T-level genome assembly in the Acer genus. The resulting high-contiguity genome provides a valuable and comprehensive reference for comparative and functional genomic studies in maples and related taxa.
Methods
Data acquisition
All raw sequencing data for A. truncatum ‘Lihong’, including PacBio subreads, ONT reads, Hi-C reads, and Illumina paired-end reads, were obtained from the National Genomics Data Center (NGDC) Sequence Read Archive under accession number PRJCA014724 (https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA014724). The ccs tool (v6.4.0) from Pacific Biosciences (https://github.com/PacificBiosciences/ccs) was used with the–all option to regenerate PacBio HiFi reads from the provided subreads. This process generated 38 Gb (~62 × coverage) of HiFi data, which was subsequently used in the genome assembly, alongside ONT, Hi-C, and Illumina data (Supplementary Table 1, 2).
Genome assembly and polishing
We reassembled the ‘Lihong’ genome using hifiasm (v0.19.8-r602)3 to generate haplotype-resolved contigs by integrating PacBio HiFi reads, Oxford Nanopore long reads, and Hi-C sequencing data. Hi-C reads were aligned to the assembled contigs using Juicer (v1.6)28, and an initial chromosome-scale assembly was generated using 3D-DNA (v201008)29. This draft assembly was manually curated using Juicebox (v2.20.00)30 to refine chromosome boundaries, correct misassemblies, and resolve haplotype switch errors based on Hi-C contact maps30.
This process yielded a chromosome-scale assembly of 1.2 Gb, comprising two complete haplotypes (Fig. 1). Attempts to phase the two haplotypes using subphaser (v1.2)31 were unsuccessful due to the absence of discriminative k-mers, and the haplotypes were thus arbitrarily labeled as “a” and “b”.
Genomic landscape of the haplotype-resolved A. truncatum genome. (a) Chromosome length in Mb; (b) Density of Class I transposable elements (TEs); (c) Density of Class II TEs; (d) Gene density across the genome; (e) Proportion of tandem repeats; (f) GC content; (g) Collinear blocks larger than 100 kb. (b–f) represent statistical analyses using a window size of 500 kb.
Gap filling
Gaps within the scaffolded assembly were filled using quarTeT (v1.2.5)4, leveraging the high accuracy of PacBio HiFi reads. This step markedly improved the assembly’s continuity.
Telomere extension
To improve the completeness of telomeric regions and extend assemblies to chromosome termini, HiFi reads were mapped to the draft genome using minimap2 (v2.29)32. Reads mapping to chromosomal termini were extracted and reassembled into contigs using hifiasm, which were then aligned back to the assemblies to extend the telomeric regions.Telomeres were detected at all expected positions except one end of chromosomes 4a, 5a, 6b, and 7b, and both ends of chromosome 7a (Fig. 2a). Conserved plant rDNA repeats33,34 were localized: 18S–5.8S–28S arrays were enriched at the termini of chromosomes 1a, 6a, 7a, and 7b (Fig. 2b), while 5S arrays were primarily distributed on chromosomes 3a, 3b, and 8b (Fig. 2c). Organelle genome assembly Separate assemblies for the chloroplast and mitochondrial genomes were performed using GetOrganelle (v1.7.7.1)35, based on PacBio HiFi reads.
Redundancy and contamination removal
Redundans (v2.0.1)36 was used to align all contig and scaffold sequences to the chromosomal and organellar genome assemblies. This analysis identified low-coverage fragments, potential haplotigs, rDNA fragments, and other redundant or contaminating sequences within the unplaced or scattered sequences. These fragments were manually curated and removed, resulting in a final assembly with 7 gaps, each fixed at 100 bp.
Repeat sequence identification
De novo identification and annotation of transposable elements (TEs) were performed to build a custom TE library for A. truncatum using EDTA (v1.9.9)37 with parameters–sensitive 1–anno 1. The genome assembly was then masked for repetitive sequences using this library with RepeatMasker (v4.1.8) (https://www.repeatmasker.org/RepeatMasker/). This analysis identified 2,128,936 repetitive sequences, covering 774,332,459 base pairs, which represent 63.98% of the entire assembly (Table 1, Table 2). Among the repeat classes, Long Terminal Repeats (LTRs) were the most abundant, comprising 738,696 elements that span 515,715,855 bp (42.61% of the genome). LTR subclasses, copia and gypsy, accounted for 18.36% and 12.20% of the genome, respectively. Terminal Inverted Repeats (TIRs) made up 11.58% of the genome sequence.
Gene prediction and annotation
Protein-coding gene annotation utilized a combination of evidence sources and computational tools. Homology-based evidence included 334,064 non-redundant protein sequences from 14 plant species (Supplementary Table 3). This transcript evidence, along with the homology evidence, was used in the PASA (v2.4.1) pipeline38 to annotate gene structures and identify full-length transcripts. These full-length transcripts were used to train the ab initio gene prediction tools AUGUSTUS (v3.5.0)39 and SNAP40, with AUGUSTUS undergoing five rounds of iterative optimization.
The MAKER2 (v3.01.03) pipeline41 was then used to integrate evidence from ab initio predictions (AUGUSTUS, SNAP), transcript alignments using BLASTN and TBLASTX42 and homologous protein alignments using BLASTX. Exonerate (v2.2.0)43 was used to refine alignment evidence, excluding repetitive regions masked by RepeatMasker. EvidenceModeler (EVM) (v2.1.0)44 was employed to integrate the MAKER output with the gene models generated by PASA, producing a more consistent gene set. To minimize the inclusion of transposable element genes, TEsorter (v1.4.7)45 was used to identify TE protein domains within the predicted genes, and these domains were masked during the EVM integration step. The final gene models were refined using PASA to add untranslated regions (UTRs) and model alternative splicing isoforms. Finally, gene models were filtered to remove those with internal stop codons, ambiguous bases, missing start or stop codons, or encoding proteins shorter than 50 amino acids.
This comprehensive annotation pipeline resulted in a final set of 58,569 protein-coding genes, comprising 81,299 transcripts (Table 1, Supplementary Table 4). Subgenome “a” contains 29,648 genes, with an average gene length of 4,481 bp, an average of 5.9 exons per gene, and an average exon length of 303 bp (Supplementary Table 5). Subgenome “b” contains 28,921 genes, with an average gene length of 4,464 bp, an average of 5.7 exons per gene, and an average exon length of 290 bp.
ncRNA annotation
Transfer RNA (tRNA) genes were annotated using tRNAscan-SE (v2.0.12)46. Ribosomal RNA (rRNA) genes were annotated with barrnap (v0.9) (https://github.com/tseemann/barrnap), with partial gene predictions filtered out. Other types of non-coding RNAs (ncRNAs), including microRNAs (miRNAs), small nuclear RNAs (snRNAs), and others, were identified by aligning to the Rfam database using RfamScan (v15.0) (https://rfam.org/). In total, we annotated 1,259 rRNA sequences, 1,560 tRNA sequences, and 338 small ncRNAs (Supplementary Table 6).
Functional annotation
Functional annotation of the protein-coding genes was performed using three complementary strategies. First, Gene Ontology (GO) terms and KEGG pathway annotations were assigned using eggNOG-mapper (v2)47 against the eggNOG homology database48. Second, sequence similarity searches were conducted with DIAMOND v2.0.1549 against multiple protein databases: Swiss-Prot, TrEMBL, NR, and the Arabidopsis thaliana proteome. Only the best hit per gene was retained, requiring an alignment identity greater than 30% and an E-value less than 1e-5. Third, conserved protein domains, motifs, and functional sites were identified using InterProScan (v5.74-105.0)48, which queries databases such as PRINTS, Pfam, SMART, PANTHER, and CDD.
Overall, 96.3% of the protein-coding genes received at least one type of functional annotation. Specifically, GO terms were assigned to 42.63% of the genes, and KEGG pathway annotations were assigned to 40.57% (Table 1, Supplementary Table 7).
Data Records
All data generated or analyzed during this study are publicly available. The haplotype-resolved genome assembly of Acer truncatum is available at the European Nucleotide Archive (ENA) under accession GCA_976991395 (https://identifiers.org/insdc.gca:GCA_976991395)50. In addition, the genome assembly and annotation files are also available on Figshare (https://doi.org/10.6084/m9.figshare.27020836.v1)51. Supplementary material associated with this study can be accessed online via Figshare (https://doi.org/10.6084/m9.figshare.30294964.v1)52.
Technical Validation
Evaluation of the assembled and the annotated genome
To evaluate the accuracy and completeness of the genome assembly, we performed multiple assessments based on read mapping, genome coverage, sequence consistency, gene completeness, and chromosomal structure.
Illumina short reads were mapped to the final assembly using BWA (v0.7.17)53, achieving a mapping rate of 99.69%. PacBio HiFi reads were mapped using minimap2, with a 99.55% mapping rate after removing non-primary alignments (Table 3). Depth analysis revealed that 99.81% of the genome was covered at least 10 × by Illumina reads and 99.58% by HiFi reads, indicating near-complete coverage.
To detect potential redundancy or collapsed regions, sequencing depth was analyzed across all genomic loci. Both single-copy and multi-copy BUSCO regions showed unimodal depth distributions approximating a Poisson model, without secondary peaks or signs of heterozygosity (Fig. 3a–d). GC-depth correlation analysis showed no significant GC bias in either sequencing dataset (Fig. 3e,f).
Assessment of genome assembly quality through read coverage depth. (a) Genome coverage depth by Illumina reads; (b) Coverage depth of BUSCO core genes by Illumina reads; (c) Genome coverage depth by HiFi reads; (d) Coverage depth of BUSCO core genes by HiFi reads; (e) Illumina read coverage depth across different GC contents; (f) HiFi read coverage depth across different GC contents.
Assembly completeness was further assessed using BUSCO (v5.8.2)54 with the embryophyta_odb10 dataset. The complete diploid assembly (haplotypes a and b) contained 98.9% complete BUSCOs, with subgenomes “a” and “b” each showing 98.7% and 98.9% completeness, respectively. Evaluation of the annotated protein-coding gene set yielded an even higher completeness rate of 99.3% (Table 1, Supplementary Table 8), with 98.2% in subgenome “a” and 98.0% in subgenome “b”.
To validate haplotype phasing, we used KAT (v2.4.1)55 to generate k-mer spectra from the raw Illumina reads. The spectra showed clear separation of haplotype-specific k-mers between subgenomes “a” and “b” (Fig. 4), confirming phasing accuracy.
Quality assessment of the genome assembly. K-mer analysis was performed using KAT to compare HiFi reads to the complete genome (a), subgenome a (b), and subgenome b (c) of A. truncatum. The plots are color-coded to show the frequency of specific k-mers from the reads in the assembly. K-mers absent from the assembly are shown in black, while those present are indicated by red (once), purple (twice), green (three times), blue (four times), yellow (five times), and orange (six or more times). Panels A, B, and C represent assessments of the complete A. truncatum genome, assembly A, and assembly B, respectively.
Hi-C contact matrices were generated by aligning Hi-C reads to the final assembly using Juicer, and visualized using Juicebox. The matrices showed strong intra-chromosomal contacts and minimal off-diagonal noise, indicating high-quality scaffolding (Fig. 5a,b). A large-scale inversion polymorphism between chromosomes 6a and 6b was identified and confirmed by synteny analysis using minimap2 (Fig. 5c,d), indicating real biological structural variation rather than misassembly.
Evaluation of the assembled transcriptome
Transcriptome assemblies were validated using BUSCO against the embryophyta_odb10 dataset. The de novo assembled transcript set from Trinity (v2.15.1)38 yielded 91.0% completeness. The genome-guided assembly, generated by mapping reads with Hisat2 (v2.2.1)56 and assembling transcripts with StringTie (v3.0.0)57, achieved 98.0% completeness. The merged transcript set contained 126,155 sequences and showed a BUSCO completeness of 99.0%, confirming its suitability as transcript evidence for gene annotation.
Data availability
All data generated or analyzed during this study are publicly available. The haplotype-resolved genome assembly of Acer truncatum is available at the European Nucleotide Archive (ENA) under accession50. In addition, the genome assembly and annotation files are also available on Figshare51, providing a complete downloadable package. Supplementary material associated with this study can be accessed online via Figshare52.
Code availability
All genome assembly, quality control, and annotation analyses were performed using publicly available software tools following recommended usage guidelines as outlined in their official documentation. Detailed software versions and execution parameters are provided in the Methods section. No custom code was generated or applied in this study.
References
Li, H. & Durbin, R. Genome assembly in the telomere-to-telomere era. Nat. Rev. Genet. 1, 1–13, https://doi.org/10.1038/s41576-024-00718-w (2024).
Garg, V. et al. Unlocking plant genetics with telomere-to-telomere genome assemblies. Nat. Genet. 56, 1788–1799, https://doi.org/10.1038/s41588-024-01830-7 (2024).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Lin, Y. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic. Res. 10, uhad127, https://doi.org/10.1093/hr/uhad127 (2023).
Jia, K. H. et al. Telomere-to-telomere genome assemblies of cultivated and wild soybean provide insights into evolution and domestication under structural variation. Plant Commun. 10, 100919, https://doi.org/10.1016/j.xplc.2024.100919 (2024).
Shang, L. et al. A complete assembly of the rice Nipponbare reference genome. Mol. Plant 16, 1232–1236, https://doi.org/10.1016/j.molp.2023.08.003 (2023).
Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1221–1231, https://doi.org/10.1038/s41588-023-01419-6 (2023).
Guan, J. et al. A near-complete cucumber reference genome assembly and Cucumber-DB, a multi-omics database. Mol. Plant 17, 1178–1182, https://doi.org/10.1016/j.molp.2024.06.012 (2024).
Wang, L. et al. Telomere-to-telomere and haplotype-resolved genome assembly of the Chinese cork oak (Quercus variabilis). Front. Plant Sci. 14, 1290913, https://doi.org/10.3389/fpls.2023.1290913 (2023).
Zhang, S. et al. T2T reference genome assembly and genome-wide association study reveal the genetic basis of Chinese bayberry fruit quality. Hortic. Res. 11, uhae033, https://doi.org/10.1093/hr/uhae033 (2024).
Shi, T.-L. et al. High-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar. Plant Physiol. 195, 652–670, https://doi.org/10.1093/plphys/kiae078 (2024).
Gao, J. et al. Historical biogeography of Acer L. (Sapindaceae): genetic evidence for Out-of-Asia hypothesis with multiple dispersals to North America and Europe. Sci. Rep. 10, 21178, https://doi.org/10.1038/s41598-020-78145-0 (2020).
Wu, Y. et al. Potential suitable habitat of two economically important forest trees (Acer truncatum and Xanthoceras sorbifolium) in East Asia under current and future climate scenarios. Forests 12, 1263, https://doi.org/10.3390/f12091263 (2021).
Dirr, M. A. & Lindstrom, O. M. Jr. Cold hardiness estimates of Acer L. taxa. J. Environ. Hortic. 11, 203–205, https://doi.org/10.24266/0738-2898-11.4.203 (1993).
Wang, R. et al. Genome survey sequencing of Acer truncatum Bunge to identify genomic information, simple sequence repeat (SSR) markers and complete chloroplast genome. Forests 10, 87, https://doi.org/10.3390/f10020087 (2019).
Qiao, Q. et al. Transcriptome sequencing and flavonoid metabolism analysis in the leaves of three different cultivars of Acer truncatum. Plant Physiol. Biochem. 171, 1–13, https://doi.org/10.1016/j.plaphy.2021.12.027 (2022).
Zhang, H. et al. High-quality maple genome reveals duplication-facilitated leaf color diversity. Plant Sciences 338, 111917, https://doi.org/10.1016/j.plantsci.2023.111917 (2024).
Ma, Q. et al. The Acer truncatum genome provides insights into nervonic acid biosynthesis. The Plant Journal 104, 662–678, https://doi.org/10.1111/tpj.14954 (2020).
Song, W. et al. Cognitive improvement effect of nervonic acid and essential fatty acids on rats ingesting Acer truncatum Bunge seed oil revealed by lipidomics approach. Food Funct. 13, 2475–2490, https://doi.org/10.1039/d1fo03671h (2022).
Ma, Q. et al. Transcriptome analysis of Acer truncatum seeds reveals candidate genes related to oil biosynthesis and fatty acid metabolism. Journal of the American Society for Horticultural Science 146, 99–108, https://doi.org/10.21273/jashs05013-20 (2021).
Namiecinska, M., Piatek, P. & Lewkowicz, P. Nervonic acid synthesis substrates as essential components in profiled lipid supplementation for more effective central nervous system regeneration. Int. J. Mol. Sci. 25, 3792, https://doi.org/10.3390/ijms25073792 (2024).
Yu, T. et al. Whole-genome sequencing of Acer catalpifolium reveals evolutionary history of endangered species. Genome Biol. Evol. 13, evab271, https://doi.org/10.1093/gbe/evab271 (2021).
McEvoy, S. L. et al. Strategies of tolerance reflected in two North American maple genomes. The Plant Journal 109, 1591–1613, https://doi.org/10.1111/tpj.15657 (2022).
Chen, Z. et al. Chromosomal-level genome and multi-omics dataset provides new insights into leaf pigmentation in Acer palmatum. Int. J. Biol. Macromol. 227, 93–104, https://doi.org/10.1016/j.ijbiomac.2022.11.303 (2023).
Li, X. et al. Chromosome-level genome assembly for Acer pseudosieboldianum and highlights to mechanisms for leaf color and shape change. Front. Plant Sci. 13, 850054, https://doi.org/10.3389/fpls.2022.850054 (2022).
Lu, X. et al. The chromosome-scale genome provides insights into pigmentation in Acer rubrum. Plant Physiol. Biochem. 186, 322–333, https://doi.org/10.1016/j.plaphy.2022.07.007 (2022).
Yang, J. et al. De novo genome assembly of the endangered Acer yangbiense, a plant species with extremely small populations endemic to Yunnan Province, China. GigaScience 8, giz085, https://doi.org/10.1093/gigascience/giz085 (2019).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95, https://doi.org/10.1126/science.aal3327 (2017).
Robinson, J. T. et al. Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst. 6, 256–258 e251, https://doi.org/10.1016/j.cels.2018.01.001 (2018).
Jia, K. H. et al. SubPhaser: a robust allopolyploid subgenome phasing method based on subgenome‐specific k‐mers. New Phytol. 235, 801–809, https://doi.org/10.1111/nph.18173 (2022).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).
Roa, F. & Guerra, M. Distribution of 45S rDNA sites in chromosomes of plants: Structural and evolutionary implications. BMC Evol. Biol. 12, 225, https://doi.org/10.1186/1471-2148-12-225 (2012).
Liu, Z. L., Zhang, D., Hong, D. Y. & Wang, X. R. Chromosomal localization of 5S and 18S–5.8S–25S ribosomal DNA sites in five Asian pines using fluorescence in situ hybridization. Theor. Appl. Genet. 106, 198–204, https://doi.org/10.1007/s00122-002-1024-z (2003).
Jin, J. J. et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 21, 241, https://doi.org/10.1186/s13059-020-02154-5 (2020).
Pryszcz, L. P. & Gabaldon, T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 44, e113, https://doi.org/10.1093/nar/gkw294 (2016).
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275, https://doi.org/10.1186/s13059-019-1905-y (2019).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512, https://doi.org/10.1038/nprot.2013.084 (2013).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–439, https://doi.org/10.1093/nar/gkl200 (2006).
Korf, I. Gene finding in novel genomes. BMC Bioinf. 5, 59, https://doi.org/10.1186/1471-2105-5-59 (2004).
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196, https://doi.org/10.1101/gr.6743907 (2008).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410, https://doi.org/10.1016/s0022-2836(05)80360-2 (1990).
Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinf. 6, 31, https://doi.org/10.1186/1471-2105-6-31 (2005).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7, https://doi.org/10.1186/gb-2008-9-1-r7 (2008).
Zhang, R. G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 9, uhac017, https://doi.org/10.1093/hr/uhac017 (2022).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964, https://doi.org/10.1093/nar/25.5.955 (1997).
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation,orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829, https://doi.org/10.1093/molbev/msab293 (2021).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240, https://doi.org/10.1093/bioinformatics/btu031 (2014).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60, https://doi.org/10.1038/nmeth.3176 (2015).
EMBL-EBI ENA https://identifiers.org/insdc.gca:GCA_976991395 (2025).
Jia, K.-H. Acer truncatum Genome Assembly and Annotation. https://doi.org/10.6084/m9.figshare.27020836.v1 (2024).
Wang, Z.-W. A near-complete reassembled haplotype-resolved reference genome of Acer truncatum (Supplementary Tables). figshare https://doi.org/10.6084/m9.figshare.30294964.v1 (2025).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv 1303, 3997, https://doi.org/10.48550/arXiv.1303.3997 (2013).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Mapleson, D., Garcia Accinelli, G., Kettleborough, G., Wright, J. & Clavijo, B. J. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 33, 574–576, https://doi.org/10.1093/bioinformatics/btw663 (2017).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360, https://doi.org/10.1038/nmeth.3317 (2015).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295, https://doi.org/10.1038/nbt.3122 (2015).
Acknowledgements
This research was supported by the Central Finance Forestry Reform and Development Fund Projects: “Technological Demonstration and Promotion of Ancient and Notable Trees Protection” (LU[2024]TG08), “Demonstration of Improved Variety Breeding and High Yield Cultivation Techniques of Acer truncatum” (LU[2020]TG01), and “Demonstration Project for Breeding and Cultivation Techniques Promotion of Major Acer Species” (LU[2023]TG018).
Author information
Authors and Affiliations
Contributions
L.W. and D.L. conceived and designed the study; Z.W.W., H.Y.M. and Z.M.S. collected materials and analyzed the data; R.G.Z., X.Q. and K.H.J. prepared figures and tables; Z.W.W., K.H.J. and D.L. wrote and revised the manuscript; all authors approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, L., Wang, ZW., Ma, HY. et al. A near-complete reassembled haplotype-resolved reference genome of Acer truncatum. Sci Data 12, 1990 (2025). https://doi.org/10.1038/s41597-025-06281-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-025-06281-1







