Background & Summary

Ficus is a large plant genus with over 800 species and a largely tropical and subtropical distribution1,2,3. This woody genus displays a wide range of growth-forms, including shrubs, trees, hemiepiphytes, and lianas, thriving in various climatic and geographic conditions and playing crucial roles in tropical and subtropical ecosystems3,4,5. Because they can fruit year-round, fig trees play a vital role in sustaining a broad range of frugivorous animal communities6,7,8. The obligate mutualism between figs and their pollinating wasps also serves as an excellent model system for studying coevolutionary relationships9,10,11,12,13,14,15,16,17,18,19.

Because of the vigorous growth and great plasticity, some strangler figs of Ficus subg. Spherosuke ( = subg. Urostigma) have high ornamental value. Among these, Ficus benjamina is the second most widely distributed and cultivated species. This fig tree is notable for its diverse morphology, weeping branches and foliage, and indistinct lateral veins20. Among the many cultivars of F. benjamina, some have variegated leaves, while others have contorted wavy branches or curled leaves21.

There are currently eight published Ficus genomes, ranging in size from 297.27 Mb to 426.56 Mb22,23,24,25,26,27. Recently, researchers have employed whole-genome sequencing data to explore various topics, including sex-determining genes, the development of aerial roots, the mechanisms underlying plant longevity22,23,26,28, and the obligate mutualistic relationships between figs and fig wasps24,29. Comparative genomics analyses involving multiple Ficus species have facilitated a better understanding of their evolutionary history. However, the genomics underlying many of the horticultural properties that are important in ornamental fig trees remain unclear.

Here, we aimed to produce a high-quality, chromosome-scale, de novo genome assembly of Ficus benjamina using Illumina, PacBio, and chromosome conformation capture (Hi-C) sequencing technologies. This high-quality F. benjamina genome will help to elucidate the mechanisms of the ecological and horticultural characters of fig trees.

Methods

Sample collection, library construction, and sequencing

All samples of this research were taken from a living individual of Ficus benjamina cultivated at the South China Botanical Garden, Guangzhou, China (23°10′43.4″N 113°21′07.6″E). Fresh and healthy young leaves were collected for genome sequencing. Tissues, including leaves, stems, and inflorescences were sampled for transcriptome sequencing. All materials were promptly frozen using liquid nitrogen and stored at −80 °C until nucleic acid isolation. High-quality genomic DNA was extracted from sampled leaves using the conventional cetyltrimethylammonium bromide method (CTAB)30. Short-read libraries were constructed using Rapid Plus DNA Lib Prep Kit for Illumina (ABclonal, Cat. RK20208).

Paired-end reads of 150 bp were generated using an Illumina NovaSeq X Plus platform. For de novo genome assembly, high-molecular-weight DNA was used to construct a 15–20-kb SMRTbell library (SMRTbell Express Template Prep Kit 2.0, Pacific Biosciences). The library was sequenced on the PacBio Sequel II platform using circular consensus sequencing (CCS) mode with a minimum read quality of Q20 (≥99% accuracy). HiFi reads were generated using the CCS algorithm with ≥3 full passes per molecule. A Hi-C library was constructed using DpnII following the standard protocol described previously with modifications for plant samples31. The library was sequenced on an Illumina NovaSeq X Plus platform, generating 150 bp paired-end reads32. Total RNA was isolated using RNAprep Pure Plant Kit (Tiangen, China) and mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. RNA-seq libraries were prepared using Fast RNA-seq Lib Prep Kit V2 (ABclonal, Cat. RK20306) and sequenced on an Illumina NovaSeq X Plus platform using paired-end reads of 150 bp. All Illumina sequencing data were filtered using the fastp v0.23.1 software33 with default parameters. For genome sequencing, we generated: (1) 30.46 Gb of high-quality Illumina short-reads (97.80% Q20, 84.14 × coverage) for genome survey; (2) 32.70 Gb of PacBio HiFi reads (90.33 × coverage) for assembly; (3) 61.27 Gb of Hi-C data (97.60% Q20, 169.25 × coverage) for scaffolding; and (4) 25.21 Gb of RNA-seq data (98.95% Q20, 69.64 × coverage) for annotation (Table 1).

Table 1 Library sequencing data statistics.

Genome survey

The genome features of Ficus benjamina were surveyed using the k-mer method based on Illumina short-reads. The k-mer count histogram was generated using Jellyfish v2.2.734 with the following parameters: ‘count -G 2 -m 17 -C -o kmercount’. The analysis based on 17-mers estimated the genome size of F. benjamina to be approximately 419.6 Mb, with repeat sequences of highly approximate 52.4% and a heterozygosity of 1.57% (Fig. 1a).

Fig. 1
figure 1

Overview of the Ficus benjamina genome assembly: (a) genome survey based on the K-mer distribution analysis using a k-mer size of 17-mers, (b) Hi-C interaction heat map for the assembled genome.

Genome assembly

High-quality PacBio HiFi long-reads were assembled into contigs using hifiasm v0.15.435 with default parameters, yielding a preliminary assembly of 409.26 Mb. Given the high heterozygosity, we performed deduplication using purge_dups v1.2.536 to remove haplotypic redundancies, followed by assembly polishing with NextPolish237. To anchor the contigs into pseudochromosomes, Hi-C data were aligned to the final assembled contigs by juicer pipeline v1.638 to obtain an interaction matrix. The contigs were then ordered and anchored using the Hi-C scaffolding tool, YaHS v1.239. The diploid chromosome number of F. benjamina (2n = 26) was confirmed using the Chromosome Counts Database (CCDB; https://taux.evolseq.net/CCDB_web), guiding the pseudochromosome construction. The Hi-C contact maps of the final assembly result were examined manually with Juicebox v2.2040. The Hi-C interaction heat map showed a strong intrachromosomal interactive signal along the diagonal (Fig. 1b). Finally, a gap-free Ficus benjamina genome of 362.73 Mb was constructed, with a contig N50 length of 25.76 Mb (Table 2), and 13 large contigs representing 13 pseudochromosomes (Fig. 2a).

Table 2 Statistics of the Ficus benjamina genome assembly and annotation.
Fig. 2
figure 2

Genomic features of Ficus benjamina showed by concentric circles from outside to inside: (a) chromosomes, (b) gene density, (c) repeat density, (d) Copia density, (e) Gypsy density, (f) GC density, (g) syntenic gene blocks.

Transposable elements and non-coding RNA annotation

Transposable elements (TEs) were identified and classified using Extensive de-novo TE Annotator (EDTA) v2.1.041. To predict non-coding RNA, tRNA genes were identified with tRNAscan-SE v2.0.642. Others, including miRNA, rRNA and snRNA genes, were detected by comparison with the Rfam database43 using CMsearch v1.1.344 under default parameters. The composition of these TEs included 24.20% long terminal repeat (LTR) elements, 8.49% terminal inverted repeat (TIR) elements, and 4.04% Helitrons (Table 3). Among the classified retroelements, the Copia and Gypsy superfamilies accounted for 4.36% and 19.52% of the assembly, respectively (Fig. 2c–e; Table 3). The most abundant DNA transposon superfamily was Mutator, comprising 4.86% of the assembly (Table 3). Genome-wide screening for non-coding RNAs revealed 526 tRNAs, 125 miRNAs, 3,514 rRNAs, and 523 snRNAs (Table 4). In addition, we found most of the LTRs have been accumulated recently over a short time span with the peak of 0.15 million years ago (Ma), suggesting an expansion event (Fig. 3).

Table 3 Statistics of repeat sequences in the Ficus benjamina genome.
Table 4 Summary of non-coding RNA genes annotated in the Ficus benjamina genome.
Fig. 3
figure 3

The distribution of insertion time (Ma, million years ago) of intact LTRs in Ficus benjamina.

Gene prediction and functional annotation

For protein-coding gene prediction, we used the pipeline MAKER v3.01.0245 with combined homology-based, transcriptome-based, and ab initio prediction methods. First, we used homologies from related species as protein-based evidence for gene sets prediction using GeneWise v2.4.146. The related species include Ficus carica, F. hispida, F. microcarpa, Morus notabilis, Vitis vinifera, and Arabidopsis thaliana. Transcriptome data, including leaf, stem, and inflorescence RNA-seq reads were mapped using HISAT2 v2.1.047. Ab initio gene prediction was carried out using AUGUSTUS v3.4.048, trained by the transcriptome data. To functionally annotate the predicted gene models, several different databases were searched, including NCBI nr49, Swiss-Prot50, eggNOG51, and Pfam52 using BLASTP53. Finally, we annotated 28,840 protein-coding genes with an average exon length of 337.6 bp, and an average intron length of 445 bp (Table 2, Fig. 2b). In total, 26,892 (96.22%) genes were assigned specific functions (Table 5).

Table 5 Gene functional annotation in the Ficus benjamina genome.

Genome synteny analysis

To reveal the syntenic relationships between the protein-coding genes of Ficus benjamina and other four representative figs, collinear blocks between them were identified based on protein sequences using MCScan implemented in jcvi v1.2.754. The syntenic gene blocks and syntenic depth showed 1:1 syntenic patterns between F. benjamina and other four figs (Fig. 4), indicating a conserved genome structure across the genus.

Fig. 4
figure 4

Syntenic blocks and syntenic depth between assembly of Ficus benjamina and other four figs.

Data Records

The raw sequencing data have been deposited in the Genome Sequence Archive (GSA) in National Genomics Data Center (NGDC) database (https://ngdc.cncb.ac.cn/) under the accession number CRA01800655. The final chromosome assembly was deposited in NCBI GenBank under accession number JBFTXC00000000056. The draft genome assembly and genome annotation were deposited in the Figshare database (https://doi.org/10.6084/m9.figshare.27980945)57.

Technical Validation

The quality of the Ficus benjamina genome assembly was evaluated using four approaches. First, the completeness of the genome assembly was assessed using BUSCO v5.4.558 against the embryophyta_odb10 database (containing 1614 orthologs). The results showed 98.10% completeness (1584 complete BUSCOs), comprising 96.30% single-copy (1555) and 1.80% duplicated (29) orthologs (Table 6). Then, the assembly continuity was determined by analyzing the LTR Assembly Index (LAI)59, which had a value of 21.14 (Table 2). Additionally, for the assessment of the assembly’s correctness, we re-aligned Illumina DNA sequencing data and PacBio HiFi long-reads against the genome using BWA v0.7.1560 and minimap2 v2.24-r11226261, respectively. The results indicated high mapping rates of Illumina short-reads (98.05%) and HiFi long-reads (99.86%). Finally, quality value (QV) was estimated using Merqury v1.36562, resulting in a value of 73.33 (Table 2). All these results indicate that the F. benjamina genome assembly presented here is of high quality.

Table 6 Result of the BUSCO assessment of the Ficus benjamina genome.