Background & Summary

Kiwifruit (Actinidia Lindl.), known as the “king of fruits” due to its exceptional phytochemical profile characterized by elevated ascorbate concentrations, potassium abundance, and dietary fiber content. This woody vine, native to China’s mountains, adapts to a wide range of habitats, including forest edges, roadside thickets, drainage ditches, and shrubby undergrowth. The genus comprises 54 taxonomically validated species exhibiting extensive cytogenetic variation with ploidy levels spanning diploid (2n = 2x) to octoploid (2n = 8x), while maintaining a conserved base chromosome number (x = 29)1.

Despite this genomic wealth, global commercial production remains constrained to four domesticated taxa: A. chinensis var. chinensis, A. chinensis var. deliciosa, A. eriantha, and A. arguta. According to 2024 FAO statistical data, the global kiwifruit production matrix encompasses 286,100 ha cultivation area yielding 4.29 million metric tons annually2. This agricultural success is threatened by pandemic outbreaks of bacterial canker caused by Pseudomonas syringae pv. actinidiae (Psa), which is a devastating disease that occurs in kiwifruit growing areas worldwide3. The pathogenic infection exhibits rapid proliferation and poses a persistent challenge for containment, primarily colonizing the plant’s meristematic tissues including emerging shoots, foliar structures, and cambial regions. This colonization manifests as necrotic branch cankers, chlorotic leaf lesions, and eventual systemic plant decline, severely compromising both fruit production and phytosanitary quality. Current phytoprotection strategies predominantly depend on cupric compounds and bacteriostatic agents, however, their protracted field application raises substantial ecological concerns regarding microbial resistance development and environmental persistence4,5.

Since the initial release of the kiwifruit draft reference genome (A. chinensis, 2n = 58) in 2013, subsequent genomic assemblies across Actinidia species have been progressively published, substantially propelling advancements in kiwifruit functional genomics research6. However, these assemblies have substantial gaps and unanchored sequences. The advent of long-read sequencing technologies, including Pacific Biosciences (PacBio) HiFi and Oxford Nanopore Technology (ONT) ultra-long platforms, has enabled precise resolution of complex genomic architectures and accurate assembly of highly repetitive chromosomal regions through their ultra-long sequencing reads7. To date, long-read sequencing technologies have enabled high-quality genome assemblies for multiple kiwifruit species, including A. chinensis, A. deliciosa, A. eriantha, A. arguta, A. latifolia, A. rufa, A. polygama, A. hemsleyana and A. zhejiangensis8,9,10,11,12,13,14,15,16,17,18,19,20. However, the lack of high-quality genome assemblies for canker-resistant kiwifruit cultivars, particularly the Psa-resistant A. chinensis var. chinensis ‘Guimi No.2’ (Accession No. 13-6-11, 2n = 58) discovered in Guizhou karst ecosystems, continues to impede comprehensive studies on disease resistance mechanisms and the functional characterization of critical resistance (R) genes.

In this study, we utilized short-read, PacBio HiFi long-read sequencing and Hi-C technology to generate a high-quality, chromosome-level assembly of A. chinensis var. chinensis ‘Guimi No.2’ genome. In conclusion, this genome provides valuable genetic resources for underlying the disease resistance mechanisms of the kiwifruit.

Methods

Materials collection and sequencing

Fresh young leaves used for genome sequencing were collected from the A. chinensis var. chinensis ‘Guimi No.2’ plant grown in the kiwifruit germplasm resource garden of Guizhou, China. A modified cetyltrimethyl ammonium bromide (CTAB) method was used for DNA extraction21. A second-generation sequencing library was constructed using X paired-end DNBSEQ-T7 libraries (MGI, Shenzhen, China), generating 80.60 Gb of raw sequencing data (PE150 reads). High-quality and purified genomic DNA samples were obtained, then, a SMRT cell sequencing library containing about 15–20 kb cut fragment was constructed and sequenced using PacBio sequel II sequencing platform. In total, 52.78 Gb raw data were obtained. A Hi-C library was established using the Illumina NovaSeq. 6000 platform (Illumina, San Diego, CA, USA) was used for sequencing. After the sequencing data of Hi-C was filtered, a total of 72.27 Gb raw data was obtained. Total RNA was isolated from a total of four tissues (root, stem, leaf, fruit) using the TRIzol reagent (Vazyme, China) for transcriptome sequencing. A cDNA library for transcriptome sequencing was constructed using the MGI Easy RNA Library Prep Kit (MGI, Shenzhen, China) with 350 bp insert sizes, and paired-end sequencing was performed on the DNBSEQ-T7 platform (MGI, Shenzhen, China), generating 25.96 Gb of raw sequencing data (PE150 reads) (Table 1).

Table 1 Statistics of the sequencing data.

Genome survey

To estimate the genome size, the clean MGI sequencing reads were filtered with Fastp22 (v0.23.4) for subsequent k-mer analysis. Jellyfish23 (v2.3.0) was used to calculate the optimal k-mer and GenomeScope24 (v2.037) was used to estimate the genome size for corresponding k-mers. When k = 19, the genome size was about 634.80 Mb, and the heterozygosity was 0.91% (Fig. 1).

Fig. 1
figure 1

The characteristics of Guimi No.2 genome.

Chromosome-level genome assembly

The genome was de novo assembled using Hifiasm25 (v0.24.0-r703) in Hi-C integrated assembly mode by combining PacBio HiFi long-read sequencing data with Hi-C chromatin interaction information. The primary assembly generated with default parameters was subsequently processed with purge_dups26 (v1.2.5) to eliminate redundancies. Chromosome-level scaffolding was performed using the HapHiC27 (v1.0.6) pipeline with species-specific parameterization (chromosome number set as 29). After manually checking and correcting in Juicebox28 (v2.1543), the pipeline finally generated a chromosome-level genome. Following the adjustment of chromosome coordinates, the Hi-C contact matrix heatmap was visualized with HiC-Pro29 and EndHiC30 (Fig. 2).

Fig. 2
figure 2

Hi-C contact matrix heatmap of Guimi No. 2 genome.

Based on the assembly pipeline, we obtained a genome size of 608.43 Mb with a contig N50 size of 20.70 Mb. The completeness of assembled genome was evaluated using BUSCO31 (v5.8.2) with the eudicots_odb10 dataset (comprising 2,326 single-copy orthologs), the analysis revealed that 98.40% (2,289/2,326) of conserved genes were fully detected (Complete BUSCOs, C), including 68.83% (1,601/2,326) as single-copy (S) and 29.57% (688/2,326) as duplicated (D), with 57 complete BUSCOs containing internal stop codons. 1.20% (28/2,326) of genes were fragmented (F), 0.38% (9/2,326) were missing (M), and 2.50% were excluded due to alignment issues (E) .

Genome annotation

EDTA32 (v2.2.2) pipeline was used to annotate repeat elements in the ‘Guimi No.2’ genome. A total of 251.15 Mb repeat sequences were identified, accounting for 41% of the entire genome. Gene structure annotation was performed using the BRAKER333 (v3.0.8) pipeline, which automates genome annotation by integrating transcriptome-based prediction (RNA-Seq data from root, stem, leaf, and fruit tissues generated in this study), homology-based prediction (A. chinensis, A. eriantha, A. arguta, A. zhejiangensis and A. melanandra), and ab initio prediction implemented through GeneMark-ETP34 (v1.02) and AUGUSTUS35 (v3.5.0). The pipeline utilized TSEBRA36 to merge evidence from RNA-Seq alignments (HISAT237 v2.2.1), protein homology, and ab initio predictions. Gene models were filtered to retain only those containing both start and stop codons and exceeding 100 nucleotides in length, yielding a final consensus set of 45,986 genes. Functional annotation was performed using emapper38 (v2.0.1) based on eggNOG orthology data.

The completeness of protein-coding genes in the assembled genome was evaluated using BUSCO (v5.8.2) with the eudicots_odb10 dataset (comprising 2,326 single-copy orthologs), the analysis revealed that 96.56% of conserved genes were fully detected (Complete BUSCOs, C), including 51.54% (1,199/2,326) as single-copy (S) and 45.01% (1,047/2,326) as duplicated (D). Only 0.94% (22/2,326) of genes were fragmented (F), while 2.49% (58/2,326) were missing (M). To show the characteristics of the Guimi No. 2 genome, we exhibited GC content, gene density, and repeats density (Fig. 3).

Fig. 3
figure 3

Circos plot of Guimi No. 2 genome. (A) Gene density, (B) GC content, (C) Repeat density, (D) Chromosome sizes.

Resistance gene analysis

The Psa-susceptible kiwifruit cultivar Hongyang10 (A. chinensis var. chinensis, 2n = 58) and the Psa-resistant cultivar Guimi No.2 were selected for comparative analysis of structural variations by MUMmer (v4.0.1, nucmer–mum)39. The structural variations identified included: 52 BRK (An insertion in the reference of unknown origin, that indicates no query sequence aligns to the sequence bounded by gap-start and gap-end, total length: 1,580,120 bp, accounting for 0.25% of the whole genome); 16,156 GAP (A gap between two mutually consistent ordered and oriented alignments, total length: 62,619,327 bp, 10.29% of the genome); 82 INV (The same as a relocation event, however both the ordering and orientation of the alignments is disrupted, total length: 2,187,148 bp, 0.35% of the genome); 217 JMP (A relocation event, where the consistent ordering of alignments is disrupted, total length: 5,104,265 bp, 0.83% of the genome); and 197 SEQ (A translocation event that requires jumping to a new query sequence in order to continue aligning to the reference, total length: 2,520,277 bp, 0.41% of the genome). Genome-wide identification of nucleotide-binding leucine-rich repeat (NLR) domains was performed using NLR-Annotator40 (v2.1b) with default parameters, revealed 252 and 282 putative NLR genes in Hongyang and Guimi No.2 (Fig. 4). Statistical analysis of the positional distribution of these SVs revealed that 205 SVs are located within the gene bodies and the 2-kb flanking regions upstream and downstream of 205 NLRs identified in Guimi No.2 (Supplementary Table 1).

Fig. 4
figure 4

Distribution of NLR genes along chromosomes in Guimi No. 2.

Data Records

The whole-genome sequencing raw data were deposited to the NCBI Sequence Read Archive with accession number SRR3279928541 (NGS-LEAF), SRR3279928642 (HIFI-LEAF), SRR3279928743 (HIC-LEAF), SRR3279928844 (RNA-LEAF), SRR3279928945 (RNA-FRUIT), SRR3279929046 (RNA-STEM), SRR3279929147 (RNA-ROOT). The assembled genome sequences have been deposited in the NCBI GenBank with the accession number JBMUMY000000000.148. The genome and annotation files were also deposited to the Figshare database49.

Technical Validation

The quality of the genome assembly was analyzed in the following aspects (Table 2): (1) The BUSCO score of 98.40%, 96.56% depicted the degree of completeness of the assembled genome and annotation. (2) 99.92% of PacBio HiFi reads were remapped to the assembled genome. (3) The quality value (QV) of the assembled genome was estimated using Merqury50 (v1.3), resulting in a value of 72.23 (long reads). (4) The LTR Assembly Index (LAI) of the assembled genome was estimated using LTR_FINDER_parallel51 (v1.07) and LTR_retriever52 (v2.9.5), resulting in a value of 10.10 which can be categorized as ‘Reference’ level. (5) Overall, these metrics indicate that the genome assembly of ‘Guimi No.2’ is of high quality and well-annotated.

Table 2 Genome assembly statistic.