Background & Summary

European rabbits (Oryctolagus cuniculus, 2n = 44), as an important branch of mammals, have unique physiological characteristics, ecological habits, and wide applications in agriculture, medicine, and biological research, making them a hot topic in genetics and evolutionary biology1. The Fujian Yellow Rabbit (O. cuniculus-FJY), as an endemic population in Fujian, China, exhibits distinct morphological traits, has a unique geographical environment and is relatively isolated. Over a long period of time, it has faced complex but relatively stable ecological conditions and human activities, which may have led to significant genetic drift and local adaptation within the population. Ren et al. compared the genetic diversity and population structure of four Chinese rabbit breeds, revealing that the O. cuniculus-FJY breed exhibits the highest levels of genetic diversity2. Meanwhile, The O. cuniculus-FJY distinguishes itself from other rabbit breeds due to its yellow coat, slow growth rate, small body size, high protein, and low-fat muscle content, showcasing unique traits of domestication and adaptive evolution3,4. Therefore, resolving the genomic architecture of O. cuniculus-FJY could inform conservation strategies and selective breeding programs, ensuring the preservation and sustainable utilization of its valuable genetic resources.

To date, several O. cuniculus genome assemblies have been released in the database, such as NCBI (https://www.ncbi.nlm.nih.gov/)1,5. The most complete among them is the assembly mOryCun1.1 (GeneBank: GCF_964237555.1), which delivers nine T2T chromosomes and nine chromosomes with a single telomere. However, as this assembly was generated from a female individual, it lacks Y-linked sequences and still contains 195 unplaced scaffolds. This incomplete genome information severely limits our understanding of rabbit genetic diversity and evolutionary history, as well as hindering the elucidation of the genetic mechanisms underlying the unique traits of the O. cuniculus-FJY. Unlike the rabbit breeds with published reference genomes, the O. cuniculus-FJY is a local breed with distinctive characteristics and considerable breeding value. It differs markedly in appearance from commercial breeds such as the New Zealand White Rabbit (O. cuniculus-NZW), exhibiting, for example, a yellow coat color and black eyes. Consequently, its genome is distinct from that of the O. cuniculus-NZW. This, in turn, affects breeding and improvement of production performance. Therefore, generating a high-quality genome for the O. cuniculus-FJY not only helps us understand the molecular basis of its origin and differentiation, but also provides a scientific basis for building regional genetic resources, protecting rare populations, and improving breeding strategies.

In this study, we integrated PacBio HiFi, ONT ultra-long reads, and Hi-C sequencing data to generate a near-T2T assembly of the O. cuniculus genome, which includes the first complete assembly of its Y chromosome. This assembly achieved nearly full T2T genome with only 30 unplaced scaffolds remaining, surpassing all previously published O. cuniculus genomes. This study not only enhances our comprehension of rabbit biology but also serves as a valuable resource for future studies aimed at conservation and genetic improvement, ultimately benefiting both the species and the agricultural practices associated with it.

Methods

Ethics statement

All animal experiments were performed in strict compliance with the guidelines established by the Committee on Animal Research and Ethics of Fujian University of Traditional Chinese Medicine. (approval number: FJTCM IACUC 2024330).

Sample collection and sequencing

A four-month-old male O. cuniculus-FJY was obtained from the National Fujian Yellow Rabbit Conservation Farm (Fuzhou, Fujian, China). Genomic DNA and transcriptome sequencing libraries were prepared from blood samples. Sequencing data were generated using multiple platforms, including PacBio high-fidelity (HiFi) reads, Oxford Nanopore (ONT) ultra-long reads, paired-end reads, and Hi-C reads (Table 1). For PacBio HiFi sequencing, a standard SMRTbell library was constructed using the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences, USA) following the manufacturer’s protocol. Sequencing was performed on the PacBio Revio platform in circular consensus sequencing (CCS) mode. Two SMRT cells were sequenced, yielding approximately 11 million high-quality CCS reads (184.14 Gb, ~64 × coverage). The average CCS read length was 16.96 Kb, with an N50 of 16.82 Kb. For ONT sequencing, a PromethION library was prepared following the Oxford Nanopore SQK-LSK109 kit protocol (Oxford Nanopore Technologies). Four flow cells were sequenced, and raw data were processed using the Dorado server (v7.2.13) (https://github.com/nanoporetech/dorado/) with adapter trimming and quality filtering (parameter: ‘–min-qscore 7’). This generated 568,258 high-quality ONT reads (56.90 Gb), with the longest and average read lengths being 813.30 Kb and 100.13 Kb, respectively (Fig. 1). The Hi-C library was constructed from cross-linked genomic DNA and sequenced on the DNBSEQ-T7 platform (MGI) using 2 × 150 bp paired-end reads, producing 166.79 Gb of clean data (~58 × coverage) with fastp (v0.26.0; parameter: ‘fastp -Q -L–adapter_sequence AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA–adapter_sequence_r2 AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG–out1 [output read1]–out2 [output read2]–in1 [input read1]–in2 [input read2]’)6. Additionally, next-generation DNA sequencing and RNA-seq were performed on the DNBSEQ-T7 platform, yielding 216.05 Gb and 10.99 Gb of clean data, respectively (Table 1).

Table 1 The summary of the sequencing data for O. cuniculus-FJY genome assembly and annotation.
Fig. 1
Fig. 1
Full size image

Read length distribution and GC content-coverage depth analysis in CCS and ONT sequencing. (a,b) Read length distribution based on CCS and ONT sequencing. The x-axis represents read length (Kb) and the y-axis represents the number of reads. (c,d) Statistical graphs showing the association between GC content and coverage depth in short reads and long reads. The x-axis represents GC content and the y-axis represents coverage depth. The distribution of contig coverage depth is shown on the right side, and the distribution of GC content is shown on the top. The scatter plot in the center is based on the GC distribution and coverage depth information of contigs, with color intensity indicating point density.

Genome assembly and evaluation

Initial chromosome-level genome assembly

Initial assembly of contig sequences was performed with Hifiasm (v0.19.6; parameters: ‘–h1 hic.R1.fq.gz–h2 hic.R2.fq.gz–ul ont.fa.gz HiFi-reads.fa.gz’)7, incorporating HiFi reads, ONT reads, and clean Hi-C data. To eliminate contaminated sequences and mitochondrial sequences, the initially assembled genome was aligned against the NT database (https://ftp.ncbi.nlm.nih.gov/blast/db/) using BLASTN (v2.11.0 + ; parameters: ‘-evalue 0.00001, -num_alignments 5, and -max_hsps 1’)8. Contigs with more than 50% of their sequences identified as contamination or originating from organelles were completely removed from the assembly. For chromosomal scaffolding, low-quality Hi-C reads were removed using HiCUP (v0.7.2; parameter: ‘–NM 3’)9, followed by alignment of the filtered Hi-C data to the contig assembly via BWA (v0.7.12; default parameters)10. Subsequently, valid Hi-C reads were analyzed using Juicer (v1.6; default parameters)11 and 3D-DNA (parameter: ‘-r 0’)12 to scaffold contigs into chromosomes. Contig ordering and assembly accuracy were verified and manually corrected in JuiceBox (v1.11.08; default parameters)13. Following refinement and genome adjustment, the final assembly was anchored to 23 chromosomes (comprising 21 autosomes, X, and Y) (Table 2).

Table 2 Genome assembly statistics of O. cuniculus-FJY.

T2T genome assembly

To achieve a T2T assembly, telomeric regions were extended by aligning ONT ultra-long reads to the scaffolded genome using Minimap2 (v2.24; default parameters)14, followed by refinement with medaka consensus (v1.7.2; https://github.com/nanoporetech/medaka) and BLASTN (v2.11.0+; default parameters)8. Contig gaps were resolved using TGS-GapCloser (v1.2.0; parameters: ‘–min_nread 10’)15, leveraging ONT ultra-long reads and coverage-based relationships among contigs. The extended and gap-filled assembly was subsequently polished with 216.05 Gb of clean short-read data (~75 × coverage) using Pilon (v1.23; default parameters)16. The final near T2T assembly spanned 23 chromosomes, totaling 2.88 Gb (30 unmapped contigs remained, with a mounting rate of 99.80%), with a scaffold N50 of 148.90 Mb and a GC content of 44.0%. Notably, this represents the first assembled rabbit Y chromosome, measuring 34.43 Mb in length (Fig. 2a–c, Tables 2, 3).

Fig. 2
Fig. 2
Full size image

Landscape of genome assembly and annotation results. (a) Genomic characteristics within non-overlapping 400 Kb windows of the O. cuniculus-FJY genome. Circos diagram arranged from the outer to the inner layers: (A) GC content, (B) gene density, (C) repeat density, (D) LTR density, (E) LINE density, (F) DNA transposable element density. (b) Whole-genome Hi-C interaction map using 1 Mb bin size. The x and y axes represent chromosomes, and the darker the color, the stronger the interaction intensity. (c) Maps of contig distribution on chromosomes with annotated telomere positions in O. cuniculus-FJY. (d) Divergence distribution of four types of TE sequences annotated by RepeatMasker. The x-axis represents the divergence between the annotated TE sequences and the corresponding repetitive sequences in the de novo library; the y-axis represents the percentage of TE sequences with that divergence in the entire genome; different TEs are indicated by different colors. (e) The Venn diagram of gene function annotation based on the five databases.

Table 3 Summary of chromosomal and telomere information for the O. cuniculus-FJY genome.

Telomere identification

Telomeric regions were identified by leveraging the conserved sequence signature of telomeres (CCCTAA/TTAGGG). A genome-wide scanning was performed to detect characteristic sequences that contain at least four repeat units and are situated within 50 Kb of either end of each chromosome using the quarTeT (v1.1.36; parameter: ‘-m 1’) software17,18. Telomeres were annotated across 23 chromosomes, with each chromosome showing detectable telomeric sequences (Fig. 2c, Table 3).

Repeat sequence annotation

The repetitive sequences in the O. cuniculus-FJY genome were annotated using the following three methods: (1) Based on the RepBase database (https://www.girinst.org/repbase/), homology prediction was conducted using RepeatMasker (v4.0.9; default parameters) (http://www.repeatmasker.org) to identify DNA-level repetitive sequence features, resulting in 1.23 Gb of repetitive sequences, accounting for approximately 42.90% of the genome (Fig. 2d); (2) Using the RepeatProteinMask tool integrated in RepeatMasker (v4.0.9; default parameters), homology prediction was performed to identify protein-level repetitive sequence features, yielding 210.71 Mb of repetitive sequences, which represent about 7.32% of the genome; (3) A self-built repetitive sequence feature library was constructed using RepeatModeler (v1.0.11; default parameters)19 and LTR-FINDER_parallel (v1.0.7; default parameters)20, and de novo prediction was carried out with RepeatMasker, resulting in 924.13 Mb of repetitive sequences, accounting for approximately 32.11% of the genome. Finally, a redundancy-removed statistical calculation was performed and a total of approximately 1.36 Gb of repetitive sequences were obtained, representing 47.09% of the genome. The repetitive sequences were categorized by type, with the highest proportions being short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), long terminal repeats (LTRs), and DNA transposons (Table 4).

Table 4 Statistical results of annotated repetitive sequences in O. cuniculus-FJY genome.

Protein coding gene annotation

Gene structure annotation

Gene structure annotation was performed using a combination of methods, including: (1) De novo prediction: We conducted de novo gene prediction using the Augustus (v3.3.2)21 and Genscan22 software with default parameters, resulting in the identification of 31,072 and 52,508 genes, respectively. (2) Homolog prediction: We downloaded homologous protein sequences from related rabbit species, including Ochotona princeps-mOchPri1 (GenBank: GCF_030435755.1), O. princeps-OchPri4 (GenBank: GCF_014633375.1), O. cuniculus-OryCun2 (GenBank: GCF_000003625.3), and O. cuniculus-NZW (GenBank: GCF_009806435). Gene structure annotation was performed based on homologous species proteins using the software miniprot (v0.11; parameters: ‘–gff-only -O 11 -E 1 -F 23 -C 1 -B 5 -G 200000 -j 1’)23 and Liftoff (v1.6.3; parameters: ‘-a 0.5 -s 0.5’)24. (3) RNA-seq data-based prediction: Using Stringtie (v1.3.5; default parameters)25 and TransDecoder (v5.5.0; default parameters)26, we predicted a total of 11,285 coding genes based on RNA-seq data. (4) Protein library comparison prediction: We obtained 14,108 genes through protein library comparison prediction using the Benchmarking Universal Single-Copy Orthologs (BUSCO; v5.5.0)27 software. Finally, a non-redundant gene set was generated by integrating prior evidence with MAKER (v2.31.10; default parameters)28, followed by refinement using HiFAP (v2.4.1; Wuhan OneMore Tech Co., Ltd., https://www.onemore-tech.com/) to obtain a final high-quality set of 22,674 genes. The average gene length was 42,977 bp, with an average of approximately nine exons per gene (Table 5).

Table 5 Summary of gene structure prediction results in O. cuniculus-FJY genome.

Gene function annotation

In this study, gene function annotation was primarily conducted using two major approaches: sequence similarity-based and domain/motif similarity-based methods. (1) Sequence similarity-based annotation: Diamond (v2.0.14; parameters: ‘–evalue 1e-05’)29 was used to align annotated protein sequences against several databases, including NR (version 2023-04-01), SwissProt (version 2023-03-01), TrEMBL (version 2023-03-01), AnimalTFDB (version 4.0), KOG (version 2023-03-01), and KEGG (version 2023-01-01). Additionally, KOBAS (v3.0; parameters: ‘-t blastout:tab -s ko’)30 was used to link annotated KEGG information with KEGG ortholog and pathway information. (2) Domain/Motif similarity-based annotation: i) We employed InterProScan (v5.61-93.0; parameters: ‘–seqtype p–formats TSV–goterms–pathways -dp’)31 to compare against a series of sub-databases, obtaining information on conserved sequences, motifs, and domains of proteins. ii) We used HMMER3 (v3.3.1; parameters: ‘hmmsearch -E 1e-05–domE 1e-05’)32 to annotate conserved sequences, including transcription factors and Pfam domains/motifs, based on multiple sequence alignment and hidden Markov models. Out of 22,674 genes, 22,616 genes (99.74%) were successfully annotated with functional information (Table 6, Fig. 2e). For Gene Ontology (GO) and KEGG functional annotation analyses, the GO results showed that the top three pathways in biological process were cellular process, metabolic process, and biological process, while the top three pathways in molecular function were binding, catalytic activity, and molecular transducer activity (Fig. 3a). The KEGG analysis revealed that the most represented pathways in cellular process, metabolism, and organismal systems were cell growth and death, lipid metabolism, and immune system, respectively (Fig. 3b).

Table 6 Summary of gene functional annotation results in O. cuniculus-FJY genome.
Fig. 3
Fig. 3
Full size image

GO (a) and KEGG (b) analysis of protein-coding genes in O. cuniculus-FJY genome. The x-axis represents the number of genes, and the y-axis represents the annotated GO or KEGG pathways.

Non-coding RNA annotation

Non-coding RNA (ncRNA) annotations were performed using different methods based on the characteristics of each RNA type: (1) tRNA: Based on the structural features of tRNA, we used the tRNAscan-SE (v1.3.1)33 software for annotation, identifying a total of 1,341 tRNAs with an average length of 78 bp. (2) rRNA: Given the high conservation of rRNA sequences, we used rRNA sequences from closely related species as reference sequences. Annotation was performed via BLASTN (v2.11.0 + ; parameters: ‘-evalue 0.01’)8, resulting in the identification of 922 rRNAs with an average length of 186 bp. (3) miRNA and snRNA: Using covariance models from the Rfam family, we annotated miRNAs and snRNAs with the INFERNAL software included in Rfam (v14.8)34. This process identified 635 miRNAs and 2,421 snRNAs, with average lengths of 87 bp and 117 bp, respectively (Table 7).

Table 7 Summary of non-coding RNA annotations in O. cuniculus-FJY genome.

Whole-genome collinearity analysis

For comparative genomic analysis, two additional O. cuniculus genome assemblies with annotations—mOryCun1.1 (GenBank: GCA_030258805.1) and NZW (GenBank: GCA_030258775.1) were retrieved from NCBI. Collinearity relationships among the three genomes were constructed and visualized using the JCVI software35. The analysis was performed with the following parameters: for ortholog identification, ‘-m jcvi.compara.catalog ortholog -no_strip_names–cscore = .99’; for synteny screening, ‘-m jcvi.compara.synteny screen–minspan = 30’. The results showed that the O. cuniculus-FJY genome was significantly better than the O. cuniculus-NZW genome, and was similar to the O. cuniculus-mOryCun1.1 genome (Fig. 4a).

Fig. 4
Fig. 4
Full size image

Collinearity analysis and phylogenetic tree construction of O. cuniculus-FJY and related species. (a) Genomic collinearity analysis among three O. cuniculus genomes, including O. cuniculus-FJY, O. cuniculus-mOryCun1.1, and O. cuniculus-NZW. (b) Statistics of homologous gene numbers across 7 species, including O. cuniculus-FJY, O. cuniculus-mOryCun1.1, O. cuniculus-NZW, Lepus europaeus, L. oiostolus, Ochotona curzoniae, O. princeps. (c) Maximum likelihood phylogenetic tree constructed based on single-copy gene families, including O. cuniculus-FJY and 6 other species. Bootstrap values adjacent to each node indicate the support rate.

Gene family clustering and analysis

To ascertain gene families, we conducted comparative analyses of protein sequences from O. cuniculus-FJY against those from six related species: O. cuniculus-mOryCun1.1, O. cuniculus-NZW, Lepus europaeus (GenBank: GCF_033115175.1), L. oiostolus (GenBank: GCA_036325905.1), Ochotona curzoniae (GenBank: GCF_017591425.1), and O. princeps (GenBank: GCF_030435755.1). Prior to analysis, gene sets of all involved species were filtered to ensure quality: for genes with multiple transcripts (resulting from alternative splicing), only the longest coding transcript was retained; additionally, genes encoding proteins shorter than 30 amino acids or containing internal stop codons were excluded. Subsequently, an all-vs-all BLASTp8 search (v2.11.0+; parameter: ‘-evalue 1e-5’) was performed to determine similarity relationships among protein sequences across all species. Finally, clustering of the BLASTp results was carried out using OrthoFinder (v2.5.5; parameter: ‘-S blast -I 1.5’)36. Among the 22,674 protein-coding genes in O. cuniculus-FJY, 22,480 (approximately 99.14%) were grouped into 18,478 orthologous clusters (Fig. 4b).

Data Records

The genome assembly of O. cuniculus-FJY has been deposited at NCBI GenBank under accession number GCA_051225735.137. Raw sequencing data were available in the Sequence Read Archive (SRA) under accession number CRA02312038, and the annotation files have been archived in figshare database39.

Technical Validation

Evaluation of genome assembly quality

The accuracy and completeness of the near-T2T assembly (O. cuniculus-FJY) were rigorously validated through multiple approaches. Hi-C contact maps revealed uniform chromosomal interactions with strong intra-chromosomal signal enrichment, confirming proper contig scaffolding and orientation (Fig. 2b). Furthermore, short reads and long reads were realigned to the assembly using BWA (v0.7.12; default parameters)10 and Minimap2 (v2.24; default parameters)14, achieving alignment rates of over 99.95%. Genome-wide integrity was further evaluated with BUSCO (v5.5.0)27 base on glires (odb10) lineage set, recovering 98.3% (13,563/13,800) of conserved eukaryotic orthologs as complete (Table 8).

Table 8 BUSCO assessment results of O. cuniculus-FJY genome and annotation.

Validation of genome annotation

The final annotation comprised 22,674 protein-coding genes, of which 99.7% (22,616) were functionally annotated across multiple reference databases. Notably, 84.8% (19,235 genes) received consistent support from all five databases, indicating robust cross-validation (Fig. 2e, Table 6). Protein-level assessment using BUSCO (v5.5.0)27 further confirmed annotation quality, with 97.80% of expected orthologs classified as complete. In addition, we compared the length distributions of genes, coding sequences (CDS), exons, and introns in O. cuniculus-FJY with those in four related genome assemblies: O. princeps-mOchPri1, Ochotona princeps-OchPri4, O. cuniculus-OryCun2, and O. cuniculus-NZW. Comparative analysis revealed that the structural characteristics of genes in O. cuniculus-FJY are highly consistent with the other related species (Fig. 5). Collectively, these results demonstrate high confidence in both structural and functional annotation accuracy.

Fig. 5
Fig. 5
Full size image

Comparative analysis of gene structural features across five closely related species. Panels (ad) illustrate the variations in (a) gene length, (b) CDS length, (c) exon length, and (d) intron length among O. cuniculus-FJY, Ochotona princeps-mOchPri1, O. princeps-OchPri4, O. cuniculus-OryCun2, and O. cuniculus-NZW.

Phylogenetic analysis

Genes from single-copy gene families were selected, and their individual alignment results were concatenated to construct a super-alignment matrix. Based on this matrix, relevant site data were extracted, including all unprocessed sites, phase 1 sites, and 4D sites. A phylogenetic tree was then constructed using RAxML (v8.2.12; parameters: ‘-f a -N 100 -m GTRGAMMA’)40 with 1000 bootstrap replications. The phylogenetic analysis highlights the unique evolutionary position of O. cuniculus-FJY and the distinct features of its genome. These findings, which are closely associated with the its unique biological characteristics, further underscore the importance of decoding its genome (Fig. 4c).