Abstract
Oryctolagus cuniculus (Fujian Yellow Rabbit; FJY), the only indigenous rabbit breed in China, known for its yellow coat, adaptability, and higher meat quality. However, comprehensive genomic datasets on O. cuniculus -FJY are lacking. In this study, we assembled the nearly full telomere-to-telomere (T2T) genome of O. cuniculus -FJY, 68% T2T, using PacBio HiFi, ONT ultra-long, and Hi-C data. The 2.88 Gb assembly was anchored into 23 chromosomes (21 autosomes, X and Y), with 99.80% of the sequence successfully assigned to chromosomal positions. Scaffold N50 reached 148.90 Mb, and glires BUSCO completeness of 98.3%. Repeats account for 47.09% of the genome. We identified 22,674 genes, with 99.74% functionally annotated. Additionally, we annotated 1,341 tRNAs, 922 rRNAs, 635 miRNAs, and 2,421 snRNAs. This genome represents the closest T2T rabbit genome and the first successful assembly of the Y chromosome in O. cuniculus. It not only enhances our understanding of the genetic diversity and evolutionary history of the domestication of rabbits, but also provides crucial data support for future breeding research on O. cuniculus-FJY.
Similar content being viewed by others
Background & Summary
European rabbits (Oryctolagus cuniculus, 2n = 44), as an important branch of mammals, have unique physiological characteristics, ecological habits, and wide applications in agriculture, medicine, and biological research, making them a hot topic in genetics and evolutionary biology1. The Fujian Yellow Rabbit (O. cuniculus-FJY), as an endemic population in Fujian, China, exhibits distinct morphological traits, has a unique geographical environment and is relatively isolated. Over a long period of time, it has faced complex but relatively stable ecological conditions and human activities, which may have led to significant genetic drift and local adaptation within the population. Ren et al. compared the genetic diversity and population structure of four Chinese rabbit breeds, revealing that the O. cuniculus-FJY breed exhibits the highest levels of genetic diversity2. Meanwhile, The O. cuniculus-FJY distinguishes itself from other rabbit breeds due to its yellow coat, slow growth rate, small body size, high protein, and low-fat muscle content, showcasing unique traits of domestication and adaptive evolution3,4. Therefore, resolving the genomic architecture of O. cuniculus-FJY could inform conservation strategies and selective breeding programs, ensuring the preservation and sustainable utilization of its valuable genetic resources.
To date, several O. cuniculus genome assemblies have been released in the database, such as NCBI (https://www.ncbi.nlm.nih.gov/)1,5. The most complete among them is the assembly mOryCun1.1 (GeneBank: GCF_964237555.1), which delivers nine T2T chromosomes and nine chromosomes with a single telomere. However, as this assembly was generated from a female individual, it lacks Y-linked sequences and still contains 195 unplaced scaffolds. This incomplete genome information severely limits our understanding of rabbit genetic diversity and evolutionary history, as well as hindering the elucidation of the genetic mechanisms underlying the unique traits of the O. cuniculus-FJY. Unlike the rabbit breeds with published reference genomes, the O. cuniculus-FJY is a local breed with distinctive characteristics and considerable breeding value. It differs markedly in appearance from commercial breeds such as the New Zealand White Rabbit (O. cuniculus-NZW), exhibiting, for example, a yellow coat color and black eyes. Consequently, its genome is distinct from that of the O. cuniculus-NZW. This, in turn, affects breeding and improvement of production performance. Therefore, generating a high-quality genome for the O. cuniculus-FJY not only helps us understand the molecular basis of its origin and differentiation, but also provides a scientific basis for building regional genetic resources, protecting rare populations, and improving breeding strategies.
In this study, we integrated PacBio HiFi, ONT ultra-long reads, and Hi-C sequencing data to generate a near-T2T assembly of the O. cuniculus genome, which includes the first complete assembly of its Y chromosome. This assembly achieved nearly full T2T genome with only 30 unplaced scaffolds remaining, surpassing all previously published O. cuniculus genomes. This study not only enhances our comprehension of rabbit biology but also serves as a valuable resource for future studies aimed at conservation and genetic improvement, ultimately benefiting both the species and the agricultural practices associated with it.
Methods
Ethics statement
All animal experiments were performed in strict compliance with the guidelines established by the Committee on Animal Research and Ethics of Fujian University of Traditional Chinese Medicine. (approval number: FJTCM IACUC 2024330).
Sample collection and sequencing
A four-month-old male O. cuniculus-FJY was obtained from the National Fujian Yellow Rabbit Conservation Farm (Fuzhou, Fujian, China). Genomic DNA and transcriptome sequencing libraries were prepared from blood samples. Sequencing data were generated using multiple platforms, including PacBio high-fidelity (HiFi) reads, Oxford Nanopore (ONT) ultra-long reads, paired-end reads, and Hi-C reads (Table 1). For PacBio HiFi sequencing, a standard SMRTbell library was constructed using the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences, USA) following the manufacturer’s protocol. Sequencing was performed on the PacBio Revio platform in circular consensus sequencing (CCS) mode. Two SMRT cells were sequenced, yielding approximately 11 million high-quality CCS reads (184.14 Gb, ~64 × coverage). The average CCS read length was 16.96 Kb, with an N50 of 16.82 Kb. For ONT sequencing, a PromethION library was prepared following the Oxford Nanopore SQK-LSK109 kit protocol (Oxford Nanopore Technologies). Four flow cells were sequenced, and raw data were processed using the Dorado server (v7.2.13) (https://github.com/nanoporetech/dorado/) with adapter trimming and quality filtering (parameter: ‘–min-qscore 7’). This generated 568,258 high-quality ONT reads (56.90 Gb), with the longest and average read lengths being 813.30 Kb and 100.13 Kb, respectively (Fig. 1). The Hi-C library was constructed from cross-linked genomic DNA and sequenced on the DNBSEQ-T7 platform (MGI) using 2 × 150 bp paired-end reads, producing 166.79 Gb of clean data (~58 × coverage) with fastp (v0.26.0; parameter: ‘fastp -Q -L–adapter_sequence AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA–adapter_sequence_r2 AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG–out1 [output read1]–out2 [output read2]–in1 [input read1]–in2 [input read2]’)6. Additionally, next-generation DNA sequencing and RNA-seq were performed on the DNBSEQ-T7 platform, yielding 216.05 Gb and 10.99 Gb of clean data, respectively (Table 1).
Read length distribution and GC content-coverage depth analysis in CCS and ONT sequencing. (a,b) Read length distribution based on CCS and ONT sequencing. The x-axis represents read length (Kb) and the y-axis represents the number of reads. (c,d) Statistical graphs showing the association between GC content and coverage depth in short reads and long reads. The x-axis represents GC content and the y-axis represents coverage depth. The distribution of contig coverage depth is shown on the right side, and the distribution of GC content is shown on the top. The scatter plot in the center is based on the GC distribution and coverage depth information of contigs, with color intensity indicating point density.
Genome assembly and evaluation
Initial chromosome-level genome assembly
Initial assembly of contig sequences was performed with Hifiasm (v0.19.6; parameters: ‘–h1 hic.R1.fq.gz–h2 hic.R2.fq.gz–ul ont.fa.gz HiFi-reads.fa.gz’)7, incorporating HiFi reads, ONT reads, and clean Hi-C data. To eliminate contaminated sequences and mitochondrial sequences, the initially assembled genome was aligned against the NT database (https://ftp.ncbi.nlm.nih.gov/blast/db/) using BLASTN (v2.11.0 + ; parameters: ‘-evalue 0.00001, -num_alignments 5, and -max_hsps 1’)8. Contigs with more than 50% of their sequences identified as contamination or originating from organelles were completely removed from the assembly. For chromosomal scaffolding, low-quality Hi-C reads were removed using HiCUP (v0.7.2; parameter: ‘–NM 3’)9, followed by alignment of the filtered Hi-C data to the contig assembly via BWA (v0.7.12; default parameters)10. Subsequently, valid Hi-C reads were analyzed using Juicer (v1.6; default parameters)11 and 3D-DNA (parameter: ‘-r 0’)12 to scaffold contigs into chromosomes. Contig ordering and assembly accuracy were verified and manually corrected in JuiceBox (v1.11.08; default parameters)13. Following refinement and genome adjustment, the final assembly was anchored to 23 chromosomes (comprising 21 autosomes, X, and Y) (Table 2).
T2T genome assembly
To achieve a T2T assembly, telomeric regions were extended by aligning ONT ultra-long reads to the scaffolded genome using Minimap2 (v2.24; default parameters)14, followed by refinement with medaka consensus (v1.7.2; https://github.com/nanoporetech/medaka) and BLASTN (v2.11.0+; default parameters)8. Contig gaps were resolved using TGS-GapCloser (v1.2.0; parameters: ‘–min_nread 10’)15, leveraging ONT ultra-long reads and coverage-based relationships among contigs. The extended and gap-filled assembly was subsequently polished with 216.05 Gb of clean short-read data (~75 × coverage) using Pilon (v1.23; default parameters)16. The final near T2T assembly spanned 23 chromosomes, totaling 2.88 Gb (30 unmapped contigs remained, with a mounting rate of 99.80%), with a scaffold N50 of 148.90 Mb and a GC content of 44.0%. Notably, this represents the first assembled rabbit Y chromosome, measuring 34.43 Mb in length (Fig. 2a–c, Tables 2, 3).
Landscape of genome assembly and annotation results. (a) Genomic characteristics within non-overlapping 400 Kb windows of the O. cuniculus-FJY genome. Circos diagram arranged from the outer to the inner layers: (A) GC content, (B) gene density, (C) repeat density, (D) LTR density, (E) LINE density, (F) DNA transposable element density. (b) Whole-genome Hi-C interaction map using 1 Mb bin size. The x and y axes represent chromosomes, and the darker the color, the stronger the interaction intensity. (c) Maps of contig distribution on chromosomes with annotated telomere positions in O. cuniculus-FJY. (d) Divergence distribution of four types of TE sequences annotated by RepeatMasker. The x-axis represents the divergence between the annotated TE sequences and the corresponding repetitive sequences in the de novo library; the y-axis represents the percentage of TE sequences with that divergence in the entire genome; different TEs are indicated by different colors. (e) The Venn diagram of gene function annotation based on the five databases.
Telomere identification
Telomeric regions were identified by leveraging the conserved sequence signature of telomeres (CCCTAA/TTAGGG). A genome-wide scanning was performed to detect characteristic sequences that contain at least four repeat units and are situated within 50 Kb of either end of each chromosome using the quarTeT (v1.1.36; parameter: ‘-m 1’) software17,18. Telomeres were annotated across 23 chromosomes, with each chromosome showing detectable telomeric sequences (Fig. 2c, Table 3).
Repeat sequence annotation
The repetitive sequences in the O. cuniculus-FJY genome were annotated using the following three methods: (1) Based on the RepBase database (https://www.girinst.org/repbase/), homology prediction was conducted using RepeatMasker (v4.0.9; default parameters) (http://www.repeatmasker.org) to identify DNA-level repetitive sequence features, resulting in 1.23 Gb of repetitive sequences, accounting for approximately 42.90% of the genome (Fig. 2d); (2) Using the RepeatProteinMask tool integrated in RepeatMasker (v4.0.9; default parameters), homology prediction was performed to identify protein-level repetitive sequence features, yielding 210.71 Mb of repetitive sequences, which represent about 7.32% of the genome; (3) A self-built repetitive sequence feature library was constructed using RepeatModeler (v1.0.11; default parameters)19 and LTR-FINDER_parallel (v1.0.7; default parameters)20, and de novo prediction was carried out with RepeatMasker, resulting in 924.13 Mb of repetitive sequences, accounting for approximately 32.11% of the genome. Finally, a redundancy-removed statistical calculation was performed and a total of approximately 1.36 Gb of repetitive sequences were obtained, representing 47.09% of the genome. The repetitive sequences were categorized by type, with the highest proportions being short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), long terminal repeats (LTRs), and DNA transposons (Table 4).
Protein coding gene annotation
Gene structure annotation
Gene structure annotation was performed using a combination of methods, including: (1) De novo prediction: We conducted de novo gene prediction using the Augustus (v3.3.2)21 and Genscan22 software with default parameters, resulting in the identification of 31,072 and 52,508 genes, respectively. (2) Homolog prediction: We downloaded homologous protein sequences from related rabbit species, including Ochotona princeps-mOchPri1 (GenBank: GCF_030435755.1), O. princeps-OchPri4 (GenBank: GCF_014633375.1), O. cuniculus-OryCun2 (GenBank: GCF_000003625.3), and O. cuniculus-NZW (GenBank: GCF_009806435). Gene structure annotation was performed based on homologous species proteins using the software miniprot (v0.11; parameters: ‘–gff-only -O 11 -E 1 -F 23 -C 1 -B 5 -G 200000 -j 1’)23 and Liftoff (v1.6.3; parameters: ‘-a 0.5 -s 0.5’)24. (3) RNA-seq data-based prediction: Using Stringtie (v1.3.5; default parameters)25 and TransDecoder (v5.5.0; default parameters)26, we predicted a total of 11,285 coding genes based on RNA-seq data. (4) Protein library comparison prediction: We obtained 14,108 genes through protein library comparison prediction using the Benchmarking Universal Single-Copy Orthologs (BUSCO; v5.5.0)27 software. Finally, a non-redundant gene set was generated by integrating prior evidence with MAKER (v2.31.10; default parameters)28, followed by refinement using HiFAP (v2.4.1; Wuhan OneMore Tech Co., Ltd., https://www.onemore-tech.com/) to obtain a final high-quality set of 22,674 genes. The average gene length was 42,977 bp, with an average of approximately nine exons per gene (Table 5).
Gene function annotation
In this study, gene function annotation was primarily conducted using two major approaches: sequence similarity-based and domain/motif similarity-based methods. (1) Sequence similarity-based annotation: Diamond (v2.0.14; parameters: ‘–evalue 1e-05’)29 was used to align annotated protein sequences against several databases, including NR (version 2023-04-01), SwissProt (version 2023-03-01), TrEMBL (version 2023-03-01), AnimalTFDB (version 4.0), KOG (version 2023-03-01), and KEGG (version 2023-01-01). Additionally, KOBAS (v3.0; parameters: ‘-t blastout:tab -s ko’)30 was used to link annotated KEGG information with KEGG ortholog and pathway information. (2) Domain/Motif similarity-based annotation: i) We employed InterProScan (v5.61-93.0; parameters: ‘–seqtype p–formats TSV–goterms–pathways -dp’)31 to compare against a series of sub-databases, obtaining information on conserved sequences, motifs, and domains of proteins. ii) We used HMMER3 (v3.3.1; parameters: ‘hmmsearch -E 1e-05–domE 1e-05’)32 to annotate conserved sequences, including transcription factors and Pfam domains/motifs, based on multiple sequence alignment and hidden Markov models. Out of 22,674 genes, 22,616 genes (99.74%) were successfully annotated with functional information (Table 6, Fig. 2e). For Gene Ontology (GO) and KEGG functional annotation analyses, the GO results showed that the top three pathways in biological process were cellular process, metabolic process, and biological process, while the top three pathways in molecular function were binding, catalytic activity, and molecular transducer activity (Fig. 3a). The KEGG analysis revealed that the most represented pathways in cellular process, metabolism, and organismal systems were cell growth and death, lipid metabolism, and immune system, respectively (Fig. 3b).
GO (a) and KEGG (b) analysis of protein-coding genes in O. cuniculus-FJY genome. The x-axis represents the number of genes, and the y-axis represents the annotated GO or KEGG pathways.
Non-coding RNA annotation
Non-coding RNA (ncRNA) annotations were performed using different methods based on the characteristics of each RNA type: (1) tRNA: Based on the structural features of tRNA, we used the tRNAscan-SE (v1.3.1)33 software for annotation, identifying a total of 1,341 tRNAs with an average length of 78 bp. (2) rRNA: Given the high conservation of rRNA sequences, we used rRNA sequences from closely related species as reference sequences. Annotation was performed via BLASTN (v2.11.0 + ; parameters: ‘-evalue 0.01’)8, resulting in the identification of 922 rRNAs with an average length of 186 bp. (3) miRNA and snRNA: Using covariance models from the Rfam family, we annotated miRNAs and snRNAs with the INFERNAL software included in Rfam (v14.8)34. This process identified 635 miRNAs and 2,421 snRNAs, with average lengths of 87 bp and 117 bp, respectively (Table 7).
Whole-genome collinearity analysis
For comparative genomic analysis, two additional O. cuniculus genome assemblies with annotations—mOryCun1.1 (GenBank: GCA_030258805.1) and NZW (GenBank: GCA_030258775.1) were retrieved from NCBI. Collinearity relationships among the three genomes were constructed and visualized using the JCVI software35. The analysis was performed with the following parameters: for ortholog identification, ‘-m jcvi.compara.catalog ortholog -no_strip_names–cscore = .99’; for synteny screening, ‘-m jcvi.compara.synteny screen–minspan = 30’. The results showed that the O. cuniculus-FJY genome was significantly better than the O. cuniculus-NZW genome, and was similar to the O. cuniculus-mOryCun1.1 genome (Fig. 4a).
Collinearity analysis and phylogenetic tree construction of O. cuniculus-FJY and related species. (a) Genomic collinearity analysis among three O. cuniculus genomes, including O. cuniculus-FJY, O. cuniculus-mOryCun1.1, and O. cuniculus-NZW. (b) Statistics of homologous gene numbers across 7 species, including O. cuniculus-FJY, O. cuniculus-mOryCun1.1, O. cuniculus-NZW, Lepus europaeus, L. oiostolus, Ochotona curzoniae, O. princeps. (c) Maximum likelihood phylogenetic tree constructed based on single-copy gene families, including O. cuniculus-FJY and 6 other species. Bootstrap values adjacent to each node indicate the support rate.
Gene family clustering and analysis
To ascertain gene families, we conducted comparative analyses of protein sequences from O. cuniculus-FJY against those from six related species: O. cuniculus-mOryCun1.1, O. cuniculus-NZW, Lepus europaeus (GenBank: GCF_033115175.1), L. oiostolus (GenBank: GCA_036325905.1), Ochotona curzoniae (GenBank: GCF_017591425.1), and O. princeps (GenBank: GCF_030435755.1). Prior to analysis, gene sets of all involved species were filtered to ensure quality: for genes with multiple transcripts (resulting from alternative splicing), only the longest coding transcript was retained; additionally, genes encoding proteins shorter than 30 amino acids or containing internal stop codons were excluded. Subsequently, an all-vs-all BLASTp8 search (v2.11.0+; parameter: ‘-evalue 1e-5’) was performed to determine similarity relationships among protein sequences across all species. Finally, clustering of the BLASTp results was carried out using OrthoFinder (v2.5.5; parameter: ‘-S blast -I 1.5’)36. Among the 22,674 protein-coding genes in O. cuniculus-FJY, 22,480 (approximately 99.14%) were grouped into 18,478 orthologous clusters (Fig. 4b).
Data Records
Technical Validation
Evaluation of genome assembly quality
The accuracy and completeness of the near-T2T assembly (O. cuniculus-FJY) were rigorously validated through multiple approaches. Hi-C contact maps revealed uniform chromosomal interactions with strong intra-chromosomal signal enrichment, confirming proper contig scaffolding and orientation (Fig. 2b). Furthermore, short reads and long reads were realigned to the assembly using BWA (v0.7.12; default parameters)10 and Minimap2 (v2.24; default parameters)14, achieving alignment rates of over 99.95%. Genome-wide integrity was further evaluated with BUSCO (v5.5.0)27 base on glires (odb10) lineage set, recovering 98.3% (13,563/13,800) of conserved eukaryotic orthologs as complete (Table 8).
Validation of genome annotation
The final annotation comprised 22,674 protein-coding genes, of which 99.7% (22,616) were functionally annotated across multiple reference databases. Notably, 84.8% (19,235 genes) received consistent support from all five databases, indicating robust cross-validation (Fig. 2e, Table 6). Protein-level assessment using BUSCO (v5.5.0)27 further confirmed annotation quality, with 97.80% of expected orthologs classified as complete. In addition, we compared the length distributions of genes, coding sequences (CDS), exons, and introns in O. cuniculus-FJY with those in four related genome assemblies: O. princeps-mOchPri1, Ochotona princeps-OchPri4, O. cuniculus-OryCun2, and O. cuniculus-NZW. Comparative analysis revealed that the structural characteristics of genes in O. cuniculus-FJY are highly consistent with the other related species (Fig. 5). Collectively, these results demonstrate high confidence in both structural and functional annotation accuracy.
Comparative analysis of gene structural features across five closely related species. Panels (a–d) illustrate the variations in (a) gene length, (b) CDS length, (c) exon length, and (d) intron length among O. cuniculus-FJY, Ochotona princeps-mOchPri1, O. princeps-OchPri4, O. cuniculus-OryCun2, and O. cuniculus-NZW.
Phylogenetic analysis
Genes from single-copy gene families were selected, and their individual alignment results were concatenated to construct a super-alignment matrix. Based on this matrix, relevant site data were extracted, including all unprocessed sites, phase 1 sites, and 4D sites. A phylogenetic tree was then constructed using RAxML (v8.2.12; parameters: ‘-f a -N 100 -m GTRGAMMA’)40 with 1000 bootstrap replications. The phylogenetic analysis highlights the unique evolutionary position of O. cuniculus-FJY and the distinct features of its genome. These findings, which are closely associated with the its unique biological characteristics, further underscore the importance of decoding its genome (Fig. 4c).
Data availability
The data that support the findings of this study are openly available in figshare at https://doi.org/10.6084/m9.figshare.2941313339.
Code availability
In this study, all software and commands were used in line with the published tools’ guidelines. Any elements not specified in the Methods were run with default settings, and no custom scripts were used. The versions of all software and databases are detailed in the Methods section.
References
Carneiro, M. et al. Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication. Science 345, 1074–1079, https://doi.org/10.1126/science.1253714 (2014).
Ren, A. et al. Genetic diversity and population structure of four Chinese rabbit breeds. PLoS One 14, e0222503, https://doi.org/10.1371/journal.pone.0222503 (2019).
Zhang, J. et al. Integrated metabolome and transcriptome provide insights into meat quality differences between Fujian yellow and New Zealand white rabbits. Food Bioscience 67, 105707, https://doi.org/10.1016/j.fbio.2024.105707 (2025).
X, X. The characteristics and development strategies of meat rabbit industry in Fujian Province. Chin J Rabbit Farm 9, 33–36, https://doi.org/10.3969/j.issn.1005-6327.2009.09.010 (2009).
Bai, Y. et al. Improving the genome assembly of rabbits with long-read sequencing. Genomics 113, 3216–3223, https://doi.org/10.1016/j.ygeno.2021.05.031 (2021).
Chen, S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2, e107, https://doi.org/10.1002/imt2.107 (2023).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Mount, D. W. Using the Basic Local Alignment Search Tool (BLAST). CSH Protoc 2007, pdb top17, https://doi.org/10.1101/pdb.top17 (2007).
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res 4, 1310, https://doi.org/10.12688/f1000research.7334.1 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95, https://doi.org/10.1126/science.aal3327 (2017).
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst 3, 99–101, https://doi.org/10.1016/j.cels.2015.07.012 (2016).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).
Xu, M. Y. et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience 9, ARTN giaa09410.1093/gigascience/giaa094 (2020).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963, https://doi.org/10.1371/journal.pone.0112963 (2014).
Lin, Y. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic Res 10, uhad127, https://doi.org/10.1038/s0.1093/hr/uhad127 (2023).
Jiang, C., Du, Y., Lou, Z., Zhang, Y. & Wang, T. Telomere-to-telomere reference genome of Rhinogobio nasutus, an endangered endemic fish from the Yellow River. Sci Data 12, 462, https://doi.org/10.1038/s41597-025-04793-4 (2025).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1), i351–358, https://doi.org/10.1093/bioinformatics/bti1018 (2005).
Ou, S. & Jiang, N. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mob DNA 10, 48, https://doi.org/10.1186/s13100-019-0193-0 (2019).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–439, https://doi.org/10.1093/nar/gkl200 (2006).
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J Mol Biol 268, 78–94, https://doi.org/10.1006/jmbi.1997.0951 (1997).
Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31, https://doi.org/10.1186/1471-2105-6-31 (2005).
Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643, https://doi.org/10.1093/bioinformatics/btaa1016 (2021).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290–295, https://doi.org/10.1038/nbt.3122 (2015).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8, 1494–1512, https://doi.org/10.1038/nprot.2013.084 (2013).
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491, https://doi.org/10.1186/1471-2105-12-491 (2011).
Buchfink, B., Reuter, K. & Drost, H. G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods 18, 366-+, https://doi.org/10.1038/s41592-021-01101-x (2021).
Bu, D. et al. KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis. Nucleic Acids Res 49, W317–W325, https://doi.org/10.1093/nar/gkab447 (2021).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240, https://doi.org/10.1093/bioinformatics/btu031 (2014).
Mistry, J., Finn, R. D., Eddy, S. R., Bateman, A. & Punta, M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res 41, e121, https://doi.org/10.1093/nar/gkt263 (2013).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25, 955–964, https://doi.org/10.1093/nar/25.5.955 (1997).
Daub, J., Eberhardt, R. Y., Tate, J. G. & Burge, S. W. Rfam: annotating families of non-coding RNA sequences. Methods Mol Biol 1269, 349–363, https://doi.org/10.1007/978-1-4939-2291-8_22 (2015).
Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. Imeta 3, e211, https://doi.org/10.1002/imt2.211 (2024).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, https://doi.org/10.1186/s13059-019-1832-y (2019).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_051225735.1 (2025).
National Genomics Data Center (NGDC) Genome Sequence Archive https://bigd.big.ac.cn/gsa/browse/CRA023120 (2025).
Genome annotation of Chinese yellow rabbit (Oryctolagus cuniculus). Figshare, https://doi.org/10.6084/m9.figshare.29413133 (2025).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313, https://doi.org/10.1093/bioinformatics/btu033 (2014).
Acknowledgements
This study was supported by the Youth Science and Technology Innovation Talent Cultivation Program of FJTCM (no. XQC2023007 and XQC2024003); Rehabilitation technology innovation center by joint collaboration of ministry of education and Fujian province, FJTCM (X2022009-Collaboration); Fujian Province ‘Eaglet Plan’ Young Top Talents Project; the Natural Science Foundation of Fujian Province (2024J01743 and 2024J01121); China Agriculture Research System (CARS-43-5). We thank Wuhan Onemore-tech Co., Ltd. for their assistance with genome sequencing and analysis.
Author information
Authors and Affiliations
Contributions
Z.H. conceived and designed the project. X.C., C.Y., D.L., Q.Z., J.Z., W.L., Y.Y. and Y.L. contributed to the data analysis and visualization. X.C. and C.Y. wrote the manuscript. X.C. and Z.H. provided edits to the manuscript. J.X., X.W., S.W., L.Q. and X.X. contributed to the collection the blood samples of rabbit.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Chen, X., Yu, C., Liu, D. et al. Near telomere to telomere genome assembly of Chinese yellow rabbit (Oryctolagus cuniculus). Sci Data 12, 1786 (2025). https://doi.org/10.1038/s41597-025-06058-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-025-06058-6







