Abstract
The Yarkand hare (Lepus yarkandensis) is endemic to the Tarim Basin in Xinjiang, China. It is a key species and a critical component of the Tarim Basin ecosystems. However, the lack of a reference genome has hindered evolutionary and genetic studies of this species. Here, we assembled a telomere-to-telomere (T2T) genome of the Yarkand hare (LepYark_1.0) using PacBio HiFi, Nanopore, and Hi-C sequencing. The assembled genome size is approximately 2.70 Gb, with a scaffold N50 of 126.86 Mb. About 94.88% of the assembled sequences could be anchored to 24 pseudo-chromosomes, with a BUSCO assessment indicating a completeness of 99.0%. Repetitive sequences comprise 46.38% of the genome, with short interspersed nuclear elements (SINEs) accounting for the largest proportion. Additionally, we identified 24 centromeres and 46 telomeres. 32,298 protein-coding genes were annotated using de novo prediction and transcriptome data, functionally annotating 85% of them. This genome assembly provides genomic resources for studies on conservation, adaptive evolution and the exploration of genetic basis related to important traits of the Yarkand hare.
Similar content being viewed by others
Data availability
The assembled genome data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.gca:GCA_047496845.153.
The annotation data that support the findings of this study are openly available in figshare at https://doi.org/10.6084/m9.figshare.29369999.v154.
The Illumina sequencing data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.sra:SRR3690616455.
The PacBio sequencing data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.sra:SRR3690616855.
The Nanopore sequencing data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.sra:SRR3690616655, https://identifiers.org/ncbi/insdc.sra:SRR3690616755.
The Hi-C sequencing data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.sra:SRR3690616955.
The RNA sequencing data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.sra:SRR3690616555.
Code availability
All software used in this work is in the public domain, with parameters being clearly described in Methods. If no detail parameters were mentioned for a software, default parameters were used as suggested by developer.
References
Sweet-Jones et al. Genotyping and Whole-Genome Resequencing of Welsh Sheep Breeds Reveal Candidate Genes and Variants for Adaptation to Local Environment and Socioeconomic Traits. Frontiers in genetics 12, 612492 (2021).
Tian, H. et al. Population genetic diversity and environmental adaptation of Tamarix hispida in the Tarim Basin, arid Northwestern China. Heredity 133, 298–307 (2024).
Shan, W. J. et al. Genetic consequences of postglacial colonization by the endemic Yarkand hare (Lepus yarkandensis) of the arid Tarim Basin. Chinese Science Bulletin 56, 1370–1382 (2011).
Weiss, B. et al. Unraveling a Lignocellulose-Decomposing Bacterial Consortium from Soil Associated with Dry Sugarcane Straw by Genomic-Centered Metagenomics. Microorganisms 9, 5995 (2021).
Wang, J. et al. Corrigendum: Genetic diversity, population structure, and selective signature of sheep in the northeastern Tarim Basin. Frontiers in genetics 14, 1336294 (2023).
Hui, X. H. & Zhao, M. F. Analysis of characteristics of Lepus yarkandensis adapting to desert ecology. Contemporary Animal Husbandry 15, 43–44 (2013).
Zhang, J. et al. Higher expression levels of aquaporin (AQP)1 and AQP5 in the lungs of arid-desert living Lepus yarkandensis. J Anim Physiol Anim Nutr (Berl) 104, 1186–1195 (2020).
Luo, S. et al. Expression Regulation of Water Reabsorption Genes and Transcription Factors in the Kidneys of Lepus yarkandensis. Front Physiol 13, 856427 (2022).
Li, Z. et al. Selective sweep analysis of the adaptability of the Yarkand hare (Lepus yarkandensis) to hot arid environments using SLAF-seq. Animal genetics 55, 681–686 (2024).
Wang, R., Tursun, M. & Shan, W. Complete Mitogenomes of Xinjiang Hares and Their Selective Pressure Considerations. Int J Mol Sci 25, 11925 (2024).
Michell, C. et al. High quality genome assembly of the brown hare (Lepus europaeus) with chromosome-level scaffolding. Peer Community Journal 4, e26 (2024).
Marques, J. P. et al. An Annotated Draft Genome of the Mountain Hare (Lepus timidus). Genome Biol Evol 12, 3656–3662 (2020).
Feng, S. et al. Chromosome-scale genome assembly of Lepus oiostolus (Lepus, Leporidae). Scientific data 11, 183 (2024).
Dong, X. et al. A chromosome-level genome assembly of Cape hare (Lepus capensis). Scientific data 11, 1081 (2024).
Nurk, S. et al. The complete sequence of a human genome. Science (New York, N.Y.) 376, 44–53 (2022).
Zhao, H. et al. Telomere-to-telomere genome assembly of the goose Anser cygnoides. Scientific data 11, 741 (2024).
Luo, L. Y. et al. Telomere-to-telomere sheep genome assembly identifies variants associated with wool fineness. Nature genetics 57, 218–230 (2025).
Liu, J. et al. The complete telomere-to-telomere sequence of a mouse genome. Science (New York, N.Y.) 386, 1141–1146 (2024).
Pavlova, A. S. et al. Genomic DNA extraction protocol using DNeasy Blood & Tissue Kit (QIAGEN) optimized for Gram-Negative bacteria, https://doi.org/10.17504/protocols.io.paadiae (2018).
Modi, A. et al. The Illumina Sequencing Protocol and the NovaSeq. 6000 System. Methods Mol Biol 2242, 15–42 (2021).
Duckworth, A. T. et al. Profiling DNA Ligase Substrate Specificity with a Pacific Biosciences Single-Molecule Real-Time Sequencing Assay. Curr Protoc 3(3), e690 (2023).
Lu, H., Giordano, F. & Ning, Z. Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics Proteomics Bioinformatics 14, 265–279 (2016).
Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. iMeta 3, e211 (2024).
Cheng, H. Y. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nature Biotechnology 40, 1332–1335 (2022).
Zhang, X. T. et al. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nature Plants 5, 833–845 (2019).
Shen, W. et al. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PloS one 11, e0163962 (2016).
Lin, Y. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic Res 10, 127 (2023).
Formenti, G. et al. Merfin: improved variant filtering, assembly evaluation and polishing via-mer validation. Nature Methods 19, 696–704 (2022).
Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21, 245 (2020).
Neng, H. & Li, H. Compleasm: a faster and more accurate reimplementation of BUSCO. Bioinformatics (Oxford, England) 39, 10 (2023).
Tegenfeldt, F. et al. OrthoDB and BUSCO update: annotation of orthologs with wider sampling of genomes. Nucleic acids research 53, D516–D522 (2025).
Stanke, M. et al. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic acids research 32, 309–312 (2004).
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome research 14, 988–995 (2004).
Kim, D. & Salzberg, S. L. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome biology 12, R72 (2011).
Cabau, C. et al. Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies. PeerJ 5, e2988 (2017).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology 9, R7 (2008).
Simão, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics (Oxford, England) 31, 3210–3212 (2015).
Pruitt, K. D. et al. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic acids research 35, D61–65 (2007).
Boutet, E. et al. UniProtKB/Swiss-Prot. Methods in molecular biology (Clifton, N.J.) 406, 89–112 (2007).
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28, 27–30 (2000).
The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic acids research 47, 330–338 (2019).
Persson, E. & Sonnhammer, E. L. L. InParanoid-DIAMOND: faster orthology analysis with the InParanoid algorithm. Bioinformatics 38, 2918–2919 (2022).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).
Abrusán, G. et al. TEclass-a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics Chapter 4, 4.10.1–4.10.14 (2009).
Chan, P. P. & Lowe, T. M. tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences. Methods in molecular biology (Clifton, N.J.) 1962, 1–14 (2019).
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic acids research 35, 3100–3108 (2007).
Kalvari, I. et al. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic acids research 46, 335–342 (2018).
Cui, X. et al. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction. Bioinformatics (Oxford, England) 32, 332–340 (2016).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40, e49 (2012).
Chen, C. et al. TBtools-II: A “one for all, all for one” bioinformatics platform for biological big-data mining. Mol Plant 16, 1733–1742 (2023).
Bai, Y. et al. Improving the genome assembly of rabbits with long-read sequencing. Genomics 113, 3216–3223 (2021).
Xu, M. Q. et al. Genbank https://identifiers.org/ncbi/insdc.gca:GCA_047496845.1 (2025).
Xu, M. Q. et al. Telomere to telomere level genome assembly of the Yarkand hare (Lepus yarkandensis). figshare https://doi.org/10.6084/m9.figshare.29369999.v1 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR36906164 (2026).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303, 3997v2 [q-bio.GN] (2013).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Chen, X. et al. Near telomere to telomere genome assembly of Chinese yellow rabbit (Oryctolagus cuniculus). Scientific data 12, 11786 (2025).
Sjodin, B. M. F. et al. Chromosome-Level Reference Genome Assembly for the American Pika (Ochotona princeps). The Journal of heredity 112, 549–557 (2021).
Acknowledgements
This work was supported by the National Natural Science Foundation of China (32260116) and the Central Guidance for Local Projects of Xinjiang (ZYYD2024ZY04). Thanks to Associate Professor Linqiang Zhong of Xinjiang University for providing the photo of the Yarkand hare (Fig. 1a) and assisting in the sample collection.
Author information
Authors and Affiliations
Contributions
In this study, Mengqi Xu was responsible for data collation and paper writing, Yuge Cui and Hongcheng Kuang were responsible for sample collection and collation, Kai Wei was responsible for technical guidance and paper revision, and Wenjuan Shan was responsible for outline writing, project management, and funding acquisition.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Xu, M., Cui, Y., Kuang, H. et al. Telomere to telomere level genome assembly of the Yarkand hare (Lepus yarkandensis). Sci Data (2026). https://doi.org/10.1038/s41597-026-06815-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-06815-1


