Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
Telomere to telomere level genome assembly of the Yarkand hare (Lepus yarkandensis)
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 12 February 2026

Telomere to telomere level genome assembly of the Yarkand hare (Lepus yarkandensis)

  • Mengqi Xu1,
  • Yuge Cui1,
  • Hongcheng Kuang1,
  • Kai Wei1 &
  • …
  • Wenjuan Shan1 

Scientific Data , Article number:  (2026) Cite this article

  • 578 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Genome
  • Genome assembly algorithms

Abstract

The Yarkand hare (Lepus yarkandensis) is endemic to the Tarim Basin in Xinjiang, China. It is a key species and a critical component of the Tarim Basin ecosystems. However, the lack of a reference genome has hindered evolutionary and genetic studies of this species. Here, we assembled a telomere-to-telomere (T2T) genome of the Yarkand hare (LepYark_1.0) using PacBio HiFi, Nanopore, and Hi-C sequencing. The assembled genome size is approximately 2.70 Gb, with a scaffold N50 of 126.86 Mb. About 94.88% of the assembled sequences could be anchored to 24 pseudo-chromosomes, with a BUSCO assessment indicating a completeness of 99.0%. Repetitive sequences comprise 46.38% of the genome, with short interspersed nuclear elements (SINEs) accounting for the largest proportion. Additionally, we identified 24 centromeres and 46 telomeres. 32,298 protein-coding genes were annotated using de novo prediction and transcriptome data, functionally annotating 85% of them. This genome assembly provides genomic resources for studies on conservation, adaptive evolution and the exploration of genetic basis related to important traits of the Yarkand hare.

Similar content being viewed by others

A telomere-to-telomere gapless genome assembly of the Tibetan wild ass (Equus kiang)

Article Open access 06 January 2026

A near-telomere-to-telomere genome assembly of the Chinese soft-shelled turtle (Pelodiscus sinensis)

Article Open access 06 January 2026

A complete telomere-to-telomere assembly of the maize genome

Article Open access 15 June 2023

Data availability

The assembled genome data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.gca:GCA_047496845.153.

The annotation data that support the findings of this study are openly available in figshare at https://doi.org/10.6084/m9.figshare.29369999.v154.

The Illumina sequencing data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.sra:SRR3690616455.

The PacBio sequencing data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.sra:SRR3690616855.

The Nanopore sequencing data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.sra:SRR3690616655, https://identifiers.org/ncbi/insdc.sra:SRR3690616755.

The Hi-C sequencing data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.sra:SRR3690616955.

The RNA sequencing data that support the findings of this study are openly available in NCBI at https://identifiers.org/ncbi/insdc.sra:SRR3690616555.

Code availability

All software used in this work is in the public domain, with parameters being clearly described in Methods. If no detail parameters were mentioned for a software, default parameters were used as suggested by developer.

References

  1. Sweet-Jones et al. Genotyping and Whole-Genome Resequencing of Welsh Sheep Breeds Reveal Candidate Genes and Variants for Adaptation to Local Environment and Socioeconomic Traits. Frontiers in genetics 12, 612492 (2021).

    Google Scholar 

  2. Tian, H. et al. Population genetic diversity and environmental adaptation of Tamarix hispida in the Tarim Basin, arid Northwestern China. Heredity 133, 298–307 (2024).

    Google Scholar 

  3. Shan, W. J. et al. Genetic consequences of postglacial colonization by the endemic Yarkand hare (Lepus yarkandensis) of the arid Tarim Basin. Chinese Science Bulletin 56, 1370–1382 (2011).

    Google Scholar 

  4. Weiss, B. et al. Unraveling a Lignocellulose-Decomposing Bacterial Consortium from Soil Associated with Dry Sugarcane Straw by Genomic-Centered Metagenomics. Microorganisms 9, 5995 (2021).

    Google Scholar 

  5. Wang, J. et al. Corrigendum: Genetic diversity, population structure, and selective signature of sheep in the northeastern Tarim Basin. Frontiers in genetics 14, 1336294 (2023).

    Google Scholar 

  6. Hui, X. H. & Zhao, M. F. Analysis of characteristics of Lepus yarkandensis adapting to desert ecology. Contemporary Animal Husbandry 15, 43–44 (2013).

    Google Scholar 

  7. Zhang, J. et al. Higher expression levels of aquaporin (AQP)1 and AQP5 in the lungs of arid-desert living Lepus yarkandensis. J Anim Physiol Anim Nutr (Berl) 104, 1186–1195 (2020).

    Google Scholar 

  8. Luo, S. et al. Expression Regulation of Water Reabsorption Genes and Transcription Factors in the Kidneys of Lepus yarkandensis. Front Physiol 13, 856427 (2022).

    Google Scholar 

  9. Li, Z. et al. Selective sweep analysis of the adaptability of the Yarkand hare (Lepus yarkandensis) to hot arid environments using SLAF-seq. Animal genetics 55, 681–686 (2024).

    Google Scholar 

  10. Wang, R., Tursun, M. & Shan, W. Complete Mitogenomes of Xinjiang Hares and Their Selective Pressure Considerations. Int J Mol Sci 25, 11925 (2024).

    Google Scholar 

  11. Michell, C. et al. High quality genome assembly of the brown hare (Lepus europaeus) with chromosome-level scaffolding. Peer Community Journal 4, e26 (2024).

    Google Scholar 

  12. Marques, J. P. et al. An Annotated Draft Genome of the Mountain Hare (Lepus timidus). Genome Biol Evol 12, 3656–3662 (2020).

    Google Scholar 

  13. Feng, S. et al. Chromosome-scale genome assembly of Lepus oiostolus (Lepus, Leporidae). Scientific data 11, 183 (2024).

    Google Scholar 

  14. Dong, X. et al. A chromosome-level genome assembly of Cape hare (Lepus capensis). Scientific data 11, 1081 (2024).

    Google Scholar 

  15. Nurk, S. et al. The complete sequence of a human genome. Science (New York, N.Y.) 376, 44–53 (2022).

    Google Scholar 

  16. Zhao, H. et al. Telomere-to-telomere genome assembly of the goose Anser cygnoides. Scientific data 11, 741 (2024).

    Google Scholar 

  17. Luo, L. Y. et al. Telomere-to-telomere sheep genome assembly identifies variants associated with wool fineness. Nature genetics 57, 218–230 (2025).

    Google Scholar 

  18. Liu, J. et al. The complete telomere-to-telomere sequence of a mouse genome. Science (New York, N.Y.) 386, 1141–1146 (2024).

    Google Scholar 

  19. Pavlova, A. S. et al. Genomic DNA extraction protocol using DNeasy Blood & Tissue Kit (QIAGEN) optimized for Gram-Negative bacteria, https://doi.org/10.17504/protocols.io.paadiae (2018).

  20. Modi, A. et al. The Illumina Sequencing Protocol and the NovaSeq. 6000 System. Methods Mol Biol 2242, 15–42 (2021).

    Google Scholar 

  21. Duckworth, A. T. et al. Profiling DNA Ligase Substrate Specificity with a Pacific Biosciences Single-Molecule Real-Time Sequencing Assay. Curr Protoc 3(3), e690 (2023).

    Google Scholar 

  22. Lu, H., Giordano, F. & Ning, Z. Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics Proteomics Bioinformatics 14, 265–279 (2016).

    Google Scholar 

  23. Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. iMeta 3, e211 (2024).

    Google Scholar 

  24. Cheng, H. Y. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nature Biotechnology 40, 1332–1335 (2022).

    Google Scholar 

  25. Zhang, X. T. et al. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nature Plants 5, 833–845 (2019).

    Google Scholar 

  26. Shen, W. et al. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PloS one 11, e0163962 (2016).

    Google Scholar 

  27. Lin, Y. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic Res 10, 127 (2023).

    Google Scholar 

  28. Formenti, G. et al. Merfin: improved variant filtering, assembly evaluation and polishing via-mer validation. Nature Methods 19, 696–704 (2022).

    Google Scholar 

  29. Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21, 245 (2020).

    Google Scholar 

  30. Neng, H. & Li, H. Compleasm: a faster and more accurate reimplementation of BUSCO. Bioinformatics (Oxford, England) 39, 10 (2023).

    Google Scholar 

  31. Tegenfeldt, F. et al. OrthoDB and BUSCO update: annotation of orthologs with wider sampling of genomes. Nucleic acids research 53, D516–D522 (2025).

    Google Scholar 

  32. Stanke, M. et al. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic acids research 32, 309–312 (2004).

    Google Scholar 

  33. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome research 14, 988–995 (2004).

    Google Scholar 

  34. Kim, D. & Salzberg, S. L. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome biology 12, R72 (2011).

    Google Scholar 

  35. Cabau, C. et al. Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies. PeerJ 5, e2988 (2017).

    Google Scholar 

  36. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology 9, R7 (2008).

    Google Scholar 

  37. Simão, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics (Oxford, England) 31, 3210–3212 (2015).

    Google Scholar 

  38. Pruitt, K. D. et al. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic acids research 35, D61–65 (2007).

    Google Scholar 

  39. Boutet, E. et al. UniProtKB/Swiss-Prot. Methods in molecular biology (Clifton, N.J.) 406, 89–112 (2007).

    Google Scholar 

  40. Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28, 27–30 (2000).

    Google Scholar 

  41. The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic acids research 47, 330–338 (2019).

    Google Scholar 

  42. Persson, E. & Sonnhammer, E. L. L. InParanoid-DIAMOND: faster orthology analysis with the InParanoid algorithm. Bioinformatics 38, 2918–2919 (2022).

    Google Scholar 

  43. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).

    Google Scholar 

  44. Abrusán, G. et al. TEclass-a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).

    Google Scholar 

  45. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics Chapter 4, 4.10.1–4.10.14 (2009).

    Google Scholar 

  46. Chan, P. P. & Lowe, T. M. tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences. Methods in molecular biology (Clifton, N.J.) 1962, 1–14 (2019).

    Google Scholar 

  47. Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic acids research 35, 3100–3108 (2007).

    Google Scholar 

  48. Kalvari, I. et al. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic acids research 46, 335–342 (2018).

    Google Scholar 

  49. Cui, X. et al. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction. Bioinformatics (Oxford, England) 32, 332–340 (2016).

    Google Scholar 

  50. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40, e49 (2012).

    Google Scholar 

  51. Chen, C. et al. TBtools-II: A “one for all, all for one” bioinformatics platform for biological big-data mining. Mol Plant 16, 1733–1742 (2023).

    Google Scholar 

  52. Bai, Y. et al. Improving the genome assembly of rabbits with long-read sequencing. Genomics 113, 3216–3223 (2021).

    Google Scholar 

  53. Xu, M. Q. et al. Genbank https://identifiers.org/ncbi/insdc.gca:GCA_047496845.1 (2025).

  54. Xu, M. Q. et al. Telomere to telomere level genome assembly of the Yarkand hare (Lepus yarkandensis). figshare https://doi.org/10.6084/m9.figshare.29369999.v1 (2025).

  55. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR36906164 (2026).

  56. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303, 3997v2 [q-bio.GN] (2013).

    Google Scholar 

  57. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Google Scholar 

  58. Chen, X. et al. Near telomere to telomere genome assembly of Chinese yellow rabbit (Oryctolagus cuniculus). Scientific data 12, 11786 (2025).

    Google Scholar 

  59. Sjodin, B. M. F. et al. Chromosome-Level Reference Genome Assembly for the American Pika (Ochotona princeps). The Journal of heredity 112, 549–557 (2021).

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (32260116) and the Central Guidance for Local Projects of Xinjiang (ZYYD2024ZY04). Thanks to Associate Professor Linqiang Zhong of Xinjiang University for providing the photo of the Yarkand hare (Fig. 1a) and assisting in the sample collection.

Author information

Authors and Affiliations

  1. Xinjiang Key Laboratory of Biological Resources and Genetic Engineering, College of Life Science and Technology, Xinjiang University, Urumqi, 830017, China

    Mengqi Xu, Yuge Cui, Hongcheng Kuang, Kai Wei & Wenjuan Shan

Authors
  1. Mengqi Xu
    View author publications

    Search author on:PubMed Google Scholar

  2. Yuge Cui
    View author publications

    Search author on:PubMed Google Scholar

  3. Hongcheng Kuang
    View author publications

    Search author on:PubMed Google Scholar

  4. Kai Wei
    View author publications

    Search author on:PubMed Google Scholar

  5. Wenjuan Shan
    View author publications

    Search author on:PubMed Google Scholar

Contributions

In this study, Mengqi Xu was responsible for data collation and paper writing, Yuge Cui and Hongcheng Kuang were responsible for sample collection and collation, Kai Wei was responsible for technical guidance and paper revision, and Wenjuan Shan was responsible for outline writing, project management, and funding acquisition.

Corresponding authors

Correspondence to Kai Wei or Wenjuan Shan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table.xlsx

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, M., Cui, Y., Kuang, H. et al. Telomere to telomere level genome assembly of the Yarkand hare (Lepus yarkandensis). Sci Data (2026). https://doi.org/10.1038/s41597-026-06815-1

Download citation

  • Received: 01 August 2025

  • Accepted: 03 February 2026

  • Published: 12 February 2026

  • DOI: https://doi.org/10.1038/s41597-026-06815-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing