A chromosomal-level genome assembly of Kibakoganea sinica, Bouchard, 2005 (Coleoptera: Scarabaeidae)

Zhu, Ming; Zhang, Jingjing; Yan, Junhui; Han, Yanting

doi:10.1038/s41597-025-05347-4

Download PDF

Data Descriptor
Open access
Published: 17 June 2025

A chromosomal-level genome assembly of Kibakoganea sinica, Bouchard, 2005 (Coleoptera: Scarabaeidae)

Ming Zhu¹,
Jingjing Zhang²,
Junhui Yan¹ &
…
Yanting Han³

Scientific Data volume 12, Article number: 1012 (2025) Cite this article

1198 Accesses
Metrics details

Subjects

Abstract

The family Scarabaeidae is one of the largest and most ecologically significant groups within the order Coleoptera, comprising over 35,000 described species. However, the limited availability of high-quality genome assemblies has hindered comprehensive studies on their ecology and evolutionary biology. In this study, we present a high-quality, chromosome-level genome assembly of Kibakoganea sinica, generated by integrating PacBio HiFi, Illumina, and Hi-C sequencing data. The final assembly spans 601.44 Mb, comprising 23 scaffolds (scaffold N50: 60.23 Mb) and 70 contigs (contig N50: 24.49 Mb), with 99.57% of the total assembly (598.84 Mb) successfully anchored to 10 chromosomes. BUSCO analysis (n = 1,367) indicates a high level of completeness, with 99.2% of genes detected: 95.2% as single-copy and 4.0% as duplicated. Repetitive elements account for 44.43% (267.24 Mb) of the genome, and a total of 12,940 protein-coding genes were predicted. This chromosome-scale genome assembly provides a foundational resource for future research into the biology and adaptation of Scarabaeidae.

A chromosomal-level genome assembly of Odontolabis cuvera Hope, 1842 (Coleoptera: Lucanidae)

Article Open access 17 July 2025

Chromosome-level genome of the shining chafers Kibakoganea tamdaoensis (Coleoptera: Scarabaeidae: Rutelinae)

Article Open access 01 August 2025

A chromosome-level genome assembly of Prosopocoilus inquinatus Westwood, 1848 (Coleoptera: Lucanidae)

Article Open access 20 July 2024

Background & Summary

The Scarabaeidae family, commonly known as scarab beetles, is one of the most diverse lineages within the order Coleoptera, comprising approximately 35,000 described species worldwide¹. These beetles are distributed worldwide, occurring on all continents except Antarctica and occupying a wide range of ecosystems, including forests, grasslands, deserts, and agricultural landscapes². Based on their feeding strategies, scarab beetles are broadly classified into two groups: coprophagous (dung-feeding) and phytophagous (plant-feeding) species^3,4. The coprophagous group, also known as dung beetles, includes two ecologically and economically significant subfamilies: Scarabaeinae and Aphodiinae^5,6. Among them, members of Scarabaeinae are considered the “true dung beetles,” primarily utilizing fresh dung as both a food source and for reproductive purposes⁷. In contrast, phytophagous scarabs, commonly known as Pleurosticti⁸, comprise more than 20,000 species, representing nearly 70% of the entire Scarabaeidae family^4,9. These beetles feed on a wide array of plant materials, including leaves, roots, decaying wood, and partially decomposed plant litter¹⁰. Such a diet requires specialized digestive adaptations to break down complex plant cell wall components, including lignocellulose and various hemicelluloses, which are chemically resistant to degradation¹¹.

High-quality genome assemblies are essential for uncovering the genetic mechanisms underlying feeding adaptations in Scarabaeidae beetles. A recent study generated a chromosome-scale genome assembly and comprehensive intestinal transcriptome for Trypoxylus dichotomus (Coleoptera: Scarabaeidae), offering valuable insights into its ability to digest lignocellulose-rich plant material¹². As of April 2025, a total of 62 Scarabaeidae genomes have been deposited in the NCBI database. However, only 13 species have been reported at the chromosome level. The majority of the remaining genomes, assembled from short-read sequencing data, are of limited quality, with scaffold N50 values typically below 100 kb. This shortage of high-quality genomic resources continues to hinder in-depth research on environmental adaptation, dietary specialization, and the evolutionary diversification of Scarabaeidae.

To enhance our understanding of adaptive evolution ecology, we propose assembling a chromosome-level genome of Kibakoganea sinica, Bouchard, 2005 (Coleoptera: Scarabaeidae), combining PacBio HiFi, Illumina, and Hi-C data. We annotated repeats, non-coding RNAs, and protein-coding genes. This high-quality genome assembly of K. sinica provides valuable insights into the evolution and ecological adaptation of the Scarabaeoidea superfamily.

Methods

Sample collection and sequencing

A K. sinica pupa was collected in Guizhou, China, on December 6, 2023, and used for genome sequencing, including Illumina, PacBio, Hi-C, and RNA sequencing. To minimize contamination, the sample was carefully rinsed in phosphate-buffered saline for 10 minutes, flash-frozen in liquid nitrogen for 20 minutes, and subsequently stored at –80 °C until further processing.

Genomic DNA and RNA were isolated from the specimen using the DNeasy Blood & Tissue Kit (Qiagen) and TRIzol Reagent (Thermo Fisher Scientific), respectively, in accordance with the manufacturers’ instructions. Short-read libraries were prepared without PCR amplification using the Illumina TruSeq DNA PCR-Free Kit, generating 150 bp paired-end reads with 350 bp inserts. For Hi-C sequencing, we implemented a standard protocol¹³, including DNA crosslinking, MboI digestion, end repair, and DNA purification. All short-read sequencing was conducted using an Illumina NovaSeq. 6000 system. For long-read sequencing, we constructed a 20 kb SMRTbell library (PacBio SMRTbell Express Template Prep Kit 2.0) and sequenced it on the PacBio Sequel II system in HiFi mode. Library construction and sequencing were conducted at Berry Genomics (Beijing, China). Our sequencing generated a total of 160.95 Gb of high-quality data, including 36.70 Gb (61.02 × coverage) PacBio HiFi reads, 56.09 Gb (93.26×) Illumina short reads, and 58.56 Gb (97.36×) Hi-C data (Table 1).

Table 1 Statistics of the sequencing data used for genome assembly.

Full size table

Genome assembly

Raw Illumina reads were processed for quality control using BBTools v38.82¹⁴. Duplicate reads were first removed with “clumpify.sh”. Subsequently, bbduk.sh was applied to trim low-quality bases (Q < 20) and adapter sequences according to strict quality criteria. Specifically, sequences with quality scores below 20 were discarded, reads containing more than five Ns were filtered out, poly-A/G/C tails longer than 10 bp were trimmed, and overlapping paired reads were corrected. To estimate the genome size, heterozygosity, and repetitive sequence content in the K. sinica genome, a genome survey was conducted using GenomeScope v2.0¹⁵. The estimated genome size ranged from 567.23 to 568.25 Mb, with repetitive sequences accounting for 31.57–31.58% of the genome. Additionally, the survey indicated a high heterozygosity rate, estimated at 2.11–2.13% (Fig. 1).

The initial genome assembly was generated using PacBio HiFi long reads and assembled with Hifiasm v0.19.8¹⁶ under default parameters. To reduce heterozygosity, we applied Purge_Dups v1.2.5¹⁷ with a haploid cutoff of 70 (-s 70) to identify and remove haplotigs. For chromosome-scale scaffolding, Hi-C reads were first quality-filtered and then aligned to the assembly using Juicer v1.6.2¹⁸. Contigs were subsequently anchored and ordered into chromosomes using 3D-DNA v.180922¹⁹. The final assembly was manually verified and corrected in Juicebox v.1.11.0¹⁸ to resolve potential misjoins or orientation errors. To ensure the assembly’s purity, we screened for contaminants using MMseqs. 2 v1.1²⁰ against the NCBI nucleotide (nt) and UniVec databases, removing any detected foreign sequences. Potential vector contaminants were identified using v2.11.0²¹ against the UniVec database, with sequences showing > 90% similarity flagged as contaminants. Additional sequences exhibiting > 80% similarity were further validated through BLASTN searches against the NCBI nucleotide database (nt). All identified bacterial and fungal contaminants were thoroughly removed from the assembly scaffolds. Telomeric regions of each chromosome were identified using the TeloExplorer module in QuarTeT v1.2.1²². The presence of continuous telomeric repeat motifs (TTAGG) within 10,000 base pairs at both ends of a chromosome was used as the criterion for confirming telomere localization (Table 3). The final chromosome-scale assembly of K. sinica spans 601.44 Mb, consisting of 23 scaffolds and 70 contigs, which is consistent with the genome size estimated in the genome survey. The assembly demonstrates high continuity, with scaffold and contig N50 values of 60.23 Mb and 24.49 Mb, respectively (Table 2). Notably, 99.57% of the assembly (598.84 Mb) was successfully anchored to 10 chromosomes, with individual chromosome lengths ranging from 16.33 Mb to 103.53 Mb (Table 3; Fig. 2; Fig. 3). The chromosome sequence names are assigned based on sequence length, with the longest sequence labeled as chromosome 1 (Table 3). Moreover, the BUSCO assessment revealed a genome assembly completeness of 99.2% (Table 2). Collectively, these findings demonstrate that our genome assembly achieves outstanding continuity and structural integrity.

Table 2 Genome assembly statistics for Kibakoganea sinica.

Full size table

Table 3 Summary of telomere information for the Kibakoganea sinica genome.

Full size table

Genome annotation

The species-specific repeat library of K. sinica was generated using RepeatModeler v2.0.4²³ and integrated with known repeats from RepBase-20130909²⁴ and Dfam 3.5²⁵ to construct a comprehensive repeat database. The custom repeat database was employed as input for RepeatMasker v4.1.4²⁶ to systematically identify and mask repetitive elements throughout the genome, followed by soft-masking of these regions. The analysis revealed that repetitive sequences account for 44.43% of the K. sinica genome assembly. These elements were classified into major categories, including unclassified elements (18.09%), LINE transposons (6.81%), LTR transposons (8.53%), DNA transposons (17.00%), and other repeat types (Table 4).

Table 4 Genome assembly and annotation statistics of Kibakoganea sinica.

Full size table

Non-coding RNAs (ncRNAs) in K. sinica were identified using Infernal v1.1.4²⁷ with the Rfam v14.10 database²⁸, while tRNA detection was performed with tRNAscan-SE v2.0.9²⁹. The analysis revealed a diverse ncRNA repertoire, comprising 312 tRNAs, 101 rRNAs, 74 microRNAs, and 69 small nuclear RNAs, totaling 596 ncRNAs (Table 4).

Protein-coding gene annotation of the K. sinica genome was performed using MAKER v3.01.03³⁰, integrating transcriptomic evidence, ab initio predictions, and protein homology data. Transcriptome sequences were aligned to the genome using HISAT2 v2.2.1³¹, followed by genome-guided assembly with StringTie v2.1.6³². For ab initio gene prediction, BRAKER v2.1.6³³ was employed, incorporating GeneMark-ES/ET/EP 4.68_lic³⁴ and Augustus v3.4.0³⁵, both of which were trained using transcriptomic sequences and protein data from OrthoDB v11³⁶. Additionally, homology-based gene prediction was conducted using GeMoMa v1.9³⁷, utilizing protein sequences from five reference species: Drosophila melanogaster (GCF_000001215.4)³⁸, Apis mellifera (GCA_003254395.2)³⁹, Coccinella septempunctata (GCA_907165205.1)⁴⁰, Prosopocoilus inquinatus (GCA_036172665.1)⁴¹, and Tribolium castaneum (GCA_031307605.1)⁴² (Table 5). The annotation pipeline identified 12,940 protein-coding genes in the K. sinica genome, with an average gene length of 14,792.6 bp (Table 4). On average, each gene contained 6.3 exons, 5.3 introns, and 6.1 coding sequences (CDS). Gene structure analysis revealed mean lengths of 357.3 bp (exons), 2500.2 bp (introns), and 272.5 bp (CDS). To assess the quality of gene predictions, we evaluated gene set completeness using BUSCO with the Insecta dataset (n = 1,367). The results showed 80.6% (1,102) single-copy, 18.5% (253) duplicated, 0.1% (2) missing, and 0.8% (10) fragmented BUSCOs, confirming the high accuracy and reliability of the annotation.

Table 5 Species taxonomic information and accession code of all samples used in this study.

Full size table

Functional annotation was performed by aligning protein sequences against the UniProtKB database using DIAMOND v2.0.11⁴³. Additionally, Gene Ontology (GO) terms, KEGG/Reactome pathways, and protein domains were annotated using eggNOGmapper v2.0.14⁴⁴ and InterProScan 5.53-87.0⁴⁵. The InterProScan analysis integrated data from five databases: Pfam⁴⁶, SMART⁴⁷, Superfamily⁴⁸, Gene3D⁴⁹, and CDD⁵⁰. Functional annotation revealed 11,414 COG categories, 10,333 GO terms, and 5,009 KEGG pathways in K. sinica, based on the integration of InterProScan and eggNOG annotations. Chromosomal features, including repeat elements, gene density, and GC content, were visualized using TBtools⁵¹.

Data Records

The sequencing data generated in this study are available under the following National Center for Biotechnology Information (NCBI) SRA accession numbers: transcriptome reads (SRR31019928)⁵², Hi-C data (SRR31019929)⁵³, Illumina short reads (SRR31019930)⁵⁴, and PacBio HiFi long reads (SRR31019931)⁵⁵. The final genome assembly is available under NCBI accession GCA_043790905.1⁵⁶. Genome annotation data, including repetitive elements, gene structure predictions, and functional annotations, have been deposited in Figshare⁵⁷.

Technical Validation

We evaluated genome assembly quality using two complementary approaches. First, assembly completeness was assessed with BUSCO v5.0.4⁵⁸ against the Insecta reference set (n = 1,367 conserved single-copy orthologs). The assembly exhibited a BUSCO completeness of 99.2%, with 95.2% of genes in single-copy, 4.0% duplicated, 0.2% fragmented, and 0.6% missing. Second, the assembly accuracy was assessed by calculating mapping rates through the alignment of PacBio, Illumina, and RNA-seq reads to the final assembly using Minimap2 v. 2.23⁵⁹ and SAMtools v. 1.9⁶⁰. The assembly demonstrated high mapping rates for PacBio (99.85%), Illumina (89.71%), and RNA-seq (93.71%) reads. These comprehensive analyses confirm the high quality of our genome assemblies.

Code availability

No specific script was used in this work. All commands and pipelines used in data processing were executed according to the manual and protocols of the corresponding bioinformatic software.

References

Ratcliffe, B. C. & Cave, R. D. The Dynastinae (Coleoptera: Scarabaeidae) of the Bahamas with a description of a new species of Cyclocephala from Great Inagua Island. Insecta Mundi. 24, 1–10 (2008).
Google Scholar
Breeschoten, T. et al. Phylogenetics and biogeography of the dung beetle genus Onthophagus inferred from mitochondrial genomes[J]. Mol Phylogenet Evol 105, 86–95 (2016).
Article PubMed Google Scholar
Song, N. & Zhang, H. The mitochondrial genomes of phytophagous scarab beetles and systematic implications. J. Insect Sci. 18, 11 (2018).
Article PubMed PubMed Central Google Scholar
Bohacz, C., du Harrison, J. G. & Ahrens, D. Comparative morphology of antennal surface structures in pleurostict scarab beetles (Coleoptera). Zoomorphology 139, 327–346 (2020).
Article Google Scholar
Rodrigues, S. R., de Barros, A. T. M., Puker, A. & Taira, T. L. Diversity of coprophagous scarab beetles (Coleoptera, Scarabaeidae) collected with flight intercept trap in the Southern Pantanal, Brazil. Biota Neotrop 10, 123–127 (2010).
Article Google Scholar
Barbero, E., Palestrini, C. & Rolando, A. Dung beetle conservation: Effects of habitat and resource selection (Coleoptera: Scarabaeoidea). J. Insect Conserv. 3, 75–84 (1999).
Article Google Scholar
Bai, M. et al. Mandible evolution in the Scarabaeinae (Coleoptera: Scarabaeidae) and adaptations to coprophagous habits. Front. Zool. 12, 30 (2015).
Article PubMed PubMed Central Google Scholar
Erichson, W. F. Naturgeschichte der Insecten Deutschlands: Erste Abtheilung. Coleoptera; Nabu Press: Charleston, SC, USA, 1847; Volume 1.
Gunter, N. L., Weir, T. A., Slipinksi, A., Bocak, L. & Cameron, S. L. If dung beetles (Scarabaeidae: Scarabaeinae) arose in association with dinosaurs, did they also suffer a mass co-extinction at the K-Pg boundary? PLoS ONE 11, e0153570 (2016).
Article PubMed PubMed Central Google Scholar
Chandra, K. & Gupta, D. An inventory of scarab beetles (Coleoptera: Scarabaeidae) of Achanakmar-Amarkantak biosphere reserve, Chhattisgarh, India. Int. J. Sci. Nat. 3, 886–891 (2012).
Google Scholar
Holter, P. & Scholtz, C. H. Elongated hindguts in desert-living dung beetles (Scarabaeidae: Scarabaeinae) feeding on dry dung pellets or plant litter. J Morphol. 274, 657–662 (2013).
Article PubMed Google Scholar
Wang, Q., Liu, L., Zhang, S., Wu, H. & Huang, J. A chromosome-level genome assembly and intestinal transcriptome of Trypoxylus dichotomus (Coleoptera: Scarabaeidae) to understand its lignocellulose digestion ability. GigaScience. 11, giac059 (2022).
Article PubMed PubMed Central Google Scholar
Belton, J. M. et al. Hi-C: A comprehensive technique to capture the conformation of genomes. Methods. 58, 268–276 (2012).
Article PubMed Google Scholar
Bushnell, B. BBtools. Available online: https://sourceforge.net/projects/bbmap/ (accessed on 1 October 2022) (2014).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 11, 1432 (2020).
Article PubMed PubMed Central ADS Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 18, 170–175 (2021).
Article PubMed PubMed Central Google Scholar
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 36, 2896–2898 (2020).
Article PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 3, 95–98 (2016).
Article PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 356, 92–95 (2017).
Article PubMed PubMed Central ADS Google Scholar
Steinegger, M. & Soding, J. MMseqs. 2 enables sensitive protein sequence searching for the analysisof massive datasets. Nat. Biotechnol. 35, 1026–1028 (2017).
Article PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article PubMed Google Scholar
Lin, Y. et al. QuarTeT: A telomere-To-Telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic. Res. 10, uhad127 (2023).
Article PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 117, 9451–9457 (2020).
Article PubMed PubMed Central ADS Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. Dna. 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016).
Article PubMed Google Scholar
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. Available online: http://www.repeatmasker.org (accessed on 1 October 2022) (2013–2015).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 29, 2933–2935 (2013).
Article PubMed PubMed Central Google Scholar
Griffiths-Jones, S. et al. Rfam: annotating noncoding RNAs in complete genomes. Nucleic Acids Res. 33, D121–124 (2005).
Article PubMed Google Scholar
Chan, P. P. & Lowe, T. M. TRNAscan-SE: Searching for tRNA genes in genomic sequences. Methods Mol Biol. 1962, 1–14 (2019).
Article PubMed PubMed Central Google Scholar
Holt, C. & Yandell, M. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. Bmc Bioinformatics. 12, 491 (2011).
Article PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods. 12, 357–360 (2015).
Article PubMed PubMed Central Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Article PubMed PubMed Central Google Scholar
Bruna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: Automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. Nar Genom. Bioinform. 3, lqaa108 (2021).
Article PubMed PubMed Central Google Scholar
Bruna, T., Lomsadze, A. & Borodovsky, M. GeneMark-EP: Eukaryotic gene prediction with self-training in the space of genes and proteins. Nar Genom. Bioinform. 2, lqaa26 (2020).
Google Scholar
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312 (2004).
Article PubMed PubMed Central Google Scholar
Kriventseva, E. V. et al. OrthoDB v10: Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 47, D807–D811 (2019).
Article PubMed Google Scholar
Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S. O. & Grau, J. Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. Bmc Bioinformatics. 19, 189 (2018).
Article PubMed PubMed Central Google Scholar
Hoskins, R. A. et al. The Release 6 reference sequence of the Drosophila melanogaster genome. Genome res. 25, 445–458 (2015).
Article PubMed PubMed Central Google Scholar
Gibbs, R. A. et al. Insights into social insects from the genome of the honeybee Apis mellifera. Nature. 443, 931–949 (2006).
Article ADS Google Scholar
Crowley, L. The genome sequence of the seven-spotted ladybird, Coccinella septempunctata Linnaeus, 1758. Wellcome open res. 6, 319 (2021).
Article PubMed PubMed Central Google Scholar
Pang, B., Zhan, Z. & Wang, Y. A chromosome-level genome assembly of Prosopocoilus inquinatus Westwood, 1848 (Coleoptera: Lucanidae). Sci Data. 11, 808 (2024).
Article PubMed PubMed Central Google Scholar
Herndon, N. et al. Enhanced genome assembly and a new official gene set for Tribolium castaneum. BMC Genomics. 21, 47 (2020).
Article PubMed PubMed Central Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 12, 59–60 (2015).
Article PubMed Google Scholar
Huerta-Cepas, J. et al. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
Article PubMed PubMed Central Google Scholar
Finn, R. D. et al. InterPro in 2017—Beyond protein family and domain annotations. Nucleic Acids Res. 45, D190–D199 (2017).
Article PubMed Google Scholar
El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).
Article PubMed Google Scholar
Letunic, I. & Bork, P. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 46, D493–D496 (2018).
Article PubMed Google Scholar
Wilson, D. et al. SUPERFAMILY—Sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37, D380–D386 (2009).
Article PubMed Google Scholar
Lewis, T. E. et al. Gene3D: Extensive Prediction of Globular Domains in Proteins. Nucleic Acids Res. 46, D1282 (2018).
Article PubMed Google Scholar
Marchler-Bauer, A. et al. CDD/SPARCLE: Functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 45, D200–D203 (2017).
Article PubMed Google Scholar
Chen, C. et al. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol. Plant. 13, 1194–1202 (2020).
Article PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31019928 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31019929 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31019930 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31019931 (2025).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_043790905.1 (2025).
Zhu, M. Genome annotation (repeats and protein-coding genes). figshare https://doi.org/10.6084/m9.figshare.28696565.v1 (2025).
Waterhouse, R. M. et al. BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics. Mol. Biol. Evol. 35, 543–548 (2018).
Article PubMed Google Scholar
Li, H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 34 (2018).
Dudchenko, O. et al. Twelve years of SAMtools and BCFtools. GigaScience. 10(2), giab008 (2021).
Article Google Scholar

Download references

Acknowledgements

This study was supported by grants from Henan Science and Technology Research Project (252102320222).

Author information

Authors and Affiliations

School of Geographic Sciences, Xinyang Normal University, Xinyang, 464000, China
Ming Zhu & Junhui Yan
College of Geography and Tourism, Zhengzhou Normal University, Zhengzhou, 450044, China
Jingjing Zhang
College of Life Sciences, Xinyang Normal University, Xinyang, 464000, China
Yanting Han

Authors

Ming Zhu
View author publications
Search author on:PubMed Google Scholar
Jingjing Zhang
View author publications
Search author on:PubMed Google Scholar
Junhui Yan
View author publications
Search author on:PubMed Google Scholar
Yanting Han
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.J. contributed to the research design. Z.M., Z.J. and H.Y. collected the samples. Z.M. analyzed the data. Z.M. and H.Y. wrote the draft manuscript and revised the manuscript. All co-authors contributed to this manuscript and approved it.

Corresponding author

Correspondence to Yanting Han.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhu, M., Zhang, J., Yan, J. et al. A chromosomal-level genome assembly of Kibakoganea sinica, Bouchard, 2005 (Coleoptera: Scarabaeidae). Sci Data 12, 1012 (2025). https://doi.org/10.1038/s41597-025-05347-4

Download citation

Received: 14 April 2025
Accepted: 05 June 2025
Published: 17 June 2025
DOI: https://doi.org/10.1038/s41597-025-05347-4