Chromosome-level genome assembly of Chinese water scorpion Ranatra chinensis (Heteroptera: Nepidae)

Liu, Xinzhi; Ma, Ling; Tian, Li; Song, Fan; Xie, Tongyin; Wu, Yunfei; Li, Hu; Cai, Wanzhi; Duan, Yuange

doi:10.1038/s41597-024-03856-2

Download PDF

Data Descriptor
Open access
Published: 18 September 2024

Chromosome-level genome assembly of Chinese water scorpion Ranatra chinensis (Heteroptera: Nepidae)

Xinzhi Liu¹,
Ling Ma¹,
Li Tian ORCID: orcid.org/0000-0002-7288-9676¹,
Fan Song¹,
Tongyin Xie²,
Yunfei Wu³,
Hu Li¹,
Wanzhi Cai¹ &
…
Yuange Duan ORCID: orcid.org/0000-0003-2311-9859¹

Scientific Data volume 11, Article number: 1016 (2024) Cite this article

2406 Accesses
6 Altmetric
Metrics details

Subjects

Abstract

Heteroptera (the true bugs), one of the most diverse lineages of insects, diversified in feeding strategies and living habitats, and thus become an ideal lineage for studies on adaptive evolution. Chinese water scorpion Ranatra chinensis (Heteroptera: Nepidae) is a predaceous bug living in lentic water systems, representing an ideal model for studying habitat transition and adaptation to water environment. However, genetic studies on this water bug remain limited. Here, we obtained a chromosome-level genome of R. chinensis using PacBio HiFi long reads and Hi-C sequencing reads. The total assembly size of genome is 867.89 Mb, with a scaffold N50 length of 26.48 Mb and the GC content of 39.50%. All contigs were assembled into 23 pseudo-chromosomes (N = 19 A + X1X2X3X4), and we predicted 18,424 protein-coding genes in this genome. This study will provide valuable genomic resources for future studies on the biology, water adaptation, and genome evolution of water bugs.

A chromosome-level genome assembly of tropical purple sea urchin Heliocidaris crassispina

Article Open access 18 December 2024

Chromosome-level genome assembly of Decorus tungting, an endemic cyprinid from China

Article Open access 04 November 2025

Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa

Article Open access 24 June 2023

Background & Summary

Biodiversity is the backbone of earth’s life support system, it plays a vital role in supporting and sustaining life on earth and maintaining the ecosystems functioning, it also underpins numerous essential benefits from nature that are vital for human well-being^1,2,3. The ecosystem involved enormous organisms, raising an interesting question that how animals adapt into different habitats. The ability of flight and water adaptation, for example, enables animals gain novel niches and escape predators. For a number of animals, the molecular bases underlying the adaptive evolution have been uncovered. These animals include marine mammals⁴, Asian honeybee⁵, the water strider⁶ and the poultry shaft louse⁷. However, the adaptive mechanisms of the majority of extant species remain unknown.

Insects are a crucial component of biodiversity and play a vital role in maintenance of ecological balance, which are estimated to be as many as 5.5 million species on earth⁸. They make up around 80% of animals and around half of all living species, and occupy nearly every terrestrial habitat on the planet^8,9. The true bugs (Hemiptera: Heteroptera) is one of the most diverse lineages of insects and the most diversified lineage among hemimetabola, which possesses a diversity of feeding strategies, ranging from predation on other arthropods and hematophagy on vertebrates, to mycetophagy and phytophagy. Hemipterans also vary in living habitats, including various terrestrial, aquatic and even marine habitats^10,11. The enormous diversity makes Hemiptera an ideal lineage for exploration of adaptive evolution. In previous studies, the common ancestor of Heteroptera was considered as terrestrial and experienced multiple independent evolutionary events on habitat transitions, from terrestrial to water surface, aquatic and other habitation, and from aquatic to shoreline^11,12. The infraorder Nepomorpha, specifically, containing many species living in water, is helpful for understanding on water adaptation and habitats transition. Exploring the water adaptation of water bugs can pave the way for the following studies related to living habitat adaptation. However, the lack of chromosome-level genomes prevents us from a deep genomic analysis of aquatic true bugs and also impedes the discovery of the adaptive mechanism.

The family Nepidae (Heteroptera: Nepomorpha), or water scorpions, can be recognized by the characteristics of the antennae hidden under the head and the long tail-like siphon (the respiratory siphon) on the rear end of their body that cannot be retracted into the apex of the abdomen, which is unique within the aquatic insects^13,14,15. Normally, they can be found in ponds, lakes, marshes and riversides and often move far from its aquatic habitat at night¹⁶. As predatory insects, they usually reside in the shade of plants in water or between stones, waiting for its preys, including small fish and aquatic insects and other invertebrates in water¹⁶. This means, Nepidae have evolved multiple derived traits compared to ancestors and are also potential natural enemies of health pests for biocontrol, because they are found to be able to control mosquito population by preying the larvae^17,18. Ranatra chinensis, also known as Chinese water scorpion, is a representative species of family Nepidae with a long caudally breathing tube. Its body size ranges from 41 mm to 48 mm, and the length of its respiratory siphon can be equal to that of torso¹⁸. Ranatra chinensis is widely distributed in China, from the northernmost Heilongjiang Province to southernmost Guangdong Province, making it a relatively accessible specimen for scientific studies.

There is no chromosome-level genome available within family Nepidae yet. Analysis on R. chinensis genome will benefit our interpretation about the molecular mechanisms of water bugs’ habitat transition and adaptative evolution. With the development of sequencing technologies and bioinformatic tools, obtaining a high-quality reference genome becomes feasible for most organisms, enabling us to investigate the genome evolution and the underlying molecular mechanisms. In this study, we assembled and annotated the chromosome-level genome of R. chinensis, combining of PacBio long-read, Illumina short-read sequencing, and chromosome conformation capture (Hi-C) technologies. Our data will provide genomic resources for future exploration on the biology and evolution of water bugs, and will facilitate the understanding of habitat transition and water adaptation.

Methods

Sample preparation

The R. chinensis specimen were collected from Xichuan County, Henan Province, China (33.24° N, 111.02° E), and were put into dry ice for transportation. All samples were stored at −80°C until further usage. The female adult was used for genomic DNA extraction based on the CTAB method, and extracted DNA were purified using a Blood and Cell Culture DNA Midi Kit (QIAGEN, Germany). The DNA degradation and contamination of the extracted DNA was monitored on 1% agarose gels. The purity of DNA samples was then detected using NanoDrop™. One UV-Vis spectrophotometer (Thermo Fisher Scientific, USA) and the Qubit® 4.0 Fluorometer (Invitrogen, USA) were used to measure the DNA concentration.

Genomic DNA sequencing

Illumina short read library was firstly constructed with an insertion size of about 350 bp, and then sequenced using the Illumina Novaseq6000 to generate 150 bp paired-end short reads. We obtained 59.89 Gb raw data of Illumina short reads and finally got 57.55 Gb clean data after removing adapters and low-quality short reads using Fastp version 0.21.0¹⁹ with default parameters (Table 1).

Table 1 Library sequencing data and methods used in this study to assemble the Ranatra chinensis genome.

Full size table

For long-read sequencing, a PacBio HiFi-read library with insertion sizes of 15 kb was generated, and sequenced the long DNA fragments using a SMRT cell on PacBio Sequel II sequencing platform (Pacific Biosciences, Menlo Park, USA). A total of 60.41 Gb clean data were obtained from the raw long reads generated using Circular Consensus Sequencing (CCS) model (Table 1).

A single female adult was used for chromosome conformation capture (Hi-C) sequencing and the library was prepared according to the standard protocol described by Belton with minor modifications²⁰. The sample was cut into pieces and mixed with 2% formaldehyde solution for cross-linking, and then treated with New England Biolabs (NEB) buffer to digest nuclei. Biotinylated nucleotides were used to fill the cohesive ends and purified DNA was sheared to fragments of 350 bp in length after ligation. DNA purification was achieved using QIAamp DNA Mini Kits (Qiagen). The final generated Hi-C library was sequenced on Illumina Novoseq6000 platform with paired-end 150 bp. The sequencing yields 59.18 Gb raw data and 57.87 Gb clean data obtained after applying the same filter criteria for short reads (Table 1).

Transcriptome sequencing

Total RNA was extracted from two adults (one female and one male) with the TRIzol reagent (Thermo Fisher Scientific, USA) for transcriptome sequencing. The construction of a paired-end library was obtained by using the TruSeq RNA Library Preparation Kit (Illumina, USA). The transcriptome sequencing was finished on an Illumina Novoseq6000 platform, resulting in a total of 4.86 Gb RNA-seq clean data (Table 1).

Genome size estimation

Genomic characteristics including genome size, heterozygosity, and duplication were estimated using 57.55 Gb clean Illumina short-reads. The distribution of k-mer copy number was calculated to perform this estimation in JELLYFISH version 2.1.3²¹. Genome size and genome heterozygosity were estimated based on 17-mer depth analysis in GenomeScope version 2.0²² with default parameters, and the results were 605.54 Mb and 2.27%, respectively (Fig. 1).

Chromosome-level genome assembly

The initial de novo assembly of R. chinensis genome was performed based on PacBio sequence data using Hifiasm version 0.13 with default parameters²³. After assembly, the genome was then polished by the Purge_Dups pipeline²⁴ to remove alternative haplotype and redundant fragments from the genome. Then, the subsequent polishing was performed using Illumina sequencing data to enhance the quality of the contigs. Finally, an 867.89 Mb contig-level genome assembly of R. chinensis was obtained based on PacBio sequencing data, containing 689 contigs with contig N50 and N90 sizes of 26.48 Mb and 3.80 Mb, respectively, and the GC content of 39.50% (Table 2).

Table 2 Statistics of the Ranatra chinensis genome assembly.

Full size table

The high-quality chromosome-scale genome was generated using a scaffolding pipeline based on Durand²⁵. Initially, BWA-MEM version 0.7.17²⁶ with the parameters: ‘mem -SP5M’ was used for mapping Hi-C data to the contig assembly genome. The DpnII sites were generated using the ‘generate_site_positions.py’ script in Juicer version 1.5²⁵. Subsequently, contigs were assembled into the chromosome-level scaffolds using the 3D-DNA pipeline with the parameter “-r 2”²⁷. After the confirmation by Hi-C contact maps, chromosome interaction matrix was manually adjusted and corrected using Juicebox version 1.11.08²⁸. Ultimately, we anchored and generated 23 pseudo-chromosomes, and the final chromosome-level genome assembly of R. chinensis was obtained with a scaffold N50 of 29.80 Mb (Fig. 2; Table 2).

Synteny analysis and the determination of sex chromosome in R. chinensis

Based on JCVI v1.1.17 with default parameters²⁹, we performed synteny analysis to confirm the sex chromosome in R. chinensis using public chromosome-level genomes whose sex chromosomes have been verified, including Rhynocoris fuscipes (Hemiptera: Reduviidae)³⁰, Triatoma rubrofasciata (Hemiptera: Reduviidae)³¹ and Riptortus pedestris (Hemiptera: Alydidae)³². The result showed that the Chr8, Chr21, Chr22 and Chr23 of R. chinensis exhibited high homology with Chr12 and Chr13 of T. rubrofasciata, Chr13, Chr14 and Chr15 of R. fuscipes, and the ChrX of R. pedestris (Fig. S1). This result indicates that R. chinensis has the same sex chromosome system as that of Nepa cinerea (Heteroptera: Nepidae, N = 14 A + X1X2X3X4) and Ranatra linearis (Heteroptera: Nepidae, N = 19 A + X1X2X3X4) reported in previous study³³, and Chr8, Chr21, Chr22 and Chr23 correspond to the X1, X2, X3, and X4 chromosomes.

Prediction of repetitive elements

Repeat sequences of R. chinensis genome were detected in Extensive de novo TE Annotator (EDTA) version 1.9.4³⁴. LTR retrotransposons were determined in LTR FINDER version 1.07³⁵, LTRharvest³⁶, and LTR retriever version 2.9.0³⁷ with default parameters. DNA transposons were classified utilizing TIR Learner³⁸ and HelitronScanner³⁹ with default parameters. RepeatMasker version 4.0.7 (parameters: -gff -xsmall -no_is)⁴⁰ and RepeatProteinMask version 4.0.7 (parameters: -engine wublast) were used to find the interspersed repeats against the RepBase database⁴¹ (http://www.girinst.org/repbase). In addition, Tandem Repeats Finder version 4.09⁴² was used to classify tandem repeats with parameters ‘2 7 7 80 10 50 500 -f -d -m’ based on the de novo prediction. RepeatModeler version 2.0.4⁴³ (parameters: ‘-engine ncbi -pa 4’) was utilized to construct a repetitive sequence library and RepeatMasker version 4.0.7⁴⁰ was used for annotation of the repeat element against this repeat library. In the genomic sequences, a total of 391.73 Mb (45.13%) repetitive elements were identified, mainly including 22.96% retrotransposon, 3.35% DNA transposons and 11.26% tandem repeat (Table 3). Retrotransposons include LTR, SINE, and LINE; and LTR is further classified in to Copia, Gypsy, and other LTR.

Table 3 Repeats elements statistics in genome of Ranatra chinensis.

Full size table

Protein-coding gene prediction and functional annotation

Protein-coding genes (PCGs) within the genome were predicted by a combined method of homology-based prediction, ab initio prediction and transcriptome-based prediction. HISAT2 version 2.2.1⁴⁴ was utilized to map RNA-seq short data to the genome with the parameter ‘-k 2’. Then the StringTie version 2.4.0⁴⁵ was used to assemble the mapped reads into transcripts with default parameters. For the homology-based prediction, the protein sequences of eight representative insect species were downloaded from the NCBI GenBank database (Table S1). Homologous proteins were aligned against R. chinensis genome using Exonerate version 2.4.0 with default parameters to train the gene sets. Additionally, the bam2hints program (parameter: -intronsonly) in AUGUSTUS version 3.2.3⁴⁶ was employed to transfer the sorted and mapped bam file of RNA-seq data into a hints file. To predict coding genes from the assembled genome, AUGUSTUS version 3.2.3⁴⁶ with default parameters was performed for prediction, in which the combination of trained gene sets and hint files was the input. In the end, MAKER version 2.31.10⁴⁷ was utilized to merge and generate a consensus high-confidence gene set on the basis of homology-based, de novo-derived and transcript genes. We predicted a total of 18,424 genes in R. chinensis genome with an average gene length of 5,900.48 bp (Table 4). The average length of coding sequences (CDS) and protein sequence were 1,189.79 bp and 396.60 AA, respectively (Table 4). The above statistics on sex chromosomes and autosomes were provided in Table S2.

Table 4 Statistics of predicted protein-coding genes of Ranatra chinensis genome assembly.

Full size table

The predicted genes were functionally annotated using multiple methods, includeincludeing eggnog-mapper⁴⁸ (parameter: -m diamond–tax_scope auto–go_evidence experimental–target_orthologs all–seed_ortholog_evalue 0.001–seed_ortholog_score 60–query-cover 20–subject-cover 0 –override), InterProscan version 5.0⁴⁹ (parameter: -iprlookup -goterms -appl Pfam -f TSV), BLAST version 2.2.28⁵⁰ (parameter: -evalue 1e-5), and HMMER version 3.3.2⁵¹ (parameter: –noali–cut_ga Pfam-A.hmm). All these approaches were performed to search against several public databases: Gene Ontology (GO), Clusters of Orthologous Groups of Proteins (COG), Kyoto Encyclopedia of Genes and Genomes (KEGG), NCBI non-redundant protein (Nr), Swiss-Prot, and Pfam. Overall, 16,262 genes were functionally annotated with at least one public database (Table 5).

Table 5 Number of functionally annotated protein-coding gene of Ranatra chinensis genome.

Full size table

Data Records

Genomic Illumina short-reads data were deposited at the NCBI Sequence Read Archive database under accession number SRR28785292⁵². Genomic PacBio HiFi sequencing data were deposited at the NCBI Sequence Read Archive database under accession number SRR28789380⁵³. RNA-seq data was deposited at the NCBI Sequence Read Archive database under accession number SRR28788880⁵⁴. The Hi-C sequencing data were deposited at the NCBI Sequence Read Archive database under accession number SRR28787538⁵⁵.

The final chromosome assembly was submitted to GenBank at NCBI under accession number JBFDAA000000000⁵⁶. The genome sequence and raw reads have been deposited in GenBank and Sequence Read Archive at NCBI under BioProject PRJNA1103718⁵⁷.

Technical Validation

To assess the accuracy of the final genome assembly, we mapped the Illumina short-reads to the R. chinensis genome with BWA-MEM version 0.7.17²⁶, and the result showed 97.08% of short reads were successfully mapped to the genome. Benchmarking Universal Single-Copy Orthologs (BUSCO version 3.0.2)⁵⁸ was used to evaluate the genome completeness based on the insecta_odb10 database, revealing the completeness was 95.7%. Among 1,309 orthologous, 1,299 genes were classified as complete single-copy genes and 10 genes were complete duplicated genes, eight genes were fragmented and 48 genes were missing (Table 6).

Table 6 BUSCO evaluation for the final genome assembly of Ranatra chinensis.

Full size table

Code availability

The bioinformatic analyses were performed using the manuals and protocols by the software developers, with the manually adjusted parameters clearly described in the Methods. No custom script or code was used in this study.

References

Cardinale, B. et al. Biodiversity loss and its impact on humanity. Nature 486, 59–67, https://doi.org/10.1038/nature11148 (2012).
Article ADS PubMed CAS Google Scholar
Brauman, K. A. et al. Global trends in nature’s contributions to people. Proc. Natl. Acad. Sci. USA 117, 32799–32805, https://doi.org/10.1073/pnas.2010473117 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Kim, H. J. et al. Understanding the role of biodiversity in the climate, food, water, energy, transport and health nexus in Europe. Sci. Total Environ. 925, 171692, https://doi.org/10.1016/j.scitotenv.2024.171692 (2024).
Article PubMed CAS Google Scholar
Yuan, Y. et al. Comparative genomics provides insights into the aquatic adaptations of mammals. Proc. Natl. Acad. Sci. USA 118, e2106080118, https://doi.org/10.1073/pnas.2106080118 (2021).
Article PubMed PubMed Central CAS Google Scholar
Ji, Y. et al. Gene reuse facilitates rapid radiation and independent adaptation to diverse habitats in the Asian honeybee. Sci. Adv. 6, eabd3590 https://www.science.org/doi/10.1126/sciadv.abd3590 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Santos, M., Le Bouquin, A., Crumière, A. & Khila, A. Taxon-restricted genes at the origin of a novel trait allowing access to a new environment. Science 358, 386–390, https://www.science.org/doi/10.1126/science.aan2748 (2017).
Article ADS PubMed CAS Google Scholar
Xu, Y. et al. Chromosome-level genome of the poultry shaft louse Menopon gallinae provides insight into the host-switching and adaptive evolution of parasitic lice. GigaScience 13, giae004, https://doi.org/10.1093/gigascience/giae004 (2024).
Article PubMed PubMed Central Google Scholar
Stork, N. E. How many species of insects and other terrestrial arthropods are there on earth? Annu. Rev. Entomol. 63, 31–45, https://doi.org/10.1146/annurev-ento-020117-043348 (2018).
Article PubMed CAS Google Scholar
Slade, E. M. & Ong, X. R. The future of tropical insect diversity: strategies to fill data and knowledge gaps. Curr. Opin. Insect Sci. 58, 101063, https://doi.org/10.1016/j.cois.2023.101063 (2023).
Article PubMed Google Scholar
Schuh, R. T. & Slater, J. A. True bugs of the world (Hemiptera: Heteroptera) classification and natural history. Cornell University Press, Ithaca, NY. (1995).
Weirauch, C., Schuh, R. T., Cassis, G. & Wheeler, W. C. Revisiting habitat and lifestyle transitions in Heteroptera (Insecta: Hemiptera): insights from a combined morphological and molecular phylogeny. Cladistics 35, 67–105, https://doi.org/10.1111/cla.12233 (2019).
Article PubMed Google Scholar
Li, H. et al. Mitochondrial phylogenomics of Hemiptera reveals adaptive innovations driving the diversification of true bugs. Proc. R. Soc. B 284, 20171223, https://doi.org/10.1098/rspb.2017.1223 (2017).
Article PubMed PubMed Central CAS Google Scholar
Chen, P., Nieser, N. & Ho, J. Review of Chinese Ranatrinae (Hemiptera: Nepidae), with descriptions of four new species of Ranatra Fabricius. Tijd. Entomol. 147, 81–102, https://api.semanticscholar.org/CorpusID:84596642 (2004).
Article Google Scholar
Chen, P., Nieser, N. & Zettel, H. The aquatic and semi-aquatic bugs (Heteroptera: Nepomorpha & Gerromorpha) of Malesia. Fauna Malesiana Handbooks 5. Leiden and Boston, Brill (546 pp), https://api.semanticscholar.org/CorpusID:82862404 (2005).
Polhemus, D. A. & Polhemus, J. T. Guide to the aquatic Heteroptera of Singapore and Peninsular Malaysia. X. Infraorder Nepomorpha – Families Belostomatidae and Nepidae. Raffles Bull. Zool. 61, 25–45, https://api.semanticscholar.org/CorpusID:87760711 (2013).
Google Scholar
Ueno, M. Hemiptera. In: Freshwater Biology of Japan. (Edited by UCno M.), Hokuryukan Pub. Co. Ltd, Tokyo. pp. 567–575, (1973).
Shaalan, E. A. & Canyon, D. V. Aquatic insect predators and mosquito control. Trop. Biomed. 26, 223–261, https://pubmed.ncbi.nlm.nih.gov/20237438/ (2009).
PubMed Google Scholar
Cui, J. & Cai, W. Taxonomic note on Ranatra Fabricius from Henan of China. Journal of Henan Agricultural Sciences 44, 87–90, https://www.hnnykx.org.cn/EN/Y2015/V44/I2/87 (2015).
Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, 884–890, https://doi.org/10.1093/bioinformatics/bty560 (2018).
Article CAS Google Scholar
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
Article PubMed CAS Google Scholar
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Article PubMed PubMed Central CAS Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
Article PubMed PubMed Central CAS Google Scholar
Cheng, H. et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–5, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Article PubMed PubMed Central CAS Google Scholar
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898, https://doi.org/10.1093/bioinformatics/btaa025 (2020).
Article PubMed PubMed Central CAS Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
Article PubMed PubMed Central CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).
Article PubMed PubMed Central CAS Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95, https://doi.org/10.1126/science.aal3327 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101, https://doi.org/10.1016/j.cels.2015.07.012 (2016).
Article PubMed PubMed Central CAS Google Scholar
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488, https://www.science.org/doi/10.1126/science.1153917 (2008).
Article ADS PubMed CAS Google Scholar
Ma, L. et al. Comparative genomic analyses on assassin bug Rhynocoris fuscipes (Hemiptera: Reduviidae) reveal genetic bases governing the diet-shift. iScience 27, 110411, https://doi.org/10.1016/j.isci.2024.110411 (2024).
Article PubMed PubMed Central CAS Google Scholar
Liu, Q. et al. A chromosomal-level genome assembly for the insect vector for Chagas disease, Triatoma rubrofasciata. GigaScience 8, giz089, https://doi.org/10.1093/gigascience/giz089 (2019).
Article ADS PubMed PubMed Central CAS Google Scholar
Huang, H. J. et al. Chromosome-level genome assembly of the bean bug Riptortus pedestris. Mol. Ecol. Resour. 21, 2423–2436, https://doi.org/10.1111/1755-0998.13434 (2021).
Article PubMed CAS Google Scholar
Angus, R. B., Jeangirard, C., Stoianova, D., Grozeva, S. & Kuznetsova, V. G. A chromosomal analysis of Nepa cinerea Linnaeus, 1758 and Ranatra linearis (Linnaeus, 1758) (Heteroptera, Nepidae). Comp. Cytogenet. 11, 641–657, https://pubmed.ncbi.nlm.nih.gov/29114353/ (2017).
Article PubMed PubMed Central Google Scholar
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 1–18, https://doi.org/10.1186/s13059-019-1905-y (2019).
Article CAS Google Scholar
Ou, S. & Jiang, N. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mobile DNA 10, 48–48, https://doi.org/10.1186/s13100-019-0193-0 (2019).
Article PubMed PubMed Central CAS Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 1–14, https://doi.org/10.1186/1471-2105-9-18 (2008).
Article CAS Google Scholar
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422, https://doi.org/10.1104/pp.17.01310 (2017).
Article PubMed PubMed Central CAS Google Scholar
Su, W., Gu, X. & Peterson, T. TIR-Learner, a new ensemble method for TIR transposable element annotation, provides evidence for abundant new transposable elements in the maize genome. Mol. Plant 12, 447–460, https://doi.org/10.1016/j.molp.2019.02.008 (2019).
Article PubMed CAS Google Scholar
Xiong, W., He, L., Lai, J., Dooner, H. K. & Du, C. HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc. Natl. Acad. Sci. USA 111, 10263–10268, https://doi.org/10.1073/pnas.141006811 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 5, 1–14, https://doi.org/10.1002/0471250953.bi0410s25 (2004).
Article Google Scholar
Jurka, J. et al. Repbase update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467, https://doi.org/10.1186/s13100-015-0041-9 (2005).
Article PubMed CAS Google Scholar
Benso, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580, https://doi.org/10.1093/nar/27.2.573 (1999).
Article Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360, https://doi.org/10.1038/nmeth.3317 (2015).
Article PubMed PubMed Central CAS Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 1–13, https://doi.org/10.1186/s13059-019-1910-1 (2019).
Article CAS Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, 435–439, https://doi.org/10.1093/nar/gkl200 (2006).
Article CAS Google Scholar
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196, http://www.genome.org/cgi/doi/10.1101/gr.6743907 (2008).
Article PubMed PubMed Central CAS Google Scholar
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829, https://doi.org/10.1093/molbev/msab293 (2021).
Article PubMed PubMed Central CAS Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240, https://doi.org/10.1093/bioinformatics/btu031 (2014).
Article PubMed PubMed Central CAS Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421–429, https://doi.org/10.1186/1471-2105-10-421 (2009).
Article CAS Google Scholar
Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, 29–37, https://doi.org/10.1093/nar/gkr367 (2011).
Article CAS Google Scholar
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28785292 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28789380 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28788880 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28787538 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc:JBFDAA000000000 (2024).
NCBI BioProject https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1103718 (2024).
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 32120103006, 31922012) and the 2115 Talent Development Program of China Agricultural University. We thank Shuxian Chen for the help in statistics.

Author information

Authors and Affiliations

Department of Entomology and MOA Key Lab of Pest Monitoring and Green Management, College of Plant Protection, China Agricultural University, Beijing, 100193, China
Xinzhi Liu, Ling Ma, Li Tian, Fan Song, Hu Li, Wanzhi Cai & Yuange Duan
College of Plant Protection, Northeast Agricultural University, Harbin, 150030, China
Tongyin Xie
School of Biology Science and Food Engineering, Chuzhou University, Anhui, 293000, China
Yunfei Wu

Authors

Xinzhi Liu
View author publications
Search author on:PubMed Google Scholar
Ling Ma
View author publications
Search author on:PubMed Google Scholar
Li Tian
View author publications
Search author on:PubMed Google Scholar
Fan Song
View author publications
Search author on:PubMed Google Scholar
Tongyin Xie
View author publications
Search author on:PubMed Google Scholar
Yunfei Wu
View author publications
Search author on:PubMed Google Scholar
Hu Li
View author publications
Search author on:PubMed Google Scholar
Wanzhi Cai
View author publications
Search author on:PubMed Google Scholar
Yuange Duan
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.D. and W.C. conceived the project. L.M., X.L., Y.W. and T.X. collected samples and extracted genomic DNA. L.M., X.L. and Y.W. performed data analysis and wrote the manuscript. L.T., F.S., T.X. and H.L. contributed to data analyses. All authors contributed to revising the manuscript. All authors have read and approved the final version.

Corresponding authors

Correspondence to Wanzhi Cai or Yuange Duan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

supplementary_clean version

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, X., Ma, L., Tian, L. et al. Chromosome-level genome assembly of Chinese water scorpion Ranatra chinensis (Heteroptera: Nepidae). Sci Data 11, 1016 (2024). https://doi.org/10.1038/s41597-024-03856-2

Download citation

Received: 22 July 2024
Accepted: 03 September 2024
Published: 18 September 2024
Version of record: 18 September 2024
DOI: https://doi.org/10.1038/s41597-024-03856-2