A chromosome-level genome assembly of the heteronomous hyperparasitoid wasp Encarsia sophia

Man, Xiaoming; Huang, Cong; Wu, Shengyong; Guo, Jianyang; Wan, Fanghao; Francis, Frédéric; Yang, Nianwan; Liu, Wanxue

doi:10.1038/s41597-024-04040-2

Download PDF

Data Descriptor
Open access
Published: 19 November 2024

A chromosome-level genome assembly of the heteronomous hyperparasitoid wasp Encarsia sophia

Xiaoming Man^1,2,
Cong Huang¹,
Shengyong Wu¹,
Jianyang Guo¹,
Fanghao Wan¹,
Frédéric Francis²,
Nianwan Yang^1,3 &
…
Wanxue Liu¹

Scientific Data volume 11, Article number: 1250 (2024) Cite this article

2200 Accesses
Metrics details

Subjects

Abstract

Encarsia sophia, a heteronomous hyperparasitoid wasp, is a well-known biological control agent, but its genomic information is limited, hindering molecular investigations and understanding of multitrophic interactions. In this study, we present a chromosome-level genome assembly for E. sophia using Illumina, PacBio HiFi, and Hi-C technologies. The assembled genome size is 398.3 Mb, with a contig N50 of 1.0 Mb and a scaffold N50 of 74.0 Mb. The BUSCO completeness score is 97.1%, and genome coverage reaches 99.1%. Utilizing Hi-C assisted assembly, the genome was organized into five chromosomes, with a mounting rate of 95.1%. Repetitive sequences make up 54.6% of the genome, and 14,914 protein-coding genes were predicted, with 95.5% functionally annotated. The high-quality genome assembly of E. sophia is a significant achievement, marking the first complete genome for a heteronomous hyperparasitoid wasp. This milestone offers valuable insights into the evolution and host interactions of heteronomous hyperparasitoids, laying the foundation for extensive research in biological control.

A chromosome-level genome assembly of the parasitoid wasp Eretmocerus hayati

Article Open access 06 September 2023

Chromosome-level genome assembly of the parasitoid wasp Aenasius arizonensis

Article Open access 17 May 2025

Chromosome-level genome assembly of Chouioia cunea Yang, the parasitic wasp of the fall webworm

Article Open access 26 July 2023

Background & Summary

The Hymenoptera, one of the four largest orders in the class Insecta, is one of the most species-rich groups of insects. With the advancement of sequencing technologies, this order has become a hotspot in insect genomics research^1,2. Currently, the number of sequenced Hymenoptera genomes has reached 557 (on April 2024, based on statistics from NCBI), with 388 species sequenced in the past three years, and annotation information submitted for 125 species. Among these sequenced Hymenoptera species, 258 belong to parasitoids, primarily including 36 species of Cynipoidea, 75 species of Chalcidoidea, 98 species of Ichneumonoidea, 42 species of Proctotrupoidea, 6 species of Chrysidoidea, and 1 species of Orussidea.

Encarsia sophia (Hymenoptera: Aphelinidae) is a dominant parasitoid of the “super pest” Bemisia tabaci (Hemiptera: Aleyrodidae), serving as a crucial biological control agent against global populations of whiteflies due to its remarkable parasitic and destructive capabilities on the host^3,4,5. The reproductive strategy of this parasitoid is rather unique, being a typical heteronomous hyperparasitoid. Males and females develop heteronomously, obtaining their nutritional resources from different host insects. Females, the primary parasitoids, arise from fertilized eggs and parasitize directly within the target host insect, feeding on the larvae or nymphs of the host to complete their development. Conversely, males, arising from unfertilized eggs, act as hyperparasitoids and can only parasitize secondary hosts, i.e., those already parasitized by the primary parasitoids, feeding on the larvae of the primary parasitoids to complete their development^6,7,8,9. Here, mated female E. sophia parasitize directly within the nymphs of the B. tabaci, laying fertilized eggs that develop into female offspring, serving as primary parasitoids. Unmated females, on the other hand, can only parasitize secondary hosts, laying unfertilized eggs within the nymphs of conspecific or heterospecific parasitoids already parasitized within the whitefly nymphs, producing male offspring, acting as hyperparasitoids^10,11. So far, no genome of a heteronomous parasitoid has been reported. In order to gain deeper insights into the characteristics of such parasitoids, we conducted whole-genome sequencing and chromosomal-level assembly of E. sophia using Illumina, PacBio, and Hi-C technologies.

Methods

Parasitoid Wasp Collection and Sequencing

Encarsia sophia population, introduced in 2008 from the Vegetable Pest Integrated Management Laboratory at Texas A&M University, USA. They were reared in the insectarium of the Laboratory of Biological Invasion Research at the Langfang Research and Development Base of the Chinese Academy of Agricultural Sciences, using cotton plant B. tabaci nymphs as hosts (26 ± 1 °C, RH65 ± 5%, light cycle 14 L:10D). The B. tabaci laboratory population originates from the MEAM1 population maintained by the Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences (CAAS), in a greenhouse at the Institute of Plant Protection, CAAS, with no history of pesticide use. The cotton variety used is CCRI 49. E sophia is a typical heteronomous hyperparasitoid with a unique reproductive strategy: females act as primary parasitoids, parasitizing first- to fourth-instar B. tabaci nymphs (primary hosts). In contrast, solitary females produce male offspring, acting as secondary parasitoids parasitizing conspecific or heterospecific parasitoid larvae inside B. tabaci nymphs (secondary hosts). Given that males are secondary parasitoids, we collected newly emerged females for sequencing. To obtain newly emerged parasitoids, we used insect pins to transfer females from black pupae to centrifuge tubes (1.5 mL). We conducted daily checks for newly emerged adults and collected a total of 4,000 females for DNA extraction using the QIAamp DNA Mini Kit (QIAGEN). Following extraction, the purity, concentration, and integrity of the DNA were evaluated with the NanoDrop 2000&8000, Qubit Fluorometer, and Agilent 4200 Bioanalyzer, respectively.

Genome size estimation and assembly

The high-quality DNA samples from E. sophia were randomly sheared using a Covaris ultrasonic disruptor. Subsequent steps, such as end repair, A-tailing, adapter ligation, purification, and PCR amplification, were performed to complete the library construction process. The constructed library was subjected to paired-end sequencing using Illumina HiSeq. By removing reads with adapter sequences and those containing more than 10% uncertain bases (N), as well as discarding single-end reads where the proportion of low-quality bases (quality score below 5) exceeds 20%, we obtained the filtered clean reads. Then, a k-mer frequency histogram was generated using Jellyfish 2.2.7 with the following parameters: “-G 2 -m 17 -C -o kmercount.”¹², yielding the following estimations: a genome size of 412.21 Mbp, corrected to 404.2 Mbp, heterozygosity rate of 0.52%, and a repeat sequence proportion of 52.84% (Fig. 1). To obtain the preliminary genome assembly of E. sophia, we utilized 49,702,845,900 bp of second-generation sequencing data and assembled it using the Soapdenovo software. The assembly was then scaffolded using kmer41. The initial assembly results showed that the genome of E. sophia had a contig N50 of 1,272 bp with a total length of 318,591,742 bp, and a scaffold N50 of 2,192 bp with a total length of 328,391,604 bp (Table 1).

Table 1 Encarsia sophia genome assembly to scaffold results.

Full size table

Sequencing was conducted using the PacBio platform, resulting in a total sequencing volume of 148 G with a coverage depth of 366.16X (calculated based on the survey-estimated genome size of 404.20 M). Additionally, a short-insert library was prepared and sequenced using the Illumina platform (Table 2). Using the sequencing data, de novo assembly of the E. sophia genome was performed with HiFiasm¹³. The genome assembled by Hifiasm has a length of 398.19 Mbp, with a contig N50 of 1.33 Mbp (sequences above 100 bp were selected for the assembly results) (Table 3).

Table 2 Summary of DNA/RNA sequencing data utilized for the genome assembly of Encarsia sophia.

Full size table

Table 3 Encarsia sophia genome denovo assembly results statistics.

Full size table

To obtain the chromosome-level genome of E. sophia, a Hi-C sequencing library was constructed using Hi-C technology¹⁴, incorporating DNA from 20,000 female adults. Hi-C data were obtained from the sequencing, and the contigs/scaffolds assembled were anchored to approximate chromosome-level using the All-hic software¹⁵. Subsequently, the juicebox software (https://github.com/aidenlab/Juicebox) was utilized for manual correction based on chromosomal interaction intensity, resulting in the final chromosome-level genome of E. sophia (Table 4). Following Hi-C-assisted assembly, the E. sophia genome assembled at the chromosome-level comprises a total of 5 sequences, with an additional 189 sequences remaining unassembled at the chromosome-level. The total genome length is 398,274,414 bp, of which 378,887,893 bp is assembled onto chromosomes (Fig. 2). The genome mapping rate achieved is 95.1% (Tables 5, 6). (Results were based on contigs above 100 bp for assembly statistics).

Table 4 Statistical results of the Encarsia sophia genome assembly, both from the initial de novo assembly and after Hi-C scaffolding.

Full size table

Table 5 Encarsia sophia single chromosome cluster number and length statistics of Hi-C assemble.

Full size table

Table 6 Encarsia sophia genome mapping rate of de novo and afer Hi-C scaffolding.

Full size table

Genome quality assessment

We employed different methods to assess the sequence integrity, consistency, and accuracy of the genome assembly. Firstly, the integrity of the E. sophia genome assembly was assessed using BUSCO with the insecta-odb10 database¹⁶, employing software such as MetaEuk and HMMER. The assembly resulted in 97.1% complete BUSCO genes, with 92.1% being single-copy genes and 5.0% being completely duplicated genes. Additionally, a core gene library comprising 248 conservative genes present in six eukaryotic model organisms was used for CEGMA assessment¹⁷ using tblastn, genewise, and geneid software. The assembly successfully identified 233 out of 248 core eukaryotic genes, indicating a completeness rate of 93.9%. Secondly, the sequence consistency of the E. sophia genome was assessed by aligning short-insert library reads using BWA software (http://bio-bwa.sourceforge.net/)¹⁸. The analysis revealed a HiFi reads alignment rate of approximately 97.6% and a genome coverage rate of around 99.1%, demonstrating strong consistency between the reads and the assembled genome. SNP calling was performed using samtools (http://samtools.sourceforge.net/) on the BWA alignment results, and after filtering and statistical analysis¹⁹, the genome exhibited a heterozygous SNP rate of 0.317095% and a homozygous SNP rate of 0.000943%, demonstrating excellent single-base accuracy in the assembly. Thirdly, the sequence accuracy of the E. sophia genome was assessed using Merqury software (https://github.com/marbl/merqury) with Illumina sequencing data. The quality value (Qv) of the genome, calculated based on K-mer using the Merqury-mash module^20,21, was determined to be 33.6653, indicating a base accuracy rate exceeding 99.9%. In conclusion, the E. sophia genome assembly exhibits good consistency, completeness, and accuracy (Table 7).

Table 7 Encarsia sophia genome assembly quality assessment results.

Full size table

Genome annotation

Our approach to repetitive annotation utilizes a thorough strategy that combines homology alignment with de novo search to detect repetitive sequences across the entire genome. We utilized TRF (http://tandem.bu.edu/trf/trf.html)²² for ab initio prediction, extracting tandem repeat sequences. For homology-based prediction, we utilized the standard Repbase database (http://www.girinst.org/repbase)²³, employing RepeatMasker (http://www.repeatmasker.org/)²⁴ and its internal script, RepeatProteinMask, to identify repetitive regions with default settings. In the de novo prediction process, we applied LTR_FINDER (http://tlife.fudan.edu.cn/ltr_finder/)²⁵, RepeatScout (http://www.repeatmasker.org/), and RepeatModeler (http://www.repeatmasker.org/RepeatModeler.html)²⁶ to create a de novo repetitive element database. Subsequently, all repetitive sequences longer than 100 bp with an ‘N’ content below 5% were included in the initial transposable element (TE) library. This custom library, created by merging Repbase with our de novo TE library and refined using uclust to remove redundancy, was then utilized by RepeatMasker for the identification of repetitive sequences at the DNA level. The E. sophia genome contains 214.7 Mb of repetitive sequences, constituting 53.92% of the genome. Among them, long terminal repeats (LTRs) are the most abundant, accounting for 34.59% of the total, followed by Unknown (12.17%), 7.18% DNA elements, 3.96% long interspersed nuclear elements (LINEs), and short interspersed nuclear elements (SINEs) at a mere 0.02% (Table 8).

Table 8 Encarsia sophia genome repeat sequence classification result statistics.

Full size table

The protein-coding gene annotation in the E. sophia genome integrates de novo prediction, homology-based approaches, and RNA-Seq-supported modeling for gene prediction²⁷. For de novo gene prediction, our automated gene prediction pipeline utilized Augustus (v3.2.3) (http://bioinf.uni-greifswald.de/augustus/)²⁸, Geneid (v1.4), Genescan (v1.0), GlimmerHMM (v3.04) (http://ccb.jhu.edu/software/glimmerhmm/)²⁹, and SNAP (http://homepage.mac.com/iankorf/)³⁰. Homologous protein sequences were downloaded from NCBI Nasonia vitripennis (Nvit), Ceratosolen solmsi (Csol), Copidosoma floridanum (Cflo), Trichogramma brassicae (Tbra), Trichomalopsis sarcophagae (Tsar), Trichogramma pretiosum (Tpre). Using TblastN (v2.2.26; E-value ≤ 1e−5), protein sequences were aligned to the E. sophia genome³¹, and GeneWise (v2.4.1)³² software was employed to align matching proteins with homologous genomic sequences for accurate splice alignment and prediction of gene structures within each protein region. We constructed seven RNA-seq libraries, including different developmental stages of female E. sophia (600 eggs, Bemisia tabaci nymphs parasitized for <24 hours, dissected for host sampling; 200 first-instar larvae, B. tabaci nymphs parasitized for 48–60 hours, dissected for host sampling; 200 second-instar larvae, B. tabaci nymphs parasitized for 72–84 hours, dissected for host sampling; 80 third-instar larvae, B. tabaci nymphs parasitized for 120–132 hours, dissected for host sampling; 40 prepupae, B. tabaci nymphs parasitized for 168–178 hours, sampled after removing the host shell; 30 pupae, B. tabaci nymphs parasitized for 216–228 hours, sampled after removing the host shell; 50 adults, eclosed within < 24 hours.). Total RNA extracted from the aforementioned samples were used for library preparation, and sequencing was performed on the Illumina NovaSeq6000 platform³³. The sequencing output generated a total of 60.51 G raw data, and after filtering, 59.88 G clean data was used for genome annotation. For genome annotation, the transcriptome was assembled using Trinity (v2.1.1)³⁴. To refine the annotation, RNA-Seq data were processed with Hisat (v2.0.4)³⁵ under default settings to identify exonic regions and splice sites. The alignment results were subsequently used as input for Stringtie (v1.3.3)³⁶ with its default parameters, facilitating genome-guided transcriptome assembly. A comprehensive, non-redundant reference gene set was then created by merging the predictions from all three methods using EvidenceModeler (EVM, v1.1.1)³⁷, which incorporated masked transposable elements for accurate gene prediction. A total of 14,914 protein-coding genes were predicted in E. sophia genome. The average length of predicted genes was 11,273.01 base pairs, with an average protein-coding region length of 1,451.53 bp. The average lengths of exons and introns were 275.58 and 2,301.66 bp, respectively. On average, each gene contained 5.27 exons (Table 9, Fig. 3).

Table 9 Encarsia sophia statistical results of genome gene structure prediction.

Full size table

Gene functions were determined by aligning the E. sophia protein sequences with the Swiss-Prot database using Blastp, applying a threshold E-value of ≤1e−5 to identify the best matches. InterProScan70 (v5.31)³⁸ was used to annotate protein motifs and domains through searches across various public databases such as ProDom, PRINTS, Pfam, PANTHER, PROSITE and SMART. Gene Ontology (GO) IDs were subsequently assigned based on the relevant InterPro entries. We mapped the genes to the NR20 database using the closest BLAST hits from the Swissprot20 database³⁹ (E-value < 10–5) and DIAMOND (v0.8.22)/BLAST hits (E-value < 10-5). Furthermore, the genome was aligned with KEGG pathways⁴⁰ to determine the best match for each gene. Ultimately, 14,245 genes (95.5% of the total) in E. sophia genome were successfully annotated in at least one database⁴¹ (Table 10).

Table 10 Functional annotation of Encarsia sophia proteins.

Full size table

To annotate non-coding RNAs (ncRNAs) in the E. sophia genome, tRNA genes were predicted using the tRNAscan-SE tool (http://lowelab.ucsc.edu/tRNAscan-SE/)⁴². Given the high conservation of rRNA sequences, we used sequences from closely related species as references and applied BLAST to identify rRNAs. Other ncRNAs, such as snRNAs and miRNAs, were detected by querying the Rfam database⁴³ with the infernal software (http://infernal.janelia.org/)⁴⁴, employing default parameters. In the end, a total of 1,457 non-coding RNAs were predicted, comprising 513 micro-RNAs (miRNAs), 514 transfer RNAs (tRNAs), 328 ribosomal RNAs (rRNAs), and 102 small nuclear RNAs (snRNAs) (Table 11).

Table 11 Encarsia sophia genome non-coding RNA statistical results.

Full size table

Data Records

The sequencing data for the E. sophia genome, including Illumina, PacBio, and Hi-C datasets, have been deposited in the Sequence Read Archive (SRA) at the National Center for Biotechnology Information (NCBI) under accession numbers SRR29702816, SRR29702817, SRR29702818^45,46,47, and in the Genome Sequence Archive (GSA) of the National Genomics Data Center (NGDC) under accession numbers BioProject PRJNA1131600 (NCBI) and CRA017569⁴⁸ (NGDC). The transcriptome data used for annotation, covering various developmental stages of female E. sophia, have been stored in the SRA of NCBI and the GSA of NGDC: Egg (SRR29702811⁴⁹, CRR1218365), 1st instar larva (SRR29702815⁵⁰, CRR1218361), 2nd instar larva (SRR29702814⁵¹, CRR1218362), 3rd instar larva (SRR29702813⁵², CRR1218363), prepupa (SRR29702810⁵³, CRR1218366), pupa (SRR29702809⁵⁴, CRR1218367), and adult (SRR29702812⁵⁵, CRR1218364). This Whole Genome Shotgun project has been deposited at GenBank under the accession JBFBOU000000000⁵⁶. The genome annotation files of Encarsia sophia are available in figshare under a https://doi.org/10.6084/m9.figshare.2642675241.

Technical Validation

The quality, concentration, and integrity of the DNA samples were evaluated using a NanoDrop 2000&8000, a Qubit 3.0 Fluorometer (Thermo Fisher Scientific, USA), and an Agilent 4200 Bioanalyzer (Agilent Technologies, CA, USA), respectively. RNA integrity was assessed using the RNA Nano 6000 kit on the Bioanalyzer 2100 system (Agilent Technologies, CA, USA). High-quality DNA and RNA were selected for library preparation and sequencing. Genome assembly integrity was verified using BUSCO (Benchmarking Universal Single-Copy Orthologs: http://busco.ezlab.org/) and CEGMA (Core Eukaryotic Genes Mapping Approach: http://korflab.ucdavis.edu/datasets/cegma/). The short-read sequences from the fragment library were mapped to the assembled genome using BWA software (http://bio-bwa.sourceforge.net/), and alignment rates, genome coverage, and depth distribution were analyzed to evaluate the completeness and uniformity of the assembly. Additionally, the genome’s quality value (Qv) was determined using the Merqury-mash module (https://github.com/marbl/merqury) to assess the sequence accuracy of the assembled genome.

Code availability

Data processing followed the standard protocols and guidelines of the relevant bioinformatics software, with default parameters applied unless specified otherwise. Details on the software versions and specific parameters are provided in the Methods section.

References

Ye, X. H. et al. A chromosome-level genome assembly of the parasitoid wasp Pteromalus puparum. Mol Ecol Resour. 20, 1384–1402 (2020).
Article PubMed CAS Google Scholar
Zhong, Y. W. et al. A chromosome-level genome assembly of the parasitoid wasp Eretmocerus hayati. Sci Data. 10, 585 (2023).
Article PubMed PubMed Central CAS Google Scholar
Charles, O. A review of management of major arthropod pests affecting cassava production in Sub-Saharan Africa. Crop Prot. 175, 1–15 (2024).
Google Scholar
Katono, K. et al. Effect of Bemisia tabaci SSA1 host density and cassava genotype on host feeding capacity and parasitism by two Hymenoptera parasitoid species. Biocontrol Sci Technol. 33, 19–34 (2022).
Article Google Scholar
Caspary, R. et al. Cutting Dipping Application of Flupyradifurone against Cassava Whiteflies Bemisia tabaci and Impact on Its Parasitism in Cassava. Insects. 14, 796 (2023).
Article PubMed PubMed Central Google Scholar
Walter, G. H. Divergent male ontogenies in Aphelinidae (Hymenoptera, Chalcidoidea): A simplified classification and a suggested evolutionary sequence. Biol. J. Linn. Soc. Lond. 19, 63–82 (1983).
Article Google Scholar
Mills, N. J. et al. Prospective modelling in biological control: An analysis of the dynamics of heteronomous hyperparasitism in a cotton-whitefly-parasitoid system. J Appl Ecol. 33, 1379–1394 (1996).
Article Google Scholar
Williams, T. Invasion and displacement of experimental populations of a conventional parasitoid by a heteronomous hyperparasitoid. Biocontrol Sci Technol. 6, 603–618 (1996).
Article Google Scholar
Hunter, M. S. et al. Evolution and behavioral ecology of heteronomous aphelinid parasitoids. Annu. Rev. Entomol. 46, 251–290 (2001).
Article PubMed CAS Google Scholar
Yang, N. W. et al. Shifting preference between oviposition vs. host-feeding under changing host densities in two aphelinid parasitoids. PLoS One. 7, e41189 (2012).
Article ADS PubMed PubMed Central CAS Google Scholar
Xu, H. Y. et al. Competitive interactions between parasitoids provide new insight into host suppression. PLoS One. 8, e82003 (2013).
Article ADS PubMed PubMed Central Google Scholar
Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27, 764 (2011).
Article PubMed PubMed Central Google Scholar
Cheng, H. et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 18, 1–6 (2021).
Article Google Scholar
Belaghzal, H. et al. Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. Methods 123, 56–65 (2017).
Article PubMed PubMed Central CAS Google Scholar
Zhang, X. et al. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants. 5, 833–845 (2019).
Article PubMed CAS Google Scholar
Manni, M. et al. BUSCO Update: Novel and streamlined workfows along with broader and deeper phylogenetic coverage for scoring of Eukaryotic, Prokaryotic, and Viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Article PubMed PubMed Central CAS Google Scholar
Genis, P. et al. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007).
Article Google Scholar
Langmead, B. et al. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9, 357–359 (2012).
Article PubMed PubMed Central CAS Google Scholar
Li, H. A. statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27, 2987–2993 (2011).
Article PubMed PubMed Central CAS Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Article PubMed PubMed Central CAS Google Scholar
Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Article PubMed PubMed Central CAS Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article PubMed PubMed Central CAS Google Scholar
Bao, W. et al. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Tarailo-Graovac, M. et al. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics 4, 10 (2009).
Google Scholar
Xu, Z. et al. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, 265–268 (2007).
Article Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA 117, 9451–9457 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Mei, Y. et al. InsectBase 2.0: a comprehensive gene resource for insects. Nucleic Acids Res. 50, D1040–D1045 (2022).
Article ADS PubMed CAS Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–439 (2006).
Article PubMed PubMed Central CAS Google Scholar
Majoros, W. H. et al. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article PubMed CAS Google Scholar
Korf, I. Gene fnding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Article PubMed PubMed Central Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Article PubMed PubMed Central Google Scholar
Birney, E. et al. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).
Article PubMed PubMed Central CAS Google Scholar
Cock, P. J. A. et al. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2010).
Article ADS PubMed CAS Google Scholar
Bolger, A. M. et al. Trimmomatic: a fexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article PubMed PubMed Central CAS Google Scholar
Kim, D. et al. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 12, 357–360 (2015).
Article PubMed PubMed Central CAS Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article PubMed PubMed Central CAS Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
Article PubMed PubMed Central Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classifcation. Bioinformatics 30, 1236–1240 (2014).
Article PubMed PubMed Central CAS Google Scholar
Bairoch, A. et al. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Article PubMed PubMed Central CAS Google Scholar
Kanehisa, M. et al. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article PubMed PubMed Central CAS Google Scholar
Man, X. Genome assembly and annotation files of Encarsia sophia. figshare https://doi.org/10.6084/m9.figshare.26426752 (2024).
Chan, P. P. et al. tRNAscan-SE 2.0: improved detection and functional classifcation of transfer RNA genes. Nucleic Acids. Res. 49, 9077–9096 (2021).
Article PubMed PubMed Central CAS Google Scholar
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids. Res. 49, D192–D200 (2021).
Article PubMed CAS Google Scholar
Nawrocki, E. P. et al. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Article PubMed PubMed Central CAS Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29702816 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29702817 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29702818 (2024).
CNCB Genome Sequence Archive https://bigd.big.ac.cn/gsa/browse/CRA017569 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29702811 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29702815 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29702814 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29702813 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29702810 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29702809 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29702812 (2024).
Man, X. Encarsia sophia isolate IPP_NW_YANG_2024, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JBFBOU000000000.1 (2024).

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2023YFC2605200), the Tianshan Talent Program for Outstanding Young Scientific and Technological Talents (2022TSYCCX0084) and National Natural Science Foundation of China (32072493).

Author information

Authors and Affiliations

State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
Xiaoming Man, Cong Huang, Shengyong Wu, Jianyang Guo, Fanghao Wan, Nianwan Yang & Wanxue Liu
University of Liege, Gembloux Agro-Bio Tech, Functional & Evolutionary Entomology, B-5030, Gembloux, Belgium
Xiaoming Man & Frédéric Francis
Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji, 831100, China
Nianwan Yang

Authors

Xiaoming Man
View author publications
Search author on:PubMed Google Scholar
Cong Huang
View author publications
Search author on:PubMed Google Scholar
Shengyong Wu
View author publications
Search author on:PubMed Google Scholar
Jianyang Guo
View author publications
Search author on:PubMed Google Scholar
Fanghao Wan
View author publications
Search author on:PubMed Google Scholar
Frédéric Francis
View author publications
Search author on:PubMed Google Scholar
Nianwan Yang
View author publications
Search author on:PubMed Google Scholar
Wanxue Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

N.W.Y. and W.X.L. conceived and designed the study; X.M.M. and S.H.W. collected the samples for sequencing; X.M.M. and C.H. performed the data analysis; X.M.M. wrote the draft manuscript; X.M.M., C.H., J.Y.G., F.H.W., W.X.L., N.W.Y. and F.F. discussed the results and improved and revised the manuscript.

Corresponding authors

Correspondence to Nianwan Yang or Wanxue Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Man, X., Huang, C., Wu, S. et al. A chromosome-level genome assembly of the heteronomous hyperparasitoid wasp Encarsia sophia. Sci Data 11, 1250 (2024). https://doi.org/10.1038/s41597-024-04040-2

Download citation

Received: 07 August 2024
Accepted: 23 October 2024
Published: 19 November 2024
Version of record: 19 November 2024
DOI: https://doi.org/10.1038/s41597-024-04040-2