Background & Summary

The Hemerobiidae (brown lacewing) is the third largest family of the order Neuropter, which plays a crucial role in natural ecosystems and agricultural settings due to its significant predatory nature, targeting various agricultural and forestry crop pests1. Globally, approximately 600 species across 28 genera and 11 subfamilies have been identified, exhibiting a wide distribution across major biogeographical regions2,3. Members of this family, including both adults and larvae of brown lacewings are predatory and effectively predator on aphids, scale insects, whiteflies, and other small soft-bodied pests and mites. Their predatory efficiency, widespread distribution, and ability to thrive in diverse environmental conditions make them essential biological control agents in integrated pest management (IPM) systems. By naturally suppressing populations of major agricultural pests, brown lacewing species contribute to reducing the need for chemical insecticides, promoting sustainable farming practices and safeguarding biodiversity4. Notably, brown lacewing possehas a long lifespan, high reproductive capacity, and substantial predatory efficiency, rendering themit highly valuable and promising for biological control5. Currently, various brown lacewing species, which are regarded as reliable control agents, are extensively utilized for the biological control of Pseudococcus citri Risso6, Aphis gossypii Glover7, and Ceratovacuna lanigera Zehntner8,9,10. The brown lacewing Micromus angulatus (Stephens), which belongs to the subfamily Microminae of the family Hemerobiidae, is an important predatory natural enemy of various pests and mites (Fig. 1). This species is widely distributed across world, exhibiting a strong capacity to adapt to various climatic conditions and habitats2,3. Both the adults and larvae of M. angulatus can prey on Chaetosiphon fragaefolii (Cockerell) on strawberries and Myzus persicae (Sulzer) on bell peppers, exhibiting excellent control efficacy11. By releasing first-instar larvae or female adults, green peach aphids on peppers can be effectively controlled within two weeks12. This makes M. angulatus an important predatory resource for the augmentative release in agricultural and forestry ecosystems.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Predation at different developmental stages of Micromus angulatus. (a) Adult preying on Aphis craccivora. (b) 3rd-instar larva preying on Aphis glycines. (Photos were taken by Qingbin Zhan and Yang Zhao).

Despite its agricultural significance, genomic research on the Hemerobiidae family remains limited, particularly when compared to other Neuropteran families such as Chrysopidae (green lacewings). To date, seven neuroptera genomes, including three families (Chrysopoidea13,14,15,16, Sisyridae17,18, and Myrmeleontidae19), have been reported (as of September 2024 from NCBI), with only five species sequenced via PacBio and at the chromosome level. However, studies of Hemerobiidae have focused mainly on morphology, biology and phylogenetics in recent decades, with few studies on genomic information. This has made it difficult to conduct in-depth investigations of this family. Here, we assembled a high-quality chromosome-level genome of M. angulatus via high-frequency (HiFi) and high-throughput chromosome conformation capture (Hi-C) technology. The high-quality chromosome-level genome assembly of M. angulatus provides a genetic basis for further research on this predatory natural enemy. By providing comprehensive genomic data, this study will not only enrich the genetic resources available for M. angulatus but also inform the development of targeted biological control methods, ultimately contributing to more sustainable pest management practices.

Methods

Sample information

The adults of M. angulatus used in this study were collected from Xianghetun, Gaojiadian Town, Xifeng County, Liaoning Province (42°67′13″N, 124°45′92″E) on May 12, 2023. The rearing conditions were set to a temperature of 25 ± 1 °C, a relative humidity of 65 ± 5%, and a light: dark cycle of 16 hours: 8 hours. Six-generation male adults were selected for sequencing under these laboratory conditions. Detailed observations of molting frequency and morphological characteristics allowed for the identification and differentiation of the 1st-, 2nd-, and 3rd- instar larvae, as well as the adult stage1.

Library construction, quality control and sequencing

Genomic DNA was extracted from a single male individual using the CTAB method, and libraries PacBio HiFi sequencing were constructed using the SMRTbell® Express Template Prep Kit 2.0 (Pacific Biosciences #PN 101-853-100, CA, USA) based on the platform PacBio Sequel IIe. After quality control, the samples were fragmented to 15 kb using a Megaruptor instrument (Diagenode B06010001, Liege, Belgium) and concentrated using AMPure® PB Beads (Pacific Biosciences 100-265-900), and SMRTbell libraries were constructed using a 2.0 kit. This process included the removal of single-stranded overhangs, DNA damage repair, end repair, A-tailing, adapter ligation, and enzyme digestion. Fragment selection was performed using the SageELF (Sage Science ELF000) or BluePippin system (Sage Science BLU0001). The construction of the PacBio HiFi 15 K libraries was conducted by Berry Genomics Corporation, Beijing, China (hereafter referred to as Berry). For second-generation whole-genome sequencing, libraries were constructed from a single male individual using the Agencourt AMPure XP-Medium Kit and sequenced on the DNBSEQ-T7 platform, with insert sizes ranging from 200 to 400 bp. The library construction was completed by Berry. For second-generation standard transcriptome sequencing, RNA was extracted through TRIzolTM Reagent, and libraries were constructed with the VAHTS mRNA-seq v2 Library Prep Kit. The library construction was completed by Berry. Third-generation full-length (ONT) transcriptome sequencing was performed using RNA extracted with the DP441 RNA Prep Pure Plant Plus Kit from a single male individual, and libraries were constructed using the SQK-PCS109 + SQK-PBK004 Kit on the ONT PromethION platform. Library construction was conducted by BenaGen, Wuhan, China. Hi-C library construction using a single male individual, including formaldehyde cross-linking, restriction enzyme digestion (using MboI), end repair, DNA cyclization, and DNA purification, was also performed by Berry. The yield and sequencing depth of the raw sequences are detailed in Table 1.

Table 1 Statistics of the DNA sequence data used for the M. angulatus genome assembly.

Genome assembly

The survey inferred genome characteristics based on k-mer frequency distribution, utilizing the khist.sh tool (part of BBTools v38.9020) to generate k-mer frequency histograms and simultaneously assess genome features. The k-mer size was set to 21. The predicted genome size is approximately 1,229,938,700 bp, with a heterozygosity rate of 1.24% and a repeat content of 89.062%, indicating a highly heterozygous and ultra-highly repetitive genome. High-quality Pacbio HiFi reads were generated by Pacific Biosciences’ (PacBio) SMRT (pbccs v6.4.0) sequencing technology (https://github.com/PacificBiosciences/ccs), and sequences with a base quality score above Q30 were retained. The initial assembly was performed by Hifiasm v0.19.621 with the parameters (‘-l 3’). Low-depth sequences, which are likely to be contaminants or erroneous, were excluded, and only contig sequences with a sequencing depth exceeding 6X (approximately 1/10 of the sequencing depth) were retained in the Hifiasm assembly. Given the high base quality of the assembly (QV value exceeding 60, as shown in subsequent genome assessment with Merqury), conventional polishing steps were unnecessary. BUSCO analyses indicated no significant heterozygosity in the assembly, so conventional heterozygosity removal steps were also omitted. Chromosome anchoring and assembly of the contigs were performed using Hi-C data and the YAHS v1.2.422 pipeline. Firstly, Hi-C data were quality controlled using chromap v0.2.523 (including sequence read alignment, duplicate removal, and Hi-C contact extraction). Next, two rounds of anchoring were performed using YAHS v1.2 with the default parameters. Manual correction of the initial round of assembly was performed via Juicebox v1.11.0824 to address issues such as misjoins, translocations, and inversions, followed by the final round of anchoring. The Hi-C interaction heatmap for Micromus angulatus (Fig. S1) illustrates the chromosomal interaction patterns across the assembled genome. The heatmap shows a clear distinction of eight primary scaffolds, representing the pseudochromosomes, which account for a substantial portion of the genome. Our Hi-C anchoring heatmap analysis indicated that chromosome anchoring quality was moderate, producing only 8 relatively long scaffolds (746,006,264 bp, 57.89%), with approximately 40% of the genome unable to be anchored, likely representing highly repetitive regions (later confirmed by repeat annotation). Hi-C sequencing is less effective in highly repetitive regions and fails to produce strong interaction signals25. Subsequent protein-coding gene annotation revealed that the first 8 longest sequences contained 12,124 genes, accounting for 91.50% (12124/13250) of the total, indicating that the majority of the coding regions were within these 8 sequences, which can be considered pseudochromosomes.

Genome quality assessment

Genome integrity was assessed via Compleasm v0.2.4726, which evaluated BUSCO to perform the completeness assessment with the Insecta_odb10 dataset (n = 1,367 single-copy orthologues), which evaluates the completeness of the genome by predicting the presence of nearly universal single-copy orthologous (USCO) genes. Additionally, second-generation genome and transcriptome (both second- and third-generation) raw sequences were aligned back to the genome assembly to verify the utilization rate of the raw data and the completeness of the assembly. The alignment was performed using Minimap2 v2.2827, and the alignment rate was calculated using SAMtools v1.2028. Potential contamination in the assembly was identified using a BLAST-like search with MMseq 2 v13829 against the NCBI nt and UniVec databases. Single-base quality scores (QV values) of the genome were assessed via Merqury v1.3930.

The detailed metrics of the final genome assembly are presented in Table 2. The genome size was 1.29 Gb, with 909 scaffolds and 994 contigs. The maximum scaffold and contig lengths were 169.16 Mb and 51.53 Mb, respectively. The scaffold N50 and contig N50 lengths were 63.78 Mb and 13.65 Mb, respectively, and the GC content was 30.07%. The 8 pseudochromosomes account for 746.01 Mb, accounting for 57.89% of the genome. The BUSCO evaluation of the genome assembly indicated a completeness of 97.00%, with a duplicate BUSCO proportion of 4.75%, indicating no significant redundancy. The alignment rates for the second-generation survey, second-generation RNA, third-generation RNA, and third-generation HiFi data were 96.47%, 90.82%, 97.58%, and 99.90%, respectively, all exceeding 90%, reflecting high sequencing quality. The genome assembly QV was 65.242, corresponding to an error rate of only 2.99086e-07, indicating excellent base quality.

Table 2 Statistics for M. angulatus genome assembly and gene annotation.

Annotation of repetitive sequences

The species-specific de novo repeat library was constructed by RepeatModeller v2.0.51031 with an additional LTR search pipeline (-LTRStruct). This library is based on the principles of repetitive sequence structure-specific and de novo prediction, and it was merged with the Dfam 3.732 and RepBase-2018102633 databases to form the final repeat sequence reference database. Repetitive sequence identification was performed by RepeatMasker v4.1.534, and the results were compared against the constructed repeat sequence database. The results revealed 3,913,133 repetitive sequences (1,077,857,554 bp), accounting for 83.65% of the genome, which is a typical ultrahigh proportion of a repetitive genome. The six most abundant types of repetitive sequences were Unknown (58.79%), LTRs (9.30%), LINEs (7.65%), DNA (4.71%), rolling circles (0.79%), and SINEs (0.59%). The detailed statistics are provided in Table S1.

Compared with the genome of the green lacewing species Chrysopa pallens13, the M. angulatus genome is nearly 700 Mb larger. This genome expansion is due primarily to the proliferation of transposable elements, particularly the Unclassified, LTR, LINE, and DNA transposon categories. In the comparison of repeat composition between M. angulatus and C. pallens (Fig. 2a), notable differences in the abundance and diversity of transposable elements were observed. We found that M. angulatus has a significantly higher proportion of repetitive elements, particularly DNA transposons, LINEs (Long Interspersed Nuclear Elements), and LTRs (Long Terminal Repeats), across a wide range of evolutionary ages as indicated by the Kimura substitution levels. DNA transposons in M. angulatus contribute a greater percentage of the genome compared to C. pallens, indicating some historical activities. Similarly, LINEs and LTRs show higher representation in M. angulatus, suggesting multiple waves of transposable element expansion. These expansions are likely key factors in the larger genome size of M. angulatus, reflecting the evolutionary processes that have shaped its genome architecture. We further compared the significantly expanded transposon families within the LTR, LINE, and DNA categories (defined here as families differing by more than 1 Mb). In M. angulatus, we observed a significant expansion of various transposable elements (TEs) compared to C. pallens (Fig. 2b). Specifically, DNA transposons, LINEs (Long Interspersed Nuclear Elements), and LTRs (Long Terminal Repeats) were prominently expanded. Among DNA transposons, elements such as TcMar-Tc1, hAT-Ac, and PIF-Spy showed the greatest expansion in M. angulatus, with total sizes reaching several megabases (Mb). For LINEs, elements like R1-LOA and I-Jockey demonstrated notable expansions, indicating an active or historically active role in genome evolution. We found that in LTR elements, Gypsy, Pao, and Copia showed extensive amplification in M. angulatus, with sizes exceeding tens of megabases. This substantial expansion of transposable elements, particularly in the Gypsy family of LTRs, suggests that transposable elements played a crucial role in the genome evolution of M. angulatus, contributing to its larger genome size relative to C. pallens. These findings highlight the evolutionary divergence between the two species and the impact of TE proliferation on genome architecture.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Repetitive sequences of Micromus angulatus. (a) Comparison of repeat composition between M. angulatus and C. pallens. (b) Transposon families significantly expanded in M. angulatus compared to C. pallens, including DNA transposons, LINE (Long Interspersed Nuclear Elements), and LTR (Long Terminal Repeats). (c) Interspersed repeat landscape based on K2P divergence (CpG adjusted). (d) Interspersed repeat landscape based on K2P divergence (CpG adjusted) with unclassified repeats excluded.

To further investigate the origins of the transposon expansions, we utilized the script calcDivergenceFromAlign.pl from RepeatMasker-4.1.5 to evaluate the TE landscape by calculating the Kimura 2-parameter divergence for each TE family, which essentially measures the degree of differentiation or genetic distance. As shown in Fig. 2c, there were three distinct waves of transposon expansion in the M. angulatus genome (three peaks). Assuming that the first expansion (the rightmost peak, approximately position 32) occurred around 115 million years ago, which coincides with the appearance of fossil records of family Hemerobiidae35, the second expansion (approximately position 18) could be calculated as 18/32 * 115 = 64.5 million years ago, roughly corresponding to the era of dinosaur extinction. The third expansion (around positions 6/7) began approximately 20 million years ago and is still ongoing. Excluding Unclassified repeats, as illustrated in Fig. 2d, only the leftmost peak (expansion) remains prominent.

Annotation of noncoding RNAs

The annotation of ncRNAs was performed by two strategies: (1) annotating rRNA, snRNA, and miRNA via alignment with known noncoding RNA libraries from the Rfam database using Infernal v1.1.536 software and (2) predicting tRNA sequences within the genome using tRNAscan-SE v2.0.1237 software and filtering low-confidence tRNAs via the built-in ‘EukHighConfidenceFilter’ script. The genome annotation of noncoding RNAs revealed 5,481 ncRNAs, including 4,609 rRNAs, 72 miRNAs, 334 snRNAs, 1 sRNA, 786 tRNAs, 3 ribozymes, 3 lncRNAs, and 1 leader RNA. The snRNAs included 291 spliceosomal RNAs (U1, U2, U4, U5, U6, and U11), 11 minor spliceosomal RNAs (U4atac, U6atac, and U12), 26 C/D box snoRNAs, and 5 HACA-box snoRNAs. The detailed statistics are presented in Table S2.

Identification of protein-coding gene structures

The identification of protein-coding gene structures was performed via the MAKER v3.01.0438 pipeline, which integrates three types of evidence to predict the structures of protein-coding genes.

  1. (1)

    Ab initio gene prediction. Methods: BRAKER v3.0.639 and GeMoMa v1.940 were used to integrate transcriptome and protein evidence. The results from both methods were merged as input files (ab.gff3) for MAKER ab initio to broaden the range of potential coding gene candidates. The transcriptome data were aligned to the genome via HISAT2 v2.2.141 to generate BAM alignment files. BRAKER automatically trained Augustus v3.4.042 and GeneMark-ETP43, integrating arthropod protein sequences (extracted from the OrthoDB11 database44) and transcriptome data to increase prediction accuracy. GeMoMa predicted genes on the basis of protein homology and intron position information via high-quality genome assemblies, protein sequences, and annotated GFF files from closely related species: Drosophila melanogaster (Diptera) (GCA_000001215.445), Bombyx mori (Lepidoptera) (GCA_030269925.246), Apis mellifera (Hymenoptera) (GCA_019321825.147), Chrysoperla carnea (Neuroptera) (GCA_905475395.114), and Tribolium castaneum (Coleoptera) (GCA_000002335.348). GeMoMa uses the parameters GeMoMa.c = 0.4 and GeMoMa.m = 100000 for gene prediction on the basis of homologous information from these species.

  2. (2)

    Transcript alignment for gene structure prediction. Methods: StringTie v2.2.149 was used for reference-guided assembly in mixed mode (–mix) on the basis of second- and third-generation transcriptome data. The BAM alignment files for the second-generation sequences were generated by HISAT2 from step (1), and the third-generation sequences were aligned to the genome via Minimap2 (-ax splice).

  3. (3)

    Homology-based prediction. Methods: Homology prediction was conducted by aligning protein sequences (as used in GeMoMa analysis) to the genome and the Wilcoxon tests was performed by rstatix (https://github.com/kassambara/rstatix). The MAKER pipeline predicted 13,250 protein-coding genes (Table 3), with an average gene length of 18,226.6 bp. Each gene contained an average of 5.3 exons with an average exon length of 432.7 bp and 4.3 introns with an average intron length of 3,933.5 bp. Each gene had an average of 5.0 CDSs, with an average CDS length of 330.6 bp. The BUSCO completeness assessment of the predicted protein-coding genes yielded 97.6% C [S:77.7%, D:19.9%], 0.2% F, 2.2% M, and 1367% n, surpassing the genome completeness of 97.00%, indicating high-quality protein-coding gene annotation. We further compared the metrics of M. angulatus protein-coding genes with those of C. pallens.

    Table 3 Summary of protein-coding genes of Micromus angulatus.

As shown in Table 4 and Fig. 3, the gene and intron lengths in M. angulatus are significantly longer than those in C. pallens, contributing to overall genome expansion. In the comparison of protein-coding genes between M. angulatus and C. pallens (Fig. 3), significant differences were observed across various genomic features. The gene length in M. angulatus was significantly longer than in C. pallens, with average gene lengths being approximately twice as large. Similarly, exon lengths also exhibited a significant difference, with M. angulatus having longer exons on average. We also found that coding sequences (CDS) in M. angulatus were longer than those in C. pallens. A more striking contrast was seen in intron lengths, where M. angulatus displayed significantly longer introns compared to C. pallens. Therefore, our results suggest that the genome of M. angulatus has undergone considerable expansion at the level of both gene structure and intron-exon architecture, which may contribute to its larger genome size and unique functional adaptations compared to C. pallens.

Table 4 Comparison of protein-coding genes between M. angulatus and C. pallens.
Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Comparison of protein-coding genes between M. angulatus and C. pallens. (a) Gene length (bp). (b) Exon length (bp). (c) CDS length (bp). (d) Intron length (bp).

Functional annotation of protein-coding genes

The functional annotation of genes was conducted via three strategies: (1) Comparison with existing databases. The UniProtKB v202305 (including SwissProt and TrEMBL) database was used. Method: Using the high-sensitivity mode of Diamond v2.1.7.16150 (‘–very-sensitive -e 1e-5’), genes were searched against the UniProtKB database to predict their functions. (2) Comparison with Comprehensive Databases The predicted conserved protein sequences and domains, Gene Ontology (GO) terms, and pathways (KEGG, Reactome) were analysed. Methods: InterPro 5.65-97.051 was used to search four databases: Pfam52, SMART53, Superfamily54, and CDD55. Additionally, eggNOG-mapper v2.1.156 was used to search the eggNOG v5.0.257 database.

The results from these two methods were integrated to produce the final functional gene predictions, which are typically presented in a GFF file format. A total of 12,110 genes (91.40%) matched entries in the UniProtKB database. InterPro identified protein domains in 11,349 protein-coding genes. Combined results from InterPro and eggNOG-mapper-annotated GO terms for 10,151 genes and KEGG pathways for 4,706 genes.

The comparison of gene structure between M. angulatus and C. pallens reveals distinct differences in genomic organization. C. pallens has a slightly lower total number of genes (12,840) compared to M. angulatus (13,250), but the mean gene length in C. pallens is significantly shorter at 7,555.9 bp compared to 18,226.6 bp in M. angulatus. Both species show similar numbers of exons per gene, with M. angulatus averaging 5.3 exons per gene and C. pallens averaging 5.7. However, the mean exon length is slightly shorter in C. pallens (410.7 bp) compared to M. angulatus (432.7 bp). Notably, the coding sequence (CDS) length is longer in M. angulatus, with an average of 330.6 bp compared to 293.7 bp in C. pallens. Although M. angulatus exhibits a lower number of introns per gene (4.3) than C. pallens (4.7), the mean intron length in M. angulatus is significantly longer at 3,933.5 bp compared to 1,120.1 bp in C. pallens. These results indicate that M. angulatus possesses genes with longer mean lengths and introns, which contrasts with the more compact gene structure observed in C. pallens, characterized by shorter genes and introns. This difference might reflect variations in genome evolution and functional genomic organization between the two species. The detailed results are provided in Table 4.

Chromosomal collinearity

To investigate the chromosomal evolution of genomes, we conducted a collinearity analysis between M. angulatus and two closely related species within the same order, C. pallens and Chrysoperla carnea, to obtain relevant information. Given the moderate quality of chromosome assembly in M. angulatus, the eight longest sequences were selected for comparison. Protein sequences between the species were aligned using the Diamond software with blastp-like functionality, applying ultra-sensitive, iterative settings (‘–ultra-sensitive–iterate -e 1e-5 -k 5’). Subsequently, collinearity analysis was performed using MCScanX58, with parameters set to ‘-s 4 -e 1e-5’, meaning that at least four genes were required to be considered a collinear block, with the e-value threshold set to 1e-5. As shown in Fig. 4, a high degree of collinearity is observed between the two species C. pallens and C. carnea from the same family, with chromosomes aligning clearly on a one-to-one basis. However, internal rearrangements and inversions within chromosomes were also evident. Key and major chromosomal variations were highlighted in the figure, notably with the X chromosome being highly conserved across all three species, suggesting that M. angulatus 8 may represent the X chromosome. Due to the more distant phylogenetic relationship between M. angulatus and the other two species from a different family, chromosomal homology was relatively low, and no clear one-to-one correspondence was observed.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Chromosomal synteny among M. angulatus, C. pallens and C. carnea.

Data Records

The raw sequencing data and genome assembly of Micromus angulatus have been deposited at the Sequence Read Archive (SRA) and GenBank databases of the National Center for Biotechnology Information (NCBI), under BioProject number PRJNA1101095. The genome assembly and annotation data have been released under GCA_047833675.159 (master record: JBCARC000000000). The Illumina, Pacbio, Hi-C genome sequencing data, RNA-ONT and Illumina RNASeq sequencing data were submitted to the Sequence Read Archive (SRA) database at under accession numbers SRR3189714260, SRR2992413761, SRR2990847962, SRR3189575963 and SRR2984151164.

Technical Validation

The predicted protein-encoding gene sequences were subjected to BUSCO integrity assessment, yielding the following results: C: 97.6%, n: 1,367. The integrity surpassed 97.00% of the genome assessments. This indicates the high quality of the genome assembly produced in this study. The assembled genome spans 1.29 Gb with 8 pseudochromosomes and scaffold N50 of 63.78 Mb which indicating long, continuous sequences. These metrics ensure that the genome assembly is of high quality with minimal gaps and interruptions. MMseq2 was used to check for contamination by aligning the genome against the NCBI nt and UniVec databases. Merqury software assessed single-base accuracy, yielding a QV score of 65.242 (very low error rate of 2.99086e-07). All these results suggest that we got a high-quality assembly of M. angulatus with high integrity, cotinuity and accuracy.