Abstract
Structural variants such as chromosomal rearrangements and gene duplications can play an important role in the adaptation and diversification of organisms. Here, we used comparative genomics to study the functional implications of structural variants across two families of flies. We compared the reference genomes of eight Asilidae species and six Stratiomyidae species, including the black soldier fly (Hermetia illucens), a species with an ability to convert organic waste into biomass and a recently expanded global range. Our results show that the genomes of Stratiomyidae are generally larger than Asilidae and contain a higher proportion of transposable elements, many of which are recently expanded. Gene families showing more gene duplications are enriched for life history related functions such as metabolism in Stratiomyidae which are known to be active decomposers, and longevity in Asilidae which are predators and have generally longer lifespan than Stratiomyidae. Gene families showing more gene duplications that are specific to H. illucens are mostly related to olfactory and immune responses, while across the Stratiomyidae there is enrichment of digestive and metabolic functions such as proteolysis, providing an explanation for the higher decomposing efficiency and adaptive ability of H. illucens compared to other Stratiomyidae species in decomposing environments. Together, our results shed light on the contribution of structural variants to functional adaptation and gene family expansions that have likely played a role in the ecological success of the black soldier fly.
Similar content being viewed by others
Introduction
Structural variations (SVs) are large scale variations in sequence (typically >50 bp) caused by insertions, deletions, inversions, duplications and sequence changes due to transposable elements (TEs) (Alkan et al. 2011; Berdan et al. 2024). As a major component of variation in the genome, SVs can play an important role in adaptation and speciation. For example, several inversions between Drosophila pseudoobscura and D. persimilis are associated with hybrid sterility, maintaining reproductive isolation (Noor et al. 2001; Zhang et al. 2021). In the Colorado potato beetle Leptinotarsa decemlineata, SVs are related to insecticide resistance and facilitate the rapid adaptation of this pest species into new agricultural environments (Cohen et al. 2023). In Lepidoptera, SVs play a role in adaptive wing pattern polymorphism (Joron et al. 2011; Brien et al. 2023) and the dynamics of gene family expansion and contraction are associated with diet breadth in butterflies (Dort et al. 2024).
Among many types of SVs, gene copy number variation (CNV) which includes the gain and loss of gene copies, is an important source of genetic variation that can contribute to adaptation. Duplication and deletion of preexisting genes can generate considerable genetic variation and is an important source of variation in genomes (Katju and Bergthorsson 2013). Gene duplications can lead to multiple outcomes. On one hand, duplications of a single gene can create functional redundancy and may affect gene dosage (Hahn 2009; Magadum et al. 2013; Kuzmin et al. 2022). Duplicated genes can also either become a pseudogene, or be subfunctionalized, whereby both genes adopt parts of their original function (Magadum et al. 2013). Alternatively, redundancy can also help the newly duplicated genes “escape” from purifying selection and evolve new functions, known as neofunctionalization (Hahn 2009; Magadum et al. 2013; Birchler and Yang 2022; Kuzmin et al. 2022).
Here we focus on SVs and their functional roles in the black soldier fly (Hermetia illucens) lineage, a Dipteran species of commercial interest due to its ability to convert organic waste into biomass. Hermetia illucens belongs to the Stratiomyidae family (soldier flies), which consists of over 2700 species found around the world (Woodley and Thompson 2001). Stratiomyidae larvae are often found in water or damp substrates such as decaying organic matter. Despite the similar ecology of many Stratiomyidae species, Hermetia illucens stands out as the only species that has spread around the globe as a human commensal and is associated with industrial uses in waste treatment (Nguyen et al. 2015; Tomberlin and van Huis 2020; Siddiqui et al. 2022).
In this study, we used chromosome-level reference genomes of six Stratiomyidae species and eight Asilidae species to explore the potential correlation between genome structure and their life history differences. During the larval stage, Asilidae are often found in soil or other damp substrates such as rotting organic matter similar to those of Stratiomyidae larvae. However, unlike the Stratiomyidae, Asilidae species are long-lived, with a life span ranging from one to three years, while the Stratiomyidae usually only have short life cycles. Compared to Stratiomyidae adults which usually only feed on plant liquids or do not feed at all, Asilidae adults are predators and feed on other insects. On the phylogeny, Asilidae are one of the families in superfamily Asiloidea, which is sister clade to the superfamily Stratiomyomorpha that contains Stratiomyidae (Wiegmann et al. 2011), making this an interesting phylogenetic and phenotypic comparison.
We address the following questions: (1) How much variation in genome size and gene number is there between Stratiomyidae and Asilidae? (2) What is the pattern of types and proportion of repetitive elements among Stratiomyidae and Asilidae species? (3) What are the dynamics of gene birth and death across the phylogeny? (4) How is gene family expansion related to species-specific life history and functional variation?
Materials and methods
Obtaining and pre-processing reference genomes
All genome assemblies were downloaded from the NCBI database with their RefSeq assembly numbers before 1st March 2025. Annotations in GFF format and peptide sequences of the reference genomes (McCulloch et al. 2023; Thomas et al. 2023; Crowley and Garland et al. 2024; Crowley and Sivell et al. 2024; Crowley and Akinmusola et al. 2024a; Crowley, University of Oxford and Wytham Woods Genome Acquisition Lab et al. 2024; Crowley and Akinmusola et al. 2024b; Mitchell et al. 2024a; Mitchell et al. 2024b; Nash et al. 2024; Sivell, Sivell, Natural History Museum Genome Acquisition Lab, et al. 2024; Sivell and McAlister et al. 2024; Sivell, Sivell, Mitchell, et al. 2024) were obtained from the Darwin Tree of Life Project (https://wellcomeopenresearch.org/treeoflife) through its online data portal (https://www.darwintreeoflife.org/genomes/genome-notes/) except for Hermetia illucens and Drosophila melanogaster whose GFF annotations and protein sequences were downloaded from their NCBI RefSeq FTP archive.
Genome quality assessment
To evaluate the completeness of reference genomes assembled via different pipelines, we used BUSCO 5.8.2 (Simão et al. 2015) to summarize genome completeness and assembly quality before actual analysis. All genomes were compared to the Diptera database diptera_odb10 downloaded from BUSCO website (https://busco-data.ezlab.org/v4/data/lineages/diptera_odb10.2019-11-20.tar.gz). Summary plot was generated using script generate_plot.py implemented in BUSCO based on the output text file of each genome.
All annotation files were filtered with the implemented python script in OrthoFinder software (https://github.com/davidemms/OrthoFinder/blob/master/tools/primary_transcript.py) to keep only the longest transcript of each gene in each genome. All annotations downloaded were checked for the consistency of gene names to remain an unique naming format for each genome for easy identification in downstream analyses.
Repetitive elements identification
We used Earl Grey 5.1.1 (Baril et al. 2024), a pipeline in which RepeatMasker (Tarailo-Graovac and Chen 2009) and RepeatModeler2 (Flynn et al. 2020) is automatically called, to identify repetitive elements on each reference genome. All genomes were used for de novo TE identification using RepeatModeler2 implemented in the pipeline. For each genome, Earl Grey was run with ten iterations of its “BLAST, Extract, Align, Trim” process (https://github.com/jamesdgalbraith/TEstrainer). The de novo TE library of each genome was used for TE annotation using RepeatMasker. LTR_Finder (Xu and Wang 2007; Ou and Jiang 2019) was also used and the output was combined with previous RepeatMasker annotation using RepeatCraft (Wong and Simakov 2019). Final output and TE annotations of each genome were used for Kimura distance calculation using script “divergence_calc.py” implemented in Earl Grey. A summary bar chart was generated using ggplot2 R package (Wickham 2016).
Orthogroups identification, phylogeny construction and genome-wide synteny analysis
To evaluate orthology relationships among coding genes, we used OrthoFinder 2.5.5 (Emms and Kelly 2019) to assign protein coding genes in all selected 14 species into orthogroups. 201,275 genes (95.3% of total) were assigned to 15,964 orthogroups by OrthoFinder. Fifty percent of all genes were in orthogroups with 15 or more genes (G50 was 15) and were contained in the largest 4780 orthogroups (O50 was 4780). There were 6653 orthogroups with all species present and 3328 of these consisted entirely of single-copy genes.
When running OrthoFinder, maximum likelihood trees were inferred using multiple sequence alignments method (argument “-M msa”). Under this setting, the species tree is inferred using a concatenated multiple sequence alignment (MSA) of single-copy genes, which is based on the output of the OrthoFinder run. A species tree was constructed using the STAG method (Emms and Kelly 2018) with default setting. A total of 3328 orthogroups with single-copy genes in all species were used for the phylogeny construction. Species tree was visualized using FigTree 1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).
The results from OrthoFinder were then used as the input for GENESPACE 1.2.3 (Lovell et al. 2022) to construct synteny plots. Two steps of format conversion were performed before running GENESPACE. First, all GFF annotation files were reformatted into bed files using “convert2bed” function (version: 2.4.39) in BEDOPS (Neph et al. 2012). Only gene name, start and end position were kept for each gene in the final bed files. Second, the peptide sequences in fasta format were renamed using corresponding gene names in the bed file for each species. The raw output file folder of OrthoFinder was assigned to GENESPACE using the “rawOrthofinderDir” parameter in GENESPACE R command. The genome of Hermetia illucens was used as reference in the plotting pipeline, and chromosomes of all other genomes were arranged based on the order of H. illucens chromosomes (“1”, “2”, “3”, “4”, “5”, “6”, “X”).
Gene family expansion and contraction analysis
A CAFE5 (Mendes et al. 2021) pipeline was used to calculate the gene family evolutionary rate Lambda (λ). The species tree constructed by OrthoFinder was first converted into an ultrametric tree using r8s (Sanderson 2003). A calibration of divergence time of 165 million years ago (MYA) was set between Drosophila melanogaster and Hermetia illucens based on a previous study (Wiegmann et al. 2011). Input control file for r8s was generated using the implemented python script in CAFE5 (https://github.com/hahnlab/CAFE5/blob/master/docs/tutorial/prep_r8s.py). The output tree file of r8s was used as the input of CAFE5 pipeline to estimate the evolution rate of gene families.
We used two types of models in CAFE5 to find the best fit of the dataset. First, three base models were used to determine the appropriate number of Lambda values: a model where all terminal and non-terminal nodes across the phylogeny share a single Lambda value (number of Lambda: 1), a model where Stratiomyidae, Asilidae and outgroup share their own Lambda value (number of Lambda: 3), and a model in which each node has an unique Lambda value (number of Lambda: 28). Each model setting was run three times, and the second model setting was chosen for the next round of testing based on its highest final likelihood (Supplementary Table 1). In the second round of test run, Gamma models instead of base models were used to allow different gene families to belong to different evolutionary rate categories, where the Gamma value represents the number of categories. The Gamma value (-k) was set as 1, 2, 3, 4 and 5 for each model, and for all models, Stratiomyidae, Asilidae and outgroup share their own Lambda value (number of Lambda: 3). Each model was run three times, and the one with 5 Gamma categories had the highest average likelihood (Supplementary Table 1). The best model setting was then run for another seven times. Five out of the ten runs had high overall failure rate where any gene family has failure rates >20%, and thus were not considered when choosing the best run. The run with the highest likelihood and without high failure rate was chosen as the final result (Replicate 10, Supplementary Table 1). All raw output files in the output directory of CAFE5 were used as input of CafePlotter (https://github.com/moshi4/CafePlotter) for visualization.
Functional enrichment analysis
Duplicated genes were extracted from the OrthoFinder output and used for functional enrichment analysis in the Database for Annotation, Visualization, and Integrated Discovery (DAVID) (Sherman et al. 2022). For Hermetia illucens genes, gene IDs in its annotation were used directly as the input gene list. For orthogroups with duplications on the most recent common ancestor nodes of Stratiomyidae and Asilidae, enrichment analyses were performed using their orthologue gene IDs in the Drosophila melanogaster annotation.
Outputs from the enrichment analysis were downloaded as text files and visualized with jvenn (Bardou et al. 2014) and ggplot2.
Results
Genome size expansion in Stratiomyidae
We first summarized BUSCO scores of all reference genomes to explore consistency between genomes assembled and annotated by different pipelines. After being compared to a total database of 3285 Dipteran BUSCO genes, the dataset of 14 reference genomes reached an average of 96.20% completeness (SD = 0.0102), showing consistent level of assembling quality across multiple pipelines (Supplementary Fig. 1).
On average, Stratiomyidae (mean genome size = 0.721 gigabytes) have larger genomes compared to Asilidae (mean genome size = 0.559 gigabytes) (Table 1, Supplementary Fig. 2). However, Dioctria linearis and D. rufipes from the Asilidae family have the largest genomes of all species in the study. Average gene number of Stratiomyidae (15,972.83) species also exceeds the one of Asilidae (11,714.25) with the exceptions of Stratiomys singularior and Beris chalybata. Interestingly, despite having a large genome, the two Dioctria species do not possess more genes than other Asilidae species in the dataset (Table 1).
Genome synteny analysis shows many chromosomal rearrangements across the phylogeny (Fig. 1, Supplementary Fig. 2). Even between species from the same genus there are a large number of inversion, fission and fusion events (Supplementary Table 2).
Syntenic blocks are divided and aligned based on the order of chromosomes in Hermetia illucens. The species tree was rooted using outgroup Drosophila melanogaster, which is not shown in the tree. Barplots at the bottom of species names represent the proportion of different types of transposable elements on the genome. Simple repeats, microsatellites and repeat RNAs were combined as “Other” in the barplots. Stratiomyidae and Asilidae families were marked with green and orange branch colors in the phylogeny, respectively. The species tree was calibrated based on estimated divergence time between Drosophila melanogaster and Stratiomyidae (MYA) (Wiegmann et al. 2011).
Variation of repetitive elements across the phylogeny
In general, Stratiomyidae have a higher proportion of repetitive elements compared to most Asilidae species except for the two Dioctria spp. (Fig. 1, Supplementary Table 3). The proportion of repetitive elements in the genome shows a moderate negative correlation with BUSCO completeness (Pearson’s r = −0.63, p = 0.015) (Supplementary Fig. 3A), indicating potential effects of assembly quality on the repeat identification process. However, compared to BUSCO completeness, the correlation between genome size and TE proportion is stronger (Pearson’s r = 0.84, p = 0.00018) (Supplementary Fig. 3B). There is considerable variation in types of repetitive elements across the phylogeny. For example, long terminal repeats (LTRs) are common in Asilidae genomes, but not in Stratiomyidae. Among Stratiomyidae, DNA repeats dominated in Stratiomys singularior and Microchrysa polita, but were a relatively low proportion of the repeats in Hermetia illucens, where long interspersed nuclear elements (LINEs) were more common compared to other subclasses (Fig. 1, Supplementary Table 3).
When divided into subclasses, the majority of repetitive elements show a pattern of recent activity, indicated by the peaks of closely related copies near right side on the X-axis (Supplementary Fig. 4). For all species in the dataset, the dominant repetitive elements tend to show lower Kimura 2-Parameter distances (Supplementary Fig. 4), showing recent emergence of those repeats. Although the proportion of TEs are generally larger in Stratiomyidae than in Asilidae (Welch Two Sample t-test, p = 0.01317), the types and divergence time of TEs do not show any strong phylogenetic pattern consistent with very rapid turnover.
Gene duplication events and gene family evolution
Using Drosophila melanogaster as an outgroup, 201,275 out of 211,239 genes (95.3%) were assigned into 15,964 orthogroups, among which 1707 were species-specific with orthologues found in only one species, and 6653 had orthologues in all selected species in the dataset (Supplementary Table 4).
Gene duplication events were counted for each of the orthogroups. 32,493 gene duplication events were identified across the phylogenetic tree (Fig. 2, Supplementary Table 5). Similar to the TE proportion, gene duplication events also showed a moderate negative correlation with BUSCO completeness (Pearson’s r = −0.61, p = 0.019) (Supplementary Fig. 3C), while the number of gene duplication events identified is strongly correlated with the total number of genes (Pearson’s r = 0.90, p = 0.0000093) (Supplementary Fig. 3D).
Numbers of duplications on each node are marked on the right side of the node. Only gene duplication events with support higher than 0.5 were considered. The node of the most recent common ancestor of Stratiomyidae and Asilidae were marked as “N_ Stratiomyidae” and “N_Asilidae”. Stratiomyidae and Asilidae families were marked with green and orange branch colors in the phylogeny, respectively. The species tree was calibrated based on estimated divergence time between Drosophila melanogaster and Stratiomyidae (MYA) (Wiegmann et al. 2011).
Most of the duplications were on terminal nodes. Compared to Asilidae, Stratiomyidae have more gene duplication events, particularly on terminal nodes. Stratiomys singularior has the lowest number of terminal duplication events among the Stratiomyidae species.
To study the function of duplicated genes, genes within orthogroups with duplication events showing support values > 0.5 on the most recent common ancestor node of Stratiomyidae (“N_ Stratiomyidae” in Fig. 2) and Asilidae (“N_ Asilidae” in Fig. 2) were extracted and GO and KEGG enrichment analyses were performed on those genes. Although several functional terms showing gene duplications were shared between families, the numbers of overlapping orthogroups was limited. Only 27 orthogroups were shared among the 180 orthogroups that had high support for duplication events in the common ancestor of Asilidae and 338 in Stratiomyidae (Fig. 3A). There was greater overlap between orthogroups with duplications in the common ancestor of Stratiomyidae and those in Hermetia illucens (Fig. 3A). After excluding those orthogroups that are shared, duplicated genes on terminal node of Hermetia illucens showed a different pattern compared to those duplicated in the common ancestor of Stratiomyidae. Aside from proteolysis which still had the highest number of enriched genes, more genes were enriched in immune and antibacterial responses and olfaction (Fig. 3B). In KEGG pathways, the most enriched terms are metabolism-related, while “Toll and Imd signaling pathway”, involved in the immune response, is also enriched (Fig. 3B). These terms are consistent with adaptation to the decomposing-related life history of H. illucens.
A Venn diagram of orthogroups with duplication events. B GO biological process and KEGG enriched functional terms of Hermetia illucens specific duplicated genes. C GO biological process and KEGG enriched functional terms of duplicated genes in the most recent common ancestor node of Stratiomyidae. D GO biological process and KEGG enriched functional terms of duplicated genes in the most recent common ancestor node of Asilidae. The length of the bars represents number of genes that are enriched onto each functional term, and p value was represented by color.
In both Stratiomyidae and Asilidae, the enriched biological process term with the highest gene count is “proteolysis” (GO:0006508, p = 3.43E-19 for Stratiomyidae and p = 1.66E-29 for Asilidae). “Lipid metabolic process” (GO:0006629, p = 1.01E-06 for Stratiomyidae and p = 1.55E-10 for Asilidae) and “lipid catabolic process” (GO:0016042, p = 3.73E-04 for Stratiomyidae and p = 4.18E-07 for Asilidae) are also present in both families. Duplicated genes in the common ancestor of Stratiomyidae are mainly enriched for metabolism related biological processes and pathways (Fig. 3C, Supplementary Table 6), while duplicated genes in the common ancestor of Asilidae are more enriched for functions related to responses to environmental stimulations, longevity regulation and protein refolding (Fig. 3D, Supplementary Table 6).
We used the gene family evolutionary rate Lambda (λ), which is the probability of any gene to be gained or lost in a gene family, as calculated in the CAFE5 pipeline (Mendes et al. 2021), to infer the pattern of expansion and contraction of orthogroups across the phylogeny. Each orthogroup assigned in the previous analysis is considered as a “gene family” and its size was mapped to each node on the species tree. Two types of models were tested in the CAFE5 pipeline: the default base model and the Gamma model where each gene family is allowed to belong to a different evolutionary rate category. After test runs with three independent replicates, the model that best fit the dataset is a 3-category Gamma model where Stratiomyidae, Asilidae and outgroup (Drosophila melanogaster) have their own Lambda values, and each node within the same Stratiomyidae or Asilidae shares the same Lambda (Supplementary Table 1). Under this model, the Lambda value for Stratiomyidae is 0.0052, and the Lambda value for Asilidae is 0.0023 (Fig. 4), suggesting a higher gene family evolutionary rate in Stratiomyidae species.
Number on the left side of each node represents the gene birth-death parameter λ of this node. Numbers with “+” and “-” beside each node represent the total number of expanded (“+”) and contracted (“−”) orthogroups on this node. Stratiomyidae and Asilidae families were marked with green and orange branch colors in the phylogeny, respectively. The species tree was calibrated based on an estimated divergence time between Drosophila melanogaster and Stratiomyidae (MYA) (Wiegmann et al. 2011). Dashed lines are not included as a part of the branch length.
In Hermetia illucens, 70 orthogroups showed significant (p < 0.05) changes from the last common ancestor node (Supplementary Table 7), most of which are expanded but not contracted. These significantly expanded orthogroups include several gene families involved in immune response (OG0000160 and OG0000220) and olfaction (OG0000020 and OG0000025) (Supplementary Table 7, Supplementary Table 8).
In the top twenty expanded orthogroups in Hermetia illucens, only one (OG0000005) has a lower copy number compared to its closest relative Microchrysa polita (Supplementary Table 8). In general, when compared to the most recent common ancestor node of Stratiomyidae, many of these orthogroups expanded in both species, but their expansions in H. illucens are larger than in Microchrysa polita.
Discussion
The black soldier fly Hermetia illucens has become a major commercial insect species used to consume organic waste. In addition to its value in the industry, it also provides an interesting example of rapid range expansion and human-mediated evolution. By comparing high quality chromosome-level reference genomes from the Stratiomyidae and Asilidae families, we provide evidence for the adaptation of H. illucens and its close relatives to their specific life history. We have shown recent activity of TEs, as well as large amounts of gene duplication on terminal nodes, high gene birth-death rate and expansions of metabolism and immune related gene families.
The quality of genome assembly can impact downstream analysis in comparative genomic studies (Florea et al. 2011; Mariene and Wasmuth 2025). In our study, most genomes used in analyses were generated using the same pipeline in Darwin Tree of Life project (https://www.darwintreeoflife.org/) except Hermetia illucens and Drosophila melanogaster (Table 1). However, the annotations of Microchrysa polita, Chorisops tibialis, Beris morrisii and Beris chalybata were generated with no RNA-Seq data (Table 1). There was also slight variation in the BUSCO completeness in the dataset (Table 1). This variation shows moderate negative correlation with both TE proportion and gene duplication events (Supplementary Fig. 3A, C), indicating potential effects of assembly quality on SV detection, such as fragmented gene sections being identified as inserted repetitive elements or duplicated orthologues. However, genome size shows higher Pearson correlation coefficient in the linear regression analysis with TE proportion compared to BUSCO completeness alone (Supplementary Fig. 3A, B). A similar pattern was shown in the correlation of BUSCO completeness and total gene number with gene duplication events (Supplementary Fig. 3C, D). Although some effects of assembly quality are likely, their extent remains difficult to disentangle from other factors in the current dataset.
TEs have been suggested to be an important factor driving insect evolution (Gilbert et al. 2021). Zhou et al. found that genome size expansion is associated with its TE proportion, and the expansion of TEs are enabled by DNA methylation (Zhou et al. 2020). The accumulation of TEs might also appear during geographical range expansion (Jiang et al. 2024). We found varying proportions of TEs ranging from 26.85% of the genome in Leptogaster cylindrica to 75.18% in Chorisops tibialis (Supplementary Table 3), which corresponds to previous findings in Diptera where proportion of TEs largely varies across the phylogeny (Petersen et al. 2019). The TEs in Stratiomyidae and Asilidae not only vary in abundance but also in divergence time. In all species, DNA, LINEs and LTRs show recent activity which can be represented by the peaks of base pair numbers with Kimura 2-Parameter Distance near zero (near right side on the X-axis, Supplementary Fig. 4). This suggests a recent expansion of these TEs in both Stratiomyidae and Asilidae families. In Hermetia illucens, the activity of LINEs is consistent across the X-axis (Supplementary Fig. 4K), suggesting that this TE superfamily is not only present on a recent timescale, but has always been a major component of the TEs in the H. illucens genome through its evolutionary history. However, in Chorisops tibialis (Supplementary Fig. 4L), SINEs also show activity with higher Kimura 2-Parameter Distance, while the same TE superfamily does not exist in Dioctria rufipes (Supplementary Fig. 4H), Stratiomys singularior (Supplementary Fig. 4I), and Beris morrisii (Supplementary Fig. 4M). In general, the types of TEs in Stratiomyidae and Asilidae correspond to phylogenetic relationships, with more similar proportion of TE types are found between more closely related species (Fig. 1), which is consistent with previous findings in the majority of insects (Gilbert et al. 2021). However, the distribution of Kimura 2-Parameter Distance of different TEs types does not show a similar phylogenetic pattern, consistent with the patterns found in Drosophila (Petersen et al. 2019). Despite the limitation of sample size and coverage of the taxa in our study, there was a significant positive linear correlation between TE proportion and genome size in Stratiomyidae and Asilidae species (Supplementary Fig. 3B), indicating a likely contribution of TE expansion to genome size changes in the two families, consistent with previous findings in other organisms (Marburger et al. 2018; Naville et al. 2019; Wong et al. 2019; Zhou et al. 2020; Oggenfuss et al. 2021). The recent expansion of TEs may also contribute to adaptation to new environments and stress responses (Chénais et al. 2012), which are also indicated by the enriched functional terms in Fig. 3.
Domestication represents an excellent system in which to study recent adaptation. Hermetia illucens is a putative example of recent domestication, with populations brought into captivity repeatedly around the world. Changes of food supply often lead to metabolism-related functional shifts in domesticated animals (Gering et al. 2019). In other species, domestication is associated with changes in diet (Luca et al. 2010; Axelsson et al. 2013), immunity (Chen et al. 2017) and digestive system gene functions (Axelsson et al. 2013; Chen et al. 2017; Pajic et al. 2019; Glazko et al. 2021). The individual of Hermetia illucens that was sequenced for the reference genome was from an industrial strain (Generalovic et al. 2021), so we cannot here distinguish between adaptations associated with the species generally and those that have happened since its domestication. It is possible that duplicated genes related to carbohydrate metabolic process that are not shared by other Stratiomyidae species (Fig. 3B) are related to the recent domestication of H. illucens, involving adaptation to a more starch-rich diet, similar to previous findings in other domesticated mammals (Axelsson et al. 2013; Li et al. 2013; Ollivier et al. 2016; Reiter et al. 2016; Lye and Purugganan 2019; Pajic et al. 2019). In addition to carbohydrate, several other metabolic pathways can also be found in the enriched functional terms, such as drug, cysteine, methionine, sucrose and even caffeine metabolism (Fig. 3B) in H. illucens. These enriched pathways could be associated with higher efficiency of H. illucens on certain diets compared to other Stratiomyidae species, and potential influences of the domestication process. Two major functions, olfactory sensory and immune responses, are specifically enriched in H. illucens compared to the most recent common ancestor node of Stratiomyidae (Fig. 3A, Table 2). Two families of olfactory genes, odorant receptor 2a, 33b, and 59a (Or2a, Or33b and Or59a, OG0000020) and general odorant-binding protein 99a (Obp99a, OG0000025), both show double the copy number in H. illucens compared to its close relative Microchrysa polita (Supplementary Table 8). Similar patterns can also be found in the peptidoglycan recognition protein 3 (PGLYRP3, OG0000136), cecropin (CecA1, OG0000160) and gram-negative bacteria-binding protein 3 (GNBP3, OG0000220) (Supplementary Table 8). Similar pattern can also be found in the coding genes of peptidoglycan recognition protein 3 (PGLYRP3, OG0000136), cecropin (CecA1, OG0000160), and gram-negative bacteria-binding protein 3 (GNBP3, OG0000220) (Supplementary Table 8). The expansion of olfactory and immune related genes in H. illucens can potentially explain the reason that it has become so widely used in bioconversion of organic waste (Tomberlin and van Huis 2020) and is often seen as “dominant” species in compost piles. Besides, the enrichment of immune-related functions might also contribute to the symbiosis between H. illucens and its diverse range of gut microbiota (Jiang et al. 2019; Zhan et al. 2020; Eke et al. 2023; Luo et al. 2023; Yu et al. 2023). All these offer clues into how H. illucens has adapted to human influenced diets and a composting environment compared to its Stratiomyidae relatives.
Despite the differences in both adult diets, habitats and life span, the top hit in biological process GO terms is proteolysis (Fig. 3D) for both Stratiomyidae and Asilidae ancestral nodes. This suggests that some specific gene families and their function may have been involved in adaptation to different life history traits. The same pattern can be observed for lipid metabolic process (Fig. 3C, D). However, more enriched functional terms in Asilidae are related to responses to external stimuli, including response to stress, heat and hypoxia (Fig. 3D). One of the top hits in the KEGG pathway category in Asilidae, longevity regulating pathway (Fig. 3D), is not enriched for genes that are duplicated in Stratiomyidae, which may be associated with the longer life span of adults in Asilidae compared to Stratiomyidae.
Among the gene families that had significantly changed in H. illucens, the largest expansion appears in the CYP (cytochrome P450) gene family (Supplementary Table 7). As the only CYP-related one in the top 20 expanded orthogroup in the H. illucens lineage, OG0000004 experienced recurrent expansions from the common ancestor node between Stratiomyidae and Asilidae to the terminal node of H. illucens (Supplementary Fig. 5), with 66 copies in H. illucens compared to 35 in its close relative Microchrysa polita. The CYP family has been studied as a major contributor to insect genome evolution (Feyereisen 2006), involved in important functions such as ecdysteroid metabolism (Feyereisen 1999; Iga and Kataoka 2012), detoxification (Scott et al. 1998; Chandor-Proust et al. 2013; Cui et al. 2016; Lu et al. 2021; Xing et al. 2021) and sex pheromone biosynthesis (Fujii et al. 2020; Wang et al. 2024). Among the 66 expanded OG0000004 copies, 59 of them locate within a single region from 173348395 bp to 174874698 bp on Chromosome 1. Except for LOC119646713 that was not characterized in the annotation from the reference genome, these genes all belong to the CYP3 clade of CYP genes, which have been found to be most abundant among the CYP clades and have experienced recent expansions (Feyereisen 2006). Although the specific functions of CYP gene families in H. illucens are unknown, it is possible that the repeated expansions of these gene families during the speciation process played a role in its adaptation, especially to human-influenced diet and environment where resistance to toxins is essential to survival.
Apart from those gene families that were expanded, several gene families have been completely lost in H. illucens (Supplementary Table 7). One example is the gene family which includes Or83c and its orthologues (OG0000615). In Drosophila melanogaster, this odorant receptor is related to sensitivity to farnesol, a fruit rind volatile present in many ripe citrus rinds (Ronderos et al. 2014). The fact that this gene and its orthologues were lost while other groups of olfactory receptor genes such as Or2a, Or59a, Or33a, Or33b, Or33c, Obp44a, Obp83g, Obp99a, Obp99b and Obp99c are largely expanded in H. illucens suggests potential species-specific diet preference and diversification in olfactory related gene functions.
Conclusions
Structural variants, and gene duplications in particular, are a driving force of functional adaptation. Using a comparative genomics approach, we show evidence of functional adaptation of the soldier flies (Stratiomyidae) and the robber flies (Asilidae) to their habitats and life history. We found a generally larger proportion of TEs in Stratiomyidae compared to Asilidae, and the majority of these are recently expanded. More gene duplication events were found on terminal nodes in Stratiomyidae than in Asilidae, and only 27 duplicated orthogroups are shared between the most recent common ancestors of these two families. We found 120 duplicated orthogroups that are shared between the most recent common ancestor of Stratiomyidae and H. illucens, indicating functional similarity of gene duplications. Certain genes involved in metabolism-related functions in Hermetia illucens are also duplicated in the common ancestor of Stratiomyidae, but more duplicated genes are related to olfactory sensory and immune responses in H. illucens, suggesting functional specialization in gene families that are beneficial to the decomposing life history in this lineage. Together, our results provide insights into the relationship between structural variation, especially gene duplications, and functional adaptation in two diverse families. We also provide directions for future experimental validation on these gene families and their specific role in functional pathways such as immune response, proteolysis and other metabolism.
Data archiving
Not applicable. All raw data used in this study was obtained from public databases.
References
Aken BL, Ayling S, Barrell D, Clarke L, Curwen V, Fairley S, Fernandez Banet J, Billis K, García Girón C, Hourlier T et al. (2016) The Ensembl gene annotation system. Database 2016:baw093.
Alkan C, Coe BP, Eichler EE (2011) Genome structural variation discovery and genotyping. Nat Rev Genet 12:363–376.
Axelsson E, Ratnakumar A, Arendt M-L, Maqbool K, Webster MT, Perloski M, Liberg O, Arnemo JM, Hedhammar Å, Lindblad-Toh K (2013) The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature 495:360–364.
Bardou P, Mariette J, Escudié F, Djemiel C, Klopp C (2014) jvenn: an interactive Venn diagram viewer. BMC Bioinform 15: 293.
Baril T, Galbraith J, Hayward A (2024) Earl grey: a fully automated user-friendly transposable element annotation and analysis pipeline. Mol Biol Evol 41:msae068.
Berdan EL, Aubier TG, Cozzolino S, Faria R, Feder JL, Giménez MD, Joron M, Searle JB, Mérot C (2024) Structural variants and speciation: multiple processes at play. Cold Spring Harb Perspect Biol 16:a041446.
Birchler JA, Yang H (2022) The multiple fates of gene duplications: deletion, hypofunctionalization, subfunctionalization, neofunctionalization, dosage balance constraints, and neutral variation. Plant Cell 34:2466–2474.
Brien MN, Orteu A, Yen EC, Galarza JA, Kirvesoja J, Pakkanen H, Wakamatsu K, Jiggins CD, Mappes J (2023) Colour polymorphism associated with a gene duplication in male wood tiger moths. eLife 12:e80116.
Chandor-Proust A, Bibby J, Régent-Kloeckner M, Roux J, Guittard-Crilat E, Poupardin R, Riaz MA, Paine M, Dauphin-Villemant C, Reynaud S et al. (2013) The central role of mosquito cytochrome P450 CYP6Zs in insecticide detoxification revealed by functional expression and structural modelling. Biochem J 455:75–85.
Chen X, Wang J, Qian L, Gaughan S, Xiang W, Ai T, Fan Z, Wang C (2017) Domestication drive the changes of immune and digestive system of Eurasian perch (Perca fluviatilis). PLOS ONE 12:e0172903.
Chénais B, Caruso A, Hiard S, Casse N (2012) The impact of transposable elements on eukaryotic genomes: from genome size increase to genetic adaptation to stressful environments. Gene 509:7–15.
Cohen ZP, Schoville SD, Hawthorne DJ (2023) The role of structural variants in pest adaptation and genome evolution of the Colorado potato beetle, Leptinotarsa decemlineata (Say). Molecular Ecol 32:1425–1440.
Crowley L, Akinmusola R, University of Oxford and Wytham Woods Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium (2024a) The genome sequence of the common awl robberfly, Neoitamus cyanurus (Loew, 1849) [version 2; peer review: 3 approved]. Wellcome Open Res 9.
Crowley L, Akinmusola R, University of Oxford and Wytham Woods Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium (2024b) The genome sequence of the yellow-legged black legionnaire, Beris chalybata (Forster, 1771) [version 2; peer review: 2 approved]. Wellcome Open Res 9.
Crowley L, Garland S, Akinmusola R, University of Oxford and Wytham Woods Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium (2024) The genome sequence of the small yellow-legged robberfly, Dioctria linearis (Fabricius, 1787) [version 1; peer review: 2 approved with reservations]. Wellcome Open Res 9.
Crowley L, Sivell O, Sivell D, Mitchell R, Newell R, University of Oxford and Wytham Woods Genome Acquisition Lab, Natural History Museum Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium (2024) The genome sequence of the Red-legged Robberfly, Dioctria rufipes (Scopoli, 1763) [version 1; peer review: 1 approved]. Wellcome Open Res 9.
Crowley L, University of Oxford and Wytham Woods Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium (2024) The genome sequence of the Black-horned Gem soldier fly, Microchrysa polita (Linnaeus, 1758) [version 1; peer review: 1 approved]. Wellcome Open Res 9.
Cui S, Wang L, Ma L, Geng X (2016) P450-mediated detoxification of botanicals in insects. Phytoparasitica 44:585–599.
Dort H, van der Bijl W, Wahlberg N, Nylin S, Wheat CW (2024) Genome-wide gene birth–death dynamics are associated with diet breadth variation in lepidoptera. Genome Biol Evol 16:evae095.
Eke M, Tougeron K, Hamidovic A, Tinkeu LSN, Hance T, Renoz F (2023) Deciphering the functional diversity of the gut microbiota of the black soldier fly (Hermetia illucens): recent advances and future challenges. Anim Microbiome 5:40.
Emms DM, Kelly S (2018) STAG: species tree inference from all genes. Preprint at bioRxiv: https://doi.org/10.1101/267914.
Emms DM, Kelly S (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20: 238.
Feyereisen R (1999) Insect P450 enzymes. Annu Rev Entomol 44:507–533.
Feyereisen R (2006) Evolution of insect P450. Biochem Soc Trans 34:1252–1255.
Florea L, Souvorov A, Kalbfleisch TS, Salzberg SL (2011) Genome assembly has a major impact on gene content: a comparison of annotation in two Bos Taurus Assemblies. PLOS ONE 6:e21400.
Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF (2020) RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA 117:9451–9457.
Fujii T, Rong Y, Ishikawa Y (2020) Epoxidases Involved in the biosynthesis of type II sex pheromones. In: Ishikawa Y (ed). Insect sex pheromone research and beyond: from molecules to robots. Springer, Singapore. p 169–181.
Generalovic TN, McCarthy SA, Warren IA, Wood JMD, Torrance J, Sims Y, Quail M, Howe K, Pipan M, Durbin R, et al. 2021. A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.). G3: Genes, Genomes, Genetics 11:jkab085.
Gering E, Incorvaia D, Henriksen R, Wright D, Getty T (2019) Maladaptation in feral and domesticated animals. Evolutionary Appl 12:1274–1286.
Gilbert C, Peccoud J, Cordaux R (2021) Transposable elements and the evolution of insects. Annual Rev Entomol 66:355–372.
Glazko VI, Zybaylov BL, Kosovsky YG, Glazko GV, Glazko TT (2021) Domestication and microbiome. Holocene 31:1635–1645.
Hahn MW (2009) Distinguishing among evolutionary models for the maintenance of gene duplicates. J Heredity 100:605–617.
Iga M, Kataoka H (2012) Recent studies on insect hormone metabolic pathways mediated by cytochrome P450 enzymes. Biol Pharm Bull 35:838–843.
Jiang C-L, Jin W-Z, Tao X-H, Zhang Q, Zhu J, Feng S-Y, Xu X-H, Li H-Y, Wang Z-H, Zhang Z-J (2019) Black soldier fly larvae (Hermetia illucens) strengthen the metabolic function of food waste biodegradation by gut microbiome. Microb Biotechnol 12:528–543.
Jiang J, Xu Y-C, Zhang Z-Q, Chen J-F, Niu X-M, Hou X-H, Li X-T, Wang L, Zhang YE, Ge S et al. (2024) Forces driving transposable element load variation during Arabidopsis range expansion. Plant Cell 36:840–862.
Joron M, Frezal L, Jones RT, Chamberlain NL, Lee SF, Haag CR, Whibley A, Becuwe M, Baxter SW, Ferguson L et al. (2011) Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature 477:203–206.
Katju V, Bergthorsson U (2013) Copy-number changes in evolution: rates, fitness effects and adaptive significance. Front Genet 4:273.
Kuzmin E, Taylor JS, Boone C (2022) Retention of duplicated genes in evolution. Trends Genet 38:59–72.
Li M, Tian S, Jin L, Zhou G, Li Y, Zhang Y, Wang T, Yeung CKL, Chen L, Ma J et al. (2013) Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars. Nat Genet 45:1431–1438.
Lovell JT, Sreedasyam A, Schranz ME, Wilson M, Carlson JW, Harkess A, Emms D, Goodstein DM, Schmutz J (2022) GENESPACE tracks regions of interest and gene copy number variation across multiple genomes. eLife 11:e78526.
Lu K, Song Y, Zeng R (2021) The role of cytochrome P450-mediated detoxification in insect adaptation to xenobiotics. Curr Opin Insect Sci 43:103–107.
Luca F, Perry GH, Rienzo AD (2010) Evolutionary adaptations to dietary changes. Annu Rev Nutr 30:291–314.
Luo X, Fang G, Chen K, Song Y, Lu T, Tomberlin JK, Zhan S, Huang Y (2023) A gut commensal bacterium promotes black soldier fly larval growth and development partly via modulation of intestinal protein metabolism. mBio 14:e01174-23.
Lye ZN, Purugganan MD (2019) Copy Number Variation In Domestication. Trends Plant Sci 24:352–365.
Magadum A, Banerjee U, Murugan P, Gangapur D, Ravikesavan R (2013) Gene duplication as a major force in evolution. J Genet 92:155–161.
Marburger S, Alexandrou MA, Taggart JB, Creer S, Carvalho G, Oliveira C, Taylor MI (2018) Whole genome duplication and transposable element proliferation drive genome expansion in Corydoradinae catfishes. Proc R Soc B: Biol Sci 285:20172732.
Mariene GM, Wasmuth JD (2025) Genome assembly variation and its implications for gene discovery in nematodes. Int J Parasitol 55:239–252.
McCulloch J, University of Oxford and Wytham Woods Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life programme, Wellcome Sanger Institute Scientific Operations: DNA Pipelines collective, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium. 2023. The genome sequence of the yellow-legged black legionnaire, Beris morrisii (Dale, 1841) [version 1; peer review: 2 approved]. Wellcome Open Research 8.
Mendes FK, Vanderpool D, Fulton B, Hahn MW (2021) CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics 36:5516–5518.
Mitchell R, Natural History Museum Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium (2024a) The genome sequence of the Brown Heath Robberfly, Tolmerus cingulatus (Fabricius, 1781) [version 1; peer review: 2 approved]. Wellcome Open Research 9.
Mitchell R, Natural History Museum Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium (2024b) The genome sequence of the Downland Robberfly, Machimus rusticus (Meigen, 1820) [version 1; peer review: 1 approved with reservations]. Wellcome Open Research 9.
Nash W, Halstead A, Natural History Museum Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium (2024) The genome sequence of the Golden-tabbed robberfly, Eutolmus rufibarbis (Meigen, 1820) [version 1; peer review: 2 approved]. Wellcome Open Research 9.
Naville M, Henriet S, Warren I, Sumic S, Reeve M, Volff J-N, Chourrout D (2019) Massive changes of genome size driven by expansions of non-autonomous transposable elements. Curr Biol 29:1161–1168.e6.
Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, Rynes E, Maurano MT, Vierstra J, Thomas S et al. (2012) BEDOPS: high-performance genomic feature operations. Bioinformatics 28:1919–1920.
Nguyen TTX, Tomberlin JK, Vanlaerhoven S (2015) Ability of Black Soldier Fly (Diptera: Stratiomyidae) Larvae to Recycle Food Waste. Environ Entomol 44:406–410.
Noor MAF, Grams KL, Bertucci LA, Reiland J (2001) Chromosomal inversions and the reproductive isolation of species. Proc Natl Acad Sci USA 98:12084–12088.
Oggenfuss U, Badet T, Wicker T, Hartmann FE, Singh NK, Abraham L, Karisto P, Vonlanthen T, Mundt C, McDonald BA et al. (2021) A population-level invasion by transposable elements triggers genome expansion in a fungal pathogen. eLife 10:e69249.
Ollivier M, Tresset A, Bastian F, Lagoutte L, Axelsson E, Arendt M-L, Bălăşescu A, Marshour M, Sablin MV, Salanova L et al. (2016) Amy2B copy number variation reveals starch diet adaptations in ancient European dogs. Royal Soc Open Sci 3:160449.
Ou S, Jiang N (2019) LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mobile DNA 10: 48.
Pajic P, Pavlidis P, Dean K, Neznanova L, Romano R-A, Garneau D, Daugherity E, Globig A, Ruhl S, Gokcumen O (2019) Independent amylase gene copy number bursts correlate with dietary preferences in mammals. eLife 8:e44628.
Petersen M, Armisén D, Gibbs RA, Hering L, Khila A, Mayer G, Richards S, Niehuis O, Misof B (2019) Diversity and evolution of the transposable element repertoire in arthropods with particular reference to insects. BMC Ecol Evol 19:11.
Reiter T, Jagoda E, Capellini TD (2016) Dietary variation and evolution of gene copy number among dog breeds. PLOS ONE 11:e0148899.
Ronderos DS, Lin C-C, Potter CJ, Smith DP (2014) Farnesol-detecting olfactory neurons in Drosophila. J Neurosci 34:3959–3968.
Sanderson MJ (2003) r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19:301–302.
Scott JG, Liu N, Wen Z (1998) Insect cytochromes P450: diversity, insecticide resistance and tolerance to plant toxins1. Comparative Biochem Physiol Part C: Pharmacol Toxicol Endocrinol 121:147–155.
Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, Imamichi T, Chang W (2022) DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res 50:W216–W221.
Siddiqui SA, Ristow B, Rahayu T, Putra NS, Widya Yuwono N, Nisa’ K, Mategeko B, Smetana S, Saki M, Nawaz A et al. (2022) Black soldier fly larvae (BSFL) and their affinity for organic waste processing. Waste Manag 140:1–13.
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212.
Sivell O, McAlister E, Mitchell R, Falk S, Natural History Museum Genome Acquisition Lab, University of Oxford and Wytham Woods Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, et al. (2024) The genome sequence of a soldierfly, Chorisops tibialis (Meigen, 1820) [version 1; peer review: 1 approved, 1 approved with reservations]. Wellcome Open Research 9.
Sivell O, Sivell D, Mitchell R, Natural History Museum Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium. (2024) The genome sequence of a soliderfly, Stratiomys singularior (Harris, 1776) [version 1; peer review: 3 approved]. Wellcome Open Research 9.
Sivell O, Sivell D, Natural History Museum Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team, Wellcome Sanger Institute Scientific Operations: Sequencing Operations, Wellcome Sanger Institute Tree of Life Core Informatics team, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium. (2024) The genome sequence of a robberfly Leptogaster cylindrica (De Geer, 1776) [version 1; peer review: 2 approved]. Wellcome Open Res 9.
Tarailo-Graovac M, Chen N (2009) Using repeatmasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinform 25:4.10.1–4.10.14.
Thomas S, Mitchell R, Crowley L, Natural History Museum Genome Acquisition Lab, University of Oxford and Wytham Woods Genome Acquisition Lab, Darwin Tree of Life Barcoding collective, Wellcome Sanger Institute Tree of Life programme, Wellcome Sanger Institute Scientific Operations: DNA Pipelines collective, Tree of Life Core Informatics collective, Darwin Tree of Life Consortium. (2023) The genome sequence of the Kite-tailed Robberfly, Machimus atricapillus (Falle?n, 1814) [version 1; peer review: 4 approved]. Wellcome Open Res 8.
Tomberlin JK, van Huis A (2020) Black soldier fly from pest to “crown jewel” of the insects as feed industry: An historical perspective. J Insects Food Feed 6:1–4.
Wang T, Liu X, Luo Z, Cai X, Li Z, Bian L, Xiu C, Chen Z, Li Q, Fu N (2024) Transcriptome-wide identification of cytochrome P450s in Tea Black Tussock Moth (Dasychira baibarana) and candidate genes involved in type-II Sex Pheromone Biosynthesis. Insects 15:139.
Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer, 189–201.
Wiegmann BM, Trautwein MD, Winkler IS, Barr NB, Kim J-W, Lambkin C, Bertone MA, Cassel BK, Bayless KM, Heimberg AM et al. (2011) Episodic radiations in the fly tree of life. Proc Natl Acad Sci USA 108:5690–5695.
Wong WY, Simakov O (2019) RepeatCraft: a meta-pipeline for repetitive element de-fragmentation and annotation. Bioinformatics 35:1051–1052.
Wong WY, Simakov O, Bridge DM, Cartwright P, Bellantuono AJ, Kuhn A, Holstein TW, David CN, Steele RE, Martínez DE (2019) Expansion of a single transposable element family is associated with genome-size increase and radiation in the genus Hydra. Proc Natl Acad Sci USA 116:22915–22917.
Woodley NE, Thompson FC (2001) A world catalog of the Stratiomyidae (Insecta: Diptera). North American Dipterists’ Society 11:1–476.
Xing X, Yan M, Pang H, Wu F, Wang J, Sheng S (2021) Cytochrome P450s are essential for insecticide tolerance in the endoparasitoid wasp meteorus pulchricornis (Hymenoptera: Braconidae). Insects 12:651.
Xu Z, Wang H (2007) LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35:W265–W268.
Yu Y, Zhang Jia, Zhu F, Fan M, Zheng J, Cai M, Zheng L, Huang F, Yu Z, Zhang Jibin. (2023) Enhanced protein degradation by black soldier fly larvae (Hermetia illucens L.) and its gut microbes. Front Microbiol 13:1095025.
Zhan S, Fang G, Cai M, Kou Z, Xu Jun, Cao Y, Bai L, Zhang Y, Jiang Y, Luo X et al. (2020) Genomic landscape and genetic manipulation of the black soldier fly Hermetia illucens, a natural waste recycler. Cell Res 30:50–60.
Zhang L, Reifová R, Halenková Z, Gompert Z (2021) How important are structural variants for speciation?. Genes 12:1084.
Zhou W, Liang G, Molloy PL, Jones PA (2020) DNA methylation enables transposable element-driven genome expansion. Proc Natl Acad Sci USA 117:19359–19366.
Acknowledgements
We thank Darwin Tree of Life project (https://www.darwintreeoflife.org/) and Wellcome Sanger Institute for the public reference genome resources and the guidance on data portal usage. We thank René Feyereisen and the anonymous reviewers for their comments and suggestions. WZ acknowledges support of the Cambridge Trust and China Scholarship Council (202206380035).
Author information
Authors and Affiliations
Contributions
C.D.J., D.K., and W.Z. conceived and designed the study. W.Z. carried out analyses and visualization. The paper was written by W.Z. and C.D.J., edited and approved by all authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Associate editor: Omer Gokcumen.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhou, W., Kunz, D. & Jiggins, C.D. Comparative analysis of gene family evolution demonstrates expansion of digestive, immunity and olfactory functions in the black soldier fly (Hermetia illucens) lineage. Heredity 135, 187–198 (2026). https://doi.org/10.1038/s41437-025-00805-6
Received:
Revised:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41437-025-00805-6






