Introduction

The Lygaeoidea is a large superfamily within Pentatomomorpha (Hemiptera: Heteroptera), comprising over 4,700 described species, which included approximately 700 genera worldwide1,2. While most Lygaeoidea species are phytophagous, a few are known to be predatory3,4,5,6. Despite multiple methods supporting the phylogenetic analysis of Lygaeoidea and its relationship with other Pentatomomorpha superfamilies, the classification of families within Lygaeoidea has been debated and remains controversial. Schuh and Slater identified five families within Lygaeoidea based on previous studies7. However, the classification has been further scrutinized in recent years8,9,10,11,12. Notably, Henry elevated several subfamilies to family status based on 57 morphological characters13. Currently, 15 families are recognized within Lygaeoidea: Artheneidae, Colobathristidae, Cryptorhamphidae, Heterogastridae, Malcidae, Piesmatidae, Berytidae, Blissidae, Cymidae, Geocoridae, Lygaeidae, Ninidae, Oxycarenidae, Pachygronthidae, and Rhyparochromidae13. Additionally, the family Meschiidae was recently established based on comprehensive morphological characteristics, bringing the total to 16 families within the superfamily Lygaeoidea14,15.

Research on Lygaeoidea has traditionally focused on entomological taxonomy using morphological data. For instance, Gao et al. constructed a cladogram of 15 families based on the fine structure of abdominal trichobothria16. Zhang et al. explored the structure of the metathoracic spiracle across 14 families, proposing its potential use in taxonomic and phylogenetic studies within Lygaeoidea17. However, many Lygaeoidea species are small and dull-colored, leading to limited research material collections and hindering molecular phylogenetic studies18. Recently, with the advancement of high-throughput sequencing technology and the accumulation of data by entomologists worldwide, mitochondrial genome-based phylogenetic analyses of Lygaeoidea have been conducted19,20,21.

Mitochondria is involved in various physiological processes and have independent gene regulatory mechanisms22,23,24. The mitochondrial genome has the characteristics of stable structure, rapid evolution, low gene recombination frequency, and the presence of numerous homologous genes25,26. These features make it widely applicable in phylogenetic studies, population genetic structure analysis, and gene drift studies27,28,29,30. Although representative mitochondrial genomes have been reported for some families, phylogenetic relationships within the species-rich Lygaeoidea remain underexplored. Moreover, most mitochondrial sequence-based phylogenetic analyses of Lygaeoidea have involved only a few families, resulting in inconsistent or even contradictory findings regarding the phylogenetic relationships among families31,32.

Dimorphopterus japonicus (Hidaka, 1959) belongs to the order Hemiptera, superfamily Lygaeoidea, and family Blissidae. This species exhibits two morphs: macropterous, characterized by a trapezoidal pronotum, and brachypterous, characterized by a square-like pronotum33. The host plants of D. japonicus include sorghum, foxtail millet, reeds and bamboo34. In northern China, it causes significant damage to sorghum and foxtail millet, where it is commonly referred to as the “sorghum blissid.” D. japonicus typically inhabits concealed areas at the base of leaf sheaths and feeds on the plant’s nutritional organs35.

In this study, the mitochondrial genome of D. japonicus was sequenced and annotated. We analyzed codon preference, RNA secondary structure, and evolutionary rates, andthe phylogeny of major lineages of Lygaeoidea was reconstructed. Our results, which include the construction of phylogenetic trees for Lygaeoidea and an evaluation of divergence times, will provide molecular evidence and theoretical reference for systematic taxonomic research on the Lygaeoidea superfamily.

Results

Mitochondrial genome features

The complete mitochondrial genome of D. japonicus spans 15,473 bp and contains 37 genes, including 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes, and a control region (Fig. 1; Table 1). Of these, 14 genes are encoded on the N-strand (nad1, nad4, nad4L, nad5, trnQ, trnC, trnY, trnF, trnH, trnP, trnL1(CUN), trnV, rrnS, and rrnL), while the remaining 23 genes are encoded on the J-strand. The arrangement of these 37 genes follows the typical order observed in Heteroptera insects, with no unusual rearrangements. Among the coding genes, there are 18 overlapping regions ranging from 1 to 7 bp, with the longest overlap occurring between atp8 and atp6. Additionally, seven intergenic spacer regions were identified, ranging from 1 to 51 bp in length, with the longest spacer located between nad4 and trnH.

Fig. 1
Fig. 1
Full size image

Mitochondrial genome structures of D. japonicus.

Table 1 Organization of the mitochondrial genome of D. Japonicus.

Nucleotide composition and codon usage

The nucleotide composition of the D. japonicus mitochondrial genome is detailed in Table 2. The overall A + T content is 77.6%, indicating a clear AT bias. Among the different genomic regions (PCGs, tRNAs, rRNAs), the control region exhibits the highest A + T content (80.8%), while PCGs have the lowest (76.8%). Among the 13 PCGs, atp8 has the highest A + T content (88.4%), whereas cox1 has the lowest (70.7%). When examining the three codon positions within the PCGs, the A + T content at the first position (69.6%) is lower than that at the second (78.1%) and third (82.7%) positions. The base skew analysis of the 13 PCGs reveals that seven genes have AT-skew values less than 0, indicating a T bias, while nine genes exhibit GC-skew values less than 0, indicating a C bias.

Table 2 Nucleotide composition and skewness of the mitochondrial genome of D. Japonicus.

The relative synonymous codon usage (RSCU) for the D. japonicus mitochondrial genome is presented in Fig. 2, which includes 4,385 codons, excluding stop codons. The most frequently used codons are UUA(L), GCU(A), and GGU(G) while the least used codons are CCG (P), CUG (L), and GGC(G). Among the frequently used codons, those ending in A/U appear at high frequencies, and codons with RSCU values greater than 1 are predominantly NNU/NNA. For example, within the synonymous codons encoding Asn, AAU (RSCU value 1.163) is used much more frequently than AAC (RSCU value 0.137), and for Lys, AAA (RSCU value 1.160) is far more common than AAG (RSCU value 0.140).

Fig. 2
Fig. 2
Full size image

The relative synonymous codon usage (RSCU) in the mitogenomes of D. japonicus.

Protein-coding genes

The 13 protein-coding genes of the D. japonicus mitochondrial genome collectively span 1,846 bp. The longest gene is nad5 at 1,705 bp, while atp8 is the shortest at 156 bp. Except for cox1, which uses TTG as a start codon, all other PCGs use ATN as their start codon. Regarding stop codons, cox2, cox3, nad1, and nad5 terminate with a single T, nad3 and cytb end with TAG, and the remaining seven PCGs use TAA as their stop codons.

This study also calculated the synonymous substitution rate (Ks), non-synonymous substitution rate (Ka), and the Ka/Ks ratio for each of the 13 protein-coding genes in the D. japonicus mitochondrial genome (Fig. 3). The results indicate that atp8 has the highest Ka value, while cox1 has the lowest. The nad1 gene has the highest Ks value, while cox2 has the lowest. The Ka/Ks ratios for all 13 PCGs are less than 1, indicating purifying selection. The evolutionary rate of the protein-coding genes is as follows: atp8 > nad3 > nad6 > nad2 > nad5 > atp6 > nad4 > nad4L > cox2 > nad1 > cytb > cox3 > cox1.

Fig. 3
Fig. 3
Full size image

The ratios of KaKs of 13 PCGs in the mitochondrial genomes of D. japonicus.

tRNA genes and rRNA genes

The mitochondrial genome of D. japonicus contains 22 tRNA genes with a total length of 1,435 bp. These individual tRNA genes range in length from 60 to 73 bp, with trnK being the longest at 73 bp and trnR and trnH the shortest at 60 bp each. In the secondary structures of the 22 tRNA genes, 20 exhibit typical cloverleaf configurations, except for trnS1 and trnV, which lack the dihydrouridine (DHU) arm (Fig. 4). Additionally, 16 instances of G-U base mismatches were identified in the secondary structures of these tRNAs.

The mitochondrial genome of D. japonicus also includes two rRNA genes: the rrnS gene (771 bp) and the rrnL gene (1,251 bp), both encoded on the N strand. The rrnS gene is located between trnV and the control region, while the rrnL gene lies between trnL1 and trnV. The secondary structures of these rRNA genes are depicted in Figs. S1 and S2. The secondary structure of the rrnS gene consists of three domains (I, II, and III), while the rrnL gene comprises five domains (I, II, IV, V, and VI), with domain III absent, which is consistent with other arthropods. There are 41 G-U mismatches in the rrnS gene and 55 G-U mismatches in the rrnL gene.

Fig. 4
Fig. 4
Full size image

Secondary structures of tRNA in D. japonicus.

Control region

The control region of the D. japonicus mitochondrial genome spans 990 bp and is situated between the rrnS gene and trnI. The base composition is 41.3% A, 39.5% T, 5.2% G, and 14.0% C, resulting in an A + T content of 80.8%, indicating a strong AT bias (Tables 1 and 2). Within this control region, two repeat sequences, each 356 bp long, are separated by a 203 bp interval (Fig. 5). These repeat sequences contain shorter repetitive elements of 45 bp and 21 bp, which are repeated three times and twice, respectively.

Fig. 5
Fig. 5
Full size image

Organization of the control region in D. japonicus.

Phylogenetic relationships

Before reconstructing the phylogenetic tree, we conducted saturation and heterogeneity analyses on the two datasets (PCG12 and PCG12rRNA). The saturation analysis showed that the sequences of the two datasets are not saturated, with PCG12 dataset Iss (0.4505) < Iss.c (0.8167) and PCG12rRNA dataset Iss (0.3619) < ISs.c (0.8658). Heterogeneity analysis showed that the sequence composition exhibited low heterogeneity (Fig. S3), confirming that both datasets are suitable for phylogenetic studies.

Fig. 6
Fig. 6
Full size image

Phylogenetic tree of Lygaeoidea inferred from BI and ML method based on different datasets.

Fig. 7
Fig. 7
Full size image

Phylogenetic tree of Lygaeoidea inferred from ML method based on PCG12rRNA dataset.

Phylogenetic trees were constructed using ML and BI methods based on the PCG12 and PCG12rRNA datasets. Except for the ML tree based on the PCG12rRNA dataset, the topological structures of the other three phylogenetic trees were consistent (Figs. 6 and 7). In all four phylogenetic trees, the monophyly of the Lygaeoidea superfamily was well supported. Neolethaeus assamensis did not cluster with other species of the family Rhyparochromidae, whereas the remaining 15 families formed monophyletic groups. Although the internal topology of Lygaeoidea was not entirely stable, certain clades, such as (Oxycarenidae + Piesmatidae), (Malcidae + Colobathristidae), (Meschiidae + Berytidae), and (Blissidae+ (Cymidae + Ninidae)), were consistently recovered in all analyses.

In BI tree based on the PCG12 dataset, the topology was divided into three major clades. Pachygronthidae and Heterogastridae formed a sister group relationship in the first clade. The topology of second clade was (Artheneidae+ (Oxycarenidae + Piesmatidae)). The other 11 families clustered the third clade, the topology was (Cryptorhamphidae+ (((Lygaeidae + Rhyparochromidae)+ (Malcidae + Colobathristidae))+ ((Geocoridae+ (Meschiidae + Berytidae))+ (Blissidae+ (Cymidae + Ninidae))))). Neolethaeus assamensis was identified as the earliest divergent lineage within Lygaeoidea. D. japonicus and D. gibbus formed a sister group, while the genera Dimorphopterus and Cavelerius also formed a sister group relationship.

Divergence time estimation

We estimated the divergence times within Lygaeoidea based on the PCG12 dataset (Fig. 8). The results indicate that the divergence of Lygaeoidea occurred approximately 99.4 million years ago (Ma) (95% HPD: 83.4-124.8 Ma) during the early Cretaceous period of the Mesozoic era. The family Rhyparochromidae was not found to be monophyletic. The earliest differentiated species within Rhyparochromidae, N. assamensis, diverged around 91.4 Ma (95% HPD: 82.7–94.0 Ma) during the early Cretaceous period.

The divergence times of Artheneidae (89.6 Ma), Heterogastridae (89.3 Ma), Cryptorhamphidae (84.5 Ma), Geocoridae (74.9 Ma), and Meschiidae (69.2 Ma) all fall within the early Cretaceous period of the Mesozoic era. The divergence times of Pachygronthidae (64.5 Ma), Cymidae (52.1 Ma), Lygaeidae (48.9 Ma), and Ninidae (45.9 Ma) occurred during the late Paleogene of the Cenozoic era. The divergence time of Berytidae was estimated at 31.4 Ma, making it the most recently differentiated family within Lygaeoidea.

The divergence times for Oxycarenidae and Piesmatidae were estimated at 82.7 Ma (95% HPD: 73.2–86.3 Ma) during the Cretaceous period of the Mesozoic era. The divergence times for Malcidae and Colobathristidae were estimated at 75.3 Ma (95% HPD: 61.0-89.4 Ma) in the same period. The genus Dimorphopterus diverged around 45.9 Ma (95% HPD: 39.8–47.7 Ma) during the Paleogene period of the Cenozoic era.

Fig. 8
Fig. 8
Full size image

The chronogram of divergence times by BEAST analysis.

Discussion

In this study, we sequenced the complete mitochondrial genome of D. japonicus using second-generation sequencing technology. The arrangement of the 37 genes was consistent with that observed in other published Lygaeoidea species. The nucleotide composition of the D. japonicus mitochondrial genome exhibits a high AT content, which is a common characteristic among Hemiptera species36,37. Notably, the longest intergenic region was located between nad5 and trnH, a rare feature in the mitochondrial genomes of Hemiptera insects.

Analyzing codon usage can provide insights into the evolutionary and environmental adaptability of different species. In the D. japonicus mitochondrial genome, the AT bias is evident in the synonymous codon usage frequency of protein-coding genes. Codons with higher usage frequencies predominantly end in A/U, and preferred codons with RSCU values greater than 1 also tend to end in A/U. The most commonly used start codon in the D. japonicus mitochondrial genome is ATN. Additionally, TTG and GTG are also preferred start codons in Heteroptera insects; in this study, the cox1 gene starts with TTG. The mitochondrial genome of D. japonicus prefers TAA as the termination codon, followed by TAG and the incomplete T. Protein-coding genes ending with T are typically completed to a full TAA stop codon through polyadenylation during transcription38. The evolution rates of PCGs were all below 1, indicating that the evolution of D. japonicus may have been influenced by natural selection. Among the PCGs, atp8 showed the fastest evolution rate, while cox1 exhibited the slowest, consistent with previous studies39,40.

The 22 typical transfer RNA (tRNA) genes were identified in the mitogenome of D. japonicus. Most of the tRNAs exhibited classic cloverleaf structures, except for trnV, which lacks the TΨC stem, and trnS1, which lacks the DHU stem. These abnormal tRNA structures are also found in the mitogenomes of many other Heteroptera insects41. G-U pairing was observed multiple times in the secondary structure of tRNA, with most mismatches occurring in the acceptor arm and anticodon stem. These mismatches play an important role in maintaining secondary structure stability42. G-U pairing was also observed in the secondary structure of rRNA.

The phylogenetic analysis supported the superfamily Lygaeoidea as a monophyletic taxon with strong support. The results indicated that Nysius assamensis represents the most basal lineage within Lygaeoidea. However, N. assamensis did not group with other Rhyparochromidae species, a finding consistent with previous phylogenetic analyses using the mitogenome of N. assamensis31,32,43. Heterogastridae and Pachygronthidae were closely related, forming a stable sister group relationship, a result corroborated by morphological data. In previous mitogenome studies, the clade of Heterogastridae and Pachygronthidae was considered a basal lineage, without N. assamensis as the representative sequence for Rhyparochromidae21,44. If the sequence was correctly uploaded, further morphological and molecular data will be necessary to better understand the monophyly of Rhyparochromidae.

The relationships among families within the superfamily Lygaeoidea remain highly debated and contentious. All of our analyses consistently recovered Blissidae as the sister group to the clade comprising Ninidae and Cymidae, with strong support in both BI and ML analyses. This relationship aligns with the findings of Ye et al., which were based on nuclear and mitochondrial genes44. Furthermore, Colobathristidae was strongly supported as the sister group to (Meschiidae + Berytidae) in the ML tree based on the PCG12rRNA dataset, confirming that these three families form a monophyletic group, referred to as the “malcid line” by Štys and Henry13,45. However, this grouping was not observed in the other three ML and BI trees. We also recovered the group (Artheneidae + (Oxycarenidae + Piesmatidae)), although this relationship was not supported in the ML tree based on the PCG12rRNA dataset. All of our phylogenetic trees supported Piesmatidae as the sister group to Oxycarenidae, reaffirming its classification as a member of Lygaeoidea rather than as an independent superfamily, consistent with previous studies21,44,46. The family Blissidae, consistently recovered as the sister group to the clade of (Ninidae + Cymidae), also showed agreement with the topologies reported by Ye et al.44.

The molecular clock method was used to estimate the origin and divergence times of each family, providing insights into the evolutionary history of Lygaeoidea. The divergence of the superfamily Lygaeoidea was estimated to have occurred at 99.4 Ma during the early Cretaceous, a result close to the estimation of Wang et al. (90 Ma)47. Most other previous studies have estimated divergence times ranging from 137 Ma to 171 Ma21,44,47,48. Our results suggest that the divergence of most lygaeoid families occurred from the early Cretaceous to the late Paleogene. The divergence time of Berytidae was estimated at 31.4 Ma, representing the latest differentiation among lygaeoid families. The divergence time of Blissidae was estimated at 55.8 Ma, with the genus Dimorphopterus differentiating at 45.9 Ma.

In this study, we reported the complete mitogenome of D. japonicus for the first time. Coupled with published data, we conducted phylogenetic analyses and divergence time estimations within Lygaeoidea. Due to the relatively small number of mitochondrial genomes available for Lygaeoidea, research on the phylogenetic relationships within this superfamily remains limited and cannot yet provide a stable taxonomic position. Therefore, further research is necessary to increase the number of mitochondrial genomes and to further elucidate the phylogenetic relationships within Lygaeoidea by integrating morphological and biological characteristics.

Methods

Specimen collection and DNA extraction

Adult specimens of D. japonicus used in this study were collected from Xinfu District, Xinzhou City, Shanxi Province in July 2021. After collection, the specimens were preserved in absolute ethanol at -20 °C. Thoracic muscle tissue from adult insects was extracted following the protocol of the Shanghai Sangon ONE-4-ALL DNA Extraction Kit (BS88504). DNA quality was assessed using a NanoDrop 2000 nucleic acid analyzer. Qualified samples were sent to Shanghai Personalbio Biotechnology Co., Ltd. for sequencing.

Genome annotations and sequence analyses

The sequencing data were assembled using A5-miseq v2015052249 and SPAdes v3.950, and the complete mitochondrial genome sequence was annotated by Geneious 10.1.351. Protein-coding genes were annotated based on the open reading frame (ORF) finder tool (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) and the identification of start and stop codons. tRNA genes were annotated using the online tools MITOS (http://mitos.bioinf.uni-leipzig.de/)52. rRNA genes were determined by referencing published mitochondrial rRNA genes of Blissidae insects and considering their secondary structures.

MEGA 7.053 was used to analyze codon usage, nucleotide composition, and relative synonymous codon usage (RSCU). AT-skew and GC-skew were calculated using the formulas AT-skew = (A − T)/(A + T) and GC-skew = (G − C)/(G + C), respectively. The number of tandem repeats of control region was investigated with Tandem Repeats Finder (http://tandem.bu.edu/trf/trf.html).

Phylogenetic analyses

To study the phylogenetic relationships among various taxa within the superfamily Lygaeoidea and the taxonomic position of Blissidae, this study selected 30 representative species from the Lygaeoidea (sequence information shown in Table 3), with two species of Coreoidea as outgroups. Phylogenetic analyses were performed with two data matrices excluding the third codon and stop codon positions: (1) PCG12 matrix including the first and the second codon positions of the 13 PCGs; (2) PCG12rRNA matrix, including the PCG12 dataset with two rRNA genes of rrnL and rrnS. In contrast with the RNA genes that were aligned through the nucleotide base, the protein-coding genes were aligned with the codon model. The connection of multiple sequences for each species was achieved using Sequence Matrix v.1.7.854.

Before constructing a phylogenetic tree, base substitution saturation and sequence composition heterogeneity analyses were performed on both datasets. DAMBE v.7.0.3555 software was used to calculate the base substitution saturation index. If Iss < Iss. c indicates that the dataset can be used for phylogenetic analysis. Heterogeneity analysis was performed using AliGROOVE v.1.0.856. Datasets with less heterogeneity were suitable for phylogenetic analysis. The BI tree was generated using MrBayes v3.257, with the GTR + I + G model selected by ModelFinder58 and run for 10,000,000 generations. The program was stopped when the convergence diagnostic value was below 0.01. The ML analyses were implemented in IQ-TREE 2 (v2.1.3)59 and tree support was evaluated using ultrafast bootstrapping with UFBoot2 (-bb 1000)60.

Table 3 List of species used to reconstruct the phylogenetic relationships in this study.

Divergence time estimate

The concatenated DNA sequence alignment of PCG12 was used to estimate divergence time in BEAST v10.5.061 with relaxation clock lognormal model. We set up a GTR + I + G partition model using the calibrated Yule model for the prior tree. The fossil information points of Blissidae, Eoblissus gallicus (58.3–53.7 Ma), and and the fossil of Berytidae, Metacanthus serratus (33.9–28.4 Ma) were used for calibration. Acceptable effective sample sizes (ESS) and convergence to the stationary distribution were checked by Tracer 1.7.262. Two replicate Markov chain Monte Carlo (MCMC) runs were performed with the tree and parameter values sampled every 1000 steps over a total of 50 million generations. Then, sample trees were aggregated and 95% highest probability density (95% HPD) was displayed.