Introduction

The Anopheles genus includes more than 480 species, 70 of which are known to transmit malaria1. The genus enumerates several complex species characterized by different vectorial competence, ecology and/or behavior1. Several of the most dangerous vectors of malaria in the Holarctic region belong to the Anopheles maculipennis complex2. The identification of the species within this complex is hampered by their morphological similarity; indeed, the species were first defined based on the morphological features of their eggs3. Subsequently, species were also defined on cytogenetic and molecular bases4,5. Moreover, wing landmarks were identified as useful for discriminating among the species of the complex6. However, the taxonomy of the complex continues to be debated due to the intrinsic difficulty of defining clear diagnostic characters for delimiting species. Over the years, new species were described, and others were synonymized, such as Anopheles subalpinus, which was synonymized with An. melanoon7,8. No consensus has been reached within the scientific community on the taxonomic status of Anopheles daciae Linton, Nicolescu & Harbach9,10,11. Therefore, in this article, we will refer to it as species inquirenda (sp. inq.), namely a species of doubtful identity that requires further investigation.

In recent decades, with the development of biotechnologies, molecular approaches have played a central role in modern entomology, especially as a support for classical morphological approaches for species identification but also for the delimitation of taxa12,13. Molecular species delimitation methods and approaches are particularly useful for delimiting sibling insect taxa or species within species complexes14,15,16; thus, they often provide answers to taxonomic questions that are debated17,18,19. These methods were also used to define the taxonomic status of the taxa within complexes of the genus Anopheles20,21,22,23. The molecular discrimination of Anopheles species and the phylogenetic relationships among taxa of the genus have often been determined by using two markers: the mitochondrial gene cytochrome c oxidase subunit I (COI), usually the 5′ region, and the nuclear internal transcribed spacer 2 (ITS2) region between the ribosomal 5.8S gene and the 26S rRNA22,23,24. Although COI is recommended as a universal marker for insect molecular identification in DNA barcoding and metabarcoding25, it can lead to misidentification26. In fact, due to maternal inheritance, variability in COI can be influenced by mitochondrial introgression due to hybridization, potentially leading to misidentifications24. On the other hand, the biparentally inherited ITS2 region has proven to be particularly suitable for the identification of species of the genus Anopheles because it is often conserved within species and variable between species21,24. The low levels of intraspecific variation of this marker is probably driven by mechanisms of concerted evolution, which promote the homogenization of repeated sequences within species. At the same time, this non-coding region can accumulate mutations between species24,27. In summary, using both COI and ITS2 markers together facilitates cross-verification of species identifications, especially in instances where hybridization may complicate interpretations.

The main aim of this study is to explore the species boundaries within the Anopheles maculipennis complex using an integrated taxonomy approach, combining DNA-based species delimitation methods and wing morphometry. Moreover, the study evaluates the congruence between DNA marker signals and their consistency with morphological information to identify the most effective approach for delimiting species within the complex. We analyzed mosquito data gathered from more than one hundred sites across the Po Plain in Italy, along with data from public repositories.

Results

COI and ITS2 features of the collected species

The ITS2 gene was successfully amplified, and high-quality sequences were obtained for 1226 individuals out of the 1276 processed, obtaining sequences from 376 to 391 bases depending on the species; the COI marker was amplified from a subsample of 827 mosquitoes, and high-quality sequences of 659 bases were obtained for 480 specimens. DNA sequences of both markers were obtained for 454 individuals.

The sequences generated were ascribable to four different species An. daciae sp. inq. (1025, 81.9%), An. maculipennis s. s. (202, 16.1%), An. atroparvus (18, 1.4%) and An. melanoon (7, 0.6%) (Fig. 1). Moreover, we obtained eggs from 22 of the collected mosquitoes; all the deposed eggs belonged to the An. messeae/An. daciae morphotype. We obtained the ITS2 sequence from all but one of these specimens and the COI sequence from 12 specimens.

Fig. 1
figure 1

Position of collection sites, with reference to identified species (a) and location of the monitored area on the map of Italy (b). Haplotype median joining network of the ITS2 (c) and COI (d) sequences obtained in this study. Mutations are represented by one-step edges; the diameter of network circles is proportional to the number of sequences (empty circles represent 10 sequences). Map circles: blue, An. daciae sp. inq. (An. da.); yellow, An. maculipennis s. s. (An. ma.); green, An. da. and An. ma.; red, An. atroparvus, An.da. and An. ma.; light blue, An. melanoon (An. me.), An. da. and An. ma.; purple, An. me. and An. da.

In the haplotype network obtained from all the available ITS2 sequences, four clearly separated groups were present (Fig. 1d). For this marker, ambiguous bases were recorded in sequences of the same individuals as a result of intraindividual polymorphisms. COI produced a more dispersed network, which included subspecific groups, due to the highest variability of the marker (Fig. 1c). Conversely, the translated amino acid sequences were identical for the majority of conspecifics, and nonsynonymous mutations were detected within the same species in 26 sequences (14 from An. daciae sp. inq., 11 from An. maculipennis s. s., 1 from An. atroparvus).

Species delimitation

A subset of COI and ITS2 sequences generated in this study, along with orthologous sequences obtained from previous works, were used to construct datasets encompassing 118 individuals from the following species: An. atroparvus, An. beklemishevi, An. daciae sp. inq., An. maculipennis s. s., An. messeae, An. melanoon and An. sacharovi plus An. plumbeus (serving as an outgroup of the Maculipennis species complex).

Automatic Barcode Gap Discovery (ABGD) analyses highlighted the absence of a clear barcoding gap in the frequency distributions of pairwise nucleotide distances for both the COI and ITS2 datasets. In the case of the COI dataset, ABGD delimited from 2 to 97 hypothetical species. The highest congruence between the analyzed morphospecies and the retrieved hypothetical species was found in partitions ranging from a prior nucleotide distance of 0.34% to 0.43%, where nine groups were identified. Among these groups, a perfect match with the morphospecies was obtained only for An. atroparvus, An. beklemishevi, An. labranchiae and An. plumbeus. Within this range of nucleotide distances (0.34% to 0.43%), all the sequences of An. messeae and An. daciae sp. inq. were merged together in a single group, and those of An. melanoon clustered with a group of An. maculipennis s. s.; the latter species was split into two separate groups. In addition, An. sacharovi were split into two groups (Fig. 2b). In the case of ABGD analysis of the ITS2 dataset, perfect matches between the initial and recursive partitions were observed for 0.46% to 10% of the prior nucleotide distances tested. The number of groups in which ABGDs were identified ranged from nine (with prior nucleotide distances of 0.10–0.28%) to three (with prior nucleotide distances of 10%). In the partition with eight groups, a perfect match between the identified hypothetical species and the morphological species was observed for seven out of nine morphospecies included in the dataset. These seven morphospecies are An. atroparvus, An. beklemishevi, An. labranchiae, An. maculipennis s. s., An. melanoon and An. sacharovi. The two presumptive species—An. messeae and An. daciae sp. inq.—were lumped together (Fig. 2b).

Fig. 2
figure 2

Species delimitation of the Anopheles maculipennis species complex. (a) Bayesian ultrametric tree inferred from the COI dataset. (b) Bayesian ultrametric tree inferred from the ITS2 dataset. Morphological assignment of the individuals (an asterisk indicates individuals identified through egg morphology) and results of species delimitation analyses (ABGD, GMYC, mPTP) are reported using vertical bars, with different colors according to the different species delimitation analyses performed and different line textures indicating taxa delimited in the same unit but not adjacent to the tree. GenBank accession numbers are reported near to the species name. Support of the nodes is shown when > 0.7, omitted for minor lineages. The scale bar indicates the distance in substitutions per site. Sequences of An. messeae deposited in public repositories were renamed according to the presence of ITS2 polymorphisms diagnostic for An. daciae sp. inq.

Species delimitation analyses implementing the coalescent tree-based methods led to somewhat similar results. In the case of COI, the Generalized Mixed Yule Coalescent (GMYC) model resulted in a greater log likelihood than the null model, identifying a statistically significant transition from the Yule model to the coalescent model (logLGMYC = 801.8, logLNULL = 777, PLR test < 0.001), indicating a threshold between the inter- and intraspecific levels at − 0.031 relative time units (root time − 1, tip time 0). This threshold identifies 11 maximum likelihood entities (95% CI 11–17 units), six of which perfectly match the morphospecies An. atroparvus, An. beklemishevi, An. labranchiae, An. melanoon, An. sacharovi and An. plumbeus (Fig. 2a). The individuals morphologically identified as An. maculipennis s. s. were paraphyletic and were split into two well-supported clades (Bayesian posterior probability (BPP) ≥ 0.99) recognized by GMYC as different Evolutionary Significant Units (ESUs). In the COI tree, individuals of An. messeae and An. daciae sp. inq. morphospecies clustered together into a well-supported clade (BBP = 1). Within the last clade, GMYC identified three separate ESUs: the first including individuals of An. daciae sp. inq. plus the individual of An. messeae KZ3 AY258182 (clade 1, C1; Fig. 2a); the second consisted of the individuals of An. daciae BG2 AY258172 only (clade 2, C2; Fig. 2a); the third group was composed of individuals of An. daciae sp. inq. plus one individual whose identification is uncertain (An. messeae/daciae A22) and An. messeae KO3 AY258178 (clade 3, C3; Fig. 2a). Interestingly, the delimitation achieved using the multi-rate Poisson Tree Processes (mPTP) method on the COI dataset (Fig. 2a) was mainly congruent with that achieved using the GMYC method; however, the mPTP recognized all the An. messeae and An. daciae sp. inq. individuals as belonging to a single ESU. For ITS2 (Fig. 2b), the GMYC model exhibited a greater log likelihood than did the null model (logLGMYC = 1019.3, logLNULL = 984.2, PLR test < 0.001), and the Yule coalescent threshold identified by the model (threshold relative time − 0.005) split the nine Anopheles morphospecies into 9 ESUs (confidence intervals [8–9]). Almost perfect congruence with the existing morphospecies was observed. The only exceptions were An. daciae sp. inq. and An. messeae, where two ESUs were identified (Fig. 2b). These ESUs are not consistent between the two morphospecies since some sequences of An. daciae sp. inq. clustered with the two present An. messeae in a single clade (not corresponding to any COI clade), while the majority of sequences of An. daciae sp. inq. was recognized within a separate clade (Fig. 2b). Delimitation using the mPTP identified eight ESUs, seven of which were consistent with the morphospecies An. atroparvus, An. beklemishevi, An. labranchiae, An. maculipennis s. s., An. melanoon, An. plumbeus and An. sacharovi, while sequences of An. messeae and An. daciae sp. inq. were grouped into a single clade (Fig. 2b).

Morphometry results

The first three CDA eigenvalues explained a cumulative percentage of 99.3 of the total variance (CDA1 = 66.3%, CDA2 = 22.4% and CDA3 = 10.6%), suggesting a difference in wing shape among the sibling species (Fig. 3). Moreover, the permutation test on the mean CDA scores showed that the pairwise comparisons of wing shape distance were all statistically significant (Fig. 3).

Fig. 3
figure 3

Wing landmarks and canonical discriminant analysis (CDA) reported in 3D for the four sibling species: An. atroparvus, An. maculipennis s. s., An. melanoon, An. daciae sp. inq. Tanglegrams between the UPGMAs of the molecular markers and the wing shape.

The first PC dimension (explained variance: PCA1 = 66%) tested against the COI Unweighted Pair-Group Method with Arithmetic mean (UPGMA) dendrogram was not statistically significant considering all the metrics (Table S1), while the ITS2 UPGMA dendrogram was significant for the metrics Cmean and lambda (p value < 0.05, Table S1). The wing shape UPGMA showed a Baker’s gamma index of − 0.23 and a cophenetic correlation of − 0.25 with COI UPGMA and a Baker’s gamma index of 0.85 and a cophenetic correlation of 0.89 with ITS UPGMA, indicating a greater association between wing shape and ITS2 than with COI UPGMA (Fig. 3).

At the intraspecific level, the first two PCs of the morphospace of An. maculipennis s. s. and An. daciae sp. inq. showed a random ordination of the groups identified by species delimitation analysis performed on the COI (Fig. S1). For each species, we tested the associations between COI groups and wing shape. The LDA model achieved poor performance considering all the performance metrics computed on the testing set and failed to record differences between the two within An. maculipennis s. s. and An. daciae sp. inq. obtained from COI species delimitation analyses (Fig. S2).

Discussion

The morphological identification of species of the Anopheles maculipennis complex is challenging, which is why DNA-based molecular identification is widely used to discriminate between these species. In this study, we generated DNA sequences for COI and ITS markers, which are widely used in the identification of Anopheles species, and used these sequences coupled with orthologous sequences available in online databases to investigate the species boundaries within the complex. We used sequences generated from individuals identified with alternative methods to DNA-based methods, such as analyses of egg morphology or chromosome inversions, to integrate morphological and molecular evidence.

The two molecular markers analyzed in this study showed different levels of nucleotide variability. ITS2 showed limited intraspecific variability compared to the mitochondrial marker COI, which was more variable within species. This finding is consistent with the expected low intraspecific variability of ITS2, likely due to concerted evolution, while its higher interspecific variability may be attributed to the ability of this non-coding region to accumulate variations between species24. Additionally, the results confirm that COI is less suited for discriminate among recently diverged taxa24,36. The morphospecies An. atroparvus, An. beklemishevi, An. sacharovi and An. labranchiae were clearly delimited in separate evolutionary units (ESUs) by the molecular species delimitation methods adopted, both with COI and ITS2, while nonunivocal results were obtained for An. melanoon, An. maculipennis s. s., An. daciae sp. inq. and An. messeae. Considering the results obtained with the two markers independently, the ITS2 gene tree showed well-supported clades, and all the delimitation methods used largely corroborated the classical division into morphospecies of the complex. The only exceptions were An. messeae and An. daciae sp. inq., which were lumped in a single ESU or, in the case of GMYC, in two ESUs. Indeed, GMYC split An. messeae and An. daciae sp. inq. into two groups whose structure had no apparent congruence with the taxonomy or with the geography or climatic conditions of the collection sites of the individuals.

On the other hand, the COI tree exhibited a more complex branching pattern than the ITS2 tree. The species delimitation obtained from the COI marker identified meaningful evolutionary units that do not always reflect the current species boundaries, such as in the case of An. maculipennis s. s. and An. messeae-An. daciae sp. inq. Commonly, the high intraspecific nucleotide variability that this marker possesses in some groups of insects allows the detection of cryptic species28,29 and phylogeographic patterns30,31. These COI features, on the other hand, can also lead to an overestimation of species number. The most relevant split observed involved An. maculipennis s. s., where the individuals of this morphospecies were clustered into two separate but well-supported monophyletic clades in the ultrametric tree inferred from COI sequences. Based on this marker, An. maculipennis s. s. morphospecies was polyphyletic, and the two groups were separated by monophyletic An. labranchiae and An. melanoon. All the species delimitation methods adopted on COI—including ABGD, which does not rely on a gene tree as input—were congruent in identifying two ESUs within the An. maculipennis morphospecies, contrary to ITS2, which supported the monophyletic status of all An. maculipennis s. s. morphospecies. The polyphyly observed in An. maculipennis s. s. based on mitochondrial gene analysis could be attributed to at least four potential explanations: i. introgression events between a lineage of An. maculipennis and closely related species; ii. vertical transmission of bacteria, such as Wolbachia and Cardinium, influencing host reproduction and spread within populations of ancestral or introgressed haplotypes; iii. artifactual tree topology resulting from high nucleotide variation in COI sequences; iv. the presence of cryptic diversity within the morphospecies An. maculipennis s. s.

In detail, introgression, defined as the genetic exchange between species through the backcrossing of interspecific hybrids, is a common phenomenon in mosquitoes32. This phenomenon has previously been proposed as the most plausible explanation for the topology observed in the COI phylogenetic tree of Anopheles species from the southwest Pacific26, which, similar to what was observed in this study, contrasted with that obtained with the ITS gene. Moreover, introgression was recently proposed as a fundamental phenomenon driving speciation in Anopheles complexes33. Based on the results obtained in this study with the ABGD method, which merged in one taxonomic entity An. melanoon and one group of An. maculipennis s. s., we can suppose that introgression among the progenitors of these groups occurred. However, we cannot disregard the possibility that the observed results may be influenced by vertically transmitted bacteria, such as Wolbachia and Cardinium, which can induce reproductive segregation and incompatibility. Although this phenomenon has already been documented in Anopheles mosquitoes34, the topology of the COI tree of this study, notably the clustering of one of the two groups of An. maculipennis s. s. with An. melanoon and An. labranchiae, diminishes the likelihood of this hypothesis.

The artifactual topology of the COI tree can indeed be linked to the high nucleotide variability of the COI gene, which is particularly notable at third codon positions. This variability can impede the accurate resolution of relationships, even among closely related taxa35. Consequently, the observed tree topology may be a result of inherent COI features intertwined with the evolutionary history of An. maculipennis s. s. This interplay might have generated a tree in which two An. maculipennis s. s. clusters are separated by a basal split, and the artifact is represented by the An. labranchiae and An. melanoon, which clustered with one of them. Moreover, we cannot dismiss incomplete lineage sorting as an alternative explanation, given its challenging differentiation from other phenomena based solely on phylogenetic evidence36.

Despite the aforementioned inconsistencies in the delimitation of An. messeae and An. daciae sp. inq. across markers and methods, attributed solely to the molecular delimitation performed with GMYC, the results of the other two molecular delimitation methods are concordant. Specifically, these methods merge the two species into one ESU, aligning with morphological evidence. In fact, the discrimination between An. messeae and An. daciae sp. inq. is unattainable through adult features and the differences in their egg characteristics are negligible and lack statistical significance37,38. Initially, An. daciae sp. inq. was distinguished from An. messeae based on the presence of five single-nucleotide polymorphisms in the ITS2 sequences37. However, subsequent studies have identified intraindividual polymorphisms in this marker9, including within three of the five diagnostic sites for An. daciae sp. inq. and An. messeae39, leaving only two informative sites for discriminating between these two species. Other methodologies that could aid in determining the species status of these taxa include crossing experiments to evaluate hybrid sterility, cytogenetic markers and molecular investigations, which have yielded conclusive results for several Anopheles species complexes5. However, laboratory experiments on hybridizations between An. daciae sp. inq. and An. messeae may be challenging to conduct, and hybrids have been reported in various studies based on molecular and cytogenetic evidence9,10,40,41. The occurrence of hybridization between these species is also conceivable considering the variability observed in several ITS2 sequences from Russia and Kazakhstan, locally recorded in a large proportion of specimens, featuring polymorphic sites at the two key positions distinguishing An. messeae and An. daciae sp. inq.42.

Moreover, inversions in polytene chromosomes are widely used as markers to discriminate between species in complexes, including those within the genus Anopheles and even within the Anopheles maculipennis complex2. Specifically, regarding An. messeae, inversions in polytene chromosomes led to the early definition of chromosomal forms A and B for this species and subsequently, An. messeae A was proposed to be synonymous with An. daciae43. Differences in the frequencies of inversions have been reported in various populations of An. messeae and An. daciae sp. inq.10,41.

The morphometric analysis of mosquito wings revealed significant differences between An. maculipennis s. s., An. daciae sp. inq., An. melanoon and An. atroparvus, confirming previous results6,44. At the same time, the intraspecific groups obtained by the COI marker were not supported by morphometric data, suggesting that this marker should be used with caution in species delimitation work, at least in those concerning Anopheles species, while the ITS2 marker showed better performance in this context.

Unfortunately, we have not classified the mosquito wings as An. messeae to screen morphometric differences with specimens of the An. daciae sp. inq. However, notably, the morphometric results for other species corroborated the results obtained for ITS2, and this marker does not support subgroups of An. messeae.

The use of morphometry in this work allowed us to evaluate, with an independent approach, the species delimitation results obtained for the COI and ITS2 markers. This approach is promising and deserves wider recognition because of the improvements it can achieve by exploiting new information tecnology methods. The use of tools to capture images and wingbeat frequency, the analysis of such data by artificial intelligence and deep learning, the use of a species distribution model combined with machine learning algorithms and global sensitivity and uncertainty analysis are innovative and widespread approaches in ecology45. Convolutional neural networks have demonstrated high accuracy in image and spectrogram classification46. Advances in automated mosquito identification and machine learning could produce a critical tool for monitoring mosquito populations, surveillance and spatial and temporal prediction using environmental layers47.

Conclusions

The present study revealed that An. atroparvus, An. beklemishevi, An. sacharovi and An. labranchiae species can be accurately distinguished using multiple approaches, including the analysis of morphological features and DNA variation of both COI and ITS2 markers. However, the COI cannot be considered an adequate marker for identifying An. maculipennis s. s. due to the likely introgression events or incomplete lineage sorting phenomenon. Furthermore, all the delimitation methods and approaches adopted failed to adequately distinguish between An. daciae sp. inq. and An. messeae. In the absence of unequivocal results, caution is suggested in accepting the species status of An. daciae sp. inq., at least until more robust evidence is produced.

Although it might seem to be a very specialist issue, the definition of taxonomic boundaries inside mosquito complexes is important from a public health perspective, particularly for the Anopheles genus, which includes species vectors of malaria. The occurrence of cryptic species can lead to significant problems in surveillance and control when they differ in vector capacity due to differences in their biology, ethology and propensity to bite humans1. In addition, hybridization between evolving lineages can produce mosquitoes with intermediate behavioral and ecological characteristics, which can give rise to speciation events48. This picture is complicated by the strong evolutionary pressure to which mosquitoes are subjected, often due to anthropic actions, such as landscape variations and chemical control, which shape the drivers of speciation processes for these insects. Nevertheless, the definition of systematic relationships within the complex is a stimulating task because Anopheles complexes are an ideal context for defining the boundaries of sibling species and the mechanism of speciation.

Methods

Sample collection and identification

The specimens were collected in the Po Plain area in the Emilia-Romagna and Lombardy regions (Northern Italy). This zone is a suitable environment for Anopheles life and reproduction since it features many potential breeding sites, such as rice fields (e.g., Lomellina area) or wetland areas near the Po River delta, which is one of the largest wetland areas in Europe (Valli di Comacchio and Po River Delta). The majority of the specimens were collected between 2017 and 2018, while a few specimens were collected between 2011 and 2016. Some specimens were retrieved during entomological surveillance of the West Nile virus in Italy at 103 sites on the Po Plain; these mosquitoes were sampled using attractive traps baited with CO249. Further samples were collected by direct aspiration of resting adults at 43 sites, including farms with a variety of animals (cattle, horses, goats and poultry), to collect engorged and host-seeking individuals and uninhabited buildings to collect overwintering mosquitoes50.

Some live engorged females were allowed to lay eggs in glass tubes equipped with a wet blotting paper strip. Laid eggs were used to identify adults to the species level following the egg identification keys4.

DNA extraction, PCR and sequencing

DNA was extracted from one leg of the sampled mosquitoes, including individuals identified to the species level through egg analysis, using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. The two markers—COI and ITS2—were amplified by PCR according to the conditions reported by Jalali et al.51 and Marinucci et al.52, respectively. Both strands of the obtained amplicons were sequenced using ABI technology (Applied Biosystems, Foster City, CA, USA) and marker-specific primers. The obtained electropherograms were edited, and the consensus sequences were acquired using Geneious Pro 11 (Biomatters Ltd., NZ). In the case of COI sequences, the open reading frame was checked using the tool EMBOSS Transeq (https://www.ebi.ac.uk/Tools/st/emboss_transeq/). The sequences were deposited in the Bold system53 under the project ‘Anopheles maculipennis complex in Northern Italy (ANMA)’ (BOLD IDs: ANMA001-21 to ANMA1247-21) and in GenBank database54 (COI accession numbers from PQ640095 to PQ640174, ITS2 accession numbers from PQ640180 to PQ640259).

The map of the distribution of mosquitoes in the surveyed area according to the geolocated sampling site was generated using QGIS software (https://www.qgis.org/en/site/index.html#).

Sequence dataset preparation

Orthologous COI and ITS2 sequences of selected Palearctic species from 33 individuals of the Anopheles maculipennis complex (i.e., An. atroparvus, An. beklemishevi, An. daciae sp. inq., An. labranchiae, An. maculipennis s. s., An. melanoon, An. messeae and An. sacharovi) were retrieved from previously published studies37,43,55,56. Preference was given to studies in which the specimens were identified to the species level by observing egg morphology. COI and ITS2 sequences were retrieved only when both markers were obtained from the same individual. In 76 individuals, for which the morphological identification was not possible, molecular identification was performed using Basic Local Alignment Search Tool analysis (BLAST: http://www.ncbi.nlm. nih.gov/BLAST) with the default parameters. Species-level identification was assigned only when a similarity ≥ 99% and an E-value < 1 × 10−20 were obtained from the comparison between the query and the reference sequence.

The retrieved sequences were merged with the de novo COI and ITS2 sequences obtained in this study. The COI sequences were aligned using MUSCLE with default parameters. COI haplotypes were identified using R version 3.6.2 (R Core Team, 2019), and library haplotypes were identified (https://biolsystematics.wordpress.com/r/). One representative sequence for each COI haplotype was selected. In the cases in which the same haplotype was shared between two groups (or more), the sequences were maintained for all species (the same procedure was also adopted in Bellin et al.44). The COI dataset obtained was further reduced to balance the number of individuals belonging to each species (balancing the intraspecific sample size before performing the species delimitation analysis) by randomly removing the COI haplotypes of overrepresented species according to previous studies57. In this step, the sequences developed within this study were removed. The ITS2 dataset was then pruned to retain the sequences of the individuals maintained in the COI dataset. The ITS2 sequences of An. messeae obtained were further checked and, if needed, renamed according to the presence of the diagnostic polymorphisms of the ITS2 that distinguish An. daciae sp. inq. (according to Lilja et al.39). The COI sequences developed from the same individuals were also relabeled accordingly. The ITS2 sequences were aligned using MAFFT with G-INS-i as the search strategy58.

Haplotype networks and molecular species delimitation analysis

The alignments of COI and ITS sequences obtained from the specimens collected in the Po Valley (Italy) were used for haplotype network reconstruction. Haplotype networks were obtained via the median joining network method using PopArt software59.

Different molecular species delimitation methods were adopted to delimit the evolutionarily significant units present in the datasets (ESUs)60. In detail, the following distance-based and coalescent tree-based methods were adopted: a. the automatic barcode gap discovery (ABGD) tool61, which is known to be efficient in delimiting phylogenetically closely related species as well as distantly related species13; b. the generalized mixed Yule-coalescent (GMYC) model62; and c. the multirate Poisson Tree Processes (mPTP)63. These approaches have been widely and successfully adopted in previous studies addressing similar questions on different insect groups15,16,64,65. Species delimitation analyses were performed separately for the COI and ITS2 datasets. ABGD analyses were performed through the web-based interface (http://wwwabi.snv.jussieu.fr/public/abgd/abgdweb.html) adopting the Kimura two parameter as a model of nucleotide evolution (transition/transversion = 2); prior divergence of intraspecific diversity ranging from 0.001 to 0.1; relative gap width of 0.1 and 0.5 for the COI and ITS2 datasets, respectively; and number of bins = 20; moreover, 20 steps were set in the case of COI, and the remaining parameters were left at the default setting.

Bayesian single-locus ultrametric phylogenetic trees for each of the two markers considered in this study were inferred using BEAST 2.666 and used as input for the species delimitation analyses. The best nucleotide substitution models and partition scheme were estimated for each dataset using PartitionFinder267 and selected according to the corrected Akaike Information Criteria. The COI dataset was partitioned according to codon positions: the 1st codon position with the GTR model of nucleotide substitution + gamma distribution (\(\Gamma\)) and nucleotide frequencies estimated with maximum likelihood from the data; the 2nd codon position with the Tamura-Nei model + \(\Gamma\) and the estimated nucleotide frequencies; and the 3rd codon position with the HKY model and estimated nucleotide frequencies. Regarding the ITS2 dataset, the Tamura-Nei model + \(\Gamma\) and estimated nucleotide frequencies was the best model of nucleotide substitution.

Two independent BEAST runs were performed for each dataset using the following parameters: Markov chain length of 150 × 106 generations for the COI and ITS2 datasets; sampling of trees and parameters every 1000 generations; models of nucleotide evolution as previously mentioned; birth–death process as the tree prior, suitable for trees describing the relationships between individuals from different species, with a uniform prior; and the other priors were set to their default values. The convergence of the two BEAST runs was visualized and examined using TRACER68. The runs were then pooled after removal of the tree burn-in fraction, and the maximum clade credibility tree was obtained using TreeAnnotator67.

Species delimitation analyses using single-threshold GMYC were performed through the R package Splits (species limits by threshold statistics), available at http://r-forge.r-project.org/projects/splits/, using the maximum clade credibility ultrametric trees as input after outgroup removal. Tree manipulation was performed using the R package ape69. The tree-based mPTP method was applied to the same trees used as inputs for GMYC. mPTP analyses were performed through the web interface available at https://mptp.h-its.org/ adopting maximum likelihood delimitation.

Geometric morphometrics

The wings of the selected specimens were removed mechanically, brushed gently with a thin-tipped brush to remove the scales and mounted on a glass slide in Hoyer’s medium. The landmarks were then fixed manually using the Clic package70 for 356 mosquitoes of the four species An. maculipennis s. s. (n = 67), An. daciae sp. inq. (n = 165), An. atroparvus (n = 10) and An. melanoon (n = 4) and were then digitized6,44. The landmark coordinates were aligned in a common reference system using generalized Procrustes analysis (GPA). Canonical discriminant analysis (CDA) was used to fit a linear combination of the GPA coordinates to estimate the maximum separation among wing shapes and to visualize differences between species in a reduced space. A permutation test (n = 1000) was used to determine differences in the CDA scores among each pair of species. The p value of each test was adjusted using Bonferroni correction. Linear discriminant analysis (LDA) was applied to morphometric data at the intraspecific COI group level recorded in An. maculipennis s. s. and An. daciae sp. inq.

GPA was performed using the R package geomorph71. The CDA was fitted with the R package MorphoTools272 and visualized with ggplot273. The permutation test was performed using the R package Morpho74.

Moreover, at the intraspecific level, we also assessed the statistical relationships among the wing shape and molecular marker groups (ITS2 and COI) identified throughout phylogenesis. We identified two main represented species in the dataset: An. maculipennis s. s. and An. daciae sp. inq. At the intraspecific level, the GPA coordinates were used to fit a principal component analysis (PCA) morphospace, and the COI group identities were superimposed to improve data visualization. We used a statistical learning framework to test the phylogenetic signal. For each species, linear discriminant analysis (LDA) was performed considering the PCAs of the morphospace (selected using the elbow plot method) as predictors and the COI groups as dependent variables. The first 10 PCs of the morphospaces were used as predictor variables in the LDA models according to the elbow plot. The individuals were subsampled considering the proportion of the COI groups toward the minority class to balance the dataset. A training set was used to train the LDA model using 70% of the data, while the remaining 30% was used as a testing set to further assess the model’s performance. The training set was further split into 5 folds, and the cross validation of the LDA model was computed. For both the fivefold cross-validation and testing sets, five different performance metrics were computed: accuracy, F1 score, kappa, sensitivity and specificity. The LDA model and the performance metrics were computed using the R package caret75.

Assessing the phylogenetic signal between wing shape and gene trees

The statistical dependence between the resemblance of the right-wing landmark coordinates and the molecular marker (ITS2 and COI) gene tree was quantified according to the definition of phylogenetic signals as the “tendency for related species to resemble each other more than they resemble species drawn at random from the tree”76. The mean wing shape of each species was computed using the GPA coordinates and projected into the PCA morphospace. The Euclidean distances between pairs of species were computed. The distance matrix between pairs of species was used to fit a UPGMA-generated wing shape tree. The COI and ITS2 sequences of the specimens analyzed with geometric morphometrics were considered. For each species, both molecular marker sequences were aligned using the ClustalW algorithm, and representative species consensus sequences were obtained. A further multiple sequence alignment among the species consensus sequences was performed using the DECIPHER algorithm77, and a UPGMA phylogenetic tree was fitted.

To assess the statistical dependence between wing shape and phylogeny, we considered two different levels of representation of the species’ mean shapes: PCA morphospace and UPGMA wing shape tree. To evaluate the phylogenetic signal, the first dimension of the PCA morphospace was tested against the UPGMA phylogenetic tree considering five different statistics: Cmean, Moran’s I, K-statistic, K* and Pagel’s λ using the R package phylosignal78. The statistical correlation and similarity between the UPGMA trees of the wing shapes and the UPGMA of the molecular markers were estimated using Baker’s Gamma Index and the cophenetic correlation was estimated using the R package dendextend79.