Background & Summary

Mosquitoes are the most important arthropod vectors of pathogens worldwide, e.g. for dengue and Zika virus1. Global change, particularly global warming and globalization, facilitates the spread of mosquitoes and their pathogens, increasing the risk of mosquito-borne pathogen transmission in previously unaffected areas2. This highlights the need for mosquito monitoring and surveillance methods to develop early prevention measures. Therefore, it is essential to accurately identify mosquito species as the ecology and vector capacity strongly vary among species3.

Mosquito species are commonly identified using taxonomic keys based on morphological characters4. The morphological species identification can be time-consuming and requires intensive entomological experience, limiting its scalability for large-scale studies5. In addition, it has been widely recognized that entomological expertise is declining as traditional taxonomy becomes less central to the biological curriculum6. Alternative methods (e.g. DNA barcoding) require specialized equipment and technical expertise7, which can be challenging to use in low-resource settings. These limitations highlight the need for complementary approaches and comprehensive datasets to improve species identification accuracy.

Wing geometric morphometrics offers a promising alternative to traditional identification methods of mosquitoes, as it reliably captures interspecific variations8,9,10. This approach utilizes the coordinates of anatomical features of the wing to analyse shape differences between species. Furthermore, the method can be used to detect subtle intraspecific differences, providing valuable insights into population structure, breeding conditions, or fecundity11,12,13. This makes wing geometric morphometrics a versatile and powerful tool for entomological research, offering a scalable and cost-effective solution for large-scale studies in mosquito biology and vector control. However, the analysis is based on the manual setting of selected landmarks on the mosquito wing image, a time-consuming and observer-biased process14.

Advances in computer vision, utilizing image processing and machine learning techniques, offer new pathways by automating the species and landmark identification process15. Convolutional Neural Networks (CNN) have demonstrated a high potential in accurately identifying mosquito species based on images of either the entire body16,17,18 or solely the wings19,20. For instance, Sauer et al. reported that a CNN model trained on wing images achieved a 91% macro-F1 score in classifying seven Aedes species19. Additionally, segmentation and regression models were used to automatically identify landmarks in Diptera. For example, Geldenhuys et al. (2023) demonstrated this for Tse Tse flies (Glossina spp.), training their model on 14,534 pairs of wings of two different species and achieving automated landmark detection with a mean distance error of 3.43 pixels per landmark21.

Despite these advancements, the public availability of large high-quality datasets remains a bottleneck for the widespread application of geometric morphometrics and machine learning techniques. Existing wing datasets are often limited in size or scope, hindering the development of robust and generalizable models22,23. Herein, we present a dataset with a comprehensive collection of diverse wing images from 72 mosquito taxa from 12 countries on five continents collected between 2008 and 202424. This dataset supports traditional morphometric studies but also enables the application of advanced machine learning techniques, creating opportunities for new insights and innovations in mosquito surveillance and research. By providing this information to the scientific community, we aim to accelerate the progress of research in mosquito biology, vector control, and disease prevention.

Methods

Mosquito Sampling

The wing dataset is composed of images consolidated from various experimental and field studies. This study consolidates the efforts of 22 research projects conducted from 2008 to 2024, incorporating a total of 10,500 mosquito specimens from 12 countries across 5 continents (Fig. 1). Various methods were used to collect the mosquitoes, including CO2-baited traps (n = 4,614), aspirators (1,630), ovitraps (308), egg-raft collection (1,683), and rearing from breeding facilities (2,113). Most of the sampled mosquitoes were identified as female (n = 9,049), with a smaller proportion being male (1,425). Species identification was primarily conducted using morphological methods (n = 4,658)4 or molecular techniques such as COI/nad4 gene barcoding (1,914)25, ITS2 gene barcoding for the identification of Anopheles species (621)26 and qPCR targeting CQ11 and ACE2 genes to identify the taxa morphologically identified as Culex pipiens s.l./torrentium (2,092)27. The remaining samples were identified based on their association with a laboratory colony (1,839).

Fig. 1
figure 1

Geographic distribution of images in the dataset. Countries colour-coded by the number of images (A). Panel (B) shows the mosquito sampling locations in Europe and (C) sampling locations in Southeast Asia. We included both Europe and Asia in the figure due to their higher variance in sampling locations compared to Africa and South America.

The complete wing dataset comprises specimens from nine genera: Culex (n = 3,980), Aedes (5,029), Anopheles (1,135), Coquillettidia (141), Culiseta (158), Uranotaenia (1), Armigeres (49), Mansonia (6), and Toxorhynchites (1). Many mosquito species are difficult or impossible to identify based solely on morphology. However, the dataset includes specimens from such groups, which were identified using morphological characteristics alone. Thus, to present the taxonomic information in a machine-readable format, we developed a hierarchical system of taxonomic levels that also describes the uncertainties in species identification (Supplementary Table 1). The first level corresponds to the family, the second level to the genus and the fourth level to species. The third taxonomic level encompasses morphologically very similar species pairs (e.g. Ae communis/Ae. punctor), species groups (e.g. Ae. annulipes group), species complexes (An. maculipennis s.l.) or combinations of these aggregated taxa (e.g. Cx. pipiens s.l./Cx. torrentium). The species names (fourth taxonomic level) for these specimens were assigned only when the identification was confirmed through molecular assays. Information on subspecies or biotypes is presented under the fifth taxonomic level.

Wing preparation and image capture

Wings were removed from mosquitoes with tweezers under a stereo microscope. The wings were placed on a microscopic slide and embedded in Euparal (Carl Roth, Karlsruhe, Germany) with a cover slide for long-term storage. Detailed instructions on the wing removal process are provided in the supplementary material (see supplementary material: wing_removal_instructions.pdf). In total 18,104 images were captured using different stereomicroscopes and a smartphone with an attached macro-lens. Most images (n = 12,462) were captured using the Olympus SZ61 (Olympus, Tokyo, Japan) in conjunction with the Olympus DP23 camera (Olympus, Tokyo, Japan), followed by 3,577 images with the Leica M205c microscope (Leica Microsystems, Wetzlar, Germany) and 1,685 images using an iPhone SE 3rd generation (Apple Inc., Cupertino, USA) in combination with a macro-lens taken at 24x magnification (Apexel-24XMH, Apexel, Shenzhen, China). The images were captured in TIF format, with resolutions of 3024 × 3024 for smartphone images, 3088 × 2076 for Olympus DP23 images and 2560 × 1920 for images captured using the Leica M205c. Imaging settings were not standardized and thus varied between image collection projects in parameters not further recorded, e.g. exposure time or lighting conditions (Fig. 2). The images captured with the stereomicroscopes are displayed with a scale providing a reference for wing size measurements such as wing length.

Fig. 2
figure 2

Example wing images of different projects of Ae. aegypti from the dataset. Images 2a, 2 d, 2e and 2 f were captured using the Olympus SZ61, 2b was captured using Leica M205c and 2c was captured using an iPhone SE with an attached micro-lens.

Data Records

The image dataset was uploaded to the Bioimage Archive and published under a CC-BY 4.0 license (S-BIAD1478, https://doi.org/10.6019/S-BIAD1478)24. Images are organized in a hierarchical folder structure with separate folders per 2nd taxonomic level, i.e. genus. The images are named according to the following template and comprehensive metadata (e.g., sampling location or image capture device) is provided for each image (Supplementary Table 2).

<2. Taxonomic Level > _ <project> _ <sex> _ <wing-side> _ <image_id>.TIF

Technical Validation

Mosquito taxa were identified using mostly two approaches: by trained taxonomists with a dichotomous taxonomic key4 or through established molecular techniques such as gene barcoding or taxa-specific PCRs as described in the method section25,26,27.

The validity of the mosquito wing images has been demonstrated in previous studies. Specifically, several projects within this dataset, such as “landmark-mosquito-identification“8, “aegypti-diversity-study“28 and “landmark-aedes-collection“9 have shown that the wing images exhibit species-specific characteristics useful for wing geometric morphometrics for species identification or the analysis of population structure. Moreover, the data set has been proven valuable for deep learning applications, i.e. Sauer et al.19 (Project: CNN-study) and Nolte et al.20 (Project: ConVector) successfully trained CNN models to identify mosquito species. Moreover, Maciel-de-Freitas et al. (Project: aegypti-Wolbachia-study) demonstrated the utility of these image data to analyse the relationship between wing measurements, such as wing length and shape, and mosquito fitness parameters, such as fecundity, in Ae. aegypti11.

Despite extensive efforts to ensure comprehensive data collection, some metadata entries remain incomplete. Capture location is unavailable for 7.9% of samples, capture method is missing for 1.4%, and precise collection dates are missing for 24.1% of wing samples. Notably, the majority of samples lacking a collection date (75.5%) were derived from laboratory colonies.

Additionally, 48.4% of the wing images in this dataset originate from unpublished projects that adhered to data collection standards consistent with those in previously published studies. The remaining 9433 images were associated with 8 different publications8,9,11,19,20,29,30,31 of which half provided the images as part of their publication32,33,34,35.

Usage Notes

This dataset supports the development and testing of geometric morphometric methods and machine learning models. However, users should be aware of certain limitations. The images were not captured under standardized conditions, resulting in significant variation across projects, such as differences in lighting and background (Fig. 2). This is due to the retrospective nature of data collection. Additionally, the dataset includes images of damaged wing samples (n = 598), which lack certain morphological features.

As further image data are collected for mosquitoes and other dipteran vectors, we plan to expand this dataset regularly. We also invite contributions from the scientific community to enhance this growing collection of wing images by contacting the authors.