Background and Summary

Integration of machine learning and citizen science has a high potential to benefit both fields and can support an efficient and accurate collection of data on biodiversity1,2. Developments in technology such as camera traps and voice recorders as well as engagement of the public in data collection has considerably increased the amount of available biodiversity data3,4. Manual processing of the collected data such as species identification by experts is time-consuming, expensive and limited in the amount of data that can be processed. Machine learning approaches and especially deep learning have advanced considerably in the last years and can significantly reduce the time that is needed for image processing and classification5,6. Deep learning algorithms were already applied to identify species from different taxonomic groups such as plants7,8, insects9,10 and vertebrates11,12. In addition, deep learning models can provide direct feedback to users of species identification applications and increase the educational value of such applications2. They are already included in species identification apps such as iNaturlist or Flora Incognita and provide users with species classifications with high accuracy8,13. While the application of deep learning algorithms has a high potential to accelerate the processing of biodiversity data, their implementation can be impeded by the large amount of data that is needed for their training4,14. Datasets collected by citizen scientists can be a valuable resource for model training, especially when high data quality can be ensured through stringent quality assuring procedures1.

Not all species groups are well suited for identification from images. While especially many insect species can be difficult to identify by macroscopic characteristics alone, butterflies can be recognized comparably easily. In addition, they react sensitively to changes in environmental conditions15,16, inhabit a wide range of terrestrial habitats16 and are representative for many (but not all) groups of terrestrial taxa17,18,19. They are therefore widely used as biodiversity indicators20,21,22. Butterflies are perceived positively by the public23 and are well-suited for observation in citizen science projects. Developing efficient species identification methods for this important indicator group can therefore benefit from the large amounts of data that is collected in such initiatives. Deep learning models have already been trained to identify butterfly and moth (Macrolepidoptera) species with high accuracy10,24,25,26. The datasets used in these studies ranged from less than 1000 to over 34,000 images covering ten to 636 species10,26.

The dataset that is presented here, is considerably larger than those used in previous studies. It contains 530,404 images of 185 butterfly and moth species that were recorded in Austria. The dataset was collected by citizen scientists with the application “Schmetterlinge Österreichs” (https://www.schmetterlingsapp.at/) of the Billa-Foundation Blühendes Österreich. Correct species identification was ensured by an experienced entomologist. The dataset has a strong class imbalance with the number of images per species ranging from 1 to nearly 30,000. Such an imbalance is common in species records from CS projects and has multiple causes. Some species are more common than others, are easier to detect or more preferably recorded by citizen scientists27,28,29.

The dataset offers valuable opportunities to train neural networks on the fine-grain classification task of identifying butterfly and moth species and to assess the performance of different neural network architectures and hyperparameter settings. To demonstrate the application of the dataset for the training of deep learning models, it was used to fine-tune a Multi-Axis Vision Transformer model (MaxViT) that was proposed by Tu et al.30. A model that was pre-trained on the ImageNet dataset31 was used. The dataset has already been used to train a ResNet152 model for a Master’s thesis in which different methods to handle class imbalance and performance for different species were analysed32.

Methods

Butterfly images were taken and uploaded by the users of the application “Schmetterlinge Österreichs” (https://www.schmetterlingsapp.at/) of the Billa Foundation “Blühendes Österreich” in Austria between 2016 and 2023. Over 25.000 users were involved in the collection of images. Registered users took images with their smartphones and could directly upload them via a mobile app. The app is also available as a desktop version, which is especially useful for uploading images taken with a camera independent from smartphones. The user who uploaded an observation or other members of the community could propose a species level classification of the images. The species that can be reported include 157 butterfly and 32 moth species. Images of 185 of these species were uploaded. Helmut Höttinger, who is an experienced entomologist and an expert on butterflies and moths, continuously validated the correct classification of all images. Over 11,000 Images that showed eggs, larvae and pupae and images with more than one butterfly species were manually deleted from the dataset. Some images were not detected though (s. Technical Validation).

Data Records

The dataset is available at figshare (https://doi.org/10.25452/figshare.plus.29135618)33.

The whole dataset contains 541,677 images of 185 butterfly and moth species and has a size of 315 GB. Files are organized in a folder structure, with one folder for each species. The size of individual images varies as they were taken with different devices. The mean width of the images is 1887 px (min: 66 px, max 12000 px). The mean height is 1906 px (min: 66 px, max: 8000 px). All images are in the JPEG file format. See Fig. 1 for a random selection of images from the dataset. There are 29,612, images of Aglais io, the species that was photographed most frequently, while other species are represented by only one image. For 131 species, there are fewer than 1,000 images, and for 62 species, fewer than 100 (Fig. 2, S Table S1 (see Supplementary information)).

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Random selection of images of the butterfly and moth images dataset collected with the application “Schmetterlinge Österreichs”.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Distribution of the number of images per species for the dataset with >500,000 images of butterflies and moths that were collected with the application “Schmetterlinge Österreichs” of the Billa Foundation”Blühendes Österreich” between 2016 and 2023 (Figure from Barkmann et al. 2025).

The dataset33 contains images of 77.6% of the 210 butterfly species (Superfamily Papilionoidea) that occur in Austria, excluding five regionally extinct species34. The moth species that occur in Austria are less well represented as only 32 of the nearly 4000 species35 (of which 1243 can be considered as Macrolepidoptera) can be recorded with the application. The selected moth species are species that can be observed easily and many of them have characteristic morphological features in at least one life stage. In Europe, there are 496 species of butterflies36 and about 8,200 moth species, about 3,000 of which are Macrolepidoptera37.

While some butterfly species such as Aglais io and Vanessa atalanta have wing patterns that are unique in Austria, species of the genus Pyrgus (Fig. 3) or Erebia can be highly similar. Other species groups such as the tribus Melitaeini contain species that are highly similar on one side of the wings but mostly have characteristic patterns for species determination on the other side (Fig. 3). Images vary regarding the size of the depicted butterfly or butterflies, the angle at which individuals were photographed and the background of the images (Fig. 4).

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Examples of similar looking species that can be difficult to identify from images only. (a) from left to right: Pyrgus armoricanus, Pyrgus malvae, Pyrgus carthami; (b) from left to right: Fabriciana adippe, Fabriciana niobe, Speyeria aglaja.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Examples for the variability of images of the same species. (a) different size of the butterfly in the image, (b) different sides of the wings and angles at which they are photographed, (c) different backgrounds.

Technical Validation

Model training

To demonstrate the training of a deep learning model on the dataset33 and to estimate the number of images that show life stages other than adults or depict more than one species, a deep learning model was fine-tuned using the dataset. Its performance was assessed and misclassified images evaluated.

For model training, only images of species with at least 50 records were used to allow for a reasonable partition of the data and evaluation on species level. This dataset contained 529,835 images of 162 species, 31 of which were moth species. 10% of the images were selected as test data with a stratified approach that ensured that the species were represented proportionally to their number of images in the whole dataset. The remaining images were divided in 80% training and 20% validation data, again using a stratified approach.

Images were augmented for higher variability of the training data. Images were cropped to up to half of their sizes and the aspect ratio was changed by a value of 0.8 to 1.2. Images were rotated between −50° and 50°, flipped horizontally and vertically with a probability of 30% and distorted with a scale of 0.2 with a probability of 40%. All images were cropped to 224 × 224 pixels. RGB channels of the images were normalized based on the ImageNet dataset standards with the means 0.485, 0.456, 0.406 and standard deviations 0.229, 0.224, 0.225. Images that were used for model evaluation were only resized and cropped to 224 × 224 pixels and the same normalization of colour channels as for the training data were applied.

A Multi-Axis Vision Transformer model (MaxViT-T)30 that was pre-trained on the ImageNet dataset31 was used. MaxViT models combine elements of convolutional neural networks (CNNs) and Vision Transformers. They outperform other models at image classification of the ImageNet dataset with higher parameter and computing efficiency30.

The model was trained for 300 epochs on 8 Graphics Processing Units (GPUs) with a batch size of 16 images on each GPU. To facilitate longer training, stochastic gradient descent was used as optimizer with a momentum of 0.9. To address the class imbalance of the dataset and ensure better representation of minority classes during training, a weighted loss function was applied. The weights were proportional to the inverse of the number of images in each class.

Model performance on the whole dataset was assessed with the top-1, top-3 and top-5 accuracy. Additionally, precision and recall for each species were calculated. To estimate how many images of eggs, larvae, pupae and multiple species were not detected during data cleaning, all misclassified images were assessed manually.

The PyTorch library38 was used for model training and validation. For parallel computing the Distributed Data Parallel (DDP) framework39 and the Accelerate library provided by Hugging Face40 were used.

Model training was conducted on the EuroHPC supercomputer LUMI hosted by CSC (Finland) and the LUMI consortium.

Results

The highest validation accuracy of 0.9806 was reached after 225 epochs. The highest training accuracy was 0.9971 (Fig. 5).

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Accuracy and loss during training of a MaxViT-T model on the butterfly and moth image dataset collected with the application “Schmetterlinge Österreichs”.

On the test dataset, the model achieved an accuracy of 97.87%. Mean recall over all species was 93.54% and mean precision was 96.31%. Precision was >70% for all species, while recall was <50% for some of the species which are represented by only few images in the dataset (Fig. 6). See table S1 in the supplements for the number of images, recall and precision for each species.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Precision and recall that were achieved by the MaxViT-T model on test data for the different species (n = 185) in the dataset against the number of images of each species.

On the test dataset, 1127 images were not correctly classified by the model. 101 of these showed more than one (mostly two) species, 11 showed eggs, 31 larvae and 7 pupae. These images comprise 0.28% of the test dataset. The number of images showing more than one species is likely higher, as the model can correctly classify such images when identifying the species which the label refers to. Assuming that the dataset contains twice the number of images with more than one species than were detected here, the number of images that do not show adult life stages of only one species is still <0.5%.

Usage Notes

The dataset33 is highly imbalanced which can negatively affect model performance for minority classes and should be considered when training models on the dataset41. The dataset does not contain all butterfly and moth species that occur in Austria. Some butterfly species that are difficult to determine to species level from images are not part of the dataset and the moths are represented by only few conspicuous species. The species pairs Aricia agestis/A. Artaxerxes, Phengaris alcon/rebeli, Colias hyale/alfacariensis and Leptidea sinapis/juvernica are treated as one species each as they cannot be distinguished from images alone. Due to the incomplete coverage the accuracy that can be obtained for an automatic classification of all species in Austria is likely lower than for this dataset. Even though most of such images were manually excluded from the dataset, it still contains few images of eggs, larvae and pupae or images that show more than one species.