Background & Summary

Member States of the European Union (hereafter EU) have the duty of reporting the Conservation Status of Habitats and Species every six years since the approval of Habitats Directive 92/43/ECC and Bird Directive 2009/147/CE were approved. In Spain, there is not yet a national mammal monitoring program to assess the status (distribution and abundance) of species of no Community interest, nor to report the presence of species at coarse resolution.

Different data sources have been proposed to feed the sexennial reports. On the one hand, collaborative citizen science1,2 is a useful collaborative model for wildlife monitoring at large scales, particularly for taxa such as birds and butterflies3,4. Its application to mammals presents additional challenges due to their nocturnal behavior, lower detectability and the need for species-specific observation techniques. Therefore, opportunistic observations of mammals through citizen science are less frequent5. Nonetheless, citizen science initiatives employing camera-trapping are used to detect mammal species and to increase the number of opportunistic observations in areas where mammal sightings are limited6,7,8

On the other hand, another important source of data on species distribution and abundance is hunting yields of game species since: (i) they are yearly updated, as hunting is an activity taken every year9, (ii) they are spatially referenced9 and cover most countries’ territory, excluding some protected areas such as national parks and security areas around urban areas and roads10, and (iii) hunting managers may have the duty to report harvested animals to the administration (e.g. hunting managers in Spain are required to report this data to their respective Autonomous Communities, as outline in Decree 506/1971, of 25th March, which approves the Regulation for the execution of the Hunting Law of 4th April, 1970). In fact, hunting yields are often considered a proxy of abundance11,12 and they are frequently used in a wide variety of studies13,14,15,16. Therefore, hunting yields could be considered the most widespread information for reporting distribution and relative abundance of game species at national and European levels.

The National Wildlife Research Institute (IREC; CSIC-UCLM-JCCM) gathered and merged hunting yield data in Spain for wild ungulate and red fox for the period 2013–2022. The data set comprised hunting yields for 8 different ungulate species: barbary sheep (Ammotragus lervia), Southern chamois (Rupicapra pyrenaica), fallow deer (Dama dama), Iberian wild goat (Capra pyrenaica), European mouflon (Ovis aries), red deer (Cervus elaphus), roe deer (Capreolus capreolus), and wild boar (Sus scrofa) and a carnivore species: the red fox (Vulpes vulpes). The different administrative Autonomous Communities reported heterogeneous data sets, requiring data harmonization into a common structure9,17. The Autonomous Community is the primary political and administrative division below the national level, to which environmental policies, such as hunting and wildlife monitoring, are delegated. They may have varying priorities, resources and methodologies, which could potentially complicate national assessments on wildlife. A structured database following the Darwin Core standards was used18 for data harmonization purpose. However, there may be constraints on making public these data, which prevents raw data from being shared publicly at game management unit. They contain information of hunting yields, which is considered sensitive in some Autonomous Communities, such as the number, sex and age of hunted individuals, the number of individuals seen (alive) on a hunting day, the number of hunters, beaters and dogs, as well as the hunting method9. Hunting is an activity that constitutes an important economic resource in some Spanish regions and involves diverse stakeholders19,20,21,22. For this reason, we reached data-share agreements with each administrative Autonomous Communities, limiting the information that can be shared, maintaining the privacy of raw hunting yield data at the finest spatial scale (Fig. 1a). Nonetheless, if hunting yields are transformed at coarser spatial resolution and simplified to presence-only, the information could be made publicly available.

Fig. 1
figure 1

Stages of hunting yields transformation developed: from collection to publicly availability in GBIF repository.

Consequently, a 5 × 5 km grid resolution was used for transferring hunting yield information into presence-only data. The transformed data sets include presence-only records for these wild species in Spain over the last decade. These valuable data are the most updated and complete known available distribution of the nine species in Spain, as the national-scale public information for mammals dates back to 200717. These data can be used in multiple studies concerning species management, spread of diseases, species distribution models13,14,23, etc. in the future.

Methods

We received the data after holding meetings with the game services of each Autonomous Community in Spain (i.e. 17 administrative regions). We reached a data-sharing agreement with each of them, which restricted the publication of data: raw data cannot be shared publicly at finest resolution provided; however, they can be transformed or aggregated into derived information products. The received data included hunting yields at hunting ground level from 2013/2014 to 2022/2023 hunting seasons (Figs. 1a and 2).

Fig. 2
figure 2

Autonomous Communities (Spain administrative regions) that reported data of hunting yield for each game species.

Since each Autonomous Community has its own (different) system/data set structure, we transformed and standardized the data received following the Wildlife Data Model template (WLDM24) developed by the ENETWILD consortium under EFSA18 (https://enetwild.com/), which follows the Darwin Core Criteria. Once we had harmonized a data set and joined with its spatial data (hunting ground perimeters), we validated the structure of each harmonized data source using ShinyIVT25. We use tidyverse 2.0.026 and sf 1.0-1627 packages of R 4.3.3 software28 for data management.

To update the distribution of presence of wild ungulates and red fox in Spain, it was decided to present the information in the 5 × 5 km squared grid of the European Environment Agency29 masked with Spain. Nonetheless, to facilitate comparisons with the latest publicly available information on mammals’ occurrence17, which has a grid resolution of 10 × 10 km, an additional field is included (verbatimCoordinates) to allow conversion from the 5 × 5 km grid to the 10 × 10 km grid.

After that, we transferred presence-only records of each species to the grid separately for each hunting season (Fig. 3), and for the total period. We considered that a species was presence for a hunting season only if at least one individual has been hunted in an overlapping hunting estate, by using the function gridPresence (Fig. 1b, see code section). Otherwise, the cell remained as unknown presence. It must be remarked that the later does not mean the species is absent in that cell, it just informs that it has not been hunted or reported, but no further inferences could be made about the absence of the species’ record.

Fig. 3
figure 3figure 3figure 3

Reported presence-only (green) and unknown presence (grey) of each species per year in Spain based on hunting yields at projected 5 × 5 km grid.

An important consideration regarding data sets is that hunting yields are reported in relation to a hunting season which generally spans from September to March of the following year. Therefore, it is not feasible to assign a hunting yield to a specific natural year. The criteria for determining presence-only for each year was to take the first year that comprises the hunting season as presence since animals are mostly hunted between September and December30,31. For example, for the 2013/2014 hunting season presence-only would be referred to the year 2013.

In addition, we arranged the data in two monitoring time periods (2013–2018 and 2019–2022) which match reporting periods of the Habitat Directives from the EU. We grouped presence cell records and determined as “present” if in any year the species was present in that cell or “unknown presence” if it was not registered as present in any of the years that encompass each of the monitoring periods. We followed the same criterion for determining presence or unknown presence for the whole decade (2013–2022; Fig. 4).

Fig. 4
figure 4figure 4

Presence-only of species (green) and unknown presence (grey) based on hunting yields for two monitoring periods and the whole period projected at 5 × 5 km grid.

Standardization

We formatted and published the data set, standardized to the Darwin Core structure32 (Fig. 1c), in the Spanish node of the Global Biodiversity Information Facility (GBIF) through the Integrated Publishing Toolkit33 (IPT v3.0.4). Other countries could easily format and publish their data sets in GBIF following our procedure, as hunting yields from (almost) all European countries adhere to the same harmonization procedure13,14. Hunting yields are transformed into the WLDM template, joined with their hunting ground perimeters, and validated using ShinyIVT25. However, a significant challenge to consider is the varying spatial resolution at which each country reports their hunting yields, ranging from fine-scale resolutions (e.g., hunting grounds) to coarse-scale resolutions (e.g., NUTS2 level). Therefore, making our approach available may contribute to addressing the need of up-to-date information on species and habitat state not only at national but also at European level.

The main bottleneck in current biodiversity data flows is data integration and data accessibility. Only if different monitoring programs are harmonized, and particularly the spatial components of data merged, subsequent data streams will be possible to derive essential biodiversity variables34, such as those proposed by EuropaBON for a European Biodiversity Observation and Coordination (EBOCC). Our data sets, presented spatially in a grid, align with the requirements of the Habitat Directive and provide sufficient resolution to evaluate temporal changes, analyze spatial patterns, and investigate drivers of change. They are also available for inclusion in the derivation of the aforementioned biodiversity variables.

Data Records

The data set is available at GBIF35 as a Darwin Core Archive (DwC-A) under a Creative Commons Attribution 4.0 International License (CC BY 4.0). The data sets presented here correspond to version 1.4, which is the most up-to-date compilation of presence-only records available for wild ungulates and red fox in Spain, derived from hunting yield data sets (Fig. 3).

Be aware that data sets do not contain complete information for all years, depending on the species and Autonomous Communities (Fig. 2). Despite the former, hunting yields from which data sets were derived from, are reported to public administrations on a mandatory and annual basis. Therefore, even if there are some current limitations in these data sets for certain species, hunting yields are continuously reviewed and updated, making them a valuable data source with extensive spatial and temporal coverage.

The current data set will be continuously updated as new data collections from providers are received and harmonized.

The DwC-A contains 1818198 records.

Remarks of the Darwin Core Archive available at GBIF are provided below:

On the other hand, we generate a simulated count data set for three years in a GeoPackage file, which can be accessed on Zenodo36, as we cannot share the collected hunting yields due to data confidentiality agreements. We created random polygons in Spain and used a random Poisson distribution where the lambda parameter was set under a random normal distribution (mean = 50, sd = 10) to simulate count data for each of the years. Moreover, we selected a 25% of the polygons each year to change their count value to 0, to simulate unknown presences. The simulated count data is to demonstrate the transformation process from count data to presence-only data.

Technical Validation

Data sets are difficult to validate since the most updated information available dates to 200717. Nonetheless:

  1. (a)

    Data sets contain information of wild ungulate species and red fox. Public administrations collect this information mainly through different departments: agriculture and wildlife services, as well animal health departments, the later aimed at disease surveillance. The information sent to agriculture and environmental departments is normally collected by hunting managers, whereas the one sent to health departments is responsibility of veterinarians carrying out big game meat inspection in some regions. Therefore, the collected information can be double-checked. However, while this double-checking process is available for big game species, it is not feasible for small game species. Nevertheless, holding different hunts on different days and the multiple participation of hunters mitigates the risk of errors in species identification.

  2. (b)

    Reporting hunting yields to public administrations is mandatory in Spain. Hunting estates that violate the law by not sending their annual activity reports may face penalties, such as hunting bans in subsequent season/s. However, these sanctions are not always applied, leading to potential data gaps for certain years and hunting grounds.

  3. (c)

    Hunting or population control for conflict management (e.g., to prevent damage to crops, even in areas that administratively are not considered hunting estates, national parks, peri-urban) may not always be reported or included to hunting yield databases in some Autonomous Communities. This may result in gaps in data, especially in areas where regular hunting activities are not allowed (e.g., some blank cells in our maps, such as for wild boar).

  4. (d)

    Data set with full hunting yields by region (RAW data) was harmonized and checked before being transformed into a presence-only grid. Data harmonization was performed by the same team which followed the same manual procedure to ensure consistency. After data was harmonized, the team monitored the quality of the data product: i) ensuring no missing values by summing the total number of individuals hunted per species and hunting season and comparing the result to the RAW data total, ii) avoiding overcounting by identifying duplicate hunting yields per species and hunting season, and iii) verifying that counts correspond to their respective hunting grounds by randomly selecting 30 rows per along the terciles of harmonized data set and cross-checking the hunting ground and counts with the RAW data. Data validation was made under the Integrated Validation Tool (IVT25) developed by ENETWILD.

  5. (e)

    In addition, a simulated data set is provided as well as the code used to generate presence-only data, which ensures reproducibility of the whole process.

Usage Notes

Monitoring programs developed for mammal species of Community interest are, mainly, those for which conservation status was reserved and are carried out at small-scale37,38,39,40,41. On the contrary, widespread, and abundant mammals, which in their majority are considered game species, have lacked monitoring programs to assess either their presence or abundance42. There are only few proposals for monitoring those species that have started to cause economic damages to agriculture, livestock or through car crashes43,44,45,46. Therefore, hunting yields have become the most commonly used source of information for knowing their distribution and abundance at national or European scale. The data set provided, which update the presence-only of several game species per year, could inform about changes in their distribution range. They can be used also to evaluate environmental-species relationships by modelling or provide important information to data-integration models23. Moreover, the methodology proposed here could be extended to other game species, as lagomorphs, bird game and fishing species.

The presence-only records provided in the data set (Fig. 3), correspond to all data received (Fig. 2). Not all Autonomous Communities reported the same hunting seasons and/or species, as that also depends on species distribution. For example, Southern chamois is only distributed on Cantabrian and Pyrenean mountain range, but not in Southern Spain; contrarily, European mouflon is distributed in the mid-term Southern Spain, but not in the Northern. Future updates will incorporate new data obtained from providers. European and Spanish authorities are increasingly involved into the development of wildlife monitoring programs47. In accordance, there are funding projects with that aim at European (ENETWILD and EOW project) and National scale (HAWIPO, AGROBOAR, FAUNET). These projects are going to give the opportunity of maintaining an updated presence-only data set for these game species. For that reason, we encourage readers to look for updated versions on the GBIF repository after this publication.

The function developed for transforming the data (gridPresence, see code availability) could also be used for other types of counting data transformation into presence-only data.