Background & Summary

The ongoing biodiversity crisis is leading to alarming rates of population decline and species extinction1,2. Actions and policies are needed to address biodiversity losses through mitigating impacts, and where possible reversing population declines by restoring degraded natural habitats or improving habitat quality. However, their effectiveness is often hampered by deficient data and knowledge gaps regarding basic aspects of the ecology of declining species (e.g. demography and distribution)3. Indeed, obtaining accurate spatial data on biodiversity is challenging because of the relatively limited economic resources allocated to conservation science4 and the high costs and massive logistical effort that biodiversity data collection at broad spatial scales requires5.

Hence, large-scale biodiversity assessments are often difficult to perform, especially for environmentally heterogeneous areas with variable levels of accessibility6. This is often the case even for well-known taxa, such as birds, for which most information is concentrated in densely sampled areas, while others lack adequate coverage7,8,9. In those contexts, it is possible to capitalise on data available from highly investigated areas to derive information about the possible occurrence of species also in less densely monitored areas through adequate quantitative approaches.

The development of Species Distribution Models (SDMs) in the last few decades has indeed promoted quantitative estimations of species range based on presence probabilities or environmental suitability. SDMs (also called Environmental Niche Models, Environmental or Habitat Suitability Models) link the presence (geographical locations) of a species to environmental variables within a study area for which such variables are spatially explicit and provide an effective measure of the environmental suitability of the reference area for the investigated species10. This enables not only improved estimation of current distributions, but also potential predictions of temporal variations in distributions due to climate change11,12,13 changes in land use, disturbance14,15 or other anthropogenic pressures.

The use of SDMs has increasingly become standard practice in many conservation applications. Examples include defining areas of major importance for conservation16, inferring ecological networks17 and future variations in connectivity18, and improving the interpretation of demographic trends19. In some cases, relevant demographic parameters, such as the abundance or reproductive success of a species, strongly correlate with environmental suitability calculated from SDMs20,21,22.

SDMs allow inferring species distributions over large areas, using standardised survey data23 or opportunistic datasets24, drastically reducing the costs and efforts required to obtain comparable information through dedicated large-scale surveys. Hence, SDMs have been increasingly adopted for biodiversity mapping by inferring the potential distribution of species9,25,26, at least partly ‘filling’ knowledge gaps by estimating environmental suitability (or the probability of presence) in unsurveyed/poorly surveyed areas. SDMs are highly flexible and can be based both on presence-absence data27 or presence-only data28.

Here, we provide habitat suitability maps for 225 breeding bird species in Italy, i.e. all widespread bird species breeding in the country, corresponding to ~83% of the total number of breeding species (Fig. 1). These suitability maps have a species-specific spatial resolution, with three different scales: 0.81 km2 (0.9 × 0.9 km; 176 species showing small-size breeding homeranges), 9 km2 (3 × 3 km; 45 species with larger home ranges), and 81 km2 (9 × 9 km; four species regularly exploiting very large areas during the breeding period). The models used for producing these maps were based on data collected for the second Italian breeding bird atlas by over 3,000 skilled observers, mainly through the Ornitho.it web portal (www.ornitho.it)9.

Fig. 1
figure 1

Synthetic workflow for generating habitat suitability maps for 225 widespread breeding bird species in Italy with data collected for realizing the second Italian breeding bird atlas9.

We encourage researchers and policymakers to use these high-resolution raster files for e.g. improving biodiversity monitoring schemes, evaluating the effectiveness of the protected areas network, identifying priority areas for conservation, landscape planning, and characterising threats to Italian biodiversity, including the assessment of current and future interactions with human infrastructures (e.g. roads, powerlines, wind power generation infrastructures29,30,31). In Usage Notes, we discuss potential applications of the dataset and call for careful use of habitat suitability values, while in the Supplementary Information 1 we provide an example of the use of the dataset for investigating the macroecological drivers of avian species richness in Italian urban areas.

Methods

Italian breeding bird atlas and occurrence data

Bird taxonomy followed HBW-BirdLife International32, adopted also for the checklist of Italian birds33 and the European Atlas EBBA225. Most occurrence data for the second Italian breeding bird atlas were bird observations/counts collected through the Ornitho.it web portal by over 3,000 skilled observers (2,360,284 records), but included also data collected through other methods (e.g., ringing), standardised monitoring or regional survey projects (630,312 records; see Lardelli et al.9 for further details). The data were used to generate distribution maps at the 10 km scale9 [Universal Transverse Mercator (UTM) projection system, 10 × 10 km grid cells].

The vast majority of Ornitho.it occurrence data were occasional observations, followed by complete lists of species recorded over variable (and generally unknown) spatial extents. Such heterogeneous data had been previously used to develop robust SDMs34. Among data derived from sources different than Ornitho.it, 80% originated from the Farmland Bird Index project35, a large-scale and country-wide monitoring initiative based on point counts and contributing to the PanEuropean Common Bird Monitoring Scheme (www.pecbms.info).

The database contained records of avian species with reproductive evidence (possible, probable or confirmed breeding; behavioural or physiological, including territorial behaviour, nest building, observations of fledged nestlings, etc.; reproductive evidence was assigned by observers in the field according to the standard European Bird Census Council methodology, see https://ebba2.info/about/methodology/) collected during the breeding seasons spanning from 2010 to 2016. Data archived in Ornitho.it are subjected to regular expert validation on a regional/local basis, and were subjected to additional quality controls before generating both distribution maps and SDMs. We excluded incomplete or incorrect data36, including those supported by weak breeding evidence for colonial and rare species. Only spatially accurate occurrence records (see Data selection, spatial scales and background points) were used for building SDMs and realising habitat suitability maps for all widespread species (see Lardelli et al.9). Figure 2 shows the distribution of five sample species at the 10 km scale9.

Original bird occurrence data used for generating distribution maps and building SDMs could not be made publicly available due to restrictions on their public release by data owners (i.e. individual observers or entities contributing data for realizing the atlas), especially for conservation-reliant taxa or those highly sensitive to human disturbance (see Lardelli et al.9).

Modelling habitat suitability

Due to the nature of the occurrence data (i.e., presence-only9), SDMs were built using MaxEnt37,38, the most effective method for modelling this kind of data39. Known drawbacks of using MaxEnt40,41 have been dealt with through careful selection of background points and parametrization and tuning of models (see below). MaxEnt relies on background data to define available environmental conditions for the target species within the study region. We parametrized MaxEnt models accounting for non-uniform sampling and optimized model complexity by carefully selecting environmental variables, in order to obtain robust and generalizable results20,42,43,44,45. The following paragraphs summarise the approach adopted for model development, from the preparation of input data to model validation (Fig. 1).

Fig. 2
figure 2

Cumulative distribution (grey dots) at the 10 km scale of a sample of five common Italian breeding species (Parus major, Sturnus vulgaris, Anthus spinoletta, Corvus cornix, Sylvia melanocephala) representative of different habitats, species groups or biogeographical regions (respectively, forest, farmland, mountain, generalist and Mediterranean species), accounting for 10.7% of records (UTM grid), highlighting the relatively homogeneous spatial coverage of the dataset used for producing the second Italian breeding bird atlas9 and species distribution models.

Data selection, spatial scales and background points

To build SDMs, we used occurrence data with high spatial accuracy (<1 km; i.e., excluding records associated with broader areas, such as municipalities or protected areas, depending on how data were uploaded on Ornitho.it). These were assigned to 1 × 1 km UTM grid cells. Eventually, the final database consisted of 2,577,222 unique occurrence records (i.e., including only a single breeding record per species per cell).

Considering the substantial differences in species’ ecology, spatial scales for SDM development were appropriately differentiated in order to assess the relationships between species and the environment at the scale at which they are most ecologically relevant46. For instance, small passerines are influenced by environmental variables at a very local scale (i.e., thousands of square metres or a few hectares47). By contrast, large raptors are influenced by environmental characteristics of much larger territories48. Given the resolution of both bird occurrence (1 × 1 km UTM grid) and environmental data (see ‘Environmental variables’), we set the lowest spatial scale at 0.81 km². Additionally, two broader spatial scales were identified, taking into account the average home range sizes of the analysed species. The exact scale at which models were implemented depended on the exact number of cells that, under a focal feature approach (see ‘Environmental variables’), provided the closest match to the desired scale. The three spatial scales considered for environmental variables were: ‘micro’ (variables computed at 0.81 km² resolution), ‘meso’ (variables computed at 9 km² resolution), and ‘mega’ (variables computed at 81 km² resolution). Additional details on generated maps are provided in Supplementary Table 1.

The generation of background is a fundamental aspect for developing realistic and robust SDMs, and it could be even more important in the case of modelling citizen science data unevenly collected over broad areas49. For instance, setting background points in unsurveyed areas could result in the environmental conditions observed at those points being considered as unsuitable for a species due to the lack of observations, rather than representing actual counter-selection for those conditions. We managed to limit as much as possible the effects of undersampling in the definition of background. To avoid including unsurveyed areas, we used the centroids of effectively sampled 1 × 1 km UTM cells as background points. For diurnal species, given the significantly larger number of observations (2,542,772 data points) compared to nocturnal species (34,450 data points), we chose as background points the centroids of the most intensively sampled cells (considering only those with at least 20 occurrence records), whereas for nocturnal species we selected all sampled cells. This resulted in 30,180 background points for diurnal species and 14,672 for nocturnal ones.

Environmental variables

The environmental variables used in the models belonged to three distinct groups: climatic, topographic, and land use. Climatic variables (1981–2010) were retrieved from the CHELSA (https://chelsa-climate.org) database at a 30 arc sec resolution (less than 1 km at the study latitude)50. Even if the period to which climatic variables refer does not perfectly overlap with the bird occurrence data, it is likely still highly representative of the relative effect of predictors, and definitely relevant to understand the indirect effect of climate on bird distribution, through e.g. effects on vegetation characteristics51, which are more dependent on long-term climate rather than on year to year weather variations. Several bioclimatic variables, summarising monthly values to obtain biologically meaningful predictors commonly used in SDMs52, were calculated. As bioclimatic variables from the CHELSA database were not available for open water bodies (coastal lagoons), the values of bioclimatic variables for cells with these habitats were recalculated from the original climate data using CHELSA database algorithms53.

Topographic variables were calculated through GRASS54 from a digital elevation model (DEM; https://www.eea.europa.eu/en/datahub/datahubitem-view/d08852bc-7b5f-4835-a776-08362e2fbf4b) with a resolution of 25 m. These were slope (in degrees) and solar radiation (total daily value, in kWh/m2) for summer solstice, considering the shading effect of relief. Land use/land cover variables were obtained from the CORINE Land Cover database55. The 2012 edition was chosen as it best overlapped with the occurrence data collection period (i.e. 2010–2016).

Environmental variables were calculated for raster grids of 100 × 100 m and 1 × 1 km. Environmental variables were attributed to cells using a ‘focal features’ (or ‘moving window’) approach, which aggregates information from neighbouring cells at the pixel level56. For each focal cell (i.e. the raster cell where the centroid of the UTM cell associated with observations or the background point fell), we calculated a summary value of each environmental variable within a neighbourhood of surrounding cells (Fig. 3). At each spatial scale, environmental variables were measured in a neighbourhood of focal cells established to approximate as much as possible the spatial scale of bird data. At the micro-scale, 9 × 9 cells of 100 m each were considered, covering a total area of 0.81 km² within which to calculate the values of environmental variables associated with the focal cell. At the meso-scale, 3 × 3 cells of 1 km each were considered (total 9 km²), while at the mega-scale, predictor values were calculated using 9 × 9 cells of 1 km each (total 81 km²). This approach allowed us to best handle calculations of environmental variables for those UTM grid cells that spanned across different UTM zones, which had slightly irregular and deformed shapes. Furthermore, it improved the ecological realism of bird-habitat associations, especially at the broader spatial scales.

Fig. 3
figure 3

Schematic representation of the focal features (moving window) approach for calculating environmental variables at the ecologically relevant scale for each species: for each cell (focal cell, in darker colour), the values of different environmental variables were computed for the surrounding cells (in light colour). The arrows show the directions over which the number of cells were counted for the computation at the micro-scale (9 × 9 cells of 100 m each; see the text for the other scales).

Training and test datasets

To develop robust and generalizable SDMs, it is crucial to assess their predictive capacity on independent datasets. To this end, a fraction of the data was not used for model building and kept as a test dataset to evaluate model performance. To select spatially independent data for the test dataset, we relied on the ‘checkerboard 2’ function of the ENMeval R package43. Checkerboard 2 was chosen to obtain subgroups of occurrence datasets that were spatially independent but still representative of the entire study region, as well as to obtain a desirable subdivision of occurrence records between training and testing data. Only presence data, not background data, were split between training and test datasets. At each scale, the procedure produced a ‘double checkerboard’, with 10 and 2 as grouping factors at the two levels, at each scale. This resulted in four different subsets of data. Three subsets constituted the training dataset, while the remaining subset was used as the test dataset. This process yielded two spatially independent sets of data, corresponding roughly to ¾ and ¼ of the original data (training and test datasets, respectively).

Model building

To reduce the number of variables included in the models and to avoid collinearity, sets of environmental variables were defined for different species groups based on their ecology. Hence, different environmental variables were considered for forest, farmland, wetland, mountain, and generalist species (Supplementary Table 2).

Although machine learning methods (such as MaxEnt) are much less sensitive to the effects of correlated predictors compared to classical statistical methods, including strongly collinear predictors may lead to unpredictable effects in the extrapolation phase45. Therefore, at each spatial scale and for each subgroup of variables identified according to species’ ecology, correlations among all predictors were assessed and we avoided including collinear variables in models as described in step 4 below.

Model calibration was performed on the training dataset and aimed at identifying the best combination of model parameters for each species. Accurately parameterizing models yields significant improvements in model predictions57, prevents overfitting and increases the ecological relevance of species-habitat relationships, resulting in robust and effective predictions of species distributions.

MaxEnt models for each species were tuned as described below:

  1. 1.

    functions for species-habitat relationships: to avoid overfitting, we considered only linear or quadratic effects of environmental predictors, i.e. simple fitting functions that can be easily evaluated in terms of ecological realism. This prudential approach may slightly reduce the model’s accuracy on training data, but it reduces the risk of considering unlikely species-habitat relationships, inconsistent with real ecological effects.

  2. 2.

    number of iterations: the number of iterations was set to 1,000; if the model converges earlier, the actual number will be smaller, otherwise it will continue to seek convergence until that value. In fact, for all species, the number of iterations in the final model was smaller.

  3. 3.

    value of regularization multiplier: the regularization multiplier is a crucial parameter for SDMs, as it determines whether distributions are more fragmented or more homogeneous, relaxing or tightening the effect of environmental parameters on suitability. The selection of the most suitable value was performed by testing values from 0.5 to 4, in 0.5 steps (i.e. 8 values)58. The AICc value (Akaike’s Information Criterion, corrected for small samples59,60) was then calculated for each model, and the value producing the most parsimonious model was chosen.

  4. 4.

    selection of environmental variables: after step 3), we first calculated, for all cells with at least one occurrence record at the relevant spatial scale, the correlation among all possible pairs of environmental predictors, in order to consider a large and representative set of environmental conditions. We then identified pairs of variables that were highly correlated (|r| ≥ 0.8) (a threshold similar to that recommended by Dormann et al.61 and Feng et al.45) (three pairs at the micro scale, and two at the broader scales). For each pair of highly correlated variables, we built SDMs by including each predictor singly, keeping for subsequent analyses only the predictor leading to the most supported model (lower AICc). We then built a model including all non-collinear predictors. We simplified this model by first removing any variable with a lambda coefficient (i.e. index for variables’ contribution in predicting distribution) equal to zero, and hence irrelevant. Finally, a variable selection procedure based on AICc was performed: for each remaining predictor, its permutation importance (importance of the specific factor in explaining the species’ distribution according to MaxEnt) was calculated. The variable with the lowest permutation importance value was removed from the model, and the AICc was calculated; if the model improved (i.e., the AICc decreased), we continued with removing the least important variable until the model showed no further improvements in AICc. The resulting model was considered as the final model for each species. For the calculation of AICc, a recently developed ad hoc method was used34.

Data Records

The habitat suitability maps for 225 widespread breeding bird species in Italy during 2010-2016, including SDM statistical outputs, is publicly available at https://doi.org/10.13130/RD_UNIMI/LUC3K662.

Suitability maps are provided as raster (.tif) files. The raster file names include the EURING species codes (see https://euring.org/data-and-codes/euring-codes) and the first three letters of genus and species (e.g. “E00070_Tac.ruf.tif” for the little grebe Tachybaptus ruficollis).

The statistical outputs corresponding to each species’ SDM are available as folders, named with the scientific name of the species. Folders contain: (1) model evaluation (.csv file); (2) model results (.csv file); (3) permutation importance of used environmental predictors (.csv file); (4) barplot of the five most important predictors (.jpg); and (5) all the response plots to environmental predictors (.jpg). For ease of visualisation, we provide a single PDF file with the response plots for the five most important predictors for each species, as well as the barplot with their permutation importance (Supplementary Information 2).

Species-specific threshold values, Maximum Training Sensitivity plus Specificity (MTSS) and 10th percentile, useful for deriving binary predictions of species occurrence63 are reported in Supplementary Table 3.

Technical Validation

Model evaluation and validation

We tested the robustness of models for each species using the test dataset. Statistics used for the evaluation and validation were calculated using the final models based on both the training and the test datasets. The most important aspect of validation was the consistency of predictive ability on both the training and test datasets64. Comparable values indicate a generalizable model unconstrained from overfitting issues and able to predict environmental suitability in sites not used for its construction, as well as in those used for its development.

To evaluate model performance, four reference statistics were considered:

  1. 1.

    AUC (Area Under the Curve of the Receiver Operating Characteristic plot), which assesses the discriminatory ability of a model65. Values equal to 0.5 represent a chance-level performance, while 1 indicates a perfect ability to distinguish between presences and background points. In absolute terms, AUC is not a good measure of model accuracy, as rare species tend to have higher AUC values than common ones66; therefore, we mainly used AUC to compare values calculated on training and test datasets for the same species. A model can be regarded as valid and generalizable when similar AUC values for the training and test datasets are obtained (difference < 0.05).

  2. 2.

    TSS (True Skill Statistic), which compares the number of correct predictions, minus those attributable to chance, to those of a hypothetical set of perfect predictions (defined as sensitivity + specificity - 1). TSS ranges from −1 to +1, with the maximum value indicating a perfect match and zero indicating performance equal to chance67. Differences in the values of TSS between test and training dataset models >0.05 suggest possible overfitting issues.

  3. 3.

    Minimum training presence omission rate on test dataset, which evaluates the proportion of occurrences included in the test dataset falling below the lower suitability value at which the species occurs, based on the locations used to develop the model (training dataset). Ideally, it should be zero or close to zero (no or a very few test locations occurring at suitability values lower than the minimum values recorded at training locations).

  4. 4.

    10th percentile omission rate on the test dataset, which evaluates the proportion of occurrences included in the test dataset falling below the threshold value of the 10th percentile from occurrences of the training dataset. Ideally, it should be close to 10% of records. Values higher than 0.1 (e.g. >0.2–0.3) indicate likely overfitting issues.

Validation statistics for all SDMs are reported in Supplementary Table 4. The minimum sample size used for implementing SDMs was set at 50 occurrence records for non-colonial species and 20 for colonial ones (for which occurrence records are necessarily scarcer because of the aggregated distribution). In the case of a few species for which models were based on a very low sample of occurrence records [i.e. Mediterranean gull (Larus melanocephalus), n = 20; Eurasian spoonbill (Platalea leucorodia), n = 30; Savi’s warbler (Locustella luscinioides), n = 67; great spotted cuckoo (Clamator glandarius), n = 78], the evaluation statistics were unreliable and model validity was instead visually assessed on the basis of the concordance between predicted and observed distributions. Overall, the average difference between training and testing dataset was 0.00 ± 0.03 for TSS (mean ± SD; only 10 species showed a difference >0.05 but always <0.09) and 0.00 ± 0.01 for AUC (only one species showed a difference >0.05, being 0.06), indicating very good performances for this indicator. Also, omission rates at minimum training presence and 10th percentile generally showed optimal values, with a few exceptions mostly related to species with a low sample of occurrence records. Furthermore, the reliability of each final model was verified by means of visual check of the resulting habitat suitability map, conducted by species’ experts (see Lardelli et al.9).

Restriction of environmental suitability predictions to presence-only areas

Due to biogeographical reasons, not all species may actually occur in all areas classified as suitable by distribution models68. Given the main purpose of these models (i.e. assisting in defining species distribution, even in poorly investigated areas), we excluded these regions from environmental suitability maps, setting environmental suitability to zero in areas where a given species has never been recorded. To this end, environmental suitability maps were intersected with actual distribution maps in order to exclude regions located outside a given species’ distribution range from potentially suitable sites. For instance, the Eurasian nuthatch (Sitta europaea) does not breed in Sardinia due to biogeographical/historical reasons, hence we set suitability to zero there despite suitable woodland habitats being identified by SDMs. The types of correction applied to environmental suitability maps based on species distribution were as follows (see Supplementary Table 1 for details and species concerned):

  1. 1.

    Exclusion of mainland Italy from suitable areas for species limited to the islands.

  2. 2.

    Exclusion of Sardinia and/or Sicily and/or adjacent smaller islands from suitable areas for species breeding only on the Italian mainland.

  3. 3.

    Restriction to a buffer surrounding actual occurrence sites: for species with a concentrated distribution but not falling into any of the previous cases, an informative layer representing the actual range was generated. For most species, a distance of 200 km (empirically set based on previous experience) around the presence sites was used; for some species with a particularly restricted or concentrated distribution (e.g. some grouse and owl species occurring only in the Alps), such distance was set to 50 km. For species distributed throughout Italy, even if scattered, no range restriction was applied.

Interpretation of environmental suitability maps

The raster maps indicated the environmental suitability for each species based on the environmental suitability model. The reported values in each raster’s cell (obtained through the cloglog transformation of raw MaxEnt output) range from zero (i.e. unsuitable) to one (i.e. maximum suitability). For a correct interpretation of the habitat suitability maps, it should be kept in mind that: (1) they represent habitat suitability, not a true species’ probability of presence, given that absence sites are unavailable and prevalence is unknown69; although these two variables are highly correlated and the cloglog transformation is meant to approximate the occurrence probability, an environmental suitability of 0.5 does not (necessarily) correspond to a 50% probability of species presence; (2) they do not represent abundance, even though there is published evidence of positive correlations between environmental suitability and local density20,21.

Usage Notes

The habitat suitability raster maps we have generated may be used for a variety of applied or theoretical purposes. Suitability rasters can allow a rapid evaluation of potential occurrence of species that are of conservation relevance, sensitive to disturbance or alteration, for management and planning purposes. Similarly, they can be used to identify species-rich and priority areas for different groups of species70, or to model community patterns at varying spatial scales, up to the national level. For instance, in the Supplementary Information 1, we provide and thoroughly discuss a study case focusing on patterns and drivers of bird species richness in Italian urban areas.

When using SDMs to predict species occurrence or community traits, it is fundamental to bear in mind some practical aspects related to both the intrinsic characteristics of the models and the data used to generate them. First, even if the sampling coverage was generally satisfactory at the national level (see Fig. 2), at smaller scales survey intensity was much more heterogeneous. Hence, we cannot exclude that some extrapolation over unsampled combinations of environmental predictors may have occurred: if any, such cases are likely to be extremely limited, but care should be applied when interpreting habitat suitability values at local scales. At these scales, other factors not taken into account in the modelling procedure (e.g., interspecific interactions, local disturbance, availability of key specific resources, etc.) may also become increasingly relevant in determining the actual occurrence or absence. Second, we did not explicitly model spatial autocorrelation. Therefore, we cannot exclude that for some species models might not be the best performing ones (and resulting spatial suitability patterns the most accurate) because of unmodelled spatial patterns in the data. This is somehow exemplified by the latitudinal effect in inferred species richness that was detected among Italian urban bird communities (see Supplementary Information 1). Third, the selection of thresholds for binary classification and generating predicted distributions is somehow species- and context-dependent. In our worked example, we found that MTSS led to more reliable outcomes than 10th percentile, but different thresholds may be more suited depending on model, species and also purposes of the reclassification. Fourth, the buffer distances we used to exclude unoccupied areas from models’ predictions (200 km and 50 km buffers) might potentially exclude some occupied areas, or conversely include unoccupied ones. A careful investigation of local contexts and of actual, updated information on regional species distribution are required to properly evaluate models’ outcomes towards the margins of ‘cropped’ suitability. Fifth, habitat selection often occurs at multiple spatial scales71. Unfortunately, testing environmental drivers of habitat suitability at multiple scales that are ecologically relevant with available data (presence-only citizen science data at a 1 km-resolution) was unfeasible, given that for most species habitat selection works at finer spatial scales. However, for some species models would have been even more informative and representative with the integration of multiple spatial scales, and the latter could be something to pursue in other applications where it is possible to collect records at very fine scales. Finally, we point out that we built SDMs with different spatial scales and specific combinations of environmental predictors according to broad ecological groups of species (Supplementary Table 2). Although we calibrated and tuned models according to species-specific results, we adopted the same modelling framework for all species assigned to a given ecological group. It could be argued that for some species different approaches might lead to better results, both for model construction and for subsequent cropping of suitable areas using the predefined buffers.