Species richness variation in marine and terrestrial fauna across widespread, fragmented territories: assessing inherent challenges of data scarcity at local and regional scales

Barreiro, Kilian; Benestan, Laura; Moritz, Charlotte; Ducatez, Simon; Gaertner, Jean-Claude; Le Luyer, Jérémy; Monaco, Cristián J.

doi:10.1038/s41598-025-06631-4

Download PDF

Article
Open access
Published: 01 July 2025

Species richness variation in marine and terrestrial fauna across widespread, fragmented territories: assessing inherent challenges of data scarcity at local and regional scales

Kilian Barreiro¹^na1,
Laura Benestan^1,2^na1,
Charlotte Moritz³,
Simon Ducatez⁴,
Jean-Claude Gaertner⁴,
Jérémy Le Luyer² &
…
Cristián J. Monaco¹

Scientific Reports volume 15, Article number: 21043 (2025) Cite this article

1230 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

The ongoing biodiversity crisis calls for a complete biodiversity inventory of marine and terrestrial ecosystems. The task is particularly challenging for fragmented island territories, where baseline biodiversity information is often difficult to procure. By centralising information from different sources (museums, research institutions, citizen scientists), ‘big-data’ platforms provide an opportunity to evaluate species biodiversity information of understudied regions. Using data primarily sourced from the Global Biodiversity Information Facility (GBIF), and complemented by a review of 56 potential data sources—of which nine provided unique, non-redundant records—we curated the first biogeographic dataset for both marine and terrestrial animal species in French Polynesia, a large territory composed of 124 islands and atolls that belongs to the Central Pacific region, a marine biodiversity hotspot facing conservation challenges. The dataset revealed heterogeneous species richness across archipelagos and islands, prompting an investigation into potential sampling biases (institutional, taxonomic, spatial) as well as an assessment of island-specific accessibility biases. We estimated that the archipelagos and islands had an inventory completeness rate that ranges from 1.9 to 98.4%, suggesting that a large proportion of the studied area remains poorly documented. Spatial and temporal sampling biases were partly explained by accessibility constraints (proximity to airports, roads or ports), and inventory completeness was higher for marine than terrestrial species. The biases quantified here challenge our ability to conduct biogeographic analyses that integrate the land-sea meta-ecosystem. Our database allows identifying taxa and sampling locations that require urgent attention, as well as comprehensively recorded species that can serve as indicators for environmental degradation. Explicitly acknowledging the inherent biases of biodiversity datasets is the first step towards a more comprehensive characterization of species diversity across fragmented territories. This information is crucial for guiding sound adaptive-management and conservation planning strategies.

Diversity, distribution and intrinsic extinction vulnerability of exploited marine bivalves

Article Open access 15 August 2023

What Darwin could not see: island formation and historical sea levels shape genetic divergence and island biogeography in a coastal marine species

Article Open access 03 July 2023

Quantitative and qualitative Data on historical Vertebrate Distributions in Bavaria 1845

Article Open access 28 March 2025

Introduction

Humans are driving an unprecedented erosion of marine and terrestrial biodiversity, fundamentally altering the structure and functioning of ecosystems, and in return threatening the beneficial contributions that nature provides^1,2,3. Implementing conservation actions to confront this crisis requires comprehensive and spatially explicit baseline information on species diversity across the planet⁴. Ultimately, these data are essential for guiding conservation management based on a sound understanding of the ecological and evolutionary processes that drive spatial and temporal patterns of species distribution across ecosystems^5,6.

Thanks to the concerted efforts from museums, research institutions, citizen scientists, and ‘big-data’ platforms facilitating the integration of information, biodiversity records are increasingly available^7,8,9. Over the last two decades, many initiatives to centralise species occurrence data have emerged, notably some online repositories including the Global Biodiversity Information Facility (GBIF, https://www.gbif.org/) and the Ocean Biodiversity Information System (OBIS, https://obis.org/). By adhering to the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) and the metadata-sharing standards, such as the Darwin Core (DwC)¹⁰, Ecological Metadata Language (EML)¹¹, and BioCASe¹², these intergovernmental research infrastructures promise to expedite the study of biodiversity across ecosystems. GBIF and OBIS are the largest open-access occurrence data portals for terrestrial and marine species, both being routinely used to inform resource management and conservation programs (e.g^{13,14,15,16,17}).

Despite their growing popularity, open-access biodiversity databases have been criticised on the grounds of poor data quality, potentially limiting their scope and applicability¹⁸. Important shortfalls that are often cited include standardisation issues during sampling^19,20, incomplete and/or incorrect records (e.g., species misidentification) and sampling biases, either spatial/temporal (i.e., unbalanced sampling efforts across space/time), taxonomic (i.e., skewed sampling favouring certain taxa), or both^20,21,22,23. While cleaning and filtering methods allow readily correcting for incomplete and/or incorrect entries, sampling biases are difficult to diagnose and require special attention²⁴. The spatial sampling bias, considered one of the main challenges limiting our comprehensive understanding of large-scale biodiversity patterns²⁵, can be partly explained by socio-economic reasons (e.g., wealthy zones are more likely to be surveyed²⁶), a scientific bias towards certain taxa^21,differences in sampling standards²⁷, and/or by logistical difficulties to access certain locations^28,29. Notably, implementing standardized sampling methods in future research is essential, as it enhances comparability and integration, increases the reproducibility of findings, and improves data quality and reliability, while also saving time and resources³⁰. These shared data collection protocols should be adopted and facilitated for both scientific and non-scientific personnel³¹.

Remote oceanic islands are likely to show sampling gaps due to their geographical isolation, which ultimately results in patchy and poorly representative data for the study region. The difficulties and high costs associated with organising monitoring campaigns further exacerbate these biases. As a result, some islands are underrepresented in long-term monitoring schemes³² and, aside from a few exceptions (e.g.,³³), comprehensive biodiversity studies across widespread archipelagos remain rare. This paucity of information for islands and atolls is particularly detrimental because they are a priori highly vulnerable ecosystems that potentially harbour high levels of endemism due to their isolation^34,35. Additionally, fragmented archipelagos are unique natural laboratories that provide opportunities for studying the ecological and evolutionary processes driving biodiversity patterns, dispersal potential, endemism and extinction rates, for both marine and terrestrial organisms. However, a proper understanding of these biogeographical processes first requires robust baseline information on species distribution^36,37.

With 124 high islands and atolls spread across five archipelagos covering 4.8 million km²^38,39, French Polynesia represents the epitome of a fragmented territory. The large number of islands, their relative isolation, and the sheer variation in geomorphological characteristics they exhibit complicate efforts to survey the entire region or avoid sampling biases. Indeed, the marine and terrestrial biogeography of French Polynesia has only been partly studied, with a remarkable skew towards specific taxonomic groups. In the marine realm, targeted investigations have mainly focused on marine molluscs, brown seaweeds (Phaeophyceae), and reef fishes^{40,41,42,43,44,45,46,47,48}. In the terrestrial realm, data compilations include a checklist of the recorded land and fresh-water arthropods⁴⁹, a biogeographic atlas of birds⁵⁰, and an inventory of the vascular flora^51,52,53, as well as some rare studies focusing on the phylogeographic origins of specific terrestrial biota (e.g.,^54,55). Overall, the lack of a centralised, complete, and unbiased dataset for the region prevents an exhaustive analysis of the biogeographical status of marine and terrestrial species across French Polynesia. As a model of a highly fragmented island system, improving our fundamental understanding of French Polynesian biogeography is not only critical for cataloguing the existing fauna of the region, but also for contributing to our general comprehension of the ecological processes driving the current biodiversity crisis in isolated systems^35,56.

Using data originally downloaded from open-access portals (GBIF, OBIS) and 56 additional occurrence data sources, we compiled and curated the first biogeographic dataset for both marine and terrestrial animal species in French Polynesia. We used these data to: (1) provide a baseline characterization of the number of species in the region; (2) identify taxonomic groups that might require further investigation, as well as comprehensively recorded species that can serve as indicators for environmental degradation; (3) identify poorly- and well-surveyed islands; and (4) quantify island-specific accessibility biases leading to heterogeneous sampling efforts.

Materials and methods

Data collection

We downloaded occurrence data from the GBIF portal (http//gbif.org; https://doi.org/10.15468/dl.gaxgr7) on May 24, 2023, covering French Polynesia (polygon spanning between 5°S and 30°S, and 134°W and 155°W). To identify any additional biodiversity records not included in GBIF, we reviewed 56 data sources, including local reference guides, expedition reports, and repositories (Table S2). Nine sources contained unique georeferenced species records that were integrated into our final dataset. The remaining sources were used to cross-check metadata, validate completeness, or support contextual interpretation. Species occurrences are defined as records of a particular species (or other taxonomic rank), with a geographic location and timestamp. These raw data were treated following the Darwin Core¹⁰, Ecological Metadata Language¹¹, and BioCASe¹² standards. A pre-filtration of the data was done to exclude records missing geographic location and/or taxonomic classification (e.g., not available or zeros), yielding 343,780 records (Fig. 1). Because GBIF and OBIS signed a data-sharing agreement which was effective at the time we downloaded the data, the marine data from OBIS was also contained in our GBIF data. The coastline shapefiles used to analyse the region included 120 geographical structures, most of which were atolls and high islands. Hereafter, we refer to all geographical structures as “islands”. Each record retrieved from the GBIF dataset was assigned to its nearest island based on geographic distances estimated using the function st_nearest_feature available in the sf package v.1.0-15⁵⁷ in R⁵⁸.

Validation of the taxonomic information

To clean, homogenise, and validate the taxonomic information in the dataset, we assumed that misidentifications would occur at the species level. To ensure taxonomic reliability we validated each species name using ad hoc taxonomic data repositories. We first validated the species name of each recognized taxon with WoRMS (World Register of Marine Species, https://www.marinespecies.org/) using the wm_records_name function from the R package worrms⁵⁹. We then assigned a taxonomic status (i.e., “accepted”, “doubtful”, “synonym”) to each record following the criteria outlined by the GBIF Backbone taxonomy (see https://doi.org/10.15468/39omei, https://hosted-datasets.gbif.org/datasets/backbone/). Taxa that were assigned as either “doubtful” or “synonym” were replaced by the updated taxonomic name provided by WoRMS. Taxa not recognized by the WoRMS repository were further examined using the gna_verifier function from the R package taxize⁶⁰, which provides a means to validate species names by accessing several additional repositories via specific Application Programming Interfaces (e.g., ITIS: Integrated Taxonomic Information System, https://www.itis.gov/; CoL: Catalogue of Life, https://www.catalogueoflife.org/; bold: Barcode of Living Data, https://www.boldsystems.org/). Taxa that were not recognized neither by WoRMS nor Taxize were submitted to TAXREF (taxonomic reference curated by the French National Museum of Natural History, https://inpn.mnhn.fr/programme/referentiel-taxonomique-taxref) using the rt_taxa_search function from the rtaxref R package⁶¹. A final manual check was done for records that could not be identified in the aforementioned taxonomic repositories.

Habitat classification and biogeographical status

Habitat classification for marine and terrestrial species were verified using WoRMS and Taxref, respectively. Habitat information was split into four categories (i.e., marine, brackish, freshwater and terrestrial) according to the classification scheme favoured by WoRMS. Missing habitat information was completed using the TaxRef database. For our analyses of terrestrial and marine ecosystems, we focused on species that were classified as exclusively “marine” or exclusively “terrestrial”. Species classified as amphibious or those inhabiting both terrestrial and marine environments at different life stages (e.g., seabirds like Gygis alba or Sula sula) or during specific phases of their life cycle (e.g., insects with aquatic larval stages) were included in the cleaned dataset (labeled as “Mixed” in Fig. 1) but excluded from further analyses.

Data filtration sequence

Because geographic, taxonomic and accuracy standards have changed over time^20,62, and notorious errors were detected in older records, we retained entries dating from 1950 onwards and excluded those without timestamps (Fig. 1). Subsequently, we removed all absence data to rule out potential biases due to false negatives and no-observation data⁶³. We then restricted occurrences to those described in the basisOfRecords column as: “human observation”, “machine observation”, “material sample”, “material citation” and “preserved specimen” according to recommendations by Smith et al. 2018⁶⁴. Cross-checking values corrected real duplicates in decimalLatitude, decimalLongitude, ScientificName, Year, Month and Day categories. Finally, records lacking species names and Habitat information were removed from the dataset (Fig. 1).

Taxonomic biases: identifying under- and over-represented groups

We estimated the taxonomic bias at the Class level based on its over- or under-representation, relative to an “ideal sampling effort index”. The ideal number of records for a given class was estimated based on the hypothetical scenario where each species received the same number of records, and therefore each class received a number of records directly proportional to its number of species²¹ according to:

Ideal = N_rec * (N_{sp_group}/ N_{sp_tot})

where N_rec = total number of records, N_{sp_group} = number of distinct species within the taxonomic group, and N_{sp_tot} = total number of species present in the whole dataset. Taxonomic bias was assessed based on the difference between the ideal and observed sampling efforts, calculated for each class with more than 100 records in marine habitats and more than ten records in terrestrial habitats. To highlight values that deviated significantly from the ideal, we applied an inverse hyperbolic sine transformation to the data. We also identified the top ten most representative species for each habitat.

Spatial and temporal heterogeneity in the sampling effort

To examine spatial heterogeneity in the sampling effort, we first mapped the number of records and species estimated for each island within each archipelago. The agreement between the number of records and species per island (log₁₀-transformed) was evaluated based on the Pearson correlation coefficients and its statistical significance. Additionally, to quantify the prevalence of heterogeneous sampling effort across space, we assessed the proportion of species recorded only once at each island, a common method in biodiversity studies to evaluate sampling completeness and detect potential under-sampling biases⁶⁵. The parameter uniqueness—species that have only been collected once—is widely recognized as an indicator of incomplete sampling^66,67, allowing researchers to infer the adequacy of the sampling effort and identify areas that may require further investigation. We considered q_k as the number of species documented in k sampling-effort units, so that the number of species observed in a single sampling-effort unit is q₁ (i.e., unique), the number of duplicates is q₂, and so on.

Inventory completeness

To investigate the degree of inventory completeness in the dataset for both marine and terrestrial ecosystems at the scale of the archipelago and the island, we estimated the species inventory completeness percentage (C), calculated as:

C_(i) = (Sobs_(i)/ Sest_(i)) * 100

where i = each island or archipelago, Sobs = number of species observed, and Sest = number of species estimated at each archipelago and island⁶⁸. To estimate Sest, we used the Species Accumulation Curve (SAC) approach that describes the relationship between species richness and sampling effort, i.e., the number of records available in a grid cell⁶⁹. To derive the SACs, we split the area in 0.05° (~ 25 km²) grid cells. We described the SACs using the specaccuml function (method = “exact”) available in the R package vegan v.2.6–4⁷⁰. We fitted the Michaelis-Menten model with the fitspecaccum function (method = “michaelis-menten”) to provide estimates of the number of species likely to be present (i.e., Sest, which corresponds to the asymptotic richness, or parameter Vm in the Michaelis-Menten equation) and the number of records required to capture 50% (K) of the estimated number of species predicted by the model^71,72,73. Using the poolspecaccum function available in the vegan package, we also compared this expected number of species with the nonparametric richness estimators, Chao1 and Jackknife 2 estimators that are recommended when the data contain a high number of unseen species⁷⁴. Because biodiversity assessments can be biased by grid cells with extremely low species records, we considered a minimum threshold of ten observations to run the SACs, as was done in previous studies^68,75.

While we aimed to calculate the SACs for each island and archipelago based on grid cells, as recently done in several studies on macroecology using GBIF datasets^75,76,77, only eight islands (i.e., Anaa, Huahine, Moorea, Nuku Hiva, Raiatea-Tahaa, Rangiroa, Tahiti and Ua Pou) were sufficiently large to yield enough cells (greater or equal to 10) to fit the Michaelis-Menten model for terrestrial data. We therefore generated archipelago-scale models based on 5-km grid cells, while for the island-scale models we used the geographic coordinates associated with the species records. A preliminary comparison between these two approaches (0.05-degree resolution grid cells vs. records) revealed a significant correlation between them for the archipelago scale (R² = 0.96). Therefore, we only presented SACs based on records for both archipelagos and islands. To evaluate inventory completeness, we determined the total number of islands with more than 100 records and C greater or equal to 80%, meaning that at least 80% of the species have been sampled^77,78. We then examined the correlation between the number of records and C to test whether these proxies of sampling effort and reliability were associated. We used a Spearman correlation test for non-parametric data. Statistical significance was evaluated based on ⍺ = 0.05.

Sampling bias due to accessibility

To explore the influence of accessibility constraints on these sampling biases, we used a Bayesian approach to estimate how sampling rates vary with proximity to several common anthropic accessibility factors (i.e., rivers, roads, cities, airports, and ports). Using the calculate_bias function from the sampbias R package v. 2.0.0⁷⁹, we estimated the bias weights (w), which quantify the impact of each accessibility factor on sampling rates. These weights are calculated assuming an exponential decline in sampling rates as distance from accessibility factors increases. This package also provides spatially explicit estimates of the number of records (i.e., expected records) using a Poisson sampling process while accounting for the influence of the accessibility factors. Because the geospatial data contained by default in the sampbias package is incomplete for French Polynesia (Natural Earth Data, https://www.naturalearthdata.com/), we manually inputted vector data for rivers, roads, cities with > 1,000 inhabitants, airports, and ports. These data were provided by the French Polynesian agency for marine resources, the Direction des Ressources Marines. We defined a grid (inp_raster parameter) contained within the same polygon used for downloading the GBIF data, with 0.05 degrees resolution (~ 5.5 km). This was done for consistency with the SAC analyses. Each grid cell was assigned to the nearest island based on geographic distances estimated using the function st_nearest_feature available in the sf R package v.1.0-15⁵⁷.

Results

Curated dataset for French Polynesian marine and terrestrial species

From the original 343,780 records included in the dataset, we removed 20,967 records that were either dated before 1950, or which did not have a time stamp (Fig. 1). Then, 58,490 records with no or non-usable species taxonomy were discarded, 86.6% of which sourced from institutions sources (e.g., Museum national d’Histoire naturelle, Smithsonian Institution). A total of 77,651 records were identified as duplicated, 21.5% of which originated from citizen science sources. The number of records accessible per year has increased over time since 1950, reaching maximum values in 2011, 2012, and 2006, with 20,636, 11,263, and 11,039 records, respectively (Fig. S1). This increase in records was mainly explained by the punctual contribution of two out of 130 publishers: OBIS-SEAMAP and UMS PatriNat (OFB-CNRS-MNHN, Paris). The mean number of records per species was 25.8 (median = 3), ranging from 1 to 12,339. Records produced by citizen scientists accounted for 21.7% of the total, corresponding to 40,394 records. Data collected by citizen scientists were also the main source of data (i.e., > 50% of occurrences sourced as citizen science) for 62 islands, and the only source (100%) for two islands (Fangatau, Marutea nord). Human observation, including institutional and citizen science publishers, was the most frequently used recording method, with 75.7% (140,650 records) of total records. Preserved specimen and material sample categories accounted for 20.1% and 3.9% of records, respectively. WoRMS validated the taxonomy of 90.6% of total records and 99.1% of non-terrestrial records. Only 268 species lacked information on their habitat, which we completed manually. The resulting cleaned dataset was composed of 185,758 records including 141,181 marine, 15,940 terrestrial and 28,637 mixed records for 5,953 marine, 1,032 terrestrial and 203 mixed species, collected from 1950 to 2023 (Figs. 1 and S1). The curated dataset is available in SEANOE (https://www.seanoe.org/data/00878/99018/).

Taxonomic composition and biases

The number of recorded species was ~ 5.8 times higher for marine than terrestrial ecosystems, with 5,953 marine and 1,032 terrestrial species, respectively. For marine taxa, the dataset included 18 phyla, with three major groups: Mollusca (2,337 species), Chordata (1,733 species), and Arthropoda (1,148 species), accounting for over 95.4% of marine records (134,734 records). Five classes alone accounted for 78.4% of the observations: Teleostei (76,248 records, 1,547 species), Gastropoda (20,446 records, 2,028 species), Malacostraca (6,788 records, 1,076 species), Bivalvia (4,368 records, 276 species), and Mammalia (2,807 records, 25 species; Fig. 2). The most represented marine species were Carcharhinus amblyrhynchos (grey reef shark, 4,428 records), Carcharhinus melanopterus (blacktip reef shark, 3,987 records), and Triaenodon obesus (whitetip reef shark, 941 records; Fig. 2). A total of 90% of the marine species had 33 or fewer records, and 26% were unique records.

The terrestrial taxa comprised five phyla, including Arthropods (761 species), Mollusca (194 species), Chordata (73 species), Platyhelminthes (3 species), and Nematoda (1 species). The five most recorded classes were Aves (9,061 records, 62 species), Insecta (3,545 records, 687 species), Gastropoda (2,198 records, 194 species), Arachnida (532 records, 62 species), and Squamata (430 records, 10 species), representing 98.9% of all terrestrial species records. A total of 90% of the terrestrial species had 23 records or fewer, and 41% were unique records. Three introduced bird species, Geopelia striata (zebra dove), Acridotheres tristis (common myna), Pycnonotus cafer (red-vented bulbul), were the most recorded terrestrial species, with 1,174, 1,140 and 957 occurrences (Fig. 2), of which 93.1% were provided by the “Cornell Lab of Ornithology”.

Spatial and temporal heterogeneity in sampling effort and the number of recorded species

We observed a significant and strong correlation between the log-10 number of records per island (i.e., a proxy for sampling effort) and the number of species per island for both marine (ρ = 0.984, P < 0.001) and terrestrial (ρ = 0.969, P < 0.001) ecosystems (Fig. S2). This analysis excluded islands that lacked records in both marine and terrestrial habitats.

Our dataset included marine species records for 118 out of 124 islands. The number of records per island was heterogeneous (Fig. 3), ranging from 1 to 60,473, with a mean of 1,196 records (median = 77). The number of species present was also highly heterogeneous across space, ranging from 1 to 2,770 species per island, with a mean of 199 species (median = 58) per island. The Society Archipelago (13 islands) held 57.1% of all marine-species records, 75.0% of which were observed in Moorea (60,473 records), Tahiti (9,920 records), and Raiatea-Tahaa (4,213 records; Fig. 4). Considering the other four archipelagos, the islands that exhibited the highest number of records were Rapa (5,797 records) in the Austral islands (11 islands), Fakarava (6,585 records) in the Tuamotu (69 islands), Nuka Hiva (3,356 records) in the Marquesas (17 islands), and Mangareva (2,504 records) in the Gambier (11 islands; Fig. 3). Gambier was the least sampled archipelago, accounting for 2.9% of all marine records, and for only 13.3% of all marine species identified.

Considering the terrestrial habitat, our dataset identified 68 islands with at least one species record, and 52 islands with no records. As for the marine habitat, the number of terrestrial species records per island was heterogeneous (Fig. 3), ranging from 1 to 4,705, with a mean of 234 records per island (median = 8.5). The number of species identified per island ranged from 1 to 384, with a mean of 34 species per island (median = 4.5). The Society Archipelago held 74.4% of all terrestrial species records, 85.2% of which were registered in the trio Moorea (4,314 records, 384 species), Tahiti (4,705 records, 301 species), and Raiatea-Tahaa (1,076 records, 201 species; Fig. 4). Considering the other four archipelagos, the islands showing the highest number of records were Anaa (683 records) in the Tuamotu, Rurutu (441 records) in the Austral, Nuku Hiva (548 records) in the Marquesas, and Mangareva (123 records) in the Gambier (Fig. 4). As for the marine database, the Gambier archipelago had the lowest number of terrestrial records, representing only 4.7% of all terrestrial species identified.

Inventory completeness

Considering the archipelago scale, the SAC analysis showed that the number of species recorded increased with sampling effort. Although the curves for both marine and terrestrial datasets exhibited a plateau, they did not reach a clear saturation point (Fig. S3). Our calculations suggest that marine inventory completeness was comparable among archipelagos, with 76.6%, 74.9%, 75.6%, 79.9% and 76.6% for the Austral, Gambier, Marquesas, Society and Tuamotu Archipelagos, respectively, indicating that at least 70% of the species were detected overall. According to the asymptote values based on the Michaelis-Menten model (Sest), marine species richness was lowest at the Gambier (Sest = 1,055, Chao1 = 1,221, Jackknife 2 = 1,317 species) and the highest at the Society (Sest = 5,230, Chao1 = 6380, Jackknife 2 = 6,969 species) archipelagos. The Austral, Marquesas and Tuamotu Archipelagos showed similar asymptote values of 2,538 (Chao1 = 3,067, Jackknife 2 = 3,291), 2,648 (Chao 1 = 3,497, Jackknife 2 = 2,120), and 2,120 (Chao1 = 2,639, Jackknife 2 = 2,807) expected species, respectively (Table 1).

Table 1 Archipelago-scale Michaelis-Menten model output parameters based on a records approach (N records as sampling units).

Full size table

Inventory completeness for terrestrial species was highly heterogeneous across archipelagos, ranging from 43.0% for the Marquesas, the northernmost and most remote archipelago, to 82.1% in the Society. Inventory completeness for terrestrial species was higher than for marine species in the Society (C = 82.1% versus 79.9%) and Tuamotu Archipelagos (C = 81.5% versus 76.6%), but lower in the Gambier (C = 64.1% versus 74.9%) and Marquesas (C = 43.0% versus 75.6%). For terrestrial species, the asymptote values based on the Michaelis-Menten model (Sest) ranged from 76 (Gambier, Chao1 = 73, Jackknife 2 = 84) to 640 species (Marquesas, Chao 1 = 639, Jackknife 2 = 609), with 484 (Chao 1 = 595, Jackknife = 578), 606 (Chao 1 = 894, Jackknife = 981), and 100 (Chao 1 = 152, Jackknife = 165) species estimated for the Austral islands, Society and Tuamotu, respectively.

At the island scale and for the marine dataset, we fitted SACs for 73 out of 119 islands having at least 10 records (Table 2). Inventory completeness was highly heterogeneous, ranging from 1.9% (Takaroa, Tuamotus, 12 records) to 82.2% (Moorea, Society, 2,826 records), with an average (± SD) of 39.2% (± 20.0%). Assuming a threshold of C ≥ 80% and at least 100 records, only two islands were classified as well-sampled: Moorea (2,826 records, C = 82.2%) and Fakarava (520 records, C = 82.0%). Among the islands with the highest number of records, we identified low to moderate inventory completeness for Tahiti (1,226 records, C = 65.9%), Bora-Bora (521 records, 60.8%) and Raiatea-Tahaa (263 records, C = 28.5%) in the Society, Rapa (385 records, C = 75.6%) and Raivavae (352 records, C = 69.1%) in the Austral Islands, Nuku Hiva (376 records, C = 65.8%) and Hiva Oa (234 records, C = 68.6%) in the Marquesas, Tikehau (360 records, C = 62.6%) and Rangiroa (343 records, C = 52.0%) in the Tuamotu. The correlation between inventory completeness and the number of records per island was moderate (R² = 0.41; P-value < 0.001).

Table 2 Island-scale Michaelis-Menten model output parameters based on a records approach (N records as sampling units) for the marine ecosystem.

Full size table

For the terrestrial dataset, 27 islands had sufficient records (> 10 records) to fit SACs (Table 3). Inventory completeness ranged from 27.3% for the Fakarava Atoll (12 records, Tuamotu) to 98.4% for Anaa Atoll (67 records, Tuamotu). Other well-sampled islands (C ≥ 80% and 1 records) included Ua Huka (39 records, C = 87.5%), Hatutaa (23 records, C = 88.7%) and Tahuata (25 records, C = 88.4%) in the Marquesas, Tenararo (26 records, C = 97.5%) in the Tuamotu. Islands with the highest number of records, including Moorea (851 records, C = 76.8%) and Tahiti (1,039 records, C = 68.9%, respectively) were nearly well-sampled. Terrestrial species inventory completeness and sampling effort were not correlated across these islands (R² = 0.20; P-value > 0.05).

Table 3 Island-scale Michaelis-Menten model output parameters based on a records approach (N records as sampling units) for the terrestrial ecosystem.

Full size table

Some islands exhibited contrasting patterns between terrestrial and marine inventories. For instance, Anaa was well sampled for terrestrial species (67 records, C = 98.4%) but only moderately sampled for marine species (31 records, C = 17.6%). Fakarava showed the opposite trend, with an almost complete marine inventory (C = 83.20%), while its terrestrial inventory was sparse (12 records, C = 27.3%). Overall, our results indicate that the observed species richness exceeds the Michaelis-Menten estimate by a factor of one to three, suggesting that even our current estimates likely underestimate the true species diversity. This reinforces the notion that species inventories in this region remain incomplete.

Islands for which we were unable to fit SACs were classified as either “neglected islands” (i.e., with no data at all) or “poorly-documented islands” (i.e., with not enough data). For the marine and terrestrial data, we identified three and 52 neglected islands, respectively. The problem of missing data was prevalent across archipelagos, but less important in the Society and Austral archipelagos (Fig. 5). We found 34 and 36 poorly-documented islands for marine and terrestrial ecosystems, respectively. The data scarcity was particularly pronounced in the largest archipelago, the Tuamotu, as well as in the southernmost archipelago, the Gambier (Fig. S4).

Sampling bias due to human accessibility

Sampling effort for marine species was primarily influenced by proximity to roads (w = 0.063), indicating a strong spatial bias towards areas with developed infrastructure. Airports had a moderate effect on sampling distribution (w = 0.008). In contrast, proximity to cities (w = 0.001) and waterbodies (w = 0.0005) had negligible impacts on sampling intensity (Fig. 6; Table S1).

Similar results were found for the terrestrial data, where the presence of roads contributed the most to the accessibility bias (w = 0.060). The effect of airports and ports was moderate (w = 0.031) while the influence of cities and water bodies was negligible (cities’ w = 0.004, waterbodies’ w = 0.001) (Fig. 6). The model also revealed a low number of marine and terrestrial records (Table S1), even after correcting for accessibility biases, in the Tuamotu and Gambier Archipelagos, except for Mangareva, Hao, and Arutua Islands. In contrast, most islands in the Society Archipelago were oversampled relative to the overall sampling effort across French Polynesia.

Discussion

Our study compiles the most comprehensive open-source database on animal biodiversity in French Polynesia, illuminating regional- and island-scale biodiversity patterns of marine and terrestrial fauna across this vast and fragmented territory. While our results highlight significant disparities in sampling effort across islands, this work offers valuable quantitative insights into completeness of taxonomic and spatial data throughout French Polynesia. This work also highlights understudied areas and taxonomic groups, providing a practical tool for conservation planners to guide future sampling strategies and enhance biodiversity representativeness. We argue that this integrative approach is essential for explicitly addressing the inherent biases often present in large-scale biodiversity studies²³.

Building an accurate open-source biodiversity dataset

While open-source biodiversity datasets offer unique opportunities for studying macroecological processes, global repositories face criticism due to significant variation in data quality and quantity, depending on geographic, temporal, and taxonomic factors²². Ignoring these caveats can lead to erroneous conclusions. However, when carefully considered, they can enhance the utility of open-source data by highlighting critical biodiversity knowledge gaps (e.g^80,81,82). Addressing uncertainties in the data first requires acknowledging that open-source biogeographic datasets are likely to be incomplete²⁵, especially in vast and fragmented regions and for specific groups of organisms. Secondly, standardised taxonomic repositories (e.g., WORMS) offer workflows for cleaning data retrieved from open-source platforms while adhering to FAIR data-sharing principles. Here, by applying previously validated filtering protocols^63,we enhanced the geographic and taxonomic accuracy of GBIF records for French Polynesia, closely matching recent expert taxonomic assessments.

Our database contains a total of 7,188 species, including 1,893 vertebrates and 5,295 invertebrates. Regarding vertebrates, we found that every known marine mammal (26 out of 26 species) and a large number of birds (126 out of 175 species) previously documented in the region are represented⁸³. Our database includes 2,552 marine molluscs out of 3,022 referenced in a recently published checklist and identification guide⁴⁶, and the Teleostei class included 1,547 species, which is more than the 1,310 reported in the most complete identification guides for the region^84,85. While the taxonomic coverage is reassuring for marine species, it remains relatively limited for terrestrial species. For example, our records include only 757 out of 2,497 insect species (Insecta) and 63 out of 365 spider species (Arachnida) described in the region⁴⁹. Data scarcity for insects is a global issue, and in some regions, it is partly driven by species extinction rates that outpace discovery rates^86,87. Islands, which harbour approximately 20% of the world’s terrestrial biodiversity, are critical reservoirs of fragile and threatened biodiversity⁵⁶. This highlights the urgent need to document the exceptional biodiversity of insular countries like French Polynesia, where some taxonomic groups, such as ground beetles, contribute significantly to global biodiversity^56,88. Our study provides an efficient framework for identifying poorly sampled species, which can be extended to other taxonomic groups in French Polynesia (e.g., plants or algae) and applied more broadly to other regions.

Linnean shortfall

The Linnean shortfall—i.e., only a fraction of the planet’s species has been described—is a major gap in our understanding of biodiversity¹⁸, limiting our ability to effectively address the ongoing extinction crisis². The Linnean shortfall is partly driven by taxonomic sampling biases, where societal preferences influence which groups are more frequently recorded²¹. This explains why patterns of sampling efforts are often represented by homogeneously-sampled taxonomic groups such as marine mammals⁸², fishes⁸⁹ or insects⁹⁰. Notably, our taxonomic bias analysis revealed a significant under-representation of non-charismatic invertebrate species such as Gastropoda, Malacostraca, Anthozoa, Bivalvia, Polychaeta in the marine environment, as well as Insecta, Gastropoda, Arachnida, Malacostraca, in terrestrial ecosystems. This finding aligns with Troudet et al. (2017)²¹, who also identified biases against these classes at the global scale. Conversely, vertebrates were well-represented, with the humpback whale (Megaptera novaeangliae) being one of the most frequently recorded species. This discrepancy often stems from the aesthetic appeal of certain species, which influences both public interest and scientific focus^91,92,93. Furthermore, studies have effectively shown that visual appeal shapes the perception and prioritisation of species in research and conservation⁹³. To address these biases and enhance biodiversity inventories in French Polynesia, our dataset can help guide future research priorities, focusing on the underrepresented invertebrates and terrestrial species identified. By addressing these gaps, we can move towards a more comprehensive and balanced understanding of biodiversity, which is crucial for developing effective conservation strategies.

Wallacean shortfall

Another significant gap in our understanding of biodiversity is the incomplete knowledge of species’ geographic distribution, also known as the Wallacean shortfall^25,94. Despite extensive efforts, biodiversity sampling remains a resource-intensive, time-consuming and costly process, often resulting in substantial gaps in the spatial coverage of species records. Short-term projects frequently fail to capture the full spectrum of species within an assemblage because many species can be cryptic, rare or elusive, ultimately leading to incomplete assessments of global biodiversity patterns. However, these data gaps and uncertainties can be gauged and possibly mitigated through robust modelling approaches²³. In our study, marine inventory completeness was consistently moderate across French Polynesia’s archipelagos, being up to 74% of known species at the regional scale. Furthermore, none of the species accumulation curves for the archipelagos reached saturation, indicating that species richness predictions require more sampling to improve accuracy. Statistical methods to correct these biases (e.g.,⁶⁶), could be used for comparing community assemblages among archipelagos, as has been recently done with woody plants⁹⁵. Another strategy is to focus on well-documented groups, with complete inventories, enabling the description of their spatial distribution patterns⁹⁶.

For terrestrial species, we found that inventory completeness was more variable than that of marine species. The Marquesas Archipelago was especially under-surveyed, as only half of the total estimated animal species have been documented. Owing to their geographical isolation and intricate topography, the Marquesas Islands harbour a high level of floral and faunal endemism, with many native and endemic arthropod species probably yet to be discovered⁵⁵. Indeed, many studies have highlighted the uniqueness of this archipelago in terms of species assemblages^43,97 and genetic diversity⁹⁸. This biological distinctiveness, combined with the underrepresentation of terrestrial studies compared to marine ones, likely accounts for the discrepancy with other archipelagos, despite the strong interest that scientists have expressed for this biodiversity hotspot⁹⁹. Prioritising terrestrial biodiversity research in the Marquesas is crucial for establishing reliable comparisons across the land-to-sea continuum in this archipelago. Similarly, a more sustained sampling effort is much needed in the Gambier and Tuamotu Archipelagos, where a significant number of islands remain insufficiently inventoried. This is an urgent call because, while scientific expeditions could potentially discover new species (e.g.,¹⁰⁰), other species could become extinct before being documented (e.g^101,102).

Sampling effort biases can obscure the true spatial distribution of biodiversity, complicating the identification of biodiversity hotspots and the quantification of biodiversity loss¹⁰³. Decision-makers rely on data to inform and justify their political choices. However, gaps in biodiversity inventories can hinder conservation efforts and limit our ability to assess their effectiveness. For instance, if the distribution of an endangered species is poorly documented, it becomes difficult to identify and prioritize areas for protection. In Polynesia, a marine mammal observation network has been established, along with sanctuaries on three islands—Rurutu, Tahiti, and Moorea—contributing to the protection of these threatened species. Nevertheless, other areas also merit consideration for protection due to their high marine mammal diversity. Notably, Raiatea-Taha’a stands out, having recorded the highest number of species (23 species) and a significant number of sightings (104 records). Furthermore, Raiatea and Tahaa, which together form the largest lagoon in the Society Archipelago, may host a particularly high level of biodiversity not fully reflected in current GBIF data. This hypothesis is supported by research showing that 26 of the 32 marine sponges recorded across French Polynesia were found in Raiatea-Taha’a¹⁰⁴. Similarly, our findings confirmed that the island of Rapa harbours remarkable marine diversity, as evidenced by studies on coral-reef and terrestrial communities, including taxa unique to this island^105,106. However, despite being one of the best documented islands in the archipelago (C = 75.6%), Rapa’s inventory completeness remains behind the global threshold of 80%, suggesting that further sampling efforts are necessary to fully capture this island’s biodiversity. Overall, our study contributes to addressing this gap by pinpointing overlooked locations of the Polynesia-Micronesia biodiversity hotspot.

Conservation science is often compelled to assist in decision-making based on limited and incomplete data¹⁰⁷. The spatial heterogeneity in sampling effort that we identified for both marine and terrestrial fauna in French Polynesia is considerable, with up to 70% of islands lacking data on their terrestrial environments. This striking data deficiency was also evidenced by another study using GBIF data to analyse species diversity in a remote region⁶³. An additional challenge, particularly for vast and fragmented territories such as French Polynesia, is the need for data at a sufficiently high spatial resolution to capture island-wide variation. We identified 52 islands that either lacked digital data entirely or were poorly documented, likely due to their remoteness. To fill the spatial gaps in biodiversity data for French Polynesia, we recommend that future sampling efforts prioritise these islands, while also considering the disparity in data coverage between marine and terrestrial ecosystems.

The marine-terrestrial sampling bias

Marine and terrestrial ecosystems are often studied separately, partly due to historical, cultural, or practical reasons^108,109. However, because the land-sea continuum operates as an integrated meta-ecosystem, this research divide hampers our ability to fully understand and effectively protect interconnected ecosystems^103,110. Maintaining a healthy land-sea ecosystem is particularly crucial in small-island territories, where biodiversity is vulnerable to human activities^35,56and where the wellbeing of local populations heavily depends on local natural resources, especially through fishing and tourism. French Polynesia is no exception, with tourism as its primary economic activity and fish and invertebrates as staples in the local diet¹¹¹. Unlike the global trend¹⁰³our data show that French Polynesian biodiversity is better documented in marine ecosystems than in terrestrial ones. This discrepancy is partly due to the focus of scientific research and exploration on marine environments (e.g., the oldest of the two major ecology research units in French Polynesia, the CRIOBE, is entirely focused on marine environments) and to the inaccessibility of the mountainous regions⁵⁴ and seamounts¹¹². The gap is also likely influenced by the huge difference in surface area between land (4,167 km²) and sea (2.5*10⁶ km²), which may also explain why the marine habitats host 20 times more species than terrestrial ones. While surface-area differences are a factor to consider, our records indicate that the disparity is also driven by a lack of terrestrial data for over 52 islands, compared to just two islands with missing marine data. The observed imbalance in marine versus terrestrial data coverage is not only due to the inherent differences between these ecosystems but also reflects underlying biases in sampling practices, exacerbated by the accessibility factors.

Sampling bias is partly influenced by accessibility factors

The accessibility bias hypothesis posits that more accessible areas tend to be surveyed more frequently than less accessible zones⁷⁹. This can significantly impact the global understanding of natural communities^103,113. Our database revealed a pronounced geographic bias in species records, with the most accessible islands (i.e., Tahiti and Moorea in the Society Archipelago, Fakarava in the Tuamotu) being heavily sampled. In contrast, less accessible islands (e.g., Tureia, Napuka and Tenarunga in the Tuamotu, Motu One and Motu Nao in the Marquesas) are poorly documented. However, Rapa Island stands out as an exception, having attracted significant attention from the scientific community due to its hosting of several threatened endemic plant and animal species^{54,105,106,114}. The sampling bias in Tahiti and Moorea is also likely related to the presence of local research institutions (e.g., CNRS-EPHE-Université de Perpignan CRIOBE station, Ifremer, IRD, UC-Berkeley Gump station, University of French Polynesia) there. While Tahiti’s international airport contributes to the sampling bias observed in the Society Islands, our accessibility bias analysis indicated that the distance from ‘airports and ports’ was not the main anthropogenic factor explaining the variance in sampling effort at the scale of French Polynesia. Overall, our accessibility bias analysis showed that sampling efforts in both marine and terrestrial datasets are predominantly skewed towards areas near roads and, to a lesser extent, airports/ports. This aggregation pattern around roads is well-documented in the literature for both terrestrial and marine species^103,115 particularly in studies based on citizen-science data¹¹⁶.

Accessibility biases can vary depending on geographic and taxonomic contexts¹¹⁶ highlighting the importance of considering situations on a case-by-case basis. For instance, Freitag et al. (1998)¹¹⁷ found that records of smaller species in African terrestrial ecosystems were minimally affected by accessibility biases, whereas larger species were disproportionately represented in protected areas. Similarly, Cardoso et al. (2024)¹¹⁸ identified various accessibility-bias factors for marine species in the western Atlantic Ocean, including proximity to the coastline, research institutions, ports, protected areas, and urban centres. Recognizing and understanding the nuances underlying these various biases is crucial for enhancing the accuracy and comprehensiveness of biodiversity datasets.

Institutional bias in open-source databases

While accessibility factors provide important insights into sampling patterns, they are not the sole source of bias impacting our biodiversity records. Institutional biases, particularly those associated with open-source databases, might also play a crucial role. The unevenness in data contributions often stems from disparities in funding, data-sharing policies, and digitization efforts across different regions and institutions. The soaring popularity of GBIF data worldwide is reflected in our dataset for French Polynesia, where the number of records per year increased from 10 in 1950 to 1,866 in 2022. We anticipate that the dataset will continue to grow with the engagement of additional contributors, thereby enhancing its reliability¹¹⁹ if institutions continue to adhere to standardisation protocols¹⁰. Interestingly, the surge in data during 2006, 2011, and 2012, which constitutes the bulk of the dataset, was driven by the digitization of the French Museum of Natural History dataset (managed by PatriNat) and a major field sampling campaign by Cornell University (USA). The patchiness in data contributions to global open-source databases can be attributed to differences in funding and data-sharing policies across countries, inadequate efforts in digitalising local and national databases, and the sporadic and spatially heterogeneous nature of formal research campaigns²⁶. Combining GBIF records with national databases can yield more complete inventories, as demonstrated by De Araujo et al. (2022)⁷⁵ for Amazonian epiphytes. In our study, we applied this approach by not only relying on GBIF as the primary data source but also integrating nine additional local datasets to enhance the completeness of our inventory. This selective integration of external data, including local sources, helped reduce coverage gaps while maintaining data quality, underscoring the importance of leveraging both global and local data sources to mitigate biases in biodiversity records. In the case of French Polynesia, engaging local research institutions, private entities, government agencies and developing a citizen science network to compile and share existing (but often inaccessible) information would significantly reduce biases and strengthen the database. The use and adaptation of existing portals such as FauneFrance (https://www.faune-france.org/) or iNaturalist (https://www.inaturalist.org/) to local flora and fauna could for example be advocated to further centralise and favour the collection and compilation of local naturalist data.

Capitalising from citizen science while reducing biases in open-source datasets

Addressing biases and shortfalls in open-source biodiversity datasets is crucial to ensure their efficiency and accuracy in describing species distribution patterns. Citizen science has been increasingly recognized as an effective method for filling gaps in biodiversity information, especially in areas where formal scientific campaigns are limited or sporadic^14,120. In our database for French Polynesia, we observed an increase in species records driven by citizen science initiatives, in agreement with the global trend⁹. Indeed, a substantial 21.7% of records originated from participatory science efforts. While citizen scientists may not always adhere to standard scientific protocols, their contributions provide valuable insights into broader trends, which can then be rigorously analysed. To minimise taxonomic and geographic biases, the involvement of taxonomic experts remains crucial⁶².

Conclusions and perspectives

Centralising biodiversity information from museums, research institutions, and citizen scientists into big-data platforms offers a transformative opportunity for evaluating species biodiversity in understudied regions. These platforms enable comprehensive data analysis, facilitate global collaboration, engage the public in science, and ultimately contribute to more informed conservation strategies and biodiversity management. Our study provides significant insights into the biodiversity patterns of both marine and terrestrial fauna across the vast and fragmented territory of French Polynesia. We found that while marine inventory completeness is relatively high, averaging up to 76% of known species at the regional scale, terrestrial biogeography remains underexplored (average of 65%), particularly in the Marquesas and Gambier Archipelagos. The analysis indicates a notable skew in the data toward specific taxonomic groups, highlighting the urgent need for comprehensive surveys to fill these gaps. Furthermore, our findings underscore the value of citizen science initiatives, demonstrating their potential to enhance biodiversity knowledge in regions where formal scientific efforts are limited. Overall, this research not only emphasises the richness of biodiversity in French Polynesia but also calls for collaborative efforts to centralise and analyse biodiversity data. These efforts are crucial for aiding in conservation strategies and improving management of the unique ecosystems in the Indo-Pacific region, a global biodiversity hotspot that includes Micronesia, Polynesia, and Fiji¹²¹. By providing a reliable, spatially resolved biodiversity dataset, this study lays the foundations for future macroecological research in French Polynesia that will help respond to both fundamental and applied environmental questions.

Data availability

The analyses scripts are available in GitHub (https://github.com/KilianBARREIRO/biogeography_datadiv). The data are available in SEANOE (https://www.seanoe.org/data/00878/99018/).

References

IPBES. Global Assessment Report on Biodiversity and Ecosystem Services of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (Bonn, 2019).
Ceballos, G. & Ehrlich, P. R. Mutilation of the tree of life via mass extinction of animal genera. Proc. Natl. Acad. Sci. U. S. A. 120 e2306987120 (2023).
Gorman, C. E. et al. Reconciling climate action with the need for biodiversity protection, restoration and rehabilitation. Sci. Total Environ. 857, 159316 (2023).
Article CAS PubMed Google Scholar
Singh, J. S. The biodiversity crisis: A multifaceted review. Curr. Sci. 82, 638–647 (2002).
Google Scholar
Newmark, W. D., Jenkins, C. N., Pimm, S. L., McNeally, P. B. & Halley, J. M. Targeted habitat restoration can reduce extinction rates in fragmented forests. Proc. Natl. Acad. Sci. 114, 9635–9640 (2017).
Pilowsky, J. A., Colwell, R. K., Rahbek, C. & Fordham, D. A. Process-explicit models reveal the structure and dynamics of biodiversity patterns. Sci. Adv. 8, eabj2271 (2022).
Article PubMed PubMed Central Google Scholar
Farley, S. S., Dawson, A., Goring, S. J. & Williams, J. W. Situating ecology as a Big-Data science: current advances, challenges, and solutions. BioScience 68, 563–576 (2018).
Article Google Scholar
Kays, R., McShea, W. J. & Wikelski, M. Born-digital biodiversity data: millions and billions. Divers. Distrib. 26, 644–648 (2020).
Article Google Scholar
Heberling, J. M., Miller, J. T., Noesgaard, D., Weingart, S. B. & Schigel, D. Data integration enables global biodiversity synthesis. Proc. Natl. Acad. Sci. U. S. A. 118 e2018093118 (2021).
Wieczorek, J. et al. Darwin core: an evolving Community-Developed biodiversity data standard. PLOS ONE. 7, e29715 (2012).
Article CAS PubMed PubMed Central Google Scholar
Fegraus, E. H., Andelman, S., Jones, M. B. & Schildhauer, M. Maximizing the value of ecological data with structured metadata: an introduction to ecological metadata Language (EML) and principles for metadata creation. Bull. Ecol. Soc. Am. 86, 158–168 (2005).
Article Google Scholar
Güntsch, A., Berendsohn, W. G. & Mergen, P. The BioCASE Project - a Biological Collections Access Service for Europe. (2007).
Levin, N. et al. Biodiversity data requirements for systematic conservation planning in the mediterranean sea. Mar. Ecol. Prog Ser. 508, 261–281 (2014).
Article Google Scholar
Amano, T., Lamming, J. D. L. & Sutherland, W. J. Spatial gaps in global biodiversity information and the role of citizen science. BioScience 66, 393–400 (2016).
Article Google Scholar
Underwood, E., Taylor, K. & Tucker, G. The use of biodiversity data in Spatial planning and impact assessment in Europe. RIO 4 e28045 (2018).
Lin, H., Caley, M. J. & Sisson, S. A. Estimating global species richness using symbolic data meta-analysis.pdf. Ecography e05617. (2022).
Takashina, N. & Kusumoto, B. A perspective on biodiversity data and applications for spatio-temporally robust Spatial planning for area-based conservation. Discov Sustain. 4, 1 (2023).
Article Google Scholar
Hortal, J. et al. Seven shortfalls that beset Large-Scale knowledge of biodiversity. Annu. Rev. Ecol. Evol. Syst. 46, 523–549 (2015).
Article Google Scholar
Troia, M. J. & McManamay, R. A. Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the united States. Ecol. Evol. 6, 4654–4669 (2016).
Article PubMed PubMed Central Google Scholar
Zizka, A. et al. No one-size-fits-all solution to clean GBIF. PeerJ 8, e9916 (2020).
Article PubMed PubMed Central Google Scholar
Troudet, J., Grandcolas, P., Blin, A., Vignes-Lebbe, R. & Legendre, F. Taxonomic bias in biodiversity data and societal preferences. Sci. Rep. 7, 9132 (2017).
Article PubMed PubMed Central Google Scholar
García-Roselló, E., González-Dacosta, J. & Lobo, J. M. The biased distribution of existing information on biodiversity hinders its use in conservation, and we need an integrative approach to act urgently. Biol. Conserv. 283, 110118 (2023).
Article Google Scholar
Rocchini, D. et al. A quixotic view of Spatial bias in modelling the distribution of species and their diversity. Npj Biodivers. 2, 10 (2023).
Article PubMed PubMed Central Google Scholar
Schiesari, L., Grillitsch, B. & Grillitsch, H. Biogeographic biases in research and their consequences for linking amphibian declines to pollution. Conserv. Biol. 21, 465–471 (2007).
Article PubMed Google Scholar
Wüest, R. O. et al. Macroecology in the age of big Data – Where to go from here? J. Biogeogr. 47, 1–12 (2020).
Article Google Scholar
Beck, J., Böller, M., Erhardt, A. & Schwanghart, W. Spatial bias in the GBIF database and its effect on modeling species’ geographic distributions. Ecol. Inf. 19, 10–15 (2014).
Article Google Scholar
König, C. et al. Biodiversity data integration—the significance of data resolution and domain. PLoS Biol. 17, e3000183 (2019).
Article PubMed PubMed Central Google Scholar
Kadmon, R., Farber, O. & Danin, A. Effect of roadside bias on the accuracy of predictive maps produced by bioclimatic models. Ecol. Appl. 14, 401–413 (2004).
Article Google Scholar
Engemann, K. et al. Limited sampling hampers big data Estimation of species richness in a tropical biodiversity hotspot. Ecol. Evol. 5, 807–820 (2015).
Article PubMed PubMed Central Google Scholar
Borges, P. A. V. et al. Global Island monitoring scheme (GIMS): a proposal for the long-term coordinated survey and monitoring of native Island forest biota. Biodivers. Conserv. 27, 2567–2586 (2018).
Article Google Scholar
Alves, C., João Aguuiar, C., Cristina, R., João Pradinho, H. & Ângela, L. Research data management in the field of ecology: an overview. Int. Conf. Dublin Core Metadata Appl. https://doi.org/10.23106/dcmi.952138986 (2018).
Stephenson, P. et al. Priorities for big biodiversity data. Front. Ecol. Environ. 15, 124–125 (2017).
Article Google Scholar
Hachich, N. F. et al. Island biogeography patterns of marine shallow-water organisms in the Atlantic. J. Biogeogr. 42, 1871–1882 (2015).
Article Google Scholar
Simberloff, D. Extinction-proneness of Island species-causes and management implications. Raffles Bull. Zool. 48, 1–9 (2000).
Google Scholar
Russell, J. C. & Kueffer, C. Island biodiversity in the anthropocene. Annu. Rev. Environ. Resour. 44, 31–60 (2019).
Article Google Scholar
Warren, B. H. et al. Islands as model systems in ecology and evolution: prospects Fifty years after MacArthur-Wilson. Ecol. Lett. 18, 200–217 (2015).
Article PubMed Google Scholar
Whittaker, R. J., Fernández-Palacios, J. M., Matthews, T. J., Borregaard, M. K. & Triantis, K. A. Island biogeography: taking the long view of nature’s laboratories. Science 357, eaam8326 (2017).
Article PubMed Google Scholar
Andréfouët, S. & Adjeroud, M. Chapter 38 - French polynesia. In world seas: an environmental evaluation volume II: Indian Ocean to the Pacific. 827–854 (2019).
Galzin, R. & Meyer, J. Y. H. Les 124 Îles de La polynésie française: types, superficies, noms et occupation humaine. Bull. De La. Société Des. Études Océaniennes 123–136 (2024).
Kulbicki, M. Biogeography of reef fishes of the French territories in the South Pacific. Cybium 31, 275–288 (2007).
Google Scholar
Salvat, B. Dominant benthic mollusks in closed atolls, French Polynesia. Galaxea J. Coral Reef. Stud. 11, 197–206 (2009).
Article Google Scholar
Tröndlé, J. & Boutet, M. Inventory of marine molluscs of French Polynesia. Atoll Res. Bull. 1–87. https://doi.org/10.5479/si.00775630.570.1 (2009).
Delrieu-Trottin, E. et al. Shore fishes of the Marquesas islands, an updated checklist with new records and new percentage of endemic species. Cl 11, 1758 (2015).
Article Google Scholar
Delrieu-Trottin, E. et al. A DNA barcode reference library of French Polynesian shore fishes. Sci. Data. 6, 114 (2019).
Article PubMed PubMed Central Google Scholar
Salvat, B. & Tröndlé, J. Biogéographie des mollusques marins de polynésie Française. Revec 72, 215–257 (2017).
Article Google Scholar
Boutet, M., Gourguet, R. & Letourneux, J. Marine Molluscs of French Polynesia / Mollusques Marins De Polynésie Française (Au Vent Des Iles, 2020).
Vieira, C. et al. Global biogeography and diversification of a group of brown seaweeds (Phaeophyceae) driven by clade-specific evolutionary processes. J. Biogeogr. 48, 703–715 (2021).
Article Google Scholar
Vieira, C. et al. Diversity, systematics and biogeography of French Polynesian Lobophora (Dictyotales, Phaeophyceae). Eur. J. Phycol. 58, 226–253 (2023).
Article CAS Google Scholar
Ramage, T. Checklist of the terrestrial and freshwater arthropods of French Polynesia (Chelicerata; myriapoda; crustacea; Hexapoda). Zoosystema 39, 213 (2017).
Article Google Scholar
Thibault, J. C., Cibois, A. & Lynx birds of Eastern Polynesia: A biogeographic atlas. (Barcelona, 2017).
Florence, J. Flore De La Polynésie Française 1 (IRD édition/MNHN, 1997).
Florence, J. Flore De La Polynésie Française 2 (IRD Éditions/MNHN, 2004).
Chevillotte, H., Ollier, C. & Meyer, J. Y. Base De Données Botaniques Nadeaud De l’Herbier De La Polynésie Française (PAP). Institut Louis Malardé, Délégation À La Recherche, Papeete, Tahiti http://nadeaud.ilm.pf (Institut Louis Malardé, 2019).
Gillespie, R. G., Claridge, E. M. & Goodacre, S. L. Biogeography of the fauna of French polynesia: diversification within and between a series of hot spot archipelagos. Phil Trans. R Soc. B. 363, 3335–3346 (2008).
Article PubMed PubMed Central Google Scholar
Hembry, D. H. Evolutionary biogeography of the terrestrial biota of the Marquesas islands, one of the world’s remotest archipelagos. J. Biogeogr. 45, 1713–1726 (2018).
Article Google Scholar
Fernandez-Palacios, J. M. et al. Scientists’ warning – The outstanding biodiversity of Islands is in peril. Global Ecol. Conserv. 31, e01847 (2021).
Article Google Scholar
Pebesma, E. & Bivand, R. Spatial Data Science: with Applications in R (Chapman and Hall/CRC, 2023).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2024).
Chamberlain, S., Vanhoorne, B. & worrms World Register of Marine Species (WoRMS) Client. R package. (2023).
Chamberlain, S. et al. taxize: Taxonomic information from around the web. R package. (2020).
Grenié, M. & Gruson, H. rtaxref: An R Client for TAXREF the French taxonomical reference API. R package. (2022).
Maldonado, C. et al. Estimating species diversity and distribution in the era of B Ig D ata: to what extent can we trust public databases? Glob. Ecol. Biogeogr. 24, 973–984 (2015).
Article PubMed PubMed Central Google Scholar
Bonnet-Lebrun, A. S. et al. Opportunities and limitations of large open biodiversity occurrence databases in the context of a marine ecosystem assessment of the Southern ocean. Front. Mar. Sci. 10, 1150603 (2023).
Article Google Scholar
Smith, J. R. et al. A global test of ecoregions. Nat. Ecol. Evol. 2, 1889–1896 (2018).
Article PubMed Google Scholar
Lim, G. S., Balke, M. & Meier, R. Determining species boundaries in a world full of rarity: singletons, species delimitation methods. Syst. Biol. 61, 165–169 (2012).
Article PubMed Google Scholar
Chao, A. et al. Quantifying sample completeness and comparing diversities among assemblages. Ecol. Res. 35, 292–314 (2020).
Article Google Scholar
Montes, E. et al. Optimizing Large-Scale biodiversity sampling effort: toward an unbalanced survey design. Oceanog 34, 80–91 (2021).
Article Google Scholar
Soberón, J., Jiménez, R., Golubov, J. & Koleff, P. Assessing completeness of biodiversity databases at different Spatial scales. Ecography 30, 152–160 (2007).
Article Google Scholar
Deng, C., Daley, T. & Smith, A. Applications of species accumulation curves in large-scale biological data analysis. Quant. Biol. 3, 135–144 (2015).
Article CAS PubMed PubMed Central Google Scholar
Oksanen, J. et al. Vegan: Community ecology package. (2024).
Chao, A. Nonparametric Estimation of the number of classes in a population. Scand. J. Stat. 11, 265–270 (1984).
MathSciNet Google Scholar
Chao, A. Estimating the population size for Capture-Recapture data with unequal catchability. Biometrics 43, 783 (1987).
Article MathSciNet CAS PubMed Google Scholar
Colwell, R. K. & Coddington, J. A. Estimating terrestrial biodiversity through extrapolation. Phil Trans. R Soc. Lond. B. 345, 101–118 (1994).
Article CAS Google Scholar
Chao, A. & Chun-Huo, C. Species richness: Estimation and Compariso. Wiley StatsRef: Stat. Ref. Online. 1, 26 (2016).
Google Scholar
De Araujo, M. L., Quaresma, A. C. & Ramos, F. N. GBIF information is not enough: National database improves the inventory completeness of Amazonian epiphytes. Biodivers. Conserv. 31, 2797–2815 (2022).
Article Google Scholar
Ramírez, F., Sbragaglia, V., Soacha, K., Coll, M. & Piera, J. Challenges for marine ecological assessments: completeness of findable, accessible, interoperable, and reusable biodiversity data in European seas. Front. Mar. Sci. 8, 802235 (2022).
Article Google Scholar
Chanachai, J. et al. What remains to be discovered: A global assessment of tree species inventory completeness. Divers. Distrib. e13862 https://doi.org/10.1111/ddi.13862 (2024).
Soberón, J. & Peterson, T. Biodiversity informatics: managing and applying primary biodiversity data. Phil Trans. R Soc. Lond. B. 359, 689–698 (2004).
Article Google Scholar
Zizka, A., Antonelli, A. & Silvestro, D. sampbias, a method for quantifying geographic sampling biases in species distribution data. Ecography 44, 25–32 (2021).
Article Google Scholar
Meyer, C., Weigelt, P. & Kreft, H. Multidimensional biases, gaps and uncertainties in global plant occurrence information. Ecol. Lett. 19, 992–1006 (2016).
Article PubMed Google Scholar
Cornwell, W. K., Pearse, W. D., Dalrymple, R. L. & Zanne, A. E. What we (don’t) know about global plant diversity. Ecography 42, 1819–1831 (2019).
Article Google Scholar
Moudrý, V. & Devillers, R. Quality and usability challenges of global marine biodiversity databases: an example for marine mammal data. Ecol. Inf. 56, 101051 (2020).
Article Google Scholar
Clements, J. F. et al. The eBird/Clements checklist of birds of the world. (2024).
Bacchet, P., Zysman, T. & Lefevre, Y. Guide Des poissons de Tahiti et Ses Îles. (Éditions Au Vent des Îles, Tahiti (Polynésie Francaise), 2017).
Siu, G. et al. Shore fishes of French Polynesia. Cybium 41, 245–278 (2017).
Google Scholar
Porch, N., Smith, T. R. & Greig, K. Five new Pycnomerus Erichson (Coleoptera: zopheridae: Pycnomerini) from Raivavae. Fr. Polynesia Zootaxa. 4718, 239–250 (2020).
Google Scholar
Rocha-Ortega, M., Rodriguez, P. & Córdoba-Aguilar, A. Geographical, Temporal and taxonomic biases in insect GBIF data on biodiversity and extinction. Ecol. Entomol. 46, 718–728 (2021).
Article Google Scholar
Liebherr, J. The first precinctive Carabidae from moorea, society islands: new Mecyclothorax spp. (Coleoptera) from the summit of Mont Tohiea. ZK 224, 37–80 (2012).
Article Google Scholar
Mora, C., Tittensor, D. P. & Myers, R. A. The completeness of taxonomic inventories for describing the global diversity and distribution of marine fishes. Proc. R Soc. B. 275, 149–155 (2008).
Article PubMed Google Scholar
Sánchez-Fernández, D., Fox, R., Dennis, R. L. H. & Lobo, J. M. How complete are insect inventories? An assessment of the British butterfly database highlighting the influence of dynamic distribution shifts on sampling completeness. Biodivers. Conserv. 30, 889–902 (2021).
Article Google Scholar
Stokes, D. L. Things we like: human preferences among similar organisms and implications for conservation. Hum. Ecol. 35, 361–369 (2007).
Article Google Scholar
Ducarme, F., Luque, G. M. & Courchamp, F. What are charismatic species for conservation biologists?. BioSci. Master Rev. (2013).
De Pinho, J. R., Grilo, C., Boone, R. B., Galvin, K. A. & Snodgrass, J. G. Influence of aesthetic appreciation of wildlife species on attitudes towards their conservation in Kenyan agropastoralist communities. PLoS ONE. 9, e88842 (2014).
Article PubMed PubMed Central Google Scholar
Lomolino, M. V. Conservation biogeography. in Frontiers of Biogeography: New Directions in the Geography of Nature (eds Lomolino, M. V. & Heaney, L. R.) 293–296 (Sinauer Associates, Sunderland, MA, (2004).
Google Scholar
Kusumoto, B. et al. Occurrence-based diversity estimation reveals macroecological and conservation knowledge gaps for global woody plants. Sci. Adv. 9 (2023).
Shirey, V., Belitz, M. W., Barve, V. & Guralnick, R. A complete inventory of North American butterfly occurrence data: narrowing data gaps, but increasing bias. Ecography 44, 537–547 (2021).
Article Google Scholar
Biodiversité, T. et Marine des Îles marquises, polynésie française. (Paris, 2016).
Reisser, C. M. O. et al. Population connectivity and genetic assessment of exploited and natural populations of Pearl oysters within a French Polynesian Atoll lagoon. Genes 11, 426 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mittermeier, R. A. et al. Hotspots Revisited: Earth’s Biologically Richest and Most Endangered Terrestrial Ecoregions (The University of Chicago Press, 2005).
Williams, J. T., Delrieu-Trottin, E. & Planes, S. A new species of Indo-Pacific fish, Canthigaster criobe, with comments on other Canthigaster (Tetraodontiformes: Tetraodontidae) at the Gambier Archipelago. Zootaxa 3523, (2012).
Zimmermann, G., Gargominy, O. & Fontaine, B. Quatre espèces nouvelles d’endodontidae (Mollusca, Pulmonata) Éteints de Rurutu (Îles australes, polynésie française). Zoosystema 31, 791–805 (2009).
Article Google Scholar
Richling, I. & Bouchet, P. Extinct even before scientific recognition: a remarkable radiation of helicinid snails (Helicinidae) on the gambier islands, French Polynesia. Biodivers. Conserv. 22, 2433–2468 (2013).
Article Google Scholar
Hughes, A. C. et al. Sampling biases shape our view of the natural world. Ecography 44, 1259–1269 (2021).
Article Google Scholar
Hall, K. A. et al. Affinities of sponges (Porifera) of the Marquesas and society islands, French Polynesia. Pac. Sci. 67, 493–511 (2013).
Article Google Scholar
Terrestrial biodiversity of the Austral Islands, french polynesia. (Muséum d’Histoire Naturelle, 2014).
Adjeroud, M. et al. Reefs at the edge: coral community structure around rapa, southernmost French Polynesia. Mar. Ecol. 37, 565–575 (2016).
Article Google Scholar
Soulé, M. E. What is conservation biology?? A new synthetic discipline addresses the dynamics and problems of perturbed species, communities, and ecosystems. BioScience 35, 727–734 (1985).
Google Scholar
Raffaelli, D., Solan, M. & Webb, T. J. Do marine and terrestrial ecologists do it differently? Mar. Ecol. Prog. Ser. 304, 283–289 (2005).
Google Scholar
Munguia, P. & Ojanguren, A. F. Bridging the gap in marine and terrestrial studies. Ecosphere 6, 1–4 (2015).
Article Google Scholar
Álvarez-Romero, J. G. et al. Integrated Land-Sea conservation planning: the missing links. Annu. Rev. Ecol. Evol. Syst. 42, 381–409 (2011).
Article Google Scholar
Gillett, R. & Tauati, M. I. Fisheries of the Pacific islands. Regional and National information. FAO Fisheries Aquaculture Tech. Paper. 625, 401 (2018).
Google Scholar
Hanafi-Portier, M. & Samedi, S. Les monts sous-marins de polynésie française, etat des lieux des connaissances et recommandations scientifiques. https://hal.science/hal-04713244 (2024).
Mangiacotti, M. et al. Assessing the Spatial scale effect of anthropogenic factors on species distribution. PLoS ONE. 8, e67573 (2013).
Article CAS PubMed PubMed Central Google Scholar
Barrett, R. L., Taputuarai, R., Meyer, J. Y. H., Bruhl, J. J. & Wilson, K. L. Reassessment of the taxonomic status of Cyperaceae on Rapa iti, Austral islands, French polynesia, with a new combination, Morelotia involuta. Telopea 24, 171–187 (2021).
Article Google Scholar
Reddy, S. & Dávalos, L. M. Geographical sampling bias and its implications for conservation priorities in in Africa. J. Biogeogr. 30, 1719–1727 (2003).
Article Google Scholar
Mair, L. & Ruete, A. Explaining Spatial variation in the recording effort of citizen science data across multiple taxa. PLoS ONE. 11, e0147796 (2016).
Article PubMed PubMed Central Google Scholar
Freitag, S., Hobson, C., Biggs, H. C. & Van Jaarsveld, A. Testing for potential survey bias: the effect of roads, urban areas and nature reserves on a Southern African mammal data set. Anim. Conserv. 1, 119–127 (1998).
Article Google Scholar
Cardoso, M. N. M. et al. Causes and effects of sampling bias on marine Western Atlantic biodiversity knowledge. Divers. Distrib. 30, e13839 (2024).
Article Google Scholar
Ivanova, N. V. & Shashkov, M. P. The possibilities of GBIF data use in ecological research. Russ J. Ecol. 52, 1–8 (2021).
Article Google Scholar
Isaac, N. J. B., Van Strien, A. J., August, T. A., De Zeeuw, M. P. & Roy, D. B. Statistics for citizen science: extracting signals of change from noisy ecological data. Methods Ecol. Evol. 5, 1052–1060 (2014).
Article Google Scholar
Fan, H. et al. Conservation priorities for global marine biodiversity across multiple dimensions. Natl. Sci. Rev. 10, nwac241 (2023).
Article PubMed Google Scholar

Download references

Acknowledgements

We are grateful to Serge Andréfouët and Jean-Yves Hiro Meyer for their insightful comments on the biogeography of French Polynesia and for highlighting relevant prior research in the region.

Author information

Kilian Barreiro and Laura Benestan contributed equally to this work.

Authors and Affiliations

IFREMER, IRD, Institut Louis-Malardé, Univ Polynésie française, UMR SECOPOL, Vairao, Tahiti, French Polynesia
Kilian Barreiro, Laura Benestan & Cristián J. Monaco
IFREMER, Univ Brest, CNRS, IRD, UMR-6539 LEMAR, Plouzané, Brittany, France
Laura Benestan & Jérémy Le Luyer
CMOANA Consulting, BP 71607, Taravao Tahiti, 98719, French Polynesia
Charlotte Moritz
IRD, IFREMER, Institut Louis-Malardé, Univ Polynésie française, UMR SECOPOL, Tahiti, French Polynesia
Simon Ducatez & Jean-Claude Gaertner

Authors

Kilian Barreiro
View author publications
Search author on:PubMed Google Scholar
Laura Benestan
View author publications
Search author on:PubMed Google Scholar
Charlotte Moritz
View author publications
Search author on:PubMed Google Scholar
Simon Ducatez
View author publications
Search author on:PubMed Google Scholar
Jean-Claude Gaertner
View author publications
Search author on:PubMed Google Scholar
Jérémy Le Luyer
View author publications
Search author on:PubMed Google Scholar
Cristián J. Monaco
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors have read and approved the final version of the manuscript. K.B. and L.B. designed the study, curated the data, produced the original draft and wrote. C.M. contributed to methodological design the original draft, and writing. S.D., J-.C.G., and J.L.L. contributd to the methodological design and writing. C.J.M. adquired funding, contributed to the methodological design, the original draft and writing.

Corresponding author

Correspondence to Cristián J. Monaco.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Barreiro, K., Benestan, L., Moritz, C. et al. Species richness variation in marine and terrestrial fauna across widespread, fragmented territories: assessing inherent challenges of data scarcity at local and regional scales. Sci Rep 15, 21043 (2025). https://doi.org/10.1038/s41598-025-06631-4

Download citation

Received: 11 February 2025
Accepted: 10 June 2025
Published: 01 July 2025
DOI: https://doi.org/10.1038/s41598-025-06631-4

Subjects

Abstract

Similar content being viewed by others

Diversity, distribution and intrinsic extinction vulnerability of exploited marine bivalves

What Darwin could not see: island formation and historical sea levels shape genetic divergence and island biogeography in a coastal marine species

Quantitative and qualitative Data on historical Vertebrate Distributions in Bavaria 1845

Introduction

Materials and methods

Data collection

Validation of the taxonomic information

Habitat classification and biogeographical status

Data filtration sequence

Taxonomic biases: identifying under- and over-represented groups

Spatial and temporal heterogeneity in the sampling effort

Inventory completeness

Sampling bias due to accessibility

Results

Curated dataset for French Polynesian marine and terrestrial species

Taxonomic composition and biases

Spatial and temporal heterogeneity in sampling effort and the number of recorded species

Inventory completeness

Sampling bias due to human accessibility

Discussion

Building an accurate open-source biodiversity dataset

Linnean shortfall

Wallacean shortfall

The marine-terrestrial sampling bias

Sampling bias is partly influenced by accessibility factors

Institutional bias in open-source databases

Capitalising from citizen science while reducing biases in open-source datasets

Conclusions and perspectives

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Supplementary Material 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links