Abstract
Rapid and comprehensive data sharing is vital to the transparency and actionability of wildlife infectious disease research and surveillance. Unfortunately, most best practices for publicly sharing these data are focused on pathogen determination and genetic sequence data. Other facets of wildlife disease data – particularly negative results – are often withheld or, at best, summarized in a descriptive table with limited metadata. Here, we propose a minimum data and metadata reporting standard for wildlife disease studies. Our data standard identifies a set of 40 data fields (9 required) and 24 metadata fields (7 required) sufficient to standardize and document a dataset consisting of records disaggregated to the finest possible spatial, temporal, and taxonomic scale. We illustrate how this standard is applied to an example study, which documented a novel alphacoronavirus found in bats in Belize. Finally, we outline best practices for how data should be formatted for optimal re-use, and how researchers can navigate potential safety concerns around data sharing.
Similar content being viewed by others
Introduction
Infectious disease is a widely studied topic in wildlife biology and ecosystem science1. Every year, countless scientific studies report new data on the prevalence of macroparasites (e.g., ticks and tapeworms) and microparasites (e.g., bacteria, viruses, and other classically defined “pathogens”), hereafter “parasites” for simplicity2, in wild animals. These datasets are incredibly valuable, and – especially in aggregate – can be used to test ecological theory3; monitor the impacts of climate change4,5, land use change6,7, and biodiversity loss8; and even track emerging threats to human and ecosystem health9,10,11.
Disease ecologists engaged in synthesis research are often faced with reconciling datasets that vary greatly in their scope and granularity. For example, many studies do not report information about sampling effort over space and time, and may not even report the location of sampling sites9,12. Similarly, researchers often collect a wealth of host-level data that might help to understand infection processes (e.g., sex, age, life stage, or body size). However, many studies only provide summary statistics for parasite prevalence across different sites, species, or time points, which cannot be disaggregated back to the host level. For example, out of 110 studies we recently reviewed9 that have tested wild bats for coronaviruses, 96 only reported data in a summarized format (see Supplemental File 4). When studies did share individual-level data, they often did so only for positive results (11 of 14 studies), making it impossible to compare prevalence across populations, years, or species.
To address these issues, wildlife disease ecology would benefit from best practices for dataset standardization and sharing, similar to those that have been developed for other types of foundational data in the biological sciences13,14,15. Data standards facilitate the sharing, (re)use, and aggregation of data by humans and machines through the use of a common structure, set of properties, and vocabulary. Here, we designed a simple and flexible minimum data standard that is intended to be accessible to a range of practitioners, while providing sufficient structure for large-scale data analysis and meeting expectations for Findable, Accessible, Interoperable, and Reusable (FAIR) research practices16. We describe the required properties and structure for wildlife disease data that conform to the standard, building on a set of similar templates for sharing datasets related to arthropod disease vectors17,18,19,20 that focus on utility and ease of use. We document the development of the data standard, show how it can be applied to a simple dataset reporting coronavirus detection in wild bats, and suggest additional best practices for data sharing.
Methods
Our goal in this project was to develop guidelines for how researchers can collect and share standardized, well-documented wildlife disease datasets, with a focus on documenting sampling methods and findings. We developed our data standard based on: (i) experience conducting and publishing wildlife disease research, and collaborating with government programs doing the same; (ii) common practices already followed by most scientists in the literature when sharing disaggregated data, including the decisions made by major data sources such as the USAID PREDICT 2 project’s data release21; (iii) best practices for sharing ecological data that minimize room for error or loss of data22,23,24,25,26,27; and (iv) interoperability with standards used by other platforms, such as the Global Biodiversity Information Facility (GBIF)27. We assumed that parasite genetic sequence data and associated types (e.g., metatranscriptomes) are already widely archived on platforms like NCBI’s GenBank and Sequence Read Archive (SRA), following a different set of best practices, and are unlikely to be stored in the same data structure as we describe here.
The guiding philosophy of the data standard is that researchers should share their raw wildlife disease data in a format that data scientists refer to as “rectangular data” or “tidy data”28, where each row corresponds to a single measurement, here meaning the outcome of a diagnostic test. Tests, samples, and individual animals can each have many-to-many relationships due to common practices such as repeated sampling of the same animal, confirmatory tests, or sequencing of samples that test positive, and pooling of samples (sometimes from multiple animals and locations) for a single test. Based on this, there are three main categories of information collected: sample data, host animal data, and the parasite data itself, including both test results and any data characterizing a parasite once it has been detected (e.g., GenBank accession). We developed the fields associated with each of these categories through an iterative process using real-world data, as part of the ongoing development of a new dedicated platform for wildlife disease data, the Pathogen Harmonized Observatory (PHAROS) database (pharos.viralemergence.org). Project-level metadata was developed using the DataCite Metadata Schema as recommended by the Generalist Repository Ecosystem Initiative29,30.
Results
When to use the data standard
Before applying this standard, we encourage researchers to verify that their dataset describes wild animal samples that were examined for parasites, accompanied by information on the diagnostic methods used and the date and location of sampling. Examples of project types that would be suitable for the data standard include, but are not limited to: the first report of a parasite in a wildlife species31; investigation of a mass wildlife mortality event32; longitudinal, multi-site sampling of multiple wildlife species for a parasite33; regular parasite screening in a single monitored wildlife population34; screening of wildlife during an investigation of a human disease outbreak35; or a passive surveillance program that tests wildlife carcasses submitted by the public36.
Some closely-related types of data are better documented using a different data standard: for example, records of free-living macroparasites (e.g., tick dragging data) can be stored in Darwin Core format like any other biodiversity dataset27,37, or can adhere to the MIReAD (Minimum Information for Reusable Arthropod Abundance Data) data standard, which was designed with disease vector surveillance in mind19. Similarly, arthropod blood meal datasets can follow another recently-published data standard18. Finally, environmental monitoring datasets (e.g., soil, water, or air microbiome metagenomics) not associated with a specific animal under direct or indirect observation should also be handled following other best practices38,39.
The data standard
Our proposed data standard includes 40 core fields (11 related to sampling, 13 related to the host organism being sampled, and 16 related to the parasite itself) and 24 fields related to project metadata. The contents of the 40 core fields and their interpretation are described in Tables 1–3 (split into three tables for the reader’s ease).
Many of the fields are open text, and this flexibility is intentional. The diversity of collection, detection, and measurement methods that researchers use is likely to be beyond the scope of a single controlled vocabulary. Restrictive values may therefore limit the adoption of the data standard by the community. To that end, we have elected to leave these fields as open text in this version of the data standard, but may restrict values as the standard matures. Nevertheless, we encourage users to take advantage of existing controlled vocabularies (see Supporting Information) when using this standard.
In Table 4, we show how a real, previously published dataset40 could be formatted using the data standard. The example dataset describes a single vampire bat (BZ19-114) tested for coronaviruses in Belize in 2019: a rectal swab tested negative, while an oral swab tested positive, leading to the identification of a novel alphacoronavirus. All mandatory and relevant fields are shown, and cells are left blank if they do not apply (e.g., parasite identity is always empty for negative test results). The data in Table 4 are only a subset of the full dataset, which is shared in full on the PHAROS platform (project: prjRPayEvMecN). While project-level metadata will likely be captured upon deposit in a scientific data repository, we include metadata for the example project in Table S4 (see Supporting Information).
How to use the data standard
For researchers who want to apply the data standard to their own projects, we recommend following four basic steps:
-
1.
Fit for purpose. The dataset or data to be collected describe wild animal samples that were examined for parasites. Each record must include the host identification, diagnostic methods used to identify parasites, outcome of the diagnostic method, parasite identification, and the date and location of sampling.
-
2.
Tailor the standard. Researchers should consult the list of fields in Tables 1–3 and identify (i) which fields beyond the required fields are applicable to their study design, (ii) which ontologies or controlled vocabularies may be appropriate for free text fields, and (iii) whether additional fields are needed.
-
3.
Format the data. Template files in.csv and.xlsx format are available in both the supplement of this paper and from GitHub (github.com/viralemergence/wdds).
-
4.
Validate the data. We have provided both a JSON Schema that implements the standard, and a simple R package (available from GitHub at github.com/viralemergence/wddsWizard) with convenience functions to validate data and metadata against the JSON Schema.
-
5.
Share the data. Researchers should make their data available in a findable, open-access generalist repository (e.g., Zenodo) and/or specialist platform (e.g., the PHAROS platform).
We discuss best practices for some of these steps in greater depth below.
Best practices for flexibility and extensibility
Although our data standard is intended to capture a minimal set of information, not all fields are applicable to every study design. For example, studies that use PCR as a diagnostic method have different applicable fields (“Forward primer sequence,” “Reverse primer sequence,” “Gene target,” “Primer citation”) than those using ELISA (“Probe target,” “Probe type,” “Probe citation”; see Table 3). Similarly, some studies that use a pooled testing approach may leave the “Animal ID” field blank, because animals are not individually identified by researchers (e.g., testing of mosquito pools for arboviral diseases); in other cases, a pooled test may be linked to multiple Animal ID values, and researchers can provide associated metadata on individual animals in a supplemental file (see Fig. 1).
Examples of one-to-one, many-to-one, and one-to-many relationships between fields of the minimum data standard, including commonly-encountered “special cases.” In a simple study design (top row), one sample corresponds to one animal, one sampling method, one parasite test, and potentially, one parasite detection. However, in other studies, multiple samples may be collected from the same animal (e.g., blood and wing punch collected from a bat), a single sample may be tested multiple times (e.g., the blood sample is screened for both coronaviruses and paramyxoviruses), or multiple parasites may be detected in one sample (e.g., the blood sample tests positive for a coronavirus and a paramyxovirus) (second row). Nested detections (third row) can occur when a parasite associated with one animal itself harbors another parasite (e.g., a flea is sampled from a rat, and the flea also tests positive for Yersinia pestis). Researchers may also combine samples from multiple animals into a single pooled sample (bottom row). In some cases, the associated animals are “unidentified” (e.g., a pooled sample of 30 mosquitoes). However, if a researcher does have data on each animal linked to a pooled sample, they can provide it in an additional file.
Some datasets may not be able to meet a comprehensive standard for documentation. When data are missing or fields are inapplicable, researchers should leave fields or cells blank instead of using placeholder values like “NA”41. For example, in some projects, limited funding or study protocols may preclude all captured animals from being sampled or all samples from being tested. Researchers might therefore include a mix of records of animals or samples with no attached test data (i.e., leaving “Detection outcome” blank). Similarly, archival samples that are rescued from old projects, or older museum specimens that are sampled for parasites42, may not always have complete date information, leading to “Collection day” and “Collection month” being left blank. We encourage researchers to adapt our data standard to their specific purposes and, as appropriate, to consider sharing their data in multiple applicable formats. For example, in the previous example, researchers might choose to both share their test results on the PHAROS platform and share a more comprehensive record of all sampling on Zenodo.
Researchers may also wish to include additional fields beyond the minimum data standard to share other kinds of information. For example, researchers might add fields for “Health status” (example values: “healthy”; “sick”; “injured”) or “Reproductive status (“pregnant”; “lactating”), or might use an an all-purpose “Notes” column to flag unusual records or non-standardized information about sampling (e.g., the circumstances under which a dead animal was found, such as opportunistic roadkill collection). Similarly, in cases where findings are particularly sensitive for public health or economic reasons, researchers might consider including some guidance on how to interpret them in the data itself. For example, the data shared by the USAID PREDICT 2 project includes a field called “Interpretation,” which provides guidance such as this disclaimer on a positive test result: “[The virus detected in this sample] is the known ebolavirus, Bombali virus, detected in an Angolan free-tailed bat. This virus has previously been found in bats in Sierra Leone as part of the PREDICT project. Further characterization is ongoing to understand the zoonotic potential of this virus.”
Best practices for sharing (and withholding) data
When using the data standard, we suggest that researchers should follow scientific conventions and best practices for data science, such as: reporting measurements in metric units; reporting taxonomic information at the most granular level possible for both the host and parasite; and leaving empty and non-applicable cells blank, rather than assigning a placeholder such as “NA”41. Researchers should also ensure that their manuscript comprehensively describes all important aspects of sampling methodology, such as the circumstances (e.g., systematic and planned sampling versus opportunistic collection of unusual carcasses), how animal taxonomy was determined (e.g., expert opinion based on morphology versus DNA barcoding), and how samples were prepared (e.g., specific products or kits used, or specific details about the methods used in parasitological dissections). These details will often be the same for each individual row of data, so we exclude them from the template. However, interpreting a study’s data correctly may still depend on these data being available. Researchers should also ensure that their study documents any relevant epidemiological observations (e.g., unusual disease presentation or nearby indicators of human-wildlife contact such as hunting traps, farms, or sewage discharge). Finally, whenever possible, researchers should also share all sequence data in an open repository.
As with other kinds of biodiversity data43,44, sharing wildlife disease data paired with high-resolution location data can sometimes be unsafe or inadvisable. For example, sharing the location of a bat roost where viruses have been detected may lead to animal culling, which in turn increases the risk of viral exposure for local human communities45,46. There may also be biosafety or biosecurity risks associated with location data, depending on the characteristics of the parasite in question; for example, anthrax spores can persist at a carcass site for several years47,48. In sensitive cases, researchers could consider truncating longitude and latitude values, or, potentially, jittering records with random noise. They should then carefully and clearly document the obfuscation process; guidance on this practice exists for other kinds of biodiversity data49. In some cases, this obfuscation may still be insufficient to prevent malicious use50. In high-risk cases, journal editors should work closely with authors to ensure that neither the manuscript itself nor any supplementary data have a significant potential to cause harm.
Best practices for publishing datasets
Published data should be stored in commonly used, non-proprietary flat file formats, like comma-separated values (i.e.,.csv with UTF-8 encoding and a period decimal separator), to increase accessibility, interoperability, and utility. Non-proprietary file formats increase access by removing the requirement to have a particular piece of software to open a file. Formats like .csv can also be used across all major operating systems, programming languages, and scientific analysis software suites, greatly expanding interoperability and utility.
The data deposit should contain sufficient documentation to facilitate discovery and use by researchers outside of the project. Data contributors can take steps to increase data discoverability by providing complete project metadata. Using persistent identifiers (PIDs) to create explicit links between the dataset and related publications via digital object identifiers (DOI), individuals with Open Researcher and Contributor IDs (ORCID), organizations with Research Organization Registry (ROR) identifiers for institutional affiliations, and funders with CrossRef Funder identifiers for funding sources creates strong semantic links that improve search results and allow for automated indexing of relationships. Our approach to project-level metadata is based on the DataCite Metadata Schema29, and includes fields recommended by the Generalist Repository Ecosystem Initiative30 to maximize data discoverability and metadata interoperability. Much of this metadata, if not more, will be captured upon deposit in scientific repositories.
Researchers must be able to interpret the data in order to use it appropriately. To that end, it is important that data contributors include a written description of the data, its intended use, and known limitations (e.g., explanations of missing values or fields) in the project metadata, as well as a data dictionary describing the fields of the flat data file. By using a data standard, data producers can quickly create a data dictionary. To ensure this data standard remains interoperable with other data initiatives, we provide cross-mapping of the fields to the Darwin Core terms51 used for biodiversity observations, as well as links to different GenBank data products through unique identifiers. These fields are validated automatically when using the Wildlife Disease Data Standard JSON Schema through the wddsWizard R package. For further specificity, data producers may use terms from ontologies or controlled vocabularies when referring to specific measurements or tests
To ensure that data producers get credit for their work, data should be deposited into archival platforms that can provide a PID like a DOI, capture project metadata, and surface relevant works via search. Commonly used archives include Zenodo, OSF.io, DataDryad, and figshare. Some journals have agreements with archival data platforms that can waive the costs of archiving data, in addition to creating a semantic link between the DOI of the publication and the DOI of the dataset.
Data producers are encouraged to deposit material in multiple archives, including discipline-specific and generalist repositories. Publishing the flat files on multiple data platforms has a series of advantages. First, increasing the number of copies decreases dependency on a single platform, increases data longevity, and reduces the risk of deletion or modification. Second, having data on multiple platforms (and especially discipline-specific platforms) maximizes the chances that they are discovered. Finally, for data contributors, depositing data in general-purpose repositories also offers additional flexibility in terms of archiving record- or project-level information that is not in the scope of our data standard. For example, the ImmPORT platform uses a data model that allows researchers to provide direct links to NIH resources, detailed lists of personnel involved in a project, and direct connections to relevant biomedical ontologies52.
Discussion
Here, we propose a data standard for wildlife infectious disease studies. With minimal modifications, the same template could also be used for related types of data, such as records of plant pathogens, or infections in captive animal populations such as zoos and wildlife sanctuaries. However, other types of spatiotemporal disease data may already have associated best practices and dedicated or otherwise well-suited repositories. For example, disaggregated but carefully de-identified human infectious disease data can be shared in epidemic settings on the Global.health platform53; host, vector, and parasite occurrence data can also all be documented in Darwin Core format and shared in GBIF54,55,56.
We encourage researchers to adopt this minimum standard, and to deposit their data in generalist repositories (e.g., Figshare, Data Dryad, or Zenodo) and specialist platforms (e.g., PHAROS), so that their data are findable, accessible, interoperable, and reusable (FAIR) by other scientists16. Doing so will help researchers meet the minimum requirements for data sharing now adopted by most journals and scientific funders. Researchers could even consider sharing data before or independent of manuscript publication, especially in cases where negative data might not be publishable, or where timely sharing of findings might be particularly relevant to public health or conservation. Progress toward open, timely data sharing will make wildlife disease research a richer and more rigorous field, leading to better insights about emerging threats to human and animal health.
Data availability
The example dataset and blank templates are available from GitHub at github.com/viralemergence/wdds.
Code availability
An R package to validate data against the data standard described in this paper is available from GitHub at github.com/viralemergence/wddsWizard.
References
McCallen, E. et al. Trends in ecology: shifts in ecological research themes over the past four decades. Front Ecol Environ. 17, 109–116 (2019).
Lafferty, K. D. & Kuris, A. M. Trophic strategies, animal diversity and body size. Trends Ecol Evol. 17, 507–513 (2002).
Stephens, P. R. et al. The macroecology of infectious diseases: a new perspective on global-scale drivers of pathogen distributions and impacts. Ecol Lett. 19, 1159–1171 (2016).
Cohen, J. M., Sauer, E. L., Santiago, O., Spencer, S. & Rohr, J. R. Divergent impacts of warming weather on wildlife disease risk across climates. Science.;370, https://doi.org/10.1126/science.abb1702 (2020).
Xu, Y. et al. Continental‐scale climatic gradients of pathogenic microbial taxa in birds and bats. Ecography 2023, https://doi.org/10.1111/ecog.06783 (2023).
Heckley, A. M., Lock, L. R. & Becker, D. J. A meta‐analysis exploring associations between habitat degradation and Neotropical bat virus prevalence and seroprevalence. Ecography 2024, https://doi.org/10.1111/ecog.07041 (2024).
Warmuth, V. M., Metzler, D. & Zamora-Gutierrez, V. Human disturbance increases coronavirus prevalence in bats. Sci Adv. 9, eadd0688 (2023).
Carlson, C. J. et al. Pathogens and planetary change. Nat Rev Biodivers. 1, 32–49 (2025).
Cohen, L. E., Fagre, A. C., Chen, B., Carlson, C. J. & Becker, D. J. Coronavirus sampling and surveillance in bats from 1996–2019: a systematic review and meta-analysis. Nature Microbiology 8, 1176–1186 (2023).
Becker, D. J., Crowley, D. E., Washburne, A. D. & Plowright, R. K. Temporal and spatial limitations in global surveillance for bat filoviruses and henipaviruses. Biol Lett. 15, 20190423 (2019).
Tolsá, M. J., García-Peña, G. E., Rico-Chávez, O., Roche, B. & Suzán, G. Macroecology of birds potentially susceptible to West Nile virus. Proc Biol Sci. 285, 20182178 (2018).
Albery, G. F., Sweeny, A. R., Becker, D. J. & Bansal, S. Fine‐scale spatial patterns of wildlife disease are common and understudied. Funct Ecol. 36, 214–225 (2022).
Leigh, D. M. et al. Best practices for genetic and genomic data archiving. Nat Ecol Evol. 8, 1224–1232 (2024).
Groom, Q. et al. Improved standardization of transcribed digital specimen data. Database (Oxford). https://doi.org/10.1093/database/baz129 (2019).
Schneider, F. D. et al. Towards an ecological trait‐data standard. Methods Ecol Evol. 10, 2006–2019 (2019).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 3, 160018 (2016).
Wu, V. Y. et al. A minimum data standard for vector competence experiments. Sci Data. 9, 634 (2022).
Wehmeyer, M. L., Sauer, F. G., Lühken, R. A minimum data standard for reporting host-feeding patterns of vectors. Available: https://www.researchsquare.com/article/rs-3896902/latest (2024).
Rund, S. S. C. et al. MIReAD, a minimum information standard for reporting arthropod abundance data. Sci Data. 6, 40 (2019).
Ryan, S. J. et al. MIReVTD, a Minimum Information Standard for Reporting Vector Trait Data. bioRxiv. https://doi.org/10.1101/2025.01.27.634769 (2025).
PREDICT Consortium. PREDICT Emerging Pandemic Threats Project. USAID Development Data Library. Available: https://data.usaid.gov/d/tqea-hwmr (2021).
Poisot, T., Bruneau, A., Gonzalez, A., Gravel, D. & Peres-Neto, P. Ecological Data Should Not Be So Hard to Find and Reuse. Trends Ecol Evol. 34, 494–496 (2019).
Guralnick, R., Walls, R. & Jetz, W. Humboldt Core - toward a standardized capture of biological inventories for biodiversity monitoring, modeling and assessment. Ecography 41, 713–725 (2018).
Augustine, S. P., Bailey-Marren, I., Charton, K. T., Kiel, N. G. & Peyton, M. S. Improper data practices erode the quality of global ecological databases and impede the progress of ecological research. Glob Chang Biol. 30, e17116 (2024).
Costello, M. J. & Wieczorek, J. Best practice for biodiversity data management and publication. Biol Conserv. 173, 68–73 (2014).
Keller, A. et al. Ten (mostly) simple rules to future‐proof trait data in ecological and evolutionary sciences. Methods Ecol Evol. 14, 444–458 (2023).
Wieczorek, J. et al. Darwin Core: an evolving community-developed biodiversity data standard. PLoS One. 7, e29715 (2012).
Wickham, H., Çetinkaya-Rundel, M., Grolemund, G. R for Data Science. “O’Reilly Media, Inc.” (2023).
DataCite Metadata Working Group. DataCite metadata schema documentation for the publication and citation of research data and other research outputs v4.5. DataCite. https://doi.org/10.14454/G8E5-6293 (2024).
Curtin, L. et al. GREI Metadata and Search Subcommittee Recommendations_V01_2023-06-29. https://doi.org/10.5281/ZENODO.8101957 (Zenodo; 2023)
Chase, E. C. et al. Rat Lungworm (Angiostrongylus cantonensis) in the Invasive Cuban Treefrog (Osteopilus septentrionalis) in Central Florida, USA. J Wildl Dis. 58, 454–456 (2022).
Gamarra-Toledo, V. et al. Mass mortality of sea lions caused by highly pathogenic avian influenza A(H5N1) virus. Emerg Infect Dis. 29, 2553–2556 (2023).
Schatz, J. et al. Twenty years of active bat rabies surveillance in Germany: a detailed analysis and future perspectives. Epidemiol Infect. 142, 1155–1166 (2014).
Hayward, A. D. et al. Long-term temporal trends in gastrointestinal parasite infection in wild Soay sheep. Parasitology. 149, 1749–1759 (2022).
Nichol, S. T. et al. Genetic identification of a hantavirus associated with an outbreak of acute respiratory illness. Science. 262, 914–917 (1993).
Osterman Lind, E. et al. First detection of Echinococcus multilocularis in Sweden, February to March 2011. Euro Surveill.;16, https://doi.org/10.2807/ese.16.14.19836-en (2011).
Paull, S. H., Thibault, K. M. & Benson, A. L. Tick abundance, diversity and pathogen data collected by the National Ecological Observatory Network. GigaByte. 2022, gigabyte56 (2022).
Vangay, P. et al. Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities. mSystems. 6, https://doi.org/10.1128/msystems.01194-20 (2021).
Huttenhower, C., Finn, R. D. & McHardy, A. C. Challenges and opportunities in sharing microbiome data and analyses. Nat Microbiol. 8, 1960–1970 (2023).
Becker, D. J. et al. Serum proteomics identifies immune pathways and candidate biomarkers of coronavirus infection in wild vampire bats. Front Immunol. 14, 2022.01.26.477790 (2023).
White, E. et al. Nine simple ways to make it easier to (re)use your data. Ideas in Ecology and Evolution 6, 1–10 (2013).
Wood, C. L. et al. A reconstruction of parasite burden reveals one century of climate-associated parasite decline. Proc Natl Acad Sci USA 120, e2211903120 (2023).
Tulloch, A. I. T. et al. A decision tree for assessing the risks and benefits of publishing biodiversity data. Nat Ecol Evol. 2, 1209–1217 (2018).
Lunghi, E., Corti, C., Manenti, R. & Ficetola, G. F. Consider species specialism when publishing datasets. Nat Ecol Evol. 3, 319 (2019).
Shapiro, J. T. et al. Setting the Terms for Zoonotic Diseases: Effective Communication for Research, Conservation, and Public Policy. Viruses. 13, https://doi.org/10.3390/v13071356 (2021).
Amman, B. R. et al. Marburgvirus resurgence in Kitaka Mine bat population after extermination attempts, Uganda. Emerg Infect Dis. 20, 1761–1764 (2014).
Carlson, C. J. et al. Spores and soil from six sides: interdisciplinarity and the environmental biology of anthrax (Bacillus anthracis). Biol Rev Camb Philos Soc. 93, 1813–1831 (2018).
Barandongo, Z. R. et al. The persistence of time: the lifespan of Bacillus anthracis spores in environmental reservoirs. Res Microbiol. 174, 104029 (2023).
Chapman, A. D., Grafton, O. Guide to best practices for generalising sensitive/primary species occurrence-data. Version 1.0. Available: https://repository.oceanbestpractices.org/handle/11329/605 (2008).
Beery, S., Bondi, E. Can poachers find animals from public camera trap images? arXiv [cs.CV]. Available: http://arxiv.org/abs/2106.11236 (2021).
Darwin Core Maintenance Group. Darwin Core List of Terms. In: Biodiversity Information Standards (TDWG) [Internet]. [cited 18 Apr 2025]. Available: http://rs.tdwg.org/dwc/doc/list/2023-09-18 (2023).
Bhattacharya, S. et al. ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data. 5, 180015 (2018).
Benjamin, A. et al. Global.health: a scalable plaform for pandemic data integration, analytics, and preparedness. Research Square. https://doi.org/10.21203/rs.3.rs-1528783/v1 (2022).
Salim, J. A., Seltmann, K., Poelen, J., Saraiva, A. Indexing Biotic Interactions in GBIF data. Biodivers Inf Sci Stand. 6, https://doi.org/10.3897/biss.6.93565 (2022).
Astorga, F. et al. Biodiversity data supports research on human infectious diseases: Global trends, challenges, and opportunities. One Health. 16, 100484 (2023).
Edmunds, S. C. et al. Publishing data to support the fight against human vector-borne diseases. Gigascience. 11, https://doi.org/10.1093/gigascience/giac114 (2022).
Acknowledgements
This work was supported by an NSF Biology Integration Institute grant (NSF DBI 2021909, 2213854, and 2515340). We also thank countless colleagues for conversations and work that shaped this data standard, especially Noam Ross.
Author information
Authors and Affiliations
Contributions
C.J.S. developed the wddsWizard R package. All authors contributed to the conceptualization and writing, and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Schwantes, C.J., Sánchez, C.A., Stevens, T. et al. A minimum data standard for wildlife disease research and surveillance. Sci Data 12, 1054 (2025). https://doi.org/10.1038/s41597-025-05332-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-05332-x