In the life sciences, FAIR principles have reshaped research policy, but their implementation still relies largely on individual researchers – many of whom lack the expertise or support needed to make data truly reusable. Realising FAIR’s promise requires sustained investment in the infrastructures that organise, standardise, and curate data: deposition databases and knowledgebases. These biodata resources are especially critical for AI, which depends on large, high-quality, and consistent data. Landmark advances like AlphaFold and the COVID-19 response illustrate how sustained curation and standardisation in expert resources such as UniProt and the Protein Data Bank have enabled rapid innovation. Yet biodata resources remain precariously funded, jeopardising long-term sustainability and the expert workforce they require. To support ambitious, data-driven science, funders must align policy and budgets by establishing dedicated mechanisms that allocate a small (e.g., 1%), but strategic and stable share, of research funding to core data infrastructures. This would maximise the value of public investment, strengthen open science and international collaboration, and unlock the full potential of FAIR.
References
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016).
Gabella, C., Durinx, C. & Appel, R. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case. F1000Res. 6, 2051 (2018).
Mons, B. Invest 5% of research funds in ensuring data are reusable. Nature Publishing Group UK https://doi.org/10.1038/d41586-020-00505-7 (2020).
Stroe, O. Open data on the rise: the value of EMBL-EBI data resources. EMBL-EBI News https://www.ebi.ac.uk/about/news/announcements/value-and-impact-emblebi-2021/ (2021).
Dessimoz, C. & Thomas, P. D. AI and the democratization of knowledge. Sci. Data 11, 268 (2024).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Crystallography: Protein Data Bank. Nat. New Biol. 233, 223–223 (1971).
Choudhary, P. et al. PDB NextGen Archive: centralizing access to integrated annotations and enriched structural information by the Worldwide Protein Data Bank. Database (Oxford) 2024 (2024).
UniProt Consortium. UniProt: The universal protein knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617 (2025).
Zhu, N. et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. New England Journal of Medicine https://doi.org/10.1056/NEJMoa2001017 (2020).
De Castro, E. et al. ViralZone 2024 provides higher-resolution images and advanced virus-specific resources. Nucleic Acids Res 52, D817–D821 (2023).
O’Cathail, C. et al. The European Nucleotide Archive in 2024. Nucleic Acids Res. 53, D49–D55 (2025).
Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020).
Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123 (2018).
Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. Nucleic Acids Res 44, D67–D72 (2015).
Iudin, A. et al. EMPIAR: the Electron Microscopy Public Image Archive. Nucleic Acids Res 51, D1503–D1511 (2022).
Zenodo. https://zenodo.org (2025).
Rutherford, K. M., Lera-Ramírez, M. & Wood, V. PomBase: a Global Core Biodata Resource-growth, collaboration, and sustainability. Genetics 227 (2024).
Bansal, P. et al. Rhea, the reaction knowledgebase in 2022. Nucleic Acids Res 50, D693–D700 (2021).
Bastian, F. B. et al. Bgee in 2024: focus on curated single-cell RNA-seq datasets, and query tools. Nucleic Acids Res. 53, D878–D885 (2025).
Durinx, C. et al. Identifying ELIXIR Core Data Resources. F1000Res. 5, 2422 (2017).
Sarkans, U. et al. The BioStudies database-one stop shop for all data supporting a life sciences study. Nucleic Acids Res. 46, D1266–D1270 (2018).
Europe PMC Consortium. Europe PMC: a full-text literature database for the life sciences and platform for innovation. Nucleic Acids Res 43, D1042–8 (2015).
Gobeill, J. et al. SIB Literature Services: RESTful customizable search engines in biomedical literature, enriched with automatically mapped biomedical concepts. Nucleic Acids Res 48, W12–W16 (2020).
Leitner, F. et al. Introducing meta-services for biomedical information extraction. Genome Biology 9, 1–11 (2008).
Gobeill, J. et al. Overview of the BioCreative VI text-mining services for Kinome Curation Track. Database (Oxford) 2018, (2018).
Gaudet, P. & Dessimoz, C. Gene Ontology: Pitfalls, Biases, and Remedies. Methods Mol Biol 1446, 189–205 (2017).
Rodríguez-López, M. et al. Broad functional profiling of fission yeast proteins using phenomics and machine learning. https://doi.org/10.7554/eLife.88229 (2023).
Lai, P.-T. et al. EnzChemRED, a rich enzyme chemistry relation extraction dataset. Scientific Data 11, 1–19 (2024).
Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3, 1–23 (2022).
Anderson, W. P., Global Life Science Data Resources Working Group. Data management: A global coalition to sustain core data. Nature 543, 179 (2017).
List of Current Global Core Biodata Resources. Global Biodata Coalition https://globalbiodata.org/what-we-do/global-core-biodata-resources/list-of-current-global-core-biodata-resources/ (2023).
The Agreement – CoARA. https://coara.eu/agreement/the-agreement-full-text/ (2025).
Imker, H. J. Who bears the burden of long-lived molecular biology databases? Data Sci. J. 19, 8 (2020).
Johnson, T. R. & Bourne, P. E. The biological data sustainability paradox. arXiv [q-bio.OT] (2023).
https://eden-fidelis.eu (2025).
Homepage. EOSC Data Commons https://www.eosc-data-commons.eu (2025).
Gabella, C., Duvaud, S. & Durinx, C. Managing the life cycle of a portfolio of open data resources at the SIB Swiss Institute of Bioinformatics. Brief. Bioinform. 23 (2022).
Lauer, K. B. et al. Open data: A driving force for innovation in the life sciences. F1000Research 10, 828 (2021).
Beagrie, N. & Houghton, J. Data-Driven Discovery: The Value and Impact of EMBL-EBI Managed Data Resources. https://www.embl.org/documents/document/embl-ebi-impact-report-2021/ (2021).
Tauriello, G. et al. ModelArchive: A deposition database for computational macromolecular structural models. J. Mol. Biol. 168996 https://doi.org/10.1016/j.jmb.2025.168996 (2025).
Acknowledgements
This study received funding from ELIXIR: the research infrastructure for life-science data. In addition, we acknowledge SNSF grant #205085 to C.D, and SNSF/CHIST-ERA grant #217525 to P. R.
Author information
Authors and Affiliations
Contributions
This article was developed within the framework of Work Package 5 of the ELIXIR Data Platform Workplan (2024–2028). L.P. and C.D. initiated and coordinated the manuscript drafting based on the group’s prior work. L.P. and C.D. wrote the first draft. All authors contributed to the conceptual development of the arguments and provided comments, suggestions, and revisions. All authors approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Poveda, L., Farrell, G., Tosatto, S.C.E. et al. The missing link in FAIR data policy: biodata resources in life sciences. Sci Data (2026). https://doi.org/10.1038/s41597-026-06690-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-06690-w