Comment

The permanence paradox in decentralized storage and the mutable nature of scientific knowledge

Centralized data repositories are foundational to the open science movement, yet they face persistent challenges that threaten the long-term viability of the FAIR data principles. Issues of sustainability, cost, and vulnerability to single points of failure cast doubt on whether today’s data will remain Findable and Accessible for future generations. Emerging decentralized storage technologies, like the InterPlanetary File System (IPFS) and blockchain-based platforms such as Arweave, propose a paradigm shift toward data permanence, integrity, and resilience. This commentary explores the potential of these systems to create a more robust and censorship-resistant scientific record. It evaluates the benefits, including long-term preservation and verifiable data integrity that enhance Reusability, against significant obstacles such as usability, governance, and economic sustainability. While decentralized storage is not a panacea, its principles warrant serious consideration. This commentary advocates for a hybrid model that integrates the strengths of centralized and decentralized systems, ensuring that the scientific record remains FAIR for generations while addressing the challenge of managing the evolution of scientific knowledge within a permanent system.

Tenzin Tamang
CommentOpen Access28 Nov 2025
Making genomic data FAIR through effective Data Portals

Genomic data portals collect, annotate, and make data files available to researchers and, increasingly, AI algorithms. They are run by, among others, broad data archive repositories or consortium-specific Data Coordination Centers. Their design may seem a niche topic, but these portals realize the open data principles by making millions of data files findable, accessible, interoperable, and reusable (FAIR). Almost every researcher uses them, yet, we are unaware of published guidance on how web data portals should be funded, built, and run. We present lessons we have learned from creating genomics-focused data portals. We highlight the importance of funders in defining rules, human data wranglers as liaisons, a flexible and simple metadata schema, and a user-centered engineering process. We also present concrete suggestions on accessions, metrics, testing, controlled access, and licenses. Finally, we discuss the unsolved problems of interoperability, portal reuse, and long-term stability. We hope these guidelines can help funders and creators of new data portals develop a better understanding of the unique challenges they may face and possible solutions.

Matthew L. Speir
Wei Kheng Teh
Maximilian Haeussler
CommentOpen Access28 Nov 2025
Global health data precarity: Safeguarding the Demographic and Health Survey program as a global public good

The Demographic and Health Surveys (DHS) program, established in 1984 and supported by USAID, was one of the most substantial data collection efforts in global health. It supported the collection of critical data across 90 + countries. The DHS program’s standardized methodology enabled cross-country comparisons, longitudinal analyses, and monitoring of health indicators, including maternal and child health, mortality, and intervention coverage. These datasets have generated thousands of peer-reviewed publications and guided policy decisions globally. This commentary examines the precarious nature of the program following budget cuts by the current US government. The precarity of the DHS Program threatens to disrupt a trusted and standardized global data collection and dissemination ecosystem at a time when such data are most needed. Other data sources exist; however, few match DHS’s rigor, standardization, and comprehensiveness. Recent efforts have, for the time being, preserved access to data and related documentation. Emergency funding is supporting some data collection activities. However, long-term solutions should involve inclusive funding and leadership structures, potentially shifting global health power dynamics.

Brian Wahl
Gautam I. Menon
Bhramar Mukherjee
CommentOpen Access19 Nov 2025
Blockchain Technology for Big-data Sharing in Material Genome Engineering

Blockchain technology holds transformative potential for Material Genome Engineering (MGE) by offering a decentralized, secure, and transparent framework for data sharing. Immutable ledgers provide tamper-proof provenance, ensuring trust in multi-institutional collaborations through precise tracking of data lifecycles. Smart contracts automate access control and enforce agreements upon consensus, enhancing efficiency and security while reflecting collective organizational decisions that require clear rules and aligned stakeholder interests. Unified protocols further enable conditional cross-platform interoperability, integrating heterogeneous data repositories and computational tools to support global-scale collaboration. Despite these advantages, challenges remain, including scalability limits, cross-system interoperability, computational and energy overheads, and institutional adoption barriers. To address these, this work investigates hybrid architectures that combine blockchain’s strengths in provenance and trust with centralized infrastructures optimized for high-throughput processing. This approach provides a pragmatic pathway to scalable, efficient, and secure solutions. Focusing on five critical stages-data integration, data trading and circulation, data-driven computation, governance, and security and privacy-we demonstrate how blockchain can underpin auditable and interoperable materials data ecosystems. The proposed framework aligns blockchain capabilities with the demands of modern materials research, enabling collaborative innovation and accelerating the discovery of next-generation materials.

Ran Wang
Fangwen Ye
Cheng Xu
CommentOpen Access18 Nov 2025
MRS-BIDS, an extension to the Brain Imaging Data Structure for magnetic resonance spectroscopy

The Brain Imaging Data Structure (BIDS) is an increasingly adopted standard for organizing scientific data and metadata. It facilitates easier and more straightforward data sharing and reuse. BIDS currently encompasses several biomedical imaging and non-imaging techniques, and as more research groups begin to use it, additional experimental techniques are being incorporated into the standard, allowing diverse experimental methods to be stored within the same cohesive structure. Here, we present an extension for magnetic resonance spectroscopy (MRS) data, termed MRS-BIDS.

Amy E. Bouchard
Dickson Wong
Mark Mikkelsen
CommentOpen Access08 Aug 2025
Could a new cross-listed method of article publication fuel growth of data from interdisciplinary research

Researchers may avoid taking on interdisciplinary projects due to concern that publications outside of their own field would not be rewarded in their home departments. This means that data with an interdisciplinary focus is underprovided. We propose that journal parent publishers facilitate “cross-listed” journal publications where papers can be submitted to and peer-reviewed simultaneously by two journals, in different fields, with joint publication under a single DOI.

Jason A. Aimone
Charles N. Noussair
CommentOpen Access17 Jul 2025
Awareness of FAIR and FAIR4RS among international research software funders

Research software has become indispensable in contemporary research and is now viewed as essential infrastructure in many scholarly fields. Encompassing source code, algorithms, scripts, computational workflows, and executables generated during or specifically for research, it plays a critical role in advancing scholarly knowledge. The research software field includes considerable open-source use and links to the broader open science movement. In this context, it has been argued that the well-established FAIR (Findable, Accessible, Interoperable, Reusable) principles for research data should be adapted for research software under the label FAIR4RS. However, the level of uptake of FAIR4RS principles is unclear. To gauge FAIR4RS’s status, international research funders involved in supporting research software (n = 36) were surveyed about their awareness of the concept. The survey reveals much greater familiarity with the more established FAIR principles for data (73% ‘extremely familiar’) than FAIR4RS (33% ‘extremely familiar’). Nevertheless, there is still considerable recognition of the relatively new FAIR4RS concept, a significant achievement for efforts to extend open science policies and practices to encompass research software.

Eric A. Jensen
Daniel S. Katz
CommentOpen Access15 Apr 2025
Extending the CARE Principles: managing data for vulnerable communities in wartime and humanitarian crises

The CARE Principles (Collective Benefit, Authority to Control, Responsibility, Ethics) were developed to ensure ethical stewardship of Indigenous data. However, their adaptability makes them an ideal framework for managing data related to vulnerable populations affected by armed conflicts. This essay explores the application of CARE principles to wartime contexts, with a particular focus on internally displaced persons (IDPs) and civilians living under occupation. These groups face significant risks of data misuse, ranging from privacy violations to targeted repression. By adapting CARE, data governance can prioritize safety, dignity, and empowerment while ensuring that data serves the collective welfare of affected communities. Drawing on examples from Indigenous data governance, open science initiatives, and wartime humanitarian challenges, this essay argues for extending CARE principles beyond their original scope. Such an adaptation highlights CARE’s potential as a universal standard for addressing the ethical complexities of data management in humanitarian crises and conflict-affected environments.

Yana Suchikova
Serhii Nazarovets
CommentOpen Access11 Mar 2025
Applying the FAIR Principles to computational workflows

Recent trends within computational and data sciences show an increasing recognition and adoption of computational workflows as tools for productivity and reproducibility that also democratize access to platforms and processing know-how. As digital objects to be shared, discovered, and reused, computational workflows benefit from the FAIR principles, which stand for Findable, Accessible, Interoperable, and Reusable. The Workflows Community Initiative’s FAIR Workflows Working Group (WCI-FW), a global and open community of researchers and developers working with computational workflows across disciplines and domains, has systematically addressed the application of both FAIR data and software principles to computational workflows. We present recommendations with commentary that reflects our discussions and justifies our choices and adaptations. These are offered to workflow users and authors, workflow management system developers, and providers of workflow services as guidelines for adoption and fodder for discussion. The FAIR recommendations for workflows that we propose in this paper will maximize their value as research assets and facilitate their adoption by the wider community.

Sean R. Wilkinson
Meznah Aloqalaa
Carole Goble
CommentOpen Access24 Feb 2025
Affording reusable data: recommendations for researchers from a data-intensive project

Scientists are increasingly required by funding agencies, publishers and their institutions to produce and publish data that are Findable, Accessible, Interoperable and Reusable (FAIR). This requires curatorial activities, which are expensive in terms of both time and effort. Based on our experience of supporting a multidisciplinary research team, we provide recommendations to direct the efforts of researchers towards affordable ways to achieve a reasonable degree of “FAIRness” for their data to become reusable upon its publication. The recommendations are accompanied by concrete insights on the challenges faced when trying to implement them in an actual data-intensive reference project.

Gorka Fraga-González
Hester van de Wiel
Eva Furrer
CommentOpen Access12 Feb 2025
Guidelines for Research Data Integrity (GRDI)

Ensuring the integrity of research data is crucial for the accuracy and reproducibility of any data-based scientific study. This can only be achieved by establishing and implementing strict rules for the handling of research data. Essential steps for achieving high-quality data involve planning what data to gather, collecting it in the correct manner, and processing it in a robust and reproducible way. Despite its importance, a comprehensive framework detailing how to achieve data quality is currently unavailable. To address this gap, our study proposes guidelines designed to establish a reliable approach to data handling. They provide clear and practical instructions for the complete research process, including an overall data collection strategy, variable definitions, and data processing recommendations. In addition to raising awareness about potential pitfalls and establishing standardization in research data usage, the proposed guidelines serve as a reference for researchers to provide a consistent standard of data quality. Furthermore, they improve the robustness and reliability of the scientific landscape by emphasising the critical role of data quality in research.

Gregor Miller
Elmar Spiegel
CommentOpen Access17 Jan 2025
The FLAIR-GG federated network of FAIR germplasm data resources

A key source of biodiversity preservation is in the ex situ storage of seed in what are known as germplasm banks (GBs). Unfortunately, wild species germplasm bank databases, often maintained by resource-limited botanical gardens, are highly disparate and capture information about their collections in a wide range of underlying data formats, storage platforms, following different standards, and with varying degrees of data accessibility. Thus, it is extremely difficult to build conservation strategies for wild species via integrating data from these GBs. Here, we envisage that the application of the FAIR Principles to wild species and crop wild relatives information, through the creation of a federated network of FAIR GB databases, would greatly facilitate cross-resource discovery and exploration, thus assisting with the design of more efficient conservation strategies for wild species, and bringing more attention to these key data providers.

Alberto Cámara Ballesteros
Elena Aguayo Jara
Mark D. Wilkinson
CommentOpen Access18 Dec 2024
Unleashing the power of AI in science-key considerations for materials data preparation

The release of ChatGPT has triggered global attention on artificial intelligence (AI), and AI for science is thus becoming a hot topic in the scientific community. When we think about unleashing the power of AI to accelerate scientific research, the question coming to our mind first is whether there is a continuous supply of highly available data at a sufficiently large scale.

Yongchao Lu
Hong Wang
Hang Su
CommentOpen Access27 Sept 2024
Motion-BIDS: an extension to the brain imaging data structure to organize motion data for reproducible research

We present an extension to the Brain Imaging Data Structure (BIDS) for motion data. Motion data is frequently recorded alongside human brain imaging and electrophysiological data. The goal of Motion-BIDS is to make motion data interoperable across different laboratories and with other data modalities in human brain and behavioral research. To this end, Motion-BIDS standardizes the data format and metadata structure. It describes how to document experimental details, considering the diversity of hardware and software systems for motion data. This promotes findable, accessible, interoperable, and reusable data sharing and Open Science in human motion research.

Sein Jeung
Helena Cockx
Julius Welzel
CommentOpen Access02 Jul 2024
Strategizing Earth Science Data Development

Developing Earth science data products that meet the needs of diverse users is a challenging task for both data producers and service providers, as user requirements can vary significantly and evolve over time. In this comment, we discuss several strategies to improve Earth science data products that everyone can use.

Zhong Liu
Tian Yao
CommentOpen Access26 Jun 2024
The O3 guidelines: open data, open code, and open infrastructure for sustainable curated scientific resources

Curated resources that support scientific research often go out of date or become inaccessible. This can happen for several reasons including lack of continuing funding, the departure of key personnel, or changes in institutional priorities. We introduce the Open Data, Open Code, Open Infrastructure (O3) Guidelines as an actionable road map to creating and maintaining resources that are less susceptible to such external factors and can continue to be used and maintained by the community that they serve.

Charles Tapley Hoyt
Benjamin M. Gyori
CommentOpen Access29 May 2024
AI and the democratization of knowledge

The solution of the longstanding “protein folding problem” in 2021 showcased the transformative capabilities of AI in advancing the biomedical sciences. AI was characterized as successfully learning from protein structure data, which then spurred a more general call for AI-ready datasets to drive forward medical research. Here, we argue that it is the broad availability of knowledge, not just data, that is required to fuel further advances in AI in the scientific domain. This represents a quantum leap in a trend toward knowledge democratization that had already been developing in the biomedical sciences: knowledge is no longer primarily applied by specialists in a sub-field of biomedicine, but rather multidisciplinary teams, diverse biomedical research programs, and now machine learning. The development and application of explicit knowledge representations underpinning democratization is becoming a core scientific activity, and more investment in this activity is required if we are to achieve the promise of AI.

Christophe Dessimoz
Paul D. Thomas
CommentOpen Access05 Mar 2024
A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments

As the number of cloud platforms supporting scientific research grows, there is an increasing need to support interoperability between two or more cloud platforms. A well accepted core concept is to make data in cloud platforms Findable, Accessible, Interoperable and Reusable (FAIR). We introduce a companion concept that applies to cloud-based computing environments that we call a Secure and Authorized FAIR Environment (SAFE). SAFE environments require data and platform governance structures and are designed to support the interoperability of sensitive or controlled access data, such as biomedical data. A SAFE environment is a cloud platform that has been approved through a defined data and platform governance process as authorized to hold data from another cloud platform and exposes appropriate APIs for the two platforms to interoperate.

Robert L. Grossman
Rebecca R. Boyles
Stan Ahalt
CommentOpen Access26 Feb 2024
Who owns (or controls) health data?

The ongoing debate on secondary use of health data for research has been renewed by the passage of comprehensive data privacy laws that shift control from institutions back to the individuals on whom the data was collected. Rights-based data privacy laws, while lauded by individuals, are viewed as problematic for the researcher due to the distributed nature of data control. Efforts such as the European Health Data Space initiative seek to build a new mechanism for secondary use that erodes individual control in favor of broader secondary use for beneficial health research. Health information sharing platforms do exist that embrace rights-based data privacy while simultaneously providing a rich research environment for secondary data use. The benefits of embracing rights-based data privacy to promote transparency of data use along with control of one’s participation builds the trust necessary for more inclusive/diverse/representative clinical research.

Scott D. Kahn
Sharon F. Terry
CommentOpen Access01 Feb 2024
A General Primer for Data Harmonization

Data harmonization is an important method for combining or transforming data. To date however, articles about data harmonization are field-specific and highly technical, making it difficult for researchers to derive general principles for how to engage in and contextualize data harmonization efforts. This commentary provides a primer on the tradeoffs inherent in data harmonization for researchers who are considering undertaking such efforts or seek to evaluate the quality of existing ones. We derive this guidance from the extant literature and our own experience in harmonizing data for the emergent and important new field of COVID-19 public health and safety measures (PHSM).

Cindy Cheng
Luca Messerschmidt
Joan Barceló
CommentOpen Access31 Jan 2024