Deciphering the information will take a long time, and will need every available mind on the job. And so it is essential that the sequence is available to the whole biological community. No single individual or group can credibly claim that they have the expertise to deal with it. When the commercial company that became Celera Genomics was launched in May 1998 with the stated aim of becoming ‘the definitive source of genomic and associated medical information’, the whole future of biology came under threat. For one company was bidding for monopoly control of access to the most fundamental information about humanity, information that is — or should be — our common heritage.”1

Sir John Sulston

The arc of recent history in biological data sharing looks more like the peaks and troughs of an electrocardiogram trace. As a fundamental tenet of science, data sharing has underpinned our most exciting and transformative discoveries. This was recently exemplified by the 2024 Nobel Prize in Chemistry, awarded to researchers who cracked the decades-old problem of protein structure prediction from primary sequence. The artificial intelligence methods used crucially relied on the Protein Data Bank, a standard-bearer for open science, which has made experimentally determined protein structures publicly and freely available since the 1960s2. In the 1990s, the international public efforts of the Human Genome Project (HGP) led to the formalization of the Bermuda Principles3 that govern the sharing of genome sequence data (Box 1). Towards the end of the HGP, these principles were extended to community resource projects in the Fort Lauderdale Agreement4 and updated to introduce the concept of ‘tripartite responsibility’ for biological data sharing with explicit roles for funders, data generators and data users (Box 1 and Supplementary Box 1). These principles have served as a foundation for the broader ‘open science’ movement in the past few decades; however, many scientists are unaware of them and take for granted the widespread availability of biological data that underpins modern biological science5.

The agreements cover ‘genome projects’ and ‘community resource projects’, but their authors could not have anticipated the scale and complexities of population-scale genomics today, 20 years later, in which scientists have moved on to assay natural genomic variation in thousands or even millions of individuals to drive discoveries. Here, exemplars include the UK Biobank, the All of Us research programme, and global microbiome and pathogen sequencing programmes. The agreements therefore do not take into account the current situation in which the interpretation of genomic data is only meaningful when accompanied by contextual ‘metadata’ that describe the source, traits, clinical features and other information that may be potentially sensitive (for example, owing to privacy, national or commercial interests). However, at their core, the Bermuda and Fort Lauderdale principles answer fundamental questions, such as: what do we want science to look like; and, as the data flood advafnces, who do we want to include in and exclude from the mammoth task of turning data into knowledge?

Extending from ‘genome projects’ (which we would now consider ‘reference genome projects’) and ‘community resource projects’ to studies exploring ‘natural genomic variation’ to generate discoveries, or aggregating data from multiple studies to further knowledge, the concept of tripartite responsibility remains key. Funders support these projects on the understanding, implicit or explicit, that the data will be used to maximal public benefit and ‘scientific advancement’. In nearly all cases of genomics-driven science, this requires that data are shared beyond the primary study team and included in pooled or aggregate analyses, which in turn generate summary-level information that is even more widely shareable to maximize advancement and benefits6. Therefore, funders should directly support sharing-related activities for individual projects and the development of appropriate data management systems and governance structures, as well as be clear and explicit in their expectations concerning the accessibility of the data they pay to generate.

Data generators should focus on their primary scientific aims but also recognize the value of the data beyond the immediate project, and share these data as rapidly and as openly as possible. Data sharing needs to be done in responsible ways with appropriate governance to protect the privacy of individual participants, and in ways that respect the data-generating scientists’ legitimate interests in publication; but it should not be delayed or withheld out of commercial or professional interests. Privacy and ethical considerations of study participants and groups must be respected, and carefully managed with appropriate data management and governance structures to provide a clear roadmap (for example, see ref. 7) to enable safe global data sharing and optimize public benefit. However, particularly given the widespread public support for data sharing for public benefit8, this should not be used as an excuse for scientists to escape their obligation to share data for scientific discovery. Indeed, individuals who participate in research studies donate their samples and data with the expectation of public benefit, not benefit to individual researchers or companies. Due diligence is necessary to determine whether there are substantive privacy or ethical risks with respect to how data are shared, with the default position being approaches that maximize data sharing while minimizing these risks (for example, when individual-level data cannot be shared, then summary-level datasets and statistics are shared). As part of these discussions, the views of representative stakeholders and study participant groups as to when and how data are shared should be included7.

Secondary data users should respect the scientific aims of the generators; state and pursue their own scientific goals; acknowledge all data used; and consider carefully whether collaboration with data generators is warranted. Roles and responsibilities of data generators and data users should be clearly articulated through data use policies, wherever relevant. Clarity and consensus on these issues should be supported by international professional organizations, such as the Global Alliance for Genomics and Health (GA4GH) and the Public Health Alliance for Genomic Epidemiology (PHA4GE), who provide overarching principles for sharing different types of genomics data (for example, the ‘Framework for Responsible Sharing of Genomic and Health-Related Data’9, and the Microbial Data-Sharing Accord10). These sit within the broader context of international legal and ethical frameworks, such as the OECD’s Principles and Guidelines for Access to Research Data from Public Funding11 and the European Union’s General Data Protection Regulation (GDPR), and provide guidance on navigating tensions between public benefit and individual privacy.

Although not explicit in the earlier agreements (Box 1), one of the rationales for public data sharing in the HGP was to minimize the encumbrances on sequence data by patents12, the view being that patents should apply to specific inventions, not vague concepts to ‘use’ genes, proteins or variants as targets. In our view, the same principles apply today. Commercial companies should be encouraged to pursue business models that develop unique technologies, products or services using genomic data resources. The data should not be treated as the product, especially when it is primarily based on samples or data collected using public funds (whether by research councils or health systems).

With the struggle to keep our promise of sharing data5, there is a slippery slope that we must climb and climb again, constantly reaffirming our core principles and applying them to an everchanging present day. As Sulston explained1, “today any scientist anywhere can access the sequence freely at no cost and use the information to make his or her own further discoveries. We wrote this book so that people might understand how close the world came to losing that freedom”.