Identifying Data Sharing in Biomedical Literature

Piwowar, Heather; Chapman, Wendy

doi:10.1038/npre.2008.1721.2

Download PDF

Manuscript
Open access
Published: 04 August 2008

Identifying Data Sharing in Biomedical Literature

Heather Piwowar¹ &
Wendy Chapman¹

Nature Precedings (2008)Cite this article

423 Accesses
3 Citations
Metrics details

Abstract

Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to find shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the feasibility of this approach and hope to inspire further study of dataset retrieval techniques and policy evaluation.

Biomedical Data Repository Concepts and Management Principles

Article Open access 13 June 2024

Addressing biomedical data challenges and opportunities to inform a large-scale data lifecycle for enhanced data sharing, interoperability, analysis, and collaboration across stakeholders

Article Open access 21 February 2025

The Translational Data Catalog - discoverable biomedical datasets

Article Open access 20 July 2023

Article PDF

Author information

Authors and Affiliations

University of Pittsburgh https://www.nature.com/nature
Heather Piwowar & Wendy Chapman

Authors

Heather Piwowar
View author publications
Search author on:PubMed Google Scholar
Wendy Chapman
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Heather Piwowar.

Rights and permissions

Creative Commons Attribution 3.0 License.

Reprints and permissions

About this article

Cite this article

Piwowar, H., Chapman, W. Identifying Data Sharing in Biomedical Literature. Nat Prec (2008). https://doi.org/10.1038/npre.2008.1721.2

Download citation

Received: 04 August 2008
Accepted: 04 August 2008
Published: 04 August 2008
DOI: https://doi.org/10.1038/npre.2008.1721.2

Identifying Data Sharing in Biomedical Literature

Abstract

Similar content being viewed by others

Biomedical Data Repository Concepts and Management Principles

Addressing biomedical data challenges and opportunities to inform a large-scale data lifecycle for enhanced data sharing, interoperability, analysis, and collaboration across stakeholders

The Translational Data Catalog - discoverable biomedical datasets

Article PDF

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Abstract

Similar content being viewed by others

Biomedical Data Repository Concepts and Management Principles

Addressing biomedical data challenges and opportunities to inform a large-scale data lifecycle for enhanced data sharing, interoperability, analysis, and collaboration across stakeholders

The Translational Data Catalog - discoverable biomedical datasets

Article PDF

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links