Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Precedings
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • RSS feed
  1. nature
  2. nature precedings
  3. articles
  4. article
Identifying Data Sharing in Biomedical Literature
Download PDF
Download PDF
  • Manuscript
  • Open access
  • Published: 04 August 2008

Identifying Data Sharing in Biomedical Literature

  • Heather Piwowar1 &
  • Wendy Chapman1 

Nature Precedings (2008)Cite this article

  • 338 Accesses

  • 2 Citations

  • Metrics details

Abstract

Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to find shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the feasibility of this approach and hope to inspire further study of dataset retrieval techniques and policy evaluation.

Similar content being viewed by others

Biomedical Data Repository Concepts and Management Principles

Article Open access 13 June 2024

Addressing biomedical data challenges and opportunities to inform a large-scale data lifecycle for enhanced data sharing, interoperability, analysis, and collaboration across stakeholders

Article Open access 21 February 2025

The Translational Data Catalog - discoverable biomedical datasets

Article Open access 20 July 2023

Article PDF

Author information

Authors and Affiliations

  1. University of Pittsburgh https://www.nature.com/nature

    Heather Piwowar & Wendy Chapman

Authors
  1. Heather Piwowar
    View author publications

    Search author on:PubMed Google Scholar

  2. Wendy Chapman
    View author publications

    Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Heather Piwowar.

Rights and permissions

Creative Commons Attribution 3.0 License.

Reprints and permissions

About this article

Cite this article

Piwowar, H., Chapman, W. Identifying Data Sharing in Biomedical Literature. Nat Prec (2008). https://doi.org/10.1038/npre.2008.1721.2

Download citation

  • Received: 04 August 2008

  • Accepted: 04 August 2008

  • Published: 04 August 2008

  • DOI: https://doi.org/10.1038/npre.2008.1721.2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • natural language processing
  • data sharing
  • bioinformatics
  • microarrays
  • databases
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Sign up for alerts
  • RSS feed

About the journal

  • Journal Information

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Precedings (Nat Preced)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2025 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing