Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Precedings
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • RSS feed
  1. nature
  2. nature precedings
  3. articles
  4. article
Performance of the Charniak-Lease parser on biological text using different training corpora
Download PDF
Download PDF
  • Manuscript
  • Open access
  • Published: 18 September 2008

Performance of the Charniak-Lease parser on biological text using different training corpora

  • Alison Callahan1 &
  • Michel Dumontier2 

Nature Precedings (2008)Cite this article

  • 227 Accesses

  • Metrics details

Abstract

POS tagging is used as the first step in many NLP workflows, although the accuracy of tag assignment frequently goes unchecked. We hypothesize that changing the training corpora for a parser will affect its POS tagging of a target corpus. To this end we train the Charniak-Lease parser on the WSJ corpus and two biomedical corpora and evaluate its output to MedPost, a POS tagger with a reported 97% accuracy on biomedical text. Our findings indicate that using biomedical training corpora significantly improves performance, but that minor differences in the biomedical training corpora have a significant effect on the correctness of POS tagging. Specifically, the tagging of hyphenated words and verbs was affected. This work suggests that the choice of training corpora is crucial to domain targeted NLP analysis.

Similar content being viewed by others

BioBBC: a multi-feature model that enhances the detection of biomedical entities

Article Open access 02 April 2024

Europe PMC annotated full-text corpus for gene/proteins, diseases and organisms

Article Open access 19 October 2023

Leveraging network analysis to evaluate biomedical named entity recognition tools

Article Open access 29 June 2021

Article PDF

Author information

Authors and Affiliations

  1. Faculty of Information, University of Toronto https://www.nature.com/nature

    Alison Callahan

  2. Department of Biology, Carleton University https://www.nature.com/nature

    Michel Dumontier

Authors
  1. Alison Callahan
    View author publications

    Search author on:PubMed Google Scholar

  2. Michel Dumontier
    View author publications

    Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Michel Dumontier.

Rights and permissions

Creative Commons Attribution 3.0 License.

Reprints and permissions

About this article

Cite this article

Callahan, A., Dumontier, M. Performance of the Charniak-Lease parser on biological text using different training corpora. Nat Prec (2008). https://doi.org/10.1038/npre.2008.2310.1

Download citation

  • Received: 18 September 2008

  • Accepted: 18 September 2008

  • Published: 18 September 2008

  • DOI: https://doi.org/10.1038/npre.2008.2310.1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • natural language processing
  • part of speech tagging
  • biomedical text
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Sign up for alerts
  • RSS feed

About the journal

  • Journal Information

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Precedings (Nat Preced)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2025 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing