Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Precedings
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • RSS feed
  1. nature
  2. nature precedings
  3. presentation
  4. article
The CALBC RDF Triple store: retrieval over large literature content
Download PDF
Download PDF
  • Presentation
  • Open access
  • Published: 18 January 2011

SWAT4LS 2010

The CALBC RDF Triple store: retrieval over large literature content

  • Samuel Croset1,
  • Christoph Grabmüller1,
  • Chen Li1,
  • Silvestras Kavaliauskas1 &
  • …
  • Dietrich Rebholz-Schuhmann1 

Nature Precedings (2011)Cite this article

  • 229 Accesses

  • 2 Citations

  • Metrics details

Abstract

Background

Integration of the scientific literature into a biomedical research infrastructure requires the processing of the literature, identification of the contained named entities (NEs) and concepts, and to represent the content in a standardised way. Little efforts have been spent on the integration of content from the literature text into RDF Triple Stores.The CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus I (SSC-I). The four semantic groups were chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). The annotations of the corpus has been transformed into RDF Triple Store representation to query the content in combination with bioinformatics data resources (UniProtKb, ArrayExpress) using RDF query language (SPARQL).

Results

All four PPs from the CALBC project contributed annotated data sets for generating the SSC-I and in addition, 12 challenge participants (CPs) provided annotated data sets for evaluation against the SSC-I and for the generation of the SSC-II. The SSC-II contains the following annotations: CHED 238,431, PRGE 435,797, DISO 245,524, and SPE 304,503. The content of the SSC-II has been fully integrated into RDF Triple Store (4,568,678 triples) and has been aligned with content from the GeneAtlas (182,840 triples), UniProtKb (12,552,239 triples for human) and the lexical resource LexEBI (BioLexicon). RDF Triple Store enables querying the scientific literature and bioinformatics resources at the same time for evidence for gene-disease links that involve immunological processes. In total the CALBC RDF Triple Store makes use of 1,224,255 annotations in the corpus for exposing links between the entities supported by the evidence in the text. RDF Triple Store is implemented as a retrieval engine that allows querying for collocations of named entities and associated relevant information from the bioinformatics data resources (UniProtKb, ArrayExpress).

Conclusions

The CALBC RDF Triple Store is the first of its kind that exposes content extracted from the scientific literature in combination with a large scale terminological resource to enable querying for causes of immunological diseases across the most relevant bioinformatics data resources.

Similar content being viewed by others

CHST3, PGBD5, and SLIT2 can be identified as potential genes for the diagnosis and treatment of osteoporosis and sarcopenia

Article Open access 02 January 2025

Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque

Article Open access 09 September 2022

Integrated bioinformatics and experimental validation identify lysosome and immune infiltration-related genes as therapeutic targets in late-onset major depressive disorder

Article Open access 31 July 2025

Article PDF

Author information

Authors and Affiliations

  1. European Bioinformatics Institute (EBI) https://www.nature.com/nature

    Samuel Croset, Christoph Grabmüller, Chen Li, Silvestras Kavaliauskas & Dietrich Rebholz-Schuhmann

Authors
  1. Samuel Croset
    View author publications

    Search author on:PubMed Google Scholar

  2. Christoph Grabmüller
    View author publications

    Search author on:PubMed Google Scholar

  3. Chen Li
    View author publications

    Search author on:PubMed Google Scholar

  4. Silvestras Kavaliauskas
    View author publications

    Search author on:PubMed Google Scholar

  5. Dietrich Rebholz-Schuhmann
    View author publications

    Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Samuel Croset, Christoph Grabmüller, Silvestras Kavaliauskas or Dietrich Rebholz-Schuhmann.

Rights and permissions

Creative Commons Attribution 3.0 License.

Reprints and permissions

About this article

Cite this article

Croset, S., Grabmüller, C., Li, C. et al. The CALBC RDF Triple store: retrieval over large literature content. Nat Prec (2011). https://doi.org/10.1038/npre.2011.5383.2

Download citation

  • Received: 18 January 2011

  • Accepted: 18 January 2011

  • Published: 18 January 2011

  • DOI: https://doi.org/10.1038/npre.2011.5383.2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Triple Store
  • text mining
  • Data integration
  • Semantic Web
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Sign up for alerts
  • RSS feed

About the journal

  • Journal Information

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Precedings (Nat Preced)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2025 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing