The CALBC RDF Triple store: retrieval over large literature content

Croset, Samuel; Grabmüller, Christoph; Li, Chen; Kavaliauskas, Silvestras; Rebholz-Schuhmann, Dietrich

doi:10.1038/npre.2011.5383.2

Download PDF

Presentation
Open access
Published: 18 January 2011

SWAT4LS 2010

The CALBC RDF Triple store: retrieval over large literature content

Samuel Croset¹,
Christoph Grabmüller¹,
Chen Li¹,
Silvestras Kavaliauskas¹ &
…
Dietrich Rebholz-Schuhmann¹

Nature Precedings (2011)Cite this article

229 Accesses
2 Citations
Metrics details

Abstract

Background

Integration of the scientific literature into a biomedical research infrastructure requires the processing of the literature, identification of the contained named entities (NEs) and concepts, and to represent the content in a standardised way. Little efforts have been spent on the integration of content from the literature text into RDF Triple Stores.The CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus I (SSC-I). The four semantic groups were chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). The annotations of the corpus has been transformed into RDF Triple Store representation to query the content in combination with bioinformatics data resources (UniProtKb, ArrayExpress) using RDF query language (SPARQL).

Results

All four PPs from the CALBC project contributed annotated data sets for generating the SSC-I and in addition, 12 challenge participants (CPs) provided annotated data sets for evaluation against the SSC-I and for the generation of the SSC-II. The SSC-II contains the following annotations: CHED 238,431, PRGE 435,797, DISO 245,524, and SPE 304,503. The content of the SSC-II has been fully integrated into RDF Triple Store (4,568,678 triples) and has been aligned with content from the GeneAtlas (182,840 triples), UniProtKb (12,552,239 triples for human) and the lexical resource LexEBI (BioLexicon). RDF Triple Store enables querying the scientific literature and bioinformatics resources at the same time for evidence for gene-disease links that involve immunological processes. In total the CALBC RDF Triple Store makes use of 1,224,255 annotations in the corpus for exposing links between the entities supported by the evidence in the text. RDF Triple Store is implemented as a retrieval engine that allows querying for collocations of named entities and associated relevant information from the bioinformatics data resources (UniProtKb, ArrayExpress).

Conclusions

The CALBC RDF Triple Store is the first of its kind that exposes content extracted from the scientific literature in combination with a large scale terminological resource to enable querying for causes of immunological diseases across the most relevant bioinformatics data resources.

CHST3, PGBD5, and SLIT2 can be identified as potential genes for the diagnosis and treatment of osteoporosis and sarcopenia

Article Open access 02 January 2025

Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque

Article Open access 09 September 2022

Integrated bioinformatics and experimental validation identify lysosome and immune infiltration-related genes as therapeutic targets in late-onset major depressive disorder

Article Open access 31 July 2025

Article PDF

Author information

Authors and Affiliations

European Bioinformatics Institute (EBI) https://www.nature.com/nature
Samuel Croset, Christoph Grabmüller, Chen Li, Silvestras Kavaliauskas & Dietrich Rebholz-Schuhmann

Authors

Samuel Croset
View author publications
Search author on:PubMed Google Scholar
Christoph Grabmüller
View author publications
Search author on:PubMed Google Scholar
Chen Li
View author publications
Search author on:PubMed Google Scholar
Silvestras Kavaliauskas
View author publications
Search author on:PubMed Google Scholar
Dietrich Rebholz-Schuhmann
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Samuel Croset, Christoph Grabmüller, Silvestras Kavaliauskas or Dietrich Rebholz-Schuhmann.

Rights and permissions

Creative Commons Attribution 3.0 License.

Reprints and permissions

About this article

Cite this article

Croset, S., Grabmüller, C., Li, C. et al. The CALBC RDF Triple store: retrieval over large literature content. Nat Prec (2011). https://doi.org/10.1038/npre.2011.5383.2

Download citation

Received: 18 January 2011
Accepted: 18 January 2011
Published: 18 January 2011
DOI: https://doi.org/10.1038/npre.2011.5383.2

The CALBC RDF Triple store: retrieval over large literature content