Table 1 Main characteristics of PGxCorpus in comparison with related corpora.

From: PGxCorpus, a manually annotated corpus for pharmacogenomics

Corpus name

Subcorpus

Corpus size

Sent. w/o rel.

Key entities

#Ent. types

#Rel. types

# Mod.

Nested entities

Discont. entities

sent.

rel.

%

nb.

Dru.

Gen.

Phe.

SNPPhenA

483

1300

0

0

 

2

1

5

  

EU-ADR

drug-disease

244

176

0

0

 

2

1

3

  

drug-target

247

310

0

0

 

3

1

3

  

target-disease

355

262

0

0

 

3

1

3

  

SemEval

DrugBank

5,675

3,805

65.9

3739

  

4

4

1

  

DDI

MEDLINE

1,301

232

87.1

1133

  

4

4

1

  

ADE-EXT

5,939

6,701

28.9

1719

 

2

1

1

  

PGxCorpus

945

2,871

2.7

26

10

7

4

  1. Sizes of corpora are reported in term of number of sentences (sent.) and annotated relationships (rel.). The number of sentences without any annotated relation (Sent. w/o rel.) is reported both as a percentage (%) and an absolute number of sentences (nb.). The specific presence of PGx key entities, i.e. drugs (Dru.), genetic factors (Gen.) and phenotypes (Phe.) is reported under the Key entities column. Overall numbers of types of entities and relations used in annotations are reported as #Ent. and #Rel. types respectively. #Mod. refers to the number of modalities for the annotation of relations (e.g. positive, hypothetical, negative).