Table 1 Main characteristics of PGxCorpus in comparison with related corpora.

Corpus name	Subcorpus	Corpus size		Sent. w/o rel.		Key entities			#Ent. types	#Rel. types	# Mod.	Nested entities	Discont. entities
Corpus name	Subcorpus	sent.	rel.	%	nb.	Dru.	Gen.	Phe.	#Ent. types	#Rel. types	# Mod.	Nested entities	Discont. entities
SNPPhenA	—	483	1300	0	0		✓	✓	2	1	5
EU-ADR	drug-disease	244	176	0	0	✓		✓	2	1	3
	drug-target	247	310	0	0	✓	✓		3	1	3
	target-disease	355	262	0	0		✓	✓	3	1	3
SemEval	DrugBank	5,675	3,805	65.9	3739	✓			4	4	1
DDI	MEDLINE	1,301	232	87.1	1133	✓			4	4	1
ADE-EXT	—	5,939	6,701	28.9	1719	✓		✓	2	1	1
PGxCorpus	—	945	2,871	2.7	26	✓	✓	✓	10	7	4	✓	✓

Sizes of corpora are reported in term of number of sentences (sent.) and annotated relationships (rel.). The number of sentences without any annotated relation (Sent. w/o rel.) is reported both as a percentage (%) and an absolute number of sentences (nb.). The specific presence of PGx key entities, i.e. drugs (Dru.), genetic factors (Gen.) and phenotypes (Phe.) is reported under the Key entities column. Overall numbers of types of entities and relations used in annotations are reported as #Ent. and #Rel. types respectively. #Mod. refers to the number of modalities for the annotation of relations (e.g. positive, hypothetical, negative).

Quick links

Search