Table 1 General statistics of BioWiC35 resources.

From: A Dataset for Evaluating Contextualized Representation of Biomedical Concepts in Language Models

Dataset

Ontologies

Semantic types

Documents

Sentences

Mentions

Medmentions

UMLS

21 UMLS types

4392

44903

203’282

BC5CDR

MeSH

Disease, Chemical

1500

11562

13’343

NCBI Disease

MeSH, OMIM

Disease

792

3891

6’892

  1. The sentence count in each source is determined using the PySBD library39, version 0.3.4.