Table 1 List of datasets used to curate AqSolDB.

From: AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds

Dataset

ID

Original

Size

Filtered

Size

Compound

Representations

Solubility

Units

A14

14,180

6,110

name, CAS

g/L, mg/L, μg/L

B15

5,764

4,651

name, CAS

LogS

C16

2,603

2,603

name, SMILES

LogS

D17

2,267

2,115

name, CAS

LogS

E1

1,291

1,291

name, SMILES, CAS

LogS

F8

1,210

1,210

SLN

LogS

G2

1,144

1,144

name, SMILES

LogS

H8

578

578

SLN

LogS

I20

105

94

name, SMILES, InChI

μM

  1. Dataset ID: identifier of the dataset during the curation process. Original Size: number of instances of the dataset when we collected. Filtered Size: number of instances after the pre-process. Compound Representation: available compound representations of the dataset when we collected. Solubility Units: units of experimental solubility values of the dataset.