Table 1 Datasets in the benchmark. They correspond to the number of drugs and diseases involved in at least one nonzero drug-disease association. The sparsity s is the percentage of unknown (neither positive nor negative) matches times 100 over the total number of possible drug-disease matches (rounded up to the first decimal place). The imbalance ratio IR is the ratio between negative and positive outcomes in the dataset (rounded up to the second decimal place). The private version of PREDICT is the one generated from notebooks in the original GitHub repository, whereas the public one is the one deposited on Zenodo14. The association matrix in the Fdataset comes from34. Still, the drug and disease features are from33.
From: Comprehensive evaluation of pure and hybrid collaborative filtering in drug repurposing
Type | Dataset | Paper | \(N_S\) | \(F_S\) | \(N_P\) | \(F_P\) | #Positive | #Negative | s (%) | IR (\(\%\)) |
|---|---|---|---|---|---|---|---|---|---|---|
Text-mining | Cdataset | 663 | 663 | 409 | 409 | 2,532 | 0 | 99.1 | 0 | |
Fdataset | 593 | 593 | 313 | 313 | 1933 | 0 | 99.0 | 0 | ||
DNdataset | 550 | 1490 | 360 | 4516 | 1008 | 0 | 99.5 | 0 | ||
Biological | Gottlieb | 593 | 1779 | 313 | 313 | 1933 | 0 | 99.0 | 0 | |
LRSSL | 763 | 2049 | 681 | 681 | 3051 | 0 | 99.4 | 0 | ||
PREDICT | 1351 | 6265 | 1066 | 2914 | 5624 | 152 | 99.6 | 2.70 | ||
PREDICT | 1014 | 1642 | 941 | 1490 | 4627 | 132 | 99.5 | 2.85 | ||
TRANSCRIPT | 204 | 12,096 | 116 | 12,096 | 401 | 11 | 98.3 | 2.74 | ||
Artificial | Synthetic | 300 | 25 | 300 | 25 | 200 | 100 | 99.7 | 50 |