Table 1 Comparison of LeMeDISCO’s J-score with the XD-score, NG, SAB score and Symptom Similarity Score for correlations with comorbidity quantified by the log(RR) score, φ-score, and recalla.

From: LeMeDISCO is a computational method for large-scale prediction & molecular interpretation of disease comorbidity

 

Log(RR) score

φ-score

Recall

Precision

AUROC

AUPRC

191,966 disease pairsb

LeMeDISCO

0.116 (0.0)

0.090 (0.0)

37.1%

77.2%

0.528

0.780

Permute drug–protein [p value]

0.050 ± 0.004 (0.0) [1.3 × 10−54]

0.060 ± 0.004 (0.0) [2.1 × 10−12]

8.8 ± 0.7% [2.0 × 10−315]

74.7 ± 1.3%[0.027]

0.495 ± 0.006 [3.5 × 10−9]

0.755 ± 0.004 [4.9 × 10−11]

Permute drug–disease [p value]

0.0026 ± 0.0056 (0.24) [4.0 × 10−92]

0.0029 ± 0.0075 (0.19) [1.2 × 10−31]

0.0137 ± 0.0828% [0.0]

54.3 ± 46.5% [0.31]

0.500 ± 0.001 [1.7 × 10−112]

0.757 ± 0.0006 [2.3 × 10−294]

29,658 pairsc

LeMeDISCO

0.146 (0.0)

0.106 (0.0)

44.5%

80.6%

0.531

0.812

XD-score8

0.042 (2.8 × 10−13)

0.071 (9.7 × 10−35)

6.4%

77.8%

0.510

0.801

NGd

0.0047 (0.42)

0.053 (6.6 × 10−20)

943 disease pairse

LeMeDISCO

0.0986 (0.0024)

0.0886 (0.0065)

68.6%

77.7%

0.529

0.798

SAB score7

−0.0620 (0.057)

−0.0413 (0.205)

8.0%

85.3%

0.434

0.761

2621 disease pairsf

LeMeDISCO

0.140 (5.2 × 10−13)

0.135 (3.8 × 10−12)

63.7%

79.3%

0.512

0.814

Symptom similarity6

0.322 (0.0)

0.194 (1.4 × 10−23)

100%

79.6%

0.587

0.856

  1. aNumbers in parentheses “()” are the p values of the corresponding correlation. Bold indicates the best results for the given dataset. For the permutations of drug–protein and drug–disease relationships, the average ± standard deviation of 100 runs with different random seeds was given, the number in parenthesis “[]” is the p value converted from the z-score = (LeMeDISCO value-average)/standard deviation to characterize the statistical significance of the difference between LeMeDISCO and permutation tests.
  2. bMapping the DOID IDs from the human DO database to ICD-9 IDs of ref. 1, gives a set of 191,966 disease pairs.
  3. cMapped the ICD-9 disease code to our DOID of DO and obtained a consensus subset of 29,658 disease pairs from Table 1’s dataset of 97,665 disease pairs in ref. 8.
  4. dNG is the number of shared genes between disease pairs in ref. 8.
  5. eConsensus set of 943 disease pairs from the dataset of ref. 7 and our dataset of 191,966.
  6. fA consensus dataset of 2621 disease pairs was obtained from their Supplementary dataset 4 of ref. 6 compared to our set of 191,966 pairs.