Table 1 Basic characteristics of rare disease test sets

From: A phenotype-based AI pipeline outperforms human experts in differentially diagnosing rare diseases using EHRs

 

Public test set

EHR test set

Subsets

RAMEDIS

MME

LIRICAL

HMS

PUMCH-La

PUMCH-ADM

Countries/regions

European

Canada

Multi-Country

Germany

China

China

Number of cases included

375

40

370

88

988

75

 Total

873

1063

Categories of diseases

63

17

252

39

73

16

 Total

362

80

Age, years: median (average)

12 days (4.7)

NA

9 (14.5)

44 (43.4)

34 (35.4)

29 (31.6)

Female (%)

200 (53.3%)

NA

174 (47.0%)

54 (61.4%)

512 (51.8%)

36 (48%)

Number of cases per disease

 Minimum

1

1

1

1

1

3

 Median

2

1

1

1

1

5

 Maximum

82

11

19

11

200

8

Number of cases diagnosed

 OMIM

375

40

370

69

723

70

 ORPHANET

375

39

227

81

983

75

 CCRD

257

1

6

13

529

75

Number of HPO terms per case

 Minimum

3

3

3

5

3

3

 Median

9

10.5

11

17.5

31

16

 Maximum

46

26

95

54

101

47

  1. aIn the PUMCH-L dataset, the HPO terms were extracted by PBTagger. In contrast, in all other PUMCH datasets, HPO terms were manually annotated. The PUMCH-ADM dataset was used for the Human-computer experiment.