Table 2 Population-level and subject-level inter-rater agreement for clinical MDS-UPDRS III subscores.

From: The CloudUPDRS smartphone software in Parkinson’s study: cross-validation against blinded human raters

Subtest

Number of categories

Kappa (SEM)

95% confidence interval

Agreement

p value

Subject-level agreement (a/b/c)

Left hand rest tremor

4

0.68 (0.06)

0.65−0.71

Substantial

<0.00001

96.7/1.6/1.6

Right hand rest tremor

4

0.62 (0.06)

0.58−0.65

Substantial

<0.00001

72.6/27.4/0

Left leg rest tremor

3

0.58 (0.07)

0.54−0.61

Moderate

<0.00001

93.5/6.45/0

Right leg rest tremor

3

0.72 (0.06)

0.69−0.75

Substantial

<0.00001

82.5/17.5/0

Left hand postural tremor

4

0.76 (0.06)

0.73−0.79

Substantial

<0.00001

79.3/20.6/0

Right hand postural tremor

4

0.75 (0.06)

0.72−0.78

Substantial

<0.00001

82.5/17.5/0

Left hand kinetic tremor

3

0.62 (0.07)

0.59−0.66

Substantial

<0.00001

69.8/30.2/0

Right hand kinetic tremor

3

0.45 (0.07)

0.41−0.49

Moderate

<0.00001

61.9/38.1/0

Left fingertap

5

0.54 (0.04)

0.52−0.56

Moderate

<0.00001

50.0/48.4/1.6

Right fingertap

4

0.64 (0.05)

0.61−0.66

Substantial

<0.00001

64.5/35.5/0

Left pronation/supination

5

0.42 (0.05)

0.40−0.44

Moderate

<0.00001

42.9/52.4/4.8

Right pronation/supination

4

0.37 (0.05)

0.34−0.39

Fair

<0.00001

42.9/52.4/4.8

Left leg agility

5

0.55 (0.05)

0.53−0.58

Moderate

<0.00001

57.1/41.3/1.6

Right leg agility

4

0.57 (0.06)

0.54−0.60

Moderate

<0.00001

61.9/38.1/0

  1. The population-level inter-rater agreement for each MDS-UPDRS III subscore was calculated using Fleiss’ Kappa. This ranges from −1 to 1, where 0 indicates chance agreement, 1 indicates complete agreement, and −1 indicates complete disagreement. Kappa values are shown for each subtest along with standard error of the mean (SEM) and 95% confidence intervals. The number of categories available in the sample (of all raters) is also shown. All estimates of agreement were highly significant. The subject-level agreement was calculated as the percentage of subjects where the blinded rating clinicians (a) agreed completely (3 raters agreed), (b) agreed moderately (2 raters agreed) or (c) disagreed (all 3 ratings were different). This is shown as a/b/c in the table above. Complete disagreement was rare (<5% in all subitems).