Table 2 Population-level and subject-level inter-rater agreement for clinical MDS-UPDRS III subscores.

Subtest	Number of categories	Kappa (SEM)	95% confidence interval	Agreement	p value	Subject-level agreement (a/b/c)
Left hand rest tremor	4	0.68 (0.06)	0.65−0.71	Substantial	<0.00001	96.7/1.6/1.6
Right hand rest tremor	4	0.62 (0.06)	0.58−0.65	Substantial	<0.00001	72.6/27.4/0
Left leg rest tremor	3	0.58 (0.07)	0.54−0.61	Moderate	<0.00001	93.5/6.45/0
Right leg rest tremor	3	0.72 (0.06)	0.69−0.75	Substantial	<0.00001	82.5/17.5/0
Left hand postural tremor	4	0.76 (0.06)	0.73−0.79	Substantial	<0.00001	79.3/20.6/0
Right hand postural tremor	4	0.75 (0.06)	0.72−0.78	Substantial	<0.00001	82.5/17.5/0
Left hand kinetic tremor	3	0.62 (0.07)	0.59−0.66	Substantial	<0.00001	69.8/30.2/0
Right hand kinetic tremor	3	0.45 (0.07)	0.41−0.49	Moderate	<0.00001	61.9/38.1/0
Left fingertap	5	0.54 (0.04)	0.52−0.56	Moderate	<0.00001	50.0/48.4/1.6
Right fingertap	4	0.64 (0.05)	0.61−0.66	Substantial	<0.00001	64.5/35.5/0
Left pronation/supination	5	0.42 (0.05)	0.40−0.44	Moderate	<0.00001	42.9/52.4/4.8
Right pronation/supination	4	0.37 (0.05)	0.34−0.39	Fair	<0.00001	42.9/52.4/4.8
Left leg agility	5	0.55 (0.05)	0.53−0.58	Moderate	<0.00001	57.1/41.3/1.6
Right leg agility	4	0.57 (0.06)	0.54−0.60	Moderate	<0.00001	61.9/38.1/0

The population-level inter-rater agreement for each MDS-UPDRS III subscore was calculated using Fleiss’ Kappa. This ranges from −1 to 1, where 0 indicates chance agreement, 1 indicates complete agreement, and −1 indicates complete disagreement. Kappa values are shown for each subtest along with standard error of the mean (SEM) and 95% confidence intervals. The number of categories available in the sample (of all raters) is also shown. All estimates of agreement were highly significant. The subject-level agreement was calculated as the percentage of subjects where the blinded rating clinicians (a) agreed completely (3 raters agreed), (b) agreed moderately (2 raters agreed) or (c) disagreed (all 3 ratings were different). This is shown as a/b/c in the table above. Complete disagreement was rare (<5% in all subitems).

Search