Table 1 Model performance for each experiment

Experiment	Revisit cycle	Error	Accuracy	Sensitivity	Specificity	F1- score	AUROC	Best cycle^a
Pre-trained model	n/a	n/a	67%	72%	63%	67%	73%	10
Clean baseline			75% (71–75)	75% (71–75)	75% (70–75)	73% (69–73)	82% (79–82)	26 (23–29)
Dirty baseline 1^st scenario			54%	0%	100%	0%	49%	11
Dirty baseline 2^nd scenario			54%	0%	100%	0%	45%	11
1^st scenario	2	2%	73%	70%	75%	70%	80%	25
	2	3%	70%	75%	65%	70%	78%	20
	2	4%	71%	64%	77%	67%	77%	22
	2	5%	71%	60%	80%	66%	78%	20
	5	2%	69%	65%	72%	66%	73%	20
	5	3%	71%	63%	78%	67%	78%	20
	5	4%	71%	63%	77%	67%	77%	22
	5	5%	71%	64%	77%	67%	77%	22
2^nd scenario	2	2%	72%	73%	70%	70%	78%	21
	2	3%	72%	72%	72%	70%	78%	26
	2	4%	55%	9%	91%	16%	48%	12
	2	5%	54%	0%	100%	0%	47%	11
	5	2%	72%	78%	67%	72%	78%	26
	5	3%	70%	71%	70%	69%	78%	26
	5	4%	55%	9%	91%	16%	48%	12
	5	5%	54%	0%	100%	0%	47%	11
Real 83 centers	2	2%	61%	30%	87%	41%	66%	15
Real 83 centers	2	3%	72%	71%	73%	71%	78%	22

^aBest cycle for ablation studies and Parkinson’s disease classifier is counted on top of the pre-trained model trained for 10 cycles. Thus, the best cycle 11 indicates that the best model needed only one cycle of additional training, while the best cycle 26 indicates that 16 additional cycles were needed using the pre-trained model.

Quick links

Search