Table 2 Hyperparameter tuning transformer models on the validation dataset & evaluating generalizability on the test datasets. Final model selected by highest sensitivity on the validation dataset (bold). * indicates the mean metric of the ensembled models rather than the voting ensemble metric. AUROC refers to the model’s “Area under the receiver operating Characteristic” curve metric, LR refers to the model’s “Learning Rate”, PPV refers to the model’s “Positive predictive Value”. Pr@50Re metric reports the precision of a model when it achieves 50% recall while the Sp@50Se metric reports the specificity of a model when it reaches 50% sensitivity.

Experiment	Exp. Param.	Sensitivity	Specificity	Accuracy	AUROC	PPV	NPV	F1 Score
Ensembled Models	Grouped Codes	60.10%	61.90%	61.90%	65.50%*	2.50%	N/A	4.70%
	All Codes	61.80%	60.80%	60.80%	66.10%*	2.50%	N/A	4.70%
	Pretrained	68.10%	61.40%	61.50%	70.90%*	2.70%	N/A	5.30%
	Pretrained w/ LR decay	72.00%	57.90%	58.10%	71.20%*	2.70%	N/A	5.10%
	Pretrained w/ LR decay, batch size = 64	72.30%	58.00%	58.20%	71.10%*	2.70%	99.24%	5.20%
Final Model; Test Dataset	Pretrained w/ LR decay, batch size = 64	70.90%	56.90%	57.10%	69.60%*	2.40%	99.22%	4.70%

Quick links

Search