Table 2 Hyperparameter tuning transformer models on the validation dataset & evaluating generalizability on the test datasets. Final model selected by highest sensitivity on the validation dataset (bold). * indicates the mean metric of the ensembled models rather than the voting ensemble metric. AUROC refers to the model’s “Area under the receiver operating Characteristic” curve metric, LR refers to the model’s “Learning Rate”, PPV refers to the model’s “Positive predictive Value”. Pr@50Re metric reports the precision of a model when it achieves 50% recall while the Sp@50Se metric reports the specificity of a model when it reaches 50% sensitivity.

From: Transformer-based deep learning ensemble framework predicts autism spectrum disorder using health administrative and birth registry data

Experiment

Exp. Param.

Sensitivity

Specificity

Accuracy

AUROC

PPV

NPV

F1 Score

Ensembled Models

Grouped Codes

60.10%

61.90%

61.90%

65.50%*

2.50%

N/A

4.70%

All Codes

61.80%

60.80%

60.80%

66.10%*

2.50%

N/A

4.70%

Pretrained

68.10%

61.40%

61.50%

70.90%*

2.70%

N/A

5.30%

Pretrained w/ LR decay

72.00%

57.90%

58.10%

71.20%*

2.70%

N/A

5.10%

Pretrained w/ LR decay, batch size = 64

72.30%

58.00%

58.20%

71.10%*

2.70%

99.24%

5.20%

Final Model; Test Dataset

Pretrained w/ LR decay, batch size = 64

70.90%

56.90%

57.10%

69.60%*

2.40%

99.22%

4.70%

  1. Validation Dataset: Pr@50Re: 3.2%; Sp@50Se: 76.2%.
  2. Test Dataset: Pr@50Re: 2.8%; Sp@50Se: 73.5%.