Table 5 Key hyperparameters for dimensionality reduction techniques and classifiers.
Component | Hyperparameters and configuration |
|---|---|
Dimensionality reduction | |
PCA | n_components = 128 |
t-SNE | n_components = 128, perplexity = 30, early_exaggeration=12, random_state=42 |
UMAP | n_components = 128, n_neighbors=15, min_dist=0.1, random_state=42 |
Traditional Classifiers | |
KNN | n_neighbors = 5, weights=’uniform’ |
Logistic Regression | penalty=’l2’, C=1.0, solver=’lbfgs’, max_iter=1000 |
Naïve Bayes | MultinomialNB alpha=1.0 on TF-IDF; GaussianNB var_smoothing=\(\texttt {10}^\texttt {-9}\) on reduced features |
Random Forest | n_estimators=100, criterion=’gini’, max_depth=None |
XGBoost | n_estimators=100, learning_rate=0.1, max_depth=6 |
MLP | hidden_layer_sizes=(128, 64), activation=’relu’, solver=’adam’, alpha=0.0001 |
Deep learning | |
ClinicalBERT | Fine-tuned for 3 epochs, learning rate (\(\eta = 2 \times 10^{-5}\)), max sequence length = 512 |