Table 5 Key hyperparameters for dimensionality reduction techniques and classifiers.

From: An empirical evaluation of dimensionality reduction and class balancing for medical text classification

Component

Hyperparameters and configuration

Dimensionality reduction

PCA

n_components = 128

t-SNE

n_components = 128, perplexity = 30, early_exaggeration=12, random_state=42

UMAP

n_components = 128, n_neighbors=15, min_dist=0.1, random_state=42

Traditional Classifiers

KNN

n_neighbors = 5, weights=’uniform’

Logistic Regression

penalty=’l2’, C=1.0, solver=’lbfgs’, max_iter=1000

Naïve Bayes

MultinomialNB alpha=1.0 on TF-IDF; GaussianNB var_smoothing=\(\texttt {10}^\texttt {-9}\) on reduced features

Random Forest

n_estimators=100, criterion=’gini’, max_depth=None

XGBoost

n_estimators=100, learning_rate=0.1, max_depth=6

MLP

hidden_layer_sizes=(128, 64), activation=’relu’, solver=’adam’, alpha=0.0001

Deep learning

ClinicalBERT

Fine-tuned for 3 epochs, learning rate (\(\eta = 2 \times 10^{-5}\)), max sequence length = 512