Table 7 Hyperparameters for random forest in Arabic text classification.
From: Quantum computing and machine learning for Arabic language sentiment classification in social media
Hyperparameter | Value | Description |
---|---|---|
n_estimators | 100 | The number of decision trees to be used in the Random Forest ensemble. Having a higher number of trees can improve the model's performance by reducing overfitting and increasing robustness to noise in the data |
Max_depth | None | The maximum depth allowed for each decision tree in the ensemble. A deeper tree can capture more complex relationships in the data, but setting it to None allows the tree to expand until all the leaves are pure or until the minimum number of samples required for a leaf is reached |
Min_samples_split | 2 | The minimum number of samples required to split an internal node during the construction of a decision tree. It prevents overfitting by controlling the threshold for further partitioning of nodes. A higher value can help to avoid splitting nodes with too few samples |
Min_samples_leaf | 1 | The minimum number of samples required to be at a leaf node. It prevents overfitting by ensuring that each leaf node has a minimum number of samples. A higher value can help to avoid creating leaf nodes with too few instances |
Max_features | "auto" | The number of features to consider when looking for the best split at each tree node. "auto" uses all features, while "sqrt" uses the square root of the total number of features, and "log2" uses the logarithm of the total number of features. Selecting a smaller value can reduce the correlation among trees and enhance diversity |
Bootstrap | True | A Boolean value indicating whether bootstrap samples should be used when building decision trees. Setting it to True enables random sampling with replacement, which helps to introduce randomness and diversity in the training process |
Class_weight | None | An optional parameter that assigns weights to different classes. If the dataset is imbalanced, setting it to "balanced" automatically adjusts the weights inversely proportional to the class frequencies. This helps to handle class imbalance and give more weight to minority classes |
Random_state | None | A seed value used by the random number generator. It ensures reproducibility of results when the same seed is used. By setting it to None, different random states will be used for each execution, resulting in different ensemble models |
n_jobs | None | The number of parallel jobs to run for fitting and predicting. Specifying None uses one job, while -1 uses all available processors, potentially speeding up the training and prediction process |