Table 5 Default values used and remarks.
Model | Default values used | Remarks |
|---|---|---|
Decision tree | “random_state = 42”, “max_depth = None”, “min_samples_split = 2”, “min_samples_leaf = 1” | No maximum depth (splits until leaves are pure). Larger depths risk overfitting; smaller depths can underfit. |
LR | “fit_intercept = True”, “normalize = False” (deprecated, default behavior) | Assumes a linear relationship; no direct regularization. Sensitive to multicollinearity and outliers. |
RR | “alpha = 1.0”, “fit_intercept = True” | L2 regularization shrinks coefficients; helps with multicollinearity and reduces overfitting. |
Lasso regression | “alpha = 1.0”, “fit_intercept = True” | L1 regularization encourages sparsity (coefficient = 0) for less important features. |
SVR | “kernel=’rbf’”, “C = 1.0”, “epsilon = 0.1”, “gamma=’scale’” | Learns a function within an ε-tube. Sensitive to “C”, “epsilon”, and “gamma”; may require careful scaling and tuning for best results. |
RF | “random_state = 42”, “n_estimators = 100”, “max_depth = None” | Ensemble of decision trees via bagging. Generally robust to outliers and can handle high-dimensional data. |
GB | “random_state = 42”, “n_estimators = 100”, “learning_rate = 0.1”, “max_depth = 3” | Sequentially adds weak learners to minimize loss. Can overfit if “n_estimators” is large without regularization. |
AdaBoost | “random_state = 42”, “n_estimators = 50”, “learning_rate = 1.0” | Boosts performance by focusing on mis-predicted samples. Works well with shallow base estimators (e.g., short decision trees). |
XGBoost | “random_state = 42”, “n_estimators = 100”, “learning_rate = 0.1”, “max_depth = 6”, “subsample = 1.0”, “colsample_bytree = 1.0” | Efficient gradient boosting library with built-in regularization and tree-pruning. Can overfit if parameters are not tuned. |