Table 5 Default values used and remarks.

From: Comparative analysis of machine learning techniques for temperature and humidity prediction in photovoltaic environments

Model

Default values used

Remarks

Decision tree

“random_state = 42”, “max_depth = None”, “min_samples_split = 2”, “min_samples_leaf = 1”

No maximum depth (splits until leaves are pure). Larger depths risk overfitting; smaller depths can underfit.

LR

“fit_intercept = True”, “normalize = False” (deprecated, default behavior)

Assumes a linear relationship; no direct regularization. Sensitive to multicollinearity and outliers.

RR

“alpha = 1.0”, “fit_intercept = True”

L2 regularization shrinks coefficients; helps with multicollinearity and reduces overfitting.

Lasso regression

“alpha = 1.0”, “fit_intercept = True”

L1 regularization encourages sparsity (coefficient = 0) for less important features.

SVR

“kernel=’rbf’”, “C = 1.0”, “epsilon = 0.1”, “gamma=’scale’”

Learns a function within an ε-tube. Sensitive to “C”, “epsilon”, and “gamma”; may require careful scaling and tuning for best results.

RF

“random_state = 42”, “n_estimators = 100”, “max_depth = None”

Ensemble of decision trees via bagging. Generally robust to outliers and can handle high-dimensional data.

GB

“random_state = 42”, “n_estimators = 100”, “learning_rate = 0.1”, “max_depth = 3”

Sequentially adds weak learners to minimize loss. Can overfit if “n_estimators” is large without regularization.

AdaBoost

“random_state = 42”, “n_estimators = 50”, “learning_rate = 1.0”

Boosts performance by focusing on mis-predicted samples. Works well with shallow base estimators (e.g., short decision trees).

XGBoost

“random_state = 42”, “n_estimators = 100”, “learning_rate = 0.1”, “max_depth = 6”, “subsample = 1.0”, “colsample_bytree = 1.0”

Efficient gradient boosting library with built-in regularization and tree-pruning. Can overfit if parameters are not tuned.