Table 2 Model configuration and workflow.

From: Efficacy of machine learning in simulating precipitation and its extremes over the capital cities in North Indian states

ML model

Underlying algorithm

Workflow

Parameter setting used for training and evaluation

RF (Random forest)

RF is an ensemble method that uses multiple decision trees to improve classification accuracy. Each tree is built from a random subset of the training data, and the final result is made by averaging the result of all trees98.

(1) Collect data (2) Preprocess data (3) Split data into training and testing sets (4) Train multiple decision trees on random subsets of training data (5) Aggregate predictions from all trees (6) Evaluate the model on the test set37.

Minmax Scaler, Train-Test Split, Iterations, n_estimators: 100, max_depth: None, criterion: “gini”, random_state: 42

SVM (Support vector machine)

SVM finds the hyperplane that best separates the data into classes. It maximizes the margin between the closest points of the classes (support vectors)99.

(1) Collect data (2) Preprocess data (3) Split data into training and testing sets (4) Choose a kernel function (5) Train the SVM model to find the optimal hyperplane (6) Evaluate the model on the test set37.

Minmax Scaler, Train-Test Split, Iterations, probability: True, random_state: 42, kernel: ‘rbf’, C: 1.0, gamma: ‘scale’

XGB (XGBoost)

XGBoost is an optimized gradient boosting algorithm that builds an ensemble of trees sequentially. Each tree tries to correct the errors of the previous one100.

(1) Collect data (2) Preprocess data (3) Split data into training and testing sets (4) Define the boosting parameters (5) Train the XGBoost model iteratively (6) Evaluate the model on the test set37.

Minmax Scaler, Train-Test Split, Iterations, n_estimators: 100, learning_rate: 0.1, max_depth: 6

KNN (K-nearest neighbors)

KNN classifies a data point based on the majority class of its K nearest neighbors in the feature space101.

(1) Collect data (2) Preprocess data (3) Split data into training and testing sets (4) Choose the value of K (5) Calculate the distance between the test point and all training points (6) Assign the class based on the majority vote of the K nearest neighbors (7) Evaluate the model on the test set37.

Minmax Scaler, Train-Test Split, Iterations, n_neighbors: 5, algorithm: ‘auto’