Abstract
Rainfall and its interaction with soil, rock, and environmental factors such as soil moisture content, temperature variations, groundwater levels, and vegetation cover are critical determinants of slope stability in geotechnical engineering. This study introduces an innovative AI-driven system designed to predict the geotech- nical properties of slopes post-monsoon season. Utilizing a comprehensive dataset collected before and after the monsoon, the system targets the prediction of es- sential properties including unit weight, cohesion, and friction angle—parameters significantly influenced by monsoon rains. To quantify these impacts, the system calculates the percentage changes in these properties. A robust Exploratory Data Analysis (EDA) was conducted to elucidate the distributions of pre-monsoon and post-monsoon properties and uncover interrela- tionships among them. Machine learning models, encompassing Linear Regression, Random Forest Regression, Gradient Boosting Regressors, Support Vector Regres- sors, and Ensemble Models, were employed to predict post-monsoon properties based on pre-monsoon data and calculated changes. The models’ accuracies were evaluated using metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), R-squared (R2), Mean Bias Deviation (MBD), and Willmott’s Index of Agreement (d). Furthermore, the study explores transforming the regression task into a binary classification problem, categorizing slopes as stable or unstable based on predicted post-monsoon properties. Classification performance was assessed using Receiver Operating Characteristic (ROC) curves and the Area Under the Curve (AUC) metric. Feature importance and sensitivity analyses were performed using SHAP (SHapley Additive exPlanations) to identify the most influential factors affecting slope stability predictions. To enhance model robustness and generalizability, synthetic data was generated using Generative Adversarial Networks (GANs), augmenting the original dataset and ensuring a diverse range of conditions. The AI system demonstrated excep- tional capabilities in accurately predicting geotechnical property changes, thereby enabling engineers to proactively manage risks and improve slope stability assess- ments. This project underscores the significant potential of integrating advanced machine learning techniques with traditional geotechnical practices, particularly in regions experiencing heavy rainfall, to foster safer and more efficient engineering solutions.
Similar content being viewed by others
Introduction
Rainfall and its interaction with soil and rock are major factors influencing slope stability in geotechnical engineering. In areas experiencing drastic seasonal changes like monsoon rains, understanding slope stability is paramount. Key factors include soil and rock prop- erties, water interaction, and environmental conditions, which can either exacerbate or mitigate slope failure risks. Predicting slope behavior under various conditions is cru- cial for the safety and reliability of infrastructure, especially in mining and construction. India, with its vast iron ore reserves totaling more than 33.276 billion tonnes, stands as one of the world’s foremost iron ore producers1. The production of iron ore in the country saw a notable increase from 158.108 million tonnes in 2015 to 200.95 million tonnes in 20182. The Ministry of Steel, Government of India, envisions the production of 300 million tonnes of crude steel by the year 2030-313, thereby opening up new and promising prospects in the iron ore mining sector. Furthermore, the Indian government has promoted foreign investment in the sector by allowing 100% Foreign Direct Invest- ment (FDI) through the Automatic Route for iron ore mining and exploration4. This decision is poised to facilitate increased investment and growth in the industry. In India, all the iron ore produced comes from open-cast mines. The development of benches in an opencast mine involves creating slopes at definite angles called “Slope Angles.” Any discrepancies in these angles can trigger failures, which are of serious concern for mine management. The high demand for iron ore has driven significant expansion in open- cast mining operations, but accidents, particularly slope failures, continue to be a major hazard5.
Slope stability in open-pit mines is crucial for both economic viability and worker safety. Slope engineering involves stability analysis, developing, and implementing stabi- lization measures. Slope failures are often triggered by a range of factors, including geo- logical conditions, climatic events, and operational practices. Addressing these challenges effectively necessitates the incorporation of comprehensive safety protocols, engineering practices, and risk assessments into mining operations6. In open-pit mining operations, achieving the optimal slope angle is crucial for maintaining stability. The slope should be steep enough for economic feasibility yet flat enough to ensure stability. Consequently, slope stability analysis becomes integral to the lifecycle of opencast mining. The depth of mine workings adds complexity, highlighting the increasing economic implications of slope angles as mining depth increases. The repercussions of slope failures, including the substantial time and financial resources required for site reconstruction, additional material handling, and equipment rescheduling, underscore the importance of meticulous slope planning7.
Numerous studies have explored the influence of soil physico-mechanical properties, water infiltration rate, rainfall intensity, and slope geometry on slope stability8. Ground- water table variations have been extensively investigated for their role in slope failures9,10,11,12. However, limited research has delved into the impact of rainfall on bench stability, especially in regions characterized by substantial precipitation. This study aims to address the gap by examining the influence of rainfall on slope stability in the Goa re- gion, while highlighting the broader applicability of this approach across regions in India with varying monsoon intensities.
The State of Goa holds over 4% of India’s total iron ore reserves and has a significant history of slope failures due to heavy rainfall13. Historical data shows that heavy rainfall is a major triggering factor for slope failures in iron ore mines in Goa, with numerous documented instances of slope and dump failures. For instance, between 1994 and 2011, there were multiple incidents of slope failures in Goa’s iron ore mines due to heavy rainfall, as shown in Table 114.
The influence of environmental factors on slope stability cannot be overstated. Mon- soons, characterized by intense and prolonged rainfall, can lead to substantial changes in the moisture content of soil and rock materials, thereby affecting their geotechnical properties. Increased moisture content typically reduces the shear strength of soils and rocks, while also altering cohesion and friction angle, making slopes more susceptible to failure and reducing the Factor of Safety (FOS). This phenomenon has been widely doc- umented in various studies, highlighting the need for accurate and timely assessments of slope stability in monsoon-affected regions16.
Traditional methods of slope stability analysis, such as Limit Equilibrium Methods (LEM) and Finite Element Methods (FEM), often rely on static models and fail to ac- count for the dynamic nature of environmental influences. These methods, while useful, can be limited in their ability to predict slope behavior under changing conditions. Re- cent advances in machine learning and artificial intelligence offer new opportunities to dynamically incorporate diverse and changing parameters, enhancing the accuracy and reliability of slope stability assessments. Machine learning models, trained on historical data and capable of learning complex patterns, can provide more nuanced predictions that account for a wide range of influencing factors. Recent studies have demonstrated the efficacy of such techniques in real-world geotechnical applications17.
This study introduces an AI-powered geotechnical risk management system designed to predict post-monsoon geotechnical properties based on pre-monsoon data and observed
changes due to monsoon rains. The primary objective is to enhance the assessment of slope stability, identify potential risks, and convert regression predictions into risk classifications (e.g., stable or unstable slopes), enabling proactive risk management in geotechnical engineering. This system utilizes machine learning to enhance prediction accuracy, leading to better decision-making in the field.
The study utilizes a dataset containing detailed measurements of soil and rock prop- erties taken both before and after the monsoon season. These measurements include key properties like unit weight, cohesion, and friction angle. The dataset also incorporates the calculated percentage changes in these properties due to the monsoon rains. These properties are crucial for determining slope stability, and their accurate measurement and analysis form the foundation of the predictive models developed in this study18.
A comprehensive Exploratory Data Analysis (EDA) was conducted to visualize the distributions of pre-monsoon and post-monsoon properties, identify outliers, and under- stand the relationships between different features. SHAP analysis was also employed to evaluate feature importance and assess the influence of critical parameters like cohesion and friction angle on predictions. Histograms and box plots were used to examine the data distributions, while a correlation matrix and heatmap helped identify the relationships between various features. Pair plots further illustrated the interactions between different geotechnical properties, providing insights into how these features influence each other and contribute to overall slope stability19.
Machine learning models, including Linear Regression, Random Forest Regression, and advanced techniques such as Gradient Boosting, were trained using the pre-monsoon properties and the calculated percentage changes as features, while the post-monsoon properties served as the target variables. To evaluate model performance, the dataset was divided into 80% for training and 20% for testing. Metrics like Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R2) were used. Additionally, scatter plots visualizing predicted versus actual values provided further insight into model accuracy20.
One of the novel aspects of this study is the exploration of converting the regression problem into a binary classification task. This approach involves setting thresholds to classify the predicted geotechnical properties into categories representing different levels of risk. For example, slopes could be classified as stable or unstable based on the predicted post-monsoon properties. The model’s ability to differentiate between risk categories was assessed using Receiver Operating Characteristic (ROC) curves and the Area Under the Curve (AUC) metric21.
Role of machine learning in enhancing slope stability As- sessments
Slope stability is a critical concern in geotechnical engineering, particularly in open-cast mining operations where the consequences of slope failures can be severe. Conventional slope stability analyses, like limit equilibrium (LEM) and finite element (FEM) methods, are constrained by their static nature, often neglecting the dynamic environmental factors that can influence slope behavior. These conventional approaches primarily focus on the equilibrium of forces and moments within the slope, considering factors such as soil and rock properties, slope geometry, and external loads. While effective, these methods are limited by their assumptions of homogeneity and isotropy in soil properties, and their inability to adapt to real-time changes in environmental conditions.
Machine learning (ML) offers a transformative approach to slope stability analysis by leveraging historical data and advanced computational techniques to predict slope behavior under varying conditions. Unlike traditional methods, machine learning models can learn complex patterns from large datasets, allowing for the integration of diverse and heterogeneous data sources, including soil properties, weather conditions, and his- torical failure events. This ability to handle large volumes of data and identify non-linear relationships between variables makes ML particularly suited for the complex and multi- faceted problem of slope stability.
Geotechnical engineering has embraced various machine learning techniques, includ- ing neural networks, support vector machines, decision trees, and ensemble methods like random forests. These models excel at uncovering complex patterns and relationships in data, potentially exceeding the capabilities of traditional analytical methods in tasks like soil property prediction, landslide susceptibility assessment, and slope stability eval- uation. For instance, ML models can incorporate real-time environmental data, such as rainfall intensity and duration, to dynamically assess the risk of slope failure. This capa- bility is especially crucial in regions prone to heavy rainfall, where changes in moisture content can significantly alter the geotechnical properties of slopes.
Objectives of the study
This study aims to develop a machine learning model capable of predicting post-monsoon geotechnical properties of slopes using pre-monsoon data. The specific objectives of the study are as follows:
-
1.
Data Collection and Analysis: To collect comprehensive geotechnical data from selected iron ore mining sites in Goa, both before and after the monsoon season. The data encompasses measurements of unit weight, cohesion, and friction angle for the soil and rock samples.
-
2.
Exploratory Data Analysis (EDA): To perform exploratory data analysis to understand the distribution and relationships between the collected data points. This involves visualizing the data using histograms, box plots, pair plots, and cor- relation matrices to identify significant patterns and correlations.
-
3.
Feature Engineering: To engineer relevant features that capture the changes in geotechnical properties due to the monsoon. In addition, the data incorporates the percentage change of properties like unit weight, cohesion, and friction angle.
-
4.
Model Development: To develop and train machine learning models, specifically Linear Regression and Random Forest Regression, using the pre-monsoon data and engineered features. The models will be trained to predict post-monsoon geotech- nical properties.
-
5.
Model Evaluation: The models’ performance will be assessed using metrics like Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R2). These evaluations will be based on comparisons between the models’ predic- tions and actual post-monsoon data.
-
6.
Risk Classification: To explore the conversion of the regression problem into a binary classification task, where slopes are classified as stable or unstable based on predicted post-monsoon properties. The model’s classification performance will be evaluated using Receiver Operating Characteristic (ROC) curves and the Area Under the Curve (AUC) metric.
-
7.
Implementation and Deployment: To implement the trained machine learning models into a practical, user-friendly system that can be used by geotechnical en- gineers for real-time slope stability assessments. This includes developing a deploy- ment strategy for the models, ensuring they can be easily integrated into existing geotechnical practices.
-
8.
Future Research Directions: To identify potential areas for future research and development in the application of machine learning in geotechnical engineering. This includes exploring the integration of additional data sources, such as remote sensing data and real-time monitoring systems, to further enhance the predictive capabilities of the models.
By achieving these objectives, the study aims to demonstrate the efficacy of machine learning models in enhancing the accuracy and reliability of slope stability assessments. The findings will provide valuable insights into the application of advanced computational techniques in geotechnical engineering, paving the way for more resilient and proactive approaches to managing geotechnical risks in mining operations.
Literature review
In geotechnical engineering, slope stability analysis plays a critical role. This process evaluates the likelihood of failures within a slope by identifying potential failure surfaces.
Impact of soil on slope Stability
Open-cast mining operations are significantly impacted by various soil parameters. Strength characteristics, including cohesion and friction angle, directly influence slope stability, af- fecting pit wall design and support needs. Denser, compacted soils are preferred for haul roads and equipment usage but can substantially increase excavation challenges. Texture plays a crucial role in extraction efficiency. Clay-rich soils present digging difficulties, while sandier soils might require management to prevent cave-ins. Hydraulic conductivity affects both operational efficiency and environmental considerations. Highly permeable soils allow for easy drainage during dry periods but can also lead to rapid groundwater infiltration and potential flooding. Finally, both erosion potential and reclamation suc- cess are heavily influenced by texture. Sandy soils are particularly susceptible to erosion and require additional dust control measures. Topsoil, rich in organic matter, is vital for revegetation efforts. Its loss may necessitate amendments or imported soil for success- ful reclamation. The impact of soil properties on different mining activities is shown in Table 2.
Impact of rainfall on slope stability
Rainfall is a significant factor influencing slope stability, primarily through its effects on soil moisture content and pore water pressure. Heavy rainfall can trigger an increase in pore water pressure within the soil. This reduces the effective stress acting on soil particles, leading to a decrease in shear strength and a heightened risk of slope failure.
Several studies have investigated the impact of rainfall on slope stability as given in Table 3. For example, Rahardjo et al. (2000) studied the effects of infiltration on residual soil slopes and found that increased moisture content significantly reduced soil strength22. Another study by Matsushi et al. (2006) examined shallow landslides in the Boso Peninsula, Japan, and highlighted the role of permeable and impermeable bedrocks in slope stability during heavy rainfall23.
These studies underscore the importance of considering rainfall-induced changes in soil properties when assessing slope stability, particularly in regions with significant seasonal rainfall variations.
Impact of rock on slope stability
In open-cast mining, the rock mass itself plays a critical role in slope stability. Rock strength parameters, such as uniaxial compressive strength (UCS) and tensile strength, directly influence the stability of pit walls. Stronger, more competent rock allows for steeper slopes, reducing excavation volumes and overall mining costs. Conversely, weaker rock with lower UCS may necessitate flatter slopes or significant reinforcement measures like rock bolts and shotcrete to prevent instability and potential rockfalls. Additionally, the presence of discontinuities like joints, faults, and bedding planes can significantly weaken the rock mass and create preferential failure zones. Understanding these geological features and their orientations is crucial for designing stable slopes and mitigating the risk of rockfalls and landslides.
Machine learning in geotechnical engineering
Geotechnical engineering is increasingly leveraging machine learning’s (ML) power to an- alyze vast datasets and unveil intricate patterns that might elude traditional analytical methods. ML models, such as neural networks, support vector machines, and ensem- ble methods like random forests, have been applied to various geotechnical problems, including soil classification, landslide susceptibility mapping, and slope stability analysis.
Applications and benefits
Machine Learning finds good applications in Geotechnical Engineering as shown in Table 4. The ML models can incorporate a wide range of data sources, including soil prop- erties, environmental conditions, and historical failure events, to provide more accurate and dynamic predictions. For instance, a study by Johnson et al. (2018) demonstrated the use of support vector machines to predict slope stability with higher accuracy com- pared to traditional methods24. Another study by Williams and Taylor (2016) utilized neural networks to model the nonlinear relationships between soil properties and slope stability, achieving significant improvements in predictive performance25.
The benefits of using ML in geotechnical engineering include the ability to handle large and complex datasets, improve the accuracy of predictions, and provide real-time assessments. These advantages make ML a valuable tool for enhancing slope stability analysis and risk management in geotechnical engineering.
Traditional Methods of Slope Stability Analysis
These traditional methods are generally well-understood and computationally efficient. These include Limit Equilibrium Methods and Finite Element Methods.
Limit Equilibrium Methods (LEM)
Limit equilibrium methods (LEM) are popular for slope stability analysis due to their simplicity and efficiency. These methods work by segmenting the slope into slices and analyzing the balance of forces and moments acting on each individual slice. Various LEM approaches exist, such as Bishop’s Simplified Method, Janbu’s Method, Spencer’s Method, and Morgenstern-Price Method. Each method employs distinct assumptions re- garding the forces between slices and simplifies the problem for mathematical tractability. A comparison of different LEM methods is shown in Table 5.
Bishop’s Simplified Method, for instance, assumes a circular failure surface and verti- cal inter-slice forces, making it relatively simple and widely used in practice27. Janbu’s Method allows for non-circular failure surfaces, providing greater flexibility, though it may be less accurate for circular surfaces28. Spencer’s Method assumes a constant inclina- tion of inter-slice forces, offering high accuracy but requiring complex calculations29. The Morgenstern-Price Method allows for variable inter-slice force inclinations, offering both flexibility and accuracy, though it is computationally intensive30.
Finite element methods (FEM)
Finite Element Methods (FEM) offer a more detailed and comprehensive approach to slope stability analysis. Unlike LEM, the finite element method (FEM) discretizes the slope into a mesh of small elements and then solves the governing equations for each el- ement to determine its behavior. This method can model complex geometries, heteroge- neous soil conditions, and dynamic loading conditions. The advantages and disadvantages of FEM are given in Table 6.
Zienkiewicz, O. C. et al. (2005) specified that FEM can account for the nonlinear behavior of soil and rock, making it suitable for complex stability problems. However, it is computationally intensive and requires detailed information about the soil properties and loading conditions31.
Current gaps and future directions
Despite the advancements in slope stability analysis and the integration of machine learn- ing techniques, several gaps remain in the current research. One significant gap is the limited availability of high-quality, long-term data on slope failures and environmental conditions. The accuracy of ML models heavily depends on the quality and quantity of the data used for training.
Another gap is the need for models that can dynamically adapt to changing envi- ronmental conditions. Most existing models are static and do not account for real-time
changes in factors such as rainfall intensity and soil moisture content. Developing adaptive models that can update predictions based on real-time data would significantly enhance the effectiveness of slope stability assessments.
Proposed future research directions
To address these gaps, future research should focus on the following areas: This research presents a promising AI system for slope stability prediction, but also highlights areas for improvement. To address these limitations, future research should focus on several key areas.
First, improved data collection and management practices are crucial. This includes utilizing remote sensing technologies and establishing comprehensive monitoring systems to gather high-quality, long-term data on slope stability and environmental conditions.
Second, developing adaptive machine learning models is essential. These models would update predictions in real-time based on changing environmental factors by integrating data from sensors and monitoring systems.
Third, fostering interdisciplinary collaboration between geotechnical engineers, data scientists, and environmental scientists is necessary to develop more comprehensive solu- tions for slope stability analysis.
Fourth, extensive field validation of the machine learning models is required to ensure their accuracy and reliability in real-world scenarios across diverse geographical regions and environmental conditions.
Finally, exploring how to integrate machine learning techniques with traditional meth- ods can leverage the strengths of both approaches, leading to more robust and reliable slope stability assessments. By addressing these research areas, the field of geotechnical engineering can move towards more accurate, dynamic, and reliable slope stability as- sessments, ultimately improving safety and risk management in mining and construction operations.
Methodology
Data collection
The dataset used in this study was collected from iron ore mines in Goa, India, and ex- panded to include environmental factors such as rainfall intensity, soil moisture content, groundwater levels, temperature variations, and vegetation cover. These factors, along- side geotechnical properties, provide a comprehensive understanding of slope stability.
The key properties measured were:
-
Unit weight (kN/m3)
-
Cohesion (kPa)
-
Angle of friction (degrees)
Measurements were taken both before and after the monsoon season to capture the changes induced by rainfall. Additionally, synthetic data generation was employed using Generative Adversarial Networks (GANs) to augment the dataset. The GANs-generated data enhanced the diversity of training samples, especially for underrepresented condi- tions, contributing to improved model robustness.
Dataset structure
The dataset utilized in this study encompasses a comprehensive set of geotechnical proper- ties measured both before and after the monsoon season from iron ore mines in Goa, India. Additionally, it includes environmental parameters and calculated percentage changes to facilitate an in-depth analysis of how these factors influence slope stability. The primary objective is to predict post-monsoon geotechnical properties based on pre-monsoon data using machine learning techniques.
Each row in the dataset represents a specific measurement instance and includes the following columns:
-
Pre-Monsoon Geotechnical Properties:
-
Unit weight (kN/m3) pre: Unit weight of the soil before the monsoon.
-
Cohesion (kPa) pre: Cohesion of the soil before the monsoon.
-
Friction angle (degrees) pre: Angle of friction of the soil before the mon- soon.
-
-
Pre-Monsoon Environmental Parameters:
-
Soil moisture (%) pre: Soil moisture content before the monsoon.
-
Temperature (°C) pre: Ambient temperature before the monsoon.
-
Rainfall intensity (mm/hr) pre: Rainfall intensity before the monsoon.
-
Groundwater level (m) pre: Groundwater level before the monsoon.
-
Vegetation index pre: Vegetation index before the monsoon.
-
-
Post-Monsoon Geotechnical Properties:
-
Unit weight (kN/m3) post: Unit weight of the soil after the monsoon.
-
Cohesion (kPa) post: Cohesion of the soil after the monsoon.
-
Friction angle (degrees) post: Angle of friction of the soil after the mon- soon.
-
-
Post-Monsoon Environmental Parameters:
-
Soil moisture (%) post: Soil moisture content after the monsoon.
-
Temperature (°C) post: Ambient temperature after the monsoon.
-
Rainfall intensity (mm/hr) post: Rainfall intensity after the monsoon.
-
Groundwater level (m) post: Groundwater level after the monsoon.
-
Vegetation index post: Vegetation index after the monsoon.
-
-
Percentage Changes in Geotechnical Properties:
-
Percentage change in unit weight (%): The percentage change in unit weight due to the monsoon.
-
Percentage change in cohesion (%): The percentage change in cohesion due to the monsoon.
-
Percentage change in angle of friction (%): The percentage change in the angle of friction due to the monsoon.
-
For the purposes of this study, the dataset is partitioned into features (X) and target variables (y) as follows:
-
Features (X):
-
Pre-Monsoon Geotechnical Properties:
-
∗ Unit weight (kN/m3) pre
∗ Cohesion (kPa) pre
∗ Friction angle (degrees) pre
-
Pre-Monsoon Environmental Parameters:
∗ Soil moisture (%) pre
∗ Temperature (°C) pre
∗ Rainfall intensity (mm/hr) pre
∗ Groundwater level (m) pre
∗ Vegetation index pre
-
Percentage Changes:
∗ Percentage change in unit weight (%)
∗ Percentage change in cohesion (%)
∗ Percentage change in angle of friction (%)
-
Target Variables (y):
-
Post-Monsoon Geotechnical Properties:
-
∗ Unit weight (kN/m3) post
∗ Cohesion (kPa) post
∗ Friction angle (degrees) post
This structured division facilitates the analysis of how pre-monsoon geotechnical and environmental properties, along with their percentage changes, can be leveraged to ac- curately predict post-monsoon geotechnical conditions. Such predictions are invaluable for proactive risk management and enhancing slope stability assessments in mining op- erations.
Data preparation
To ensure the data was ready for analysis and modeling, several preprocessing steps were undertaken:
Handling missing values
Missing values in the dataset were addressed using advanced imputation techniques such as k-nearest neighbors (KNN) and iterative imputation. These methods ensure the integrity of the dataset by better preserving the relationships between features and minimizing bias introduced by missing data. This approach ensures that the dataset re- mains consistent and complete, allowing for accurate analysis and modeling. The absence of missing values was confirmed using a check for missing values in each column, ensuring the robustness of the dataset.
Normalization
StandardScaler was applied to the dataset to normalize the features by centering them to have zero mean and unit variance. This method enhances the model’s performance by ensuring that features with different units or magnitudes contribute equally to the predictions. Normalization is crucial for improving the performance and convergence speed of machine learning algorithms, particularly when dealing with features of different magnitudes.
Feature engineering
To capture the changes in geotechnical properties due to the monsoon, derived features were created. The percentage change for each property was calculated using the following formula:
This calculation was applied to the unit weight, cohesion, and angle of friction, re- sulting in new features that represent the percentage changes in these properties. These derived features are critical for understanding the impact of monsoon rains on slope stability.
Exploratory data analysis (EDA)
Exploratory Data Analysis was conducted to understand the distributions, relationships, and potential outliers in the data. Sensitivity analysis was performed using SHAP (SHap- ley Additive exPlanations) to evaluate the relative importance of features like cohesion, friction angle, and soil moisture. This analysis provided insights into how these features influence the model’s predictions and contributed to enhanced interpretability.
Descriptive statistics of synthetic data
A comprehensive statistical overview of the synthetic data generated via GANs was per- formed to ensure data quality and diversity. Table 7 presents the basic statistics of the synthetic dataset, including mean, standard deviation, minimum, maximum, and quar- tile values for each feature. The absence of missing values was confirmed, ensuring data integrity.
Histograms of Features
Histograms were plotted to visualize the distribution of each feature in the dataset. The histograms (Figure 1) show the frequency distribution of the geotechnical properties, pro- viding insights into their central tendency and spread. This is essential for understanding the baseline conditions of the dataset.
Box plots of features
Box plots were used to identify outliers in the dataset. The box plots (Figure 2) highlight the interquartile range (IQR), median, and potential outliers for each feature. Identify- ing outliers is crucial for ensuring that they do not disproportionately affect the model training process.
Pair Plots
Pair plots were generated to visualize relationships between different features. The pair plots (Figure 3) provide a matrix of scatter plots for each pair of features, allowing for the identification of potential correlations and interactions between variables.
Correlation heatmap
A correlation heatmap was created to show the correlation between different features. The heatmap (Figure 4) visualizes the strength and direction of relationships between variables, with values ranging from −1 to 1. This helps in identifying highly correlated features, which can be important for feature selection and reducing multicollinearity.
Histograms of percentage changes
Histograms of the percentage changes in geotechnical properties were plotted to visualize their distributions. These histograms (Figure 5) provide insights into how the properties changed due to the monsoon, highlighting the magnitude and variability of the changes.
Model development
The development of predictive models involved several steps, including defining features and targets, splitting the data, and training the models.
Defining features and targets
The dataset was split into features (X) and targets (y). The features included pre- monsoon properties and their percentage changes, while the targets were the post-monsoon properties. The selected features were:
-
Unit weight (kN/m3) pre
-
Cohesion (kPa) pre
-
Angle of friction (degrees) pre
-
Percentage change in unit weight
-
Percentage change in cohesion
-
Percentage change in angle of friction
The target variables were the post-monsoon values of the same properties.
Data splitting
To assess the model’s ability to generalize to unseen data, the dataset was divided into training (80%) and testing (20%) sets. This split ensures the model is evaluated on data it hasn’t been trained on, promoting real-world applicability.
Model training
This investigation compared the performance of several regression models: Linear Re- gression, Random Forest Regressor, Gradient Boosting Regressor, and Support Vector Regressor. Additionally, an optimized Random Forest Regressor and an Ensemble Model combining Linear Regression and Random Forest were evaluated.
Linear Regression Linear Regression offers a relatively simple and interpretable model. This allows engineers to understand how individual soil parameters (like cohesion or fric- tion angle) influence slope stability. However, linear regression assumes a linear relation- ship between variables, which may not always hold true for complex geological settings.
Random Forest Regressor Random Forest provides a powerful and flexible model- ing tool. Unlike linear regression, it can capture non-linear relationships between soil parameters and slope stability. This makes it well-suited for complex scenarios where traditional methods might struggle. However, random forest models can be less inter- pretable, making it difficult to pinpoint the exact influence of each parameter.
Gradient Boosting Regressor Gradient Boosting is an ensemble technique that builds models sequentially, each correcting the errors of the previous ones. It is effective in handling non-linear relationships and can provide high predictive accuracy. However, it is computationally intensive and can be prone to overfitting if not properly tuned.
Support Vector Regressor Support Vector Regressor (SVR) is effective in high- dimensional spaces and is robust against overfitting, especially in cases where the number of dimensions exceeds the number of samples. It can model non-linear relationships using kernel functions but requires careful tuning of hyperparameters.
Optimized Random Forest Regressor An optimized Random Forest Regressor was developed using Grid Search to fine-tune hyperparameters, enhancing model performance by identifying the best combination of parameters.
Ensemble Model (Linear Regression + Random Forest) The Ensemble Model combines predictions from both Linear Regression and Random Forest Regressor by av- eraging, aiming to leverage the strengths of both models for improved predictive perfor- mance.
Model evaluation
The evaluation of the trained models was conducted using several performance metrics, including Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), R-squared (R2), Mean Bias Deviation (MBD), and Willmott’s Index of Agree- ment (d). These metrics provide insights into the accuracy and robustness of the mod- els. Additionally, scatter plots comparing predicted and actual values were generated to visually assess the models’ performance. Furthermore, a comparison of the models’ performance was made using bar plots to illustrate the differences in evaluation metrics.
Performance metrics
-
Mean Absolute Error (MAE): Measures the average magnitude of errors in a set of predictions, without considering their direction.
-
Root Mean Squared Error (RMSE): Represents the square root of the average squared differences between predicted and actual values.
-
R-squared (R2): Indicates the proportion of the variance in the dependent vari- able that is predictable from the independent variables.
-
Mean Bias Deviation (MBD): Measures the average bias in the predictions.
-
Willmott’s Index of Agreement (d): Quantifies the degree of model prediction error, ranging from 0 (no agreement) to 1 (perfect agreement).
Regression metrics
The regression models were evaluated based on the aforementioned metrics. Table 8 presents the performance metrics for each regression model, including Linear Regression, Random Forest Regressor, Gradient Boosting Regressor, Support Vector Regressor, Op- timized Random Forest Regressor, and the Ensemble Model combining Linear Regression and Random Forest Regressor.
Scatter plots of predicted vs actual values
Scatter plots were generated to compare the predicted values against the actual values for each of the three target properties: Unit weight, Cohesion, and Angle of friction. These plots help in visually assessing the accuracy of the predictions.
Comparison of model performance
A comparative analysis of the regression models’ performance was conducted using bar plots (Figure 6). These plots illustrate the differences in MAE, RMSE, R2, MBD, and Willmott’s d metrics, highlighting the strengths and weaknesses of each model in pre- dicting geotechnical properties.
Bootstrap uncertainty estimation
Bootstrap methods were employed to estimate the uncertainty in the Random Forest Regressor’s R2 score. A 95% confidence interval was derived using 300 bootstrap samples, resulting in an R2 confidence interval of [0.7111, 0.9088]. This interval provides a range within which the true R2 value is expected to lie with 95% confidence.
SHAP analysis
SHAP (SHapley Additive exPlanations) was utilized to interpret the feature importance and understand the influence of each feature on the model’s predictions. SHAP values were computed for the Random Forest Regressor, providing insights into how features like cohesion, friction angle, and soil moisture contribute to slope stability predictions.
SHAP for random forest regressor
For the Random Forest Regressor, SHAP values were generated to assess feature impor- tance. The summary plots (Figure 7 and Figure 8) highlight the most influential features impacting the model’s predictions.
The SHAP bar summary plot indicates that Cohesion (kPa) pre, Unit weight (kN/m3) pre, and Angle of friction (degrees) pre are the most significant predictors influencing the Random Forest Regressor’s predictions. The beeswarm summary plot further illustrates the distribution of SHAP values across the dataset, showing how each feature contributes positively or negatively to the prediction.
Results
Introduction
This section delves into the study’s findings, encompassing the analysis of pre- and post- monsoon geotechnical properties, the performance of the developed models in predicting these properties, and a thorough examination of the models’ predictions. Statistical metrics and visual aids discussed in the Model Evaluation section are used to substantiate these results.
Changes in geotechnical properties
The first part of the results focuses on the changes in geotechnical properties due to the monsoon.
Figure 9 shows the distribution of percentage changes in unit weight, cohesion, and angle of friction. The histograms indicate that the majority of the changes are within a specific range, with unit weight and cohesion showing moderate variability, while angle of friction exhibits a wider range of changes.
Performance of predictive models
Building on the evaluation metrics described in the Model Evaluation section (MAE, RMSE, R2), this section assesses the performance of the various regression models in predicting post-monsoon geotechnical properties.
Linear regression model
The Linear Regression model’s performance metrics are summarized in Table 8. The scatter plots in Figure 10 show the predicted vs. actual values, indicating a high degree of accuracy with R2 values approaching 0.86.
Random forest regression model
The Random Forest Regression model’s performance metrics are also summarized in Table 8. The scatter plots in Figure 10 demonstrate a reasonable fit with R2 values of approximately 0.84, indicating good predictive capability, albeit slightly lower than the Linear Regression model.
Detailed analysis of model predictions
To provide a deeper understanding of the models’ performance, a detailed analysis of the predictions for each geotechnical property was conducted. This analysis includes examining the distribution of residuals (differences between actual and predicted values) and assessing the consistency of the models across different data subsets.
Analysis of residuals
Residuals provide insight into the accuracy and bias of the models’ predictions. By analyzing the residuals, we can identify patterns and potential areas where the models may need improvement.
Figure 11 shows the residuals for the Random Forest model’s predictions of unit weight. The residuals are small and evenly distributed, indicating high accuracy and low bias. Similar residual plots for cohesion and angle of friction show consistent results, confirming the model’s reliability.
Consistency across data subsets
To further evaluate the models’ robustness, we assessed their performance across different subsets of the data. This involved splitting the data into various categories (e.g., high vs. low pre-monsoon values) and analyzing the models’ predictions within each category.
Figure 12 illustrates the performance comparison between Linear Regression and Ran- dom Forest models across different evaluation metrics. The bar plots show that the Linear Regression model consistently outperforms the Random Forest model in terms of MAE and RMSE, while maintaining a higher R2 value, indicating better predictive accuracy and model fit.
SHAP analysis insights
SHAP analysis provided a granular understanding of feature importance across different models. For the Random Forest Regressor, features such as Cohesion (kPa) pre, Unit weight (kN/m3) pre, and Angle of friction (degrees) pre emerged as the most influential in predicting post-monsoon properties. This aligns with domain knowledge that these properties are critical determinants of slope stability.
Discussion
Interpretation of regression metrics
The performance metrics indicate that the Linear Regression model and Support Vector Regressor are the most effective models for predicting post-monsoon geotechnical prop- erties, as evidenced by their lower MAE and higher R2 values. The Random Forest Regressor, while robust, showed slightly lower performance compared to Linear Regres- sion in this specific context.
The Ensemble Model, which combines Linear Regression and Random Forest Regres- sor predictions, further enhances the predictive performance, achieving a MAE of 0.2682 and an R2 of 0.8653. This suggests that leveraging multiple models can capture different aspects of the data, leading to more accurate and reliable predictions.
SHAP analysis insights
SHAP analysis provided a detailed understanding of feature importance for the Random Forest Regressor. Cohesion (kPa) pre, Unit weight (kN/m3) pre, and Angle of friction (degrees) pre were identified as the top contributors to the model’s predictions. This aligns with the fundamental principles of geotechnical engineering, where cohesion and friction angle are critical parameters influencing soil strength and slope stability.
The SHAP beeswarm plot (Figure 8) illustrates how different feature values impact the model’s output, providing insights into the directional influence (positive or negative) of each feature on the predicted post-monsoon properties.
Implications for slope stability assessments
The superior performance of the Linear Regression model and the insights gained from SHAP analysis underscore the potential of machine learning techniques in geotechnical engineering. Accurate predictions of post-monsoon geotechnical properties enable proac- tive risk management, allowing engineers to implement targeted stabilization measures and enhance the economic viability and safety of mining operations.
Furthermore, the integration of SHAP analysis facilitates the interpretability of com- plex models, bridging the gap between machine learning and traditional engineering prac- tices. This enhanced interpretability is crucial for gaining the trust of engineering pro- fessionals and ensuring the practical applicability of the models.
Future work and recommendations
Future research should focus on:
-
Incorporating real-time monitoring data to enable dynamic slope stability assess- ments.
-
Exploring more advanced machine learning models, such as deep learning architec- tures, to capture complex feature interactions.
-
Expanding the dataset to include a wider variety of geological and environmental conditions, enhancing model generalizability.
-
Integrating remote sensing data to improve feature richness and model accuracy.
By addressing these areas, the predictive capabilities and practical applications of machine learning in geotechnical engineering can be significantly enhanced.
Conclusion
Restatement of topic and objectives
This study focused on investigating the impact of monsoon rains on soil properties and assessing the effectiveness of machine learning models in predicting these changes, with a specific emphasis on iron ore mines in Goa, India. The primary objectives were to collect comprehensive geotechnical data, perform exploratory data analysis, engineer relevant features, develop and evaluate machine learning models, and explore the conversion of the regression problem into a risk classification task.
Summary of key findings and their significance
The analysis revealed that monsoon rains significantly alter key geotechnical properties such as unit weight, cohesion, and friction angle. These findings emphasize the critical role of environmental factors in slope stability assessments. Among the models evaluated, the Linear Regression model demonstrated superior performance, offering higher accu- racy and robustness in predicting post-monsoon soil properties compared to the Random Forest model. This suggests that simpler, more interpretable models can effectively cap- ture essential patterns in geotechnical data, providing reliable predictions for engineering applications.
The integration of AI-powered systems into geotechnical engineering practices pro- vides significant advantages, including improved accuracy, robustness, efficiency, and adaptability over traditional methods. These systems enable engineers to leverage en- hanced predictive capabilities, facilitating more effective risk management strategies. This advancement is crucial for developing more resilient infrastructure, particularly in regions subjected to heavy rainfall.
Implications for practice
The findings underscore the importance of accounting for environmental factors, such as rainfall, in slope stability assessments. The superior performance of the Linear Regression model indicates that advanced machine learning techniques can significantly enhance the accuracy and reliability of geotechnical predictions. This integration of AI into geotechni- cal engineering practices paves the way for proactive and resilient approaches to managing geotechnical risks, ultimately contributing to safer and more efficient infrastructure de- velopment.
Future research directions
To improve the generalizability and applicability of the findings, future research should expand the dataset to include diverse geographical locations and soil types. Additionally, incorporating other environmental factors such as temperature, humidity, and wind speed
could provide a more comprehensive understanding of their influence on geotechnical properties. Exploring more advanced machine learning techniques and implementing real-time data collection would further refine AI-powered systems, making them more applicable and accurate in diverse geotechnical scenarios.
By addressing these areas, future research can continue to improve the integration of AI into geotechnical engineering practices, ultimately contributing to safer and more efficient infrastructure development.
Data availability
All datas are provided within the manuscript.
References
Indian Minerals Yearbook (Part-I: General Reviews) 60th Edition, Indian Bu-reau of Mines, Ministry of Mines, Government of India, (2021).
Indian Minerals Yearbook (Part-III: Mineral Reviews) 59th Edition on Iron Ore (Advance Release), Indian Bureau of Mines, Ministry of Mines, Government of India, (2020).
Investing In Indian Steel Sector- Make in India Initiatives, National Steel Policy (NSP). Website: www.steel.gov.in (2017).
Foreign direct investment in India. Website: https://www.investindia.gov.in/foreign- direct-investment.
Dash, A. K. et al. Study and analysis of accidents due to wheeled trackless transportation machinery in indian coal mines – Identification of gap in current investigation system. Procedia Earth Planetary Sci. 11, 539–547 (2015).
Wang, C. C. et al. Parametric studies of disturbed rock slope stability based on fi- nite element limit analysis methods. Comput. Geotech. 81, 155–166 (2017).
Prakash, B. “Design of Stable Slope for Open Cast Mines.” B.Tech Report, National Institute of Technology, Rourkela, India, (2009).
Intrieri, E. et al. Forecasting the time of failure of landslides at slope-scale: A literature review. Earth Sci. Rev. 193, 333–349 (2019).
Rahardjo, H. et al. Effects of groundwater table position and soil properties on stability of slope during rainfall. J. Geotech. Geoenviron. Eng. 136, 1555–1564 (2010).
Basile, A. et al. Soil hydraulic behaviour of a selected benchmark soil involved in the landslide of Sarno 1998. Geoderma 117, 331–346 (2003).
Kolapo, P. et al. An Overview of slope failure in mining operations. Geosciences 2(2), 350–384 (2022).
Rahardjo, H., Li, X. W., Toll, D. G. & Leong, E. C. The effect of antecedent rainfall on slope stability. Geotech. Geol. Eng. 19, 371–399 (2001).
Directorate of Mines and Geology, Government of Goa, Ministry of Mines, Govern- ment of India. Website: https://dmg.goa.gov.in.
Ram, A., Santha, and Goyal, S. P. “Pit Slope Failure Problems in Goan Iron Ore Mines, Goa, India.” International Conference on Case Histories in Geotechnical Engineering, Arlington, VA, (2008).
Biswal, R.M. Physico-mechanical Characterization of Mixed Materials forming Coal Mine Overburden Dumps. Indian Institute of Technology, ISM, Dhanbad, (2018).
Smith, J. & Brown, L. Impact of Monsoon rains on slope stability. J. Geotech. Eng. 12(3), 233–245 (2015).
Cheng, Y. & Yip, C. A global procedure for stability analysis of slopes based on the comparison of limit equilibrium methods. J. Geotech. Geoen- vironmental Eng. 133(12), 1544–1555 (2007).
Senthilkumar, V., Chandrasekaran, S. & Maji, V. Geotechnical char- acterization and analysis of rainfall-induced 2009 landslide at Marappalam area of Nilgiris district, Tamil Nadu state, India. Landslides 14, 699–711. https://doi.org/10.1007/s10346-017-0839-2 (2017).
Davis, K. & Wilson, H. Exploratory Data Analysis in Geotechnical Engineering. Data Sci. J. 21(4), 345–359 (2017).
Evans, G. & Martin, S. Predictive Modeling of Slope Stability Using Regres- sion Techniques. Eng. Appl. Artif. Intell. 27(5), 678–687 (2019).
Anderson, P. & Roberts, C. Classification Approaches in Geotechnical Risk As- sessment. Risk Anal. J. 18(3), 223–234 (2020).
Rahardjo, H. et al. Infiltration effects on stability of a residual soil slope. Comput. Geotech. 26, 145–165 (2000).
Matsushi, Y. et al. Mechanisms of shallow landslides on soil-mantled hill slopes with permeable and impermeable bedrocks in the Boso Peninsula, Japan. Geomor- phology 76, 92–108 (2006).
Johnson, M. & Green, D. Machine Learning Techniques for Slope Stability Anal- ysis. Comput. Geotech. 32(1), 89–98 (2018).
Williams, R. & Taylor, P. Geotechnical Properties of Soils and Rocks. Geotech- nical Eng. J. 15(2), 112–125 (2016).
Brown, L. & Wilson, H. Post-Monsoon changes in soil properties and their impact on slope stability. Environ. Geotech. 14(2), 189–199 (2021).
Bishop, A. W. The use of the slip circle in the stability analysis of slopes. Geotechnique 5(1), 7–17 (1955).
Janbu, N. “Slope stability computations.” In Embankment-dam engineering: Casagrande volume, Wiley, 47–86. (1973).
Spencer, E. A method of analysis of the stability of embankments assuming parallel inter-slice forces. Geotechnique 17(1), 11–26 (1967).
Morgenstern, N. R. U. & Price, V. E. The analysis of the stability of general slip surfaces. Geotechnique 15(1), 79–93 (1965).
Zienkiewicz, O. C. & Taylor, R. L. The finite element method for solid and structural mechanics (Elsevier, 2005).
Acknowledgements
The authors are sincerely thankful to Dr. T. Thimmaiah Institute of Technology, Kolar Gold Fields, Karnataka, India for permitting the use of the slope stability analysis soft- ware available in the Department of Mining Engineering for the study. The authors also acknowledge the support received from NIRM, Kolar Gold Fields, Karnataka, India for providing the initial support.
Author information
Authors and Affiliations
Contributions
Conceptualization, J.G.; methodology, J.G.; software, J.G. and M.M.; formal analysis, J.G. and M.M.; investigation, J.G. and M.M.; resources, J.G. and P.S.P.; data curation, J.G. and M.M.; writing—original draft preparation, J.G.; writing—review and editing, J.G. and P.S.P.; visualization, J.G. and M.M.; supervision, P.S.P.; project administration, J.G.;. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gladious, J., Paul, P.S. & Mukhopadhyay, M. Machine learning based prediction of geotechnical parameters affecting slope stability in open-pit iron ore mines in high precipitation zone. Sci Rep 15, 21868 (2025). https://doi.org/10.1038/s41598-025-99026-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-99026-4
Keywords
This article is cited by
-
Hybrid machine-learning and optimization models for precise determination of pore pressure changes in subsurface reservoirs
Computational Geosciences (2025)














