Hybrid Harris hawks-optimized random forest model for detecting multi-element geochemical anomalies related to mineralization

Sabbaghi, Hamid; Fathianpour, Nader

doi:10.1038/s41598-025-07534-0

Download PDF

Article
Open access
Published: 02 July 2025

Hybrid Harris hawks-optimized random forest model for detecting multi-element geochemical anomalies related to mineralization

Hamid Sabbaghi^1,2 &
Nader Fathianpour¹

Scientific Reports volume 15, Article number: 23662 (2025) Cite this article

999 Accesses
2 Citations
Metrics details

Subjects

Abstract

Reliable recognition of geochemical anomalies linked to ore deposits is one of the most significant challenges in mineral exploration. Several advanced machine learning (AML) algorithms have recently been applied to recognize multi-element geochemical anomalies. Performance of the AML algorithms are extremely dependent to values of their hyperparameters. Because, conclusions of their application can significantly be differed tuning hyperparameters. Tuning hyperparameters through trial-and-error way is a labor-intensive and time-consuming procedure which is not mostly eventuated to reliable results. In this regard, applying an AML model decreases training time and assists to achieve optimized values of hyperparameters yielding reasonable potential maps. Hence, execution of an AML model mitigates the biasness problem and uncertainties with recognition of multi-element geochemical anomalies. In this study, Harris hawks optimization (HHO) algorithm was employed to optimize known hyperparameters of the random forest (RF) method for detecting multi-element geochemical anomalies related to mineralization occurrences in the Feyzabad district of the Razavi Khorasan province, NE Iran. This research demonstrates that Harris hawks optimized random forest (HHORF) model is a vigorous procedure to identify multi-element geochemical anomalies. Because, the HHORF model has recognized 86.53% mineralization occurrences through 30% corresponding area while the RF method has catched 80.14% mineralization occurrences up via same corresponding area.

Geologically-constrained GANomaly network for mineral prospectivity mapping through frequency domain training data

Article Open access 14 March 2024

Comparison of manifold learning algorithms for identifying geochemical anomalies associated with copper mineralization

Article Open access 12 November 2025

Utilizing machine learning for flow zone indicators prediction and hydraulic flow unit classification

Article Open access 20 February 2024

Introduction

Decreasing uncertainty of multi-element geochemical anomaly mapping is a challenging procedure. Decreasing uncertainty is so necessary for geochemical anomaly mapping in a study area. Because, recognition of multi-element geochemical anomalies can facilitate to detect hidden deposits¹. In regional scale, multi-element geochemical anomaly detection is performed employing stream sediments geochemical data. The stream sediments geochemical data is extremely under efficacy of complex geological features^2,3,4,5. Hence, this data is a nonlinear multivariate input which requires capable processing models^5,6. Whereas, traditional procedures have not necessary capability to process them but advanced machine learning (AML) frameworks are appropriate substitutes to perform this task^{1,3,7,8,9,10,11,12,13,14}. Among applied machine learning models, random forest (RF), artificial neural networks and support vector machine were the most useful methods for multi-element geochemical anomaly detection^15,16,17. The RF method is a developed form of decision trees which can be applied for classification and regression^18,19,20,21. Three known hyperparameters of the RF comprising number of trees (NT), number of splits (NS) and depth (D) extremely need to be optimized for reduction of uncertainties in multi-element geochemical anomaly detection. Although, applied ML models have better conclusions than traditional methods but majority users tune their hyperparameters through performing trial-and-error procedure. While, applying trial-and-error is an onerous and time-consuming way which can not be eventuated to reliable results^22,23,24. Fortunately, numerous nature-inspired optimization techniques have been designed to remove trial-and-error procedure of tuning stage of the ML hyperparameters in recent decade. These techniques commonly inspire by the social life behavior of creatures. In this regard, firefly algorithm^25,26, dolphin echolocation^27,28, cuckoo search²⁹, bat algorithm³⁰, whale optimization algorithm²⁴ and grey wolf optimizer³¹, wild horse optimizer³², Harris hawks optimization (HHO) algorithm²³ and so on were introduced to optimize ML models applied in medical, industrial, agriculture and geoscience fields^33,34,35,36. Indeed, these optimization techniques have been widely regarded due to i) inspiring from nature, ii) considering problems as black boxes, iii) avoidance of falling to great local optima, iv) containing gradient-free mechanism³⁷. Optimization techniques are usually selected based on: 1) which does hyperparameter of a specific ML model need to be optimized? and 2) should that hyperparameter be maximized or minimized? In recent decade, several AML frameworks were constructed hybridizing the ML models with nature-inspired optimization techniques to recognize multi-element geochemical anomalies^38,39,40. Accordingly, research proposal is integration of the RF model due to its popularity, computational attractiveness, great inference power with the HHO algorithm due to its robust conclusions in comparison to 11 optimization algorithms²³. In fact, hybridization of the RF model with HHO was applied to eliminate trial-and-error of training stages and decreases uncertainties of geochemical anomaly mapping. It is noteworthy, results demonstrate effect of eliminating trial-and-error procedure in tuning hyperparameters of the RF model. Difference of performance of the AML model applied to conventional RF model is rather than 6% which can be confirmed considering success-rate curves.

Region of interest

A main mineral potential zone from NE Iran is the Feyzabad district. The Feyzabad is known as a high potential area of the iron oxide copper–gold (IOCG) and vein-type Au-Cu mineralizations which is restricted between 58° 30´ 0˝ E and 59° 0´ 0˝ E longitudes and 35° 0´ 0˝ N and 35° 30´ 0˝ N latitudes^4,41. Its significant mineralization occurrences are Zarmehr (IOCG), Tanourjeh (vein-type), Baharieh (IOCG), Sarsefidal (IOCG), Kamarmard (IOCG) and Kalateh timor. This area is a segment of the boundaries of the internal Iranian microcontinent which places between the Loot Block and the Central Iran zones. It is seen, numerous faults and fractures are related to mineralization occurrences in this area. In this regard, the darouneh fault as the longest fracture plays a significant role in forming deposits of the Feyzabad district. Granodiorite, diorite, pyroxene andesite and diabase gabbroic rock are the most significant volcanic structures which are frequently observed there (Fig. 1). Also, alternations of sedimentary and carbonated rock units comprising reddish and sandstone conglomerate, gypsiferous marl, dolomitic limestone, silty shale and quartz latite which belong to middle- to upper-Cambrian era and accompany mentioned volcanic rock units (Fig. 1)⁴². The vein-type Au-Cu and the IOCG deposits are mainly hosted by diorite and granodiorite intrusions of Eocene–Oligocene age in this area. Elements Au, Cu, Bi, Pb, Zn, Sb and As demonstrated spatial correlation to mineralization occurrences. But, Appropriate pathfinder elements Au, Cu, Sb, Zn and Pb with specific thresholds were chosen based on a novel deep framework presented by^7,18 to trace mineralization occurrences in the study area⁴³.

Methods and materials

Conventional random forest method

The RF method is applied for classification and regression which was first introduced by⁴⁴. Training process in the RF method is performed applying “bagging” procedure. This method creates many decision trees and integrates them to predict information precisely. Each decision tree is trained by a random sample of inputs. Within the forest, a sample is classified into a class which get majority votes of overall decision trees. A schematic flowchart of the conventional RF classification algorithm has been shown in Fig. 2. Three hyperparameters of the RF comprising NT, NS and D should be tuned to present reliable data classification. In this concept, increasing NT value can increase accuracy of classification but value of the NT should be optimized because additional NT wastes training time. Unsuitable NS value can create under-fitting problem in prediction procedure⁴⁵. Because, each decision tree with rather splits is considered deeper. Also, unsuitable D value can create over-fitting problem in training procedure⁴⁵. More information of the conventional RF methodology can be investigated in^{10,40,44,46,47,48}.

Harris hawks optimization algorithm

Superb performance of the HHO algorithm in comparison to 11 powerful optimization algorithms was demonstrated by²³. Harris hawk is a predator bird in the Arizona, USA. Because, this creature seeks, attacks and shares victims with other family members. Nature-inspired algorithms generally include two stages comprising exploration and exploitation. In the exploration stage, Harris hawks seek and find the prey from the highest altitude applying their eyes in a desert region. In this regard, the best location of the Harris hawks to the prey is the closest distance to it which can be mathematically simulated as follow:

$$X \left( {t + 1} \right) = \left\{ {\begin{array}{*{20}c} {X_{rand} \left( t \right) - r_{1} \left| {X_{rand} \left( t \right) - 2r_{2} X\left( t \right)} \right| q \ge 0.5} \\ {\left( {X_{prey} \left( t \right) - X_{m} \left( t \right)} \right) - r_{3} \left( {LB + r_{4} \left( {UB - LB} \right)} \right) q < 0.5} \\ \end{array} } \right.$$

(1)

where r₁, r₂, r₃, r₄ and q (perching chance of the Harris hawks) are the random values in range (0, 1) and LB, UB, X_prey, X_rand, X(t) and X(t + 1) are lower bound, upper bound, location of the prey in the t iteration, randomly chosen Harris hawk in the current population, location of the Harris hawk in the t iteration and t + 1 iteration respectively. Also, X_m(t) as mean location of the Harris hawks should be calculated as follow:

$$X_{m} \left( t \right) = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} X_{i} \left( t \right)$$

(2)

where X_i(t) is location of each Harris hawk in iteration t and N is number of Harris hawks. It is noteworthy, fleeing condition of the prey can change response of the Harris hawks in the HHO algorithm. In this case, escaping energy factor can be defined using Eq. (3). While, escaping energy parameter be $\text{E}\ge 1$, seeking for the prey is continued by Harris hawks but the exploitation behavior is started when escaping energy parameter is $\text{E}<1$.

$$\text{E }= {2\text{E}}_{0}(1-\frac{\text{t}}{\text{T}})$$

(3)

where T is maximum number of iteration. E and E₀ are escaping energy in iteration t and initial escaping energy respectively. For each iteration, E₀ varies inside range (− 1, 1). Decreasing E₀ from 0 to − 1 demonstrates exhausting the prey and increasing E₀ from 0 to 1 demonstrates strengthening of the prey. In the exploitation stage, Harris hawks have targeted their prey and intend to attack it. Consequently, targeted prey tries to flee the surprising pounce by performing random jumps. In this regard, the prey chance for successful feeling can be considered as r < 0.5 and while fleeing be not successful r ≥ 0.5. Based on different fleeing chances and escaping energy factor of the prey, four possible strategies are mathematically simulated in the HHO algorithm. At the first strategy ($r \ge 0.5$ and $\left| E \right| \ge 0.5$), Harris hawks intend to tire the prey before attacking because the prey still has enough escaping energy but it will be eventually capitulated. In this strategy which named soft besiege, location of the Harris hawks in iteration t + 1 is expressed as follow:

$$X (t + 1) = \left(\Delta X(t)\right)-E\left|J{X}_{prey}\left(t\right)-X(t)\right|$$

(4)

$$\Delta X(t) = \left({X}_{prey}\left(t\right)-X\left(t\right)\right)$$

(5)

$$J=2(1-{r}_{5})$$

(6)

where r₅ is also a random value in (0, 1) and J is random jump strength of the prey during fleeing procedure which varies in each iteration randomly to model the nature of the prey movements. At the second strategy ($r \ge 0.5$ and $\left| E \right| < 0.5$), the prey has been capitulated and Harris hawk can apply surprising pounce. In this strategy (hard besiege), locations of the Harris hawks in iteration t + 1 can also be considered by following equation:

$$X (t + 1) = {X}_{prey}\left(t\right)-E\left|\Delta X(t)\right|$$

(7)

At the third strategy ($r < 0.5$ and $\left| E \right| \ge 0.5$), the prey has enough escaping energy and a soft besiege can still be performed before surprising pounce. This stage (soft besiege with progressive rapid dives) is rather intelligent than the first stage. Accordingly, locations of Harris hawks are updated via Eqs. (8–10):

$$\text{Y }= {X}_{prey}\left(t\right)-E\left|J{X}_{prey}\left(t\right)-X(t)\right|$$

(8)

The Harris hawks compare previous motion of the prey to their previous dive, whether was dive response appropriate or not? Accordingly, along deceptive movements of the prey, those also carry irregular, suddenly and rapid dives out. In²³ suggested an LF(x) function to consider various dives of the Harris hawks along zigzag deceptive movements of the prey during scape as follow:

$$Z = Y + S \times LF(x)$$

(9)

$$LF\left( x \right) = 0.01{ } \times { }\frac{u \times \sigma }{{\left| \omega \right|^{{\frac{1}{\beta }}} }},{{ \sigma }} = \left( {\frac{{\Gamma \left( {1 + \beta } \right) \times \sin \left( {\frac{\pi \beta }{2}} \right)}}{{\Gamma \left( {\frac{1 + \beta }{2}} \right) \times \beta \times 2^{{\left( {\frac{\beta - 1}{2}} \right)}} }}} \right)^{{\frac{1}{\beta }}}$$

(10)

where LF(x) named levy flight function and β value is equal to 1.5 and Г is Gamma function. Also, u, ω and S are the random value in range (0, 1) and a random vector respectively. Eventually, location of the Harris hawk in iteration t + 1 can be expressed as follow:

$$X (t + 1) = \left\{\begin{array}{c}Y if F\left(Y\right)<F(X\left(t\right))\\ Z if F\left(Z\right)<F(X\left(t\right))\end{array}\right.$$

(11)

At the forth strategy ($r<0․5$ and $\left|E\right|<0․5$), the prey has been exhausted and the escaping chance is very low. Accordingly, the Harris hawks apply hard besiege behavior with progressive rapid dives which can be simulated as:

$$X (t + 1) = \left\{\begin{array}{c}Y if F\left(Y\right)<F(X\left(t\right))\\ Z if F\left(Z\right)<F(X\left(t\right))\end{array}\right.$$

(12)

$$Y ={X}_{prey}\left(t\right)-E\left|{JX}_{prey}\left(t\right)-{X}_{m}\left(t\right)\right|$$

(13)

$$Z = Y + S \times LF(x)$$

(14)

Examples of the soft and hard besiege behaviors have been presented in Fig. 3.

Appropriate validation

Area under the receiver operating characteristic (ROC) curve (AUC) was employed as aggregated classification method to validate geochemical samples classified. The AUC value mostly lies in range [0.5, 1]. While, the AUC values be 0.5, performance of applied machine learning model is similar to guess randomly. Also, performance of applied machine learning model is completed if its AUC values be 1. In other word, applied machine learning model has been perfectly trained. The success-rate curve method was initially presented to evaluate spatial accuracy of targeting models employed by⁴⁹. In this validation tool, proportion of mineralization occurrences which have correctly placed within recognized anomaly zones is exhibited in the vertical axis. In contrast, proportion of the corresponding study area is exhibited in the horizontal axis. A diagonal gauge line is discriminant factor of inefficiency and efficiency of applied targeting model and its geochemical map produced. If, success-rate curve of an employed model be above the gauge line meaning a strong spatial correlation between its geochemical map produced and mineralization occurrences. While, success-rate curve of an employed model be below the gauge line meaning a weak spatial correlation between its geochemical map produced and mineralization occurrences. Indeed, each curve which be higher than other includes more prediction ability.

Geochemical sample preparation and analysis

The study area has dimensions of 44 × 54 km² which a dense sampling grid (1.4 × 1.4 km²) has been performed there. Stream sediments samples (1033) were collected to check changing rate of concentrations of 27 elements across the Feyzabad district. Collected geochemical samples were analyzed using a combined inductively coupled plasma-optic emission spectroscopy (ICP-OES) after a near-total 4-acid digestion (hydrochloric, nitric, perchloric, and hydrofluoric acids)⁵⁰. Also, analyzing precision (< 10%) was measured applying duplicated sub-samples for each 20 measurements.

Prepare training data

Classification of the stream sediments geochemical data is critical for producing requirement geochemical layers. Stream sediments geochemical data includes inherent closure problem^5,51. Hence, the centered log-ratio (clr) transformation was performed to eliminate data closure problem using Eq. (15).

$$clr \left( x \right) = \left( {\log \left( {\frac{{x_{1} }}{g\left( x \right)}} \right), \ldots ,\left( {\log \frac{{x_{D} }}{g\left( x \right)}} \right)} \right)$$

(15)

where x, x_D and g(x) are vector of the composition with D dimensions, Euclidean distances between distinct variables and geometric mean of the composition x respectively⁵². Then, data table values were transformed in range [0, 1]. Accordingly, a geochemical data table comprising transformed values of elements Au, Cu, Sb, Zn and Pb was classified to map geochemical anomalies in the study area. Indeed, a geochemical data table with 5 columns including transformed values of pathfinder elements with 1 column including their labels for whole of 1033 collected samples was constructed to train model (Fig. 4). For assigning pre-defined labels of training data, we applied several ranges of transformed values (Table 1). Accordingly, geochemical data table was consisted of the four class types comprising strong anomaly, weak anomaly, high background and low background based on suggested ranges and their labels were allocated to samples. Accordingly, for instance, a geochemical sample which is contained transformed values Au = 0.648, Cu = 0.703, Sb = 0.927, Zn = 0.811 and Pb = 0.806 has a great spatial correlation to close mineralization occurrences and is member of the strong anomaly population and achieves label 4 (Table 1). In this research, we also implemented a cost function of root mean squared error (RMSE) values during the training procedure based on Eq. (16).

$$RMSE =\sqrt{\frac{1}{n}(\sum_{i=1}^{n}({C}_{R}-{C}_{p}))}$$

(16)

where n, C_R and C_p are number of samples, allocated real class to sample and predicted class respectively.

Table 1 Classes defined for training data with criterions.

Full size table

Results and discussion

Training conventional RF

In this regard, the MATLAB R2022a environment was applied to conduct conventional RF (CRF) and Harris hawks optimized random forest (HHORF) networks. 70% samples (in-bag data) were randomly applied to train and 30% samples (out-of-bag data) were employed to test networks designed. For training conventional RF network, relevant hyperparameters NT, NS and D were experimentally assigned in range 1–300, 1–8 and 1–4 respectively. These tuned hyperparameters using trial-and-error procedure were presented in Table 2. Optimum value of the parameter NT (280) was applied to train CRF with tenfold cross-validation. It is noteworthy, increasing the NT value is not necessarily eventuated to decrease the uncertainties but also increases the calculating time. Furthermore, NS and D values were chosen 5 and 2 respectively while these parameters include lower impact on the CRF performance (Table 2).

Table 2 Tuned hyperparameter values of the CRF and HHORF models.

Full size table

Training HHORF and comparison

A schematic flowchart of the hybridization of the HHO algorithm with the RF method has been presented in Fig. 4. Here, the HHO is fittingly attracted to optimal solutions in the best locations of the search spaces, and the number of random parameters is restricted in the HHO. Therefore, the initial population of Harris hawks is significant in this algorithm. Before optimizing hyperparameters of the HHORF model, appropriate number of iterations (100), Harris hawks population (30), lower bound value (100) and upper bound value (1) with tenfold cross-validation were set. In the HHORF procedure, the best location of the prey (X_prey) is considered as relevant hyperparameters of the RF method in the selected features for all cross validation folds. Tuned hyperparameters of the RF by employing HHO algorithm were demonstrated as NT = 636, NS = 7 and D = 3 (Table 2). Cost function of optimizing procedure for all iterations has also been exhibited in Fig. 5. It is displayed, minimizing procedure is finalized after 58th iteration with a cost value 0.466. Indeed, stable part of the cost function after 58th iteration with the lowest cost value can insure perfect tuning of the model hyperparameters. The AUCs presented in Fig. 6 can compare prediction ability of the trained models in this research. It is cleared, the HHORF has rather accuracy than the CRF method. The AUC values of classified samples (class 4 = 0.931, class 3 = 0.937, class 2 = 0.943, class 1 = 0.925) by applying the HHORF (Fig. 6b) are greater than the AUC values of the classified samples (class 4 = 0.811, class 3 = 0.916, class 2 = 0.802, class 1 = 0.797) through the CRF (Fig. 6a; Table 3).

Table 3 The AUC values for geochemical data classified applying the CRF and HHORF models.

Full size table

Multi-element geochemical anomaly mapping and validation

Testing data classified were applied to map multi-element geochemical anomalies through inverse distance weighted (IDW) interpolation tool in the GIS software 10.6 toolbox (Fig. 7). Due to greater classification accuracy of the HHORF procedure, map plotted of its samples classified obviously presents better prediction of the geochemical anomalies linked to the mineralization occurrences. In other word, high potential zones of multi-element geochemical map produced through HHORF approach (Fig. 7b) has catched up more mineralization occurrences. This claim can also be demonstrated regarding the success-rate curves achieved for both plots (Fig. 8). In this concept, two cases are regardable. The first, both success-rate curves are meaningfully above the diagonal gauge line meaning both produced maps have acceptable ability in predicting geochemical anomalies. The second, success-rate curve of the produced map by classified samples of the HHORF procedure is above other curve. It is meaning, prediction ability of the HHORF procedure is greater due to higher proportion of the mineralization occurrences have detected in lower proportion of the corresponding area. For instance, the HHORF has predicted 86.53% mineralization occurrences in the 30% corresponded area while the CRF has predicted 80.14% mineralization occurrences in same corresponding area (Fig. 8).

Conclusions

In this research, a hybridized random forest model was successfully constructed to classify multi-element geochemical data table linked to the mineralization occurrences in the Feyzabad district, NE Iran. Accordingly, conclusion remarks are presented as follow:

The CRF is a powerful and popular method for classification of geochemical data which its relevant hyperparameters extremely require to be optimized for achieving reliable conclusions.
Optimization of hyperparameters of the CRF method is time consuming and onerous while trial-and-error procedure be executed.
A nature-inspired procedure named Harris hawks optimization algorithm could reliably tune hyperparameters of the CRF without wasting a lot time.
Advanced machine learning frameworks can be constructed hybridizing appropriate optimization algorithms with powerful machine learning models.
Advanced machine learning frameworks can meaningfully decrease uncertainties yielding reasonable geochemical anomaly maps.

Data availability

The datasets used during the current study available from the corresponding author on reasonable request.

References

Zuo, R. & Carranza, E. J. M. Support vector machine: A tool for mapping mineral prospectivity. Comput. Geosci. 37(12), 1967–1975 (2011).
Article ADS CAS Google Scholar
Zuo, R. et al. A geologically constrained variational autoencoder for mineral prospectivity mapping. Nat. Resour. Res. 31(3), 1121–1133 (2022).
Article Google Scholar
Zhang, C. et al. A geologically-constrained deep learning algorithm for recognizing geochemical anomalies. Comput. Geosci. 162, 105100 (2022).
Article Google Scholar
Sabbaghi, H., Tabatabaei, S. H. & Fathianpour, N. Geologically-constrained GANomaly network for mineral prospectivity mapping through frequency domain training data. Sci. Rep. 14(1), 6236 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Sadeghi, B., Molayemat, H. & Pawlowsky-Glahn, V. How to choose a proper representation of compositional data for mineral exploration? J. Geochem. Exp., p. 107425 (2024).
Qaderi, S., et al., Translation of mineral system components into time step-based ore-forming events and evidence maps for mineral exploration: Intelligent mineral prospectivity mapping through adaptation of recurrent neural networks and random forest algorithm. Ore Geol. Rev., p. 106537 (2025).
Sabbaghi, H. & Tabatabaei, S. H. Regimentation of geochemical indicator elements employing convolutional deep learning algorithm. Front. Environ. Sci. 11, 1076302 (2023).
Article Google Scholar
Sabbaghi, H. & Tabatabaei, S. H. A combinative knowledge-driven integration method for integrating geophysical layers with geological and geochemical datasets. J. Appl. Geophys. 172, 103915 (2020).
Article Google Scholar
Xiong, Y. & Zuo, R. Recognition of geochemical anomalies using a deep autoencoder network. Comput. Geosci. 86, 75–82 (2016).
Article ADS CAS Google Scholar
Wang, J., Zuo, R. & Xiong, Y. Mapping mineral prospectivity via semi-supervised random forest. Nat. Resour. Res. 29, 189–202 (2020).
Article Google Scholar
Sabbaghi, H. & Tabatabaei, S. H. Application of the most competent knowledge-driven integration method for deposit-scale studies. Arab. J. Geosci. 15(11), 1–10 (2022).
Article Google Scholar
Sabbaghi, H. & Tabatabaei, S. H. Execution of an applicable hybrid integration procedure for mineral prospectivity mapping. Arab. J. Geosci. 16(1), 1–13 (2023).
Article Google Scholar
Yousefi, M., Kamkar-Rouhani, A. & Carranza, E. J. M. Geochemical mineralization probability index (GMPI): A new approach to generate enhanced stream sediment geochemical evidential map for increasing probability of success in mineral potential mapping. J. Geochem. Exp. 115, 24–35 (2012).
Article CAS Google Scholar
Parsa, M. et al. Prospectivity modeling of porphyry-Cu deposits by identification and integration of efficient mono-elemental geochemical signatures. J. Afr. Earth Sci. 114, 228–241 (2016).
Article ADS CAS Google Scholar
Gonbadi, A. M., Tabatabaei, S. H. & Carranza, E. J. M. Supervised geochemical anomaly detection by pattern recognition. J. Geochem. Exp. 157, 81–91 (2015).
Article CAS Google Scholar
Geranian, H. et al. Application of discriminant analysis and support vector machine in mapping gold potential areas for further drilling in the Sari-Gunay gold deposit NW Iran. Nat. Resour. Res. 25(2), 145–159 (2016).
Article CAS Google Scholar
Abedi, M., Norouzi, G.-H. & Bahroudi, A. Support vector machine for multi-classification of mineral prospectivity areas. Comput. Geosci. 46, 272–283 (2012).
Article ADS CAS Google Scholar
Sabbaghi, H., Recognition of multi-element geochemical anomalies related to Pb–Zn mineralization applying upgraded support vector machine in the Varcheh district of Iran. Model. Earth Syst. Environ., p. 1–14 (2024).
Sakizadeh, M. & Milewski, A. Quantifying LULC changes in Urmia Lake Basin using machine learning techniques, intensity analysis and a combined method of cellular automata (CA) and artificial neural networks (ANN)(CA-ANN). Model. Earth Syst. Environ. 10(2), 2011–2030 (2024).
Article Google Scholar
Arshad, A. et al. Reconstructing high-resolution groundwater level data using a hybrid random forest model to quantify distributed groundwater changes in the Indus Basin. J. Hydrol. 628, 130535 (2024).
Article Google Scholar
Sabbaghi, H., Tabatabaei, S. H. & Fathianpour, N. Optimization of multi-element geochemical anomaly recognition in the Takht-e Soleyman area of northwestern Iran using swarm-intelligence support vector machine. Front. Earth Sci. 13, 1352912 (2025).
Article Google Scholar
Sabbaghi, H. & Tabatabaei, S. H. Data-driven logistic function for weighting of geophysical evidence layers in mineral prospectivity mapping. J. Appl. Geophys. 212, 104986 (2023).
Article Google Scholar
Heidari, A. A. et al. Harris hawks optimization: Algorithm and applications. Futur. Gener. Comput. Syst. 97, 849–872 (2019).
Article Google Scholar
Mirjalili, S., et al. Whale optimization algorithm: Theory, literature review, and application in designing photonic crystal filters. In Nature-inspired optimizers: theories, literature reviews and applications, p. 219–238 (2020).
Yang, X.-S. & Deb, S. Eagle strategy using Lévy walk and firefly algorithms for stochastic optimization. In Nature inspired cooperative strategies for optimization (NICSO 2010) 101–111 (Springer, 2010).
Chapter Google Scholar
Yang, X.-S., Firefly algorithm, stochastic test functions and design optimisation. arXiv preprint arXiv:1003.1409, 2010.
Kaveh, A. & Farhoudi, N. Dolphin monitoring for enhancing metaheuristic algorithms: Layout optimization of braced frames. Comput. Struct. 165, 1–9 (2016).
Article Google Scholar
Kaveh, A. & Farhoudi, N. A new optimization method: Dolphin echolocation. Adv. Eng. Softw. 59, 53–70 (2013).
Article Google Scholar
Yang, X.-S. & Deb, S. Cuckoo search via Lévy flights. In 2009 World congress on nature & biologically inspired computing (NaBIC). (2009. IEEE).
Yang, X.-S. A new metaheuristic bat-inspired algorithm. In Nature inspired cooperative strategies for optimization (NICSO 2010) 65–74 (Springer, 2010).
Chapter Google Scholar
Mirjalili, S., Mirjalili, S. M. & Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014).
Article Google Scholar
Naruei, I. & Keynia, F. Wild horse optimizer: A new meta-heuristic algorithm for solving engineering optimization problems. Eng. Comput. 38(Suppl 4), 3025–3056 (2022).
Article Google Scholar
Zareie, A., Sheikhahmadi, A. & Jalili, M. Identification of influential users in social network using gray wolf optimization algorithm. Expert Syst. Appl. 142, 112971 (2020).
Article Google Scholar
Saravanan, G. et al. Improved wild horse optimization with levy flight algorithm for effective task scheduling in cloud computing. J. Cloud Comput. 12(1), 24 (2023).
Article Google Scholar
Ali, E. Optimization of power system stabilizers using BAT search algorithm. Int. J. Electr. Power Energy Syst. 61, 683–690 (2014).
Article Google Scholar
Wachs-Lopes, G. A. et al. Recent nature-Inspired algorithms for medical image segmentation based on tsallis statistics. Commun. Nonlinear Sci. Numer. Simul. 88, 105256 (2020).
Article MathSciNet Google Scholar
Saremi, S., Mirjalili, S. & Lewis, A. Grasshopper optimisation algorithm: Theory and application. Adv. Eng. Softw. 105, 30–47 (2017).
Article Google Scholar
Roshanravan, B. et al. Cuckoo optimization algorithm for support vector regression potential analysis: An example from the Granites-Tanami Orogen, Australia. J. Geochem. Exp. 230, 106858 (2021).
Article CAS Google Scholar
Ghezelbash, R. et al. Incorporating the genetic and firefly optimization algorithms into K-means clustering method for detection of porphyry and skarn Cu-related geochemical footprints in Baft district, Kerman, Iran. Appl. Geochem. 148, 105538 (2023).
Article CAS Google Scholar
Daviran, M. et al. A new strategy for spatial predictive mapping of mineral prospectivity: Automated hyperparameter tuning of random forest approach. Comput. Geosci. 148, 104688 (2021).
Article CAS Google Scholar
Hoseinzade, Z. & Bazoobandi, M. H. Deep embedded clustering: Delineating multivariate geochemical anomalies in the Feizabad region. Geochemistry 84(4), 126208 (2024).
Article CAS Google Scholar
Sabbaghi, H. & Moradzadeh, A. ASTER spectral analysis for host rock associated with porphyry copper-molybdenum mineralization. J. Geol. Soc. India 91(5), 627–638 (2018).
Article CAS Google Scholar
Skirrow, R. G. Iron oxide copper-gold (IOCG) deposits–A review (part 1): Settings, mineralogy, ore geochemistry and classification. Ore Geol. Rev. 140, 104569 (2022).
Article Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Carranza, E. J. M. & Laborte, A. G. Data-driven predictive mapping of gold prospectivity, Baguio district, Philippines: Application of random forests algorithm. Ore Geol. Rev. 71, 777–787 (2015).
Article Google Scholar
Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2(3), 18–22 (2002).
Google Scholar
Zuo, R. Machine learning of mineralization-related geochemical anomalies: A review of potential methods. Nat. Resour. Res. 26, 457–464 (2017).
Article CAS Google Scholar
Gao, Y. et al. Mapping mineral prospectivity for Cu polymetallic mineralization in southwest Fujian Province, China. Ore Geol. Rev. 75, 16–28 (2016).
Article Google Scholar
Agterberg, F. P. & Bonham-Carter, G. F. Measuring the performance of mineral-potential maps. Nat. Resour. Res. 14, 1–17 (2005).
Article Google Scholar
Sabbaghi, H. A combinative technique to recognise and discriminate turquoise stone. Vib. Spectrosc. 99, 93–99 (2018).
Article CAS Google Scholar
Wang, W., Zhao, J. & Cheng, Q. Mapping of Fe mineralization-associated geochemical signatures using logratio transformed stream sediment geochemical data in eastern Tianshan, China. J. Geochem. Exp. 141, 6–14 (2014).
Article CAS Google Scholar
Aitchison, J. et al. Logratio analysis and compositional distance. Math. Geol. 32, 271–275 (2000).
Article Google Scholar

Download references

Acknowledgements

Authors intend to thank geological survey & mineral exploration organization from Iran for presenting geochemical samples collected to this manuscript.

Funding

The authors declare that no financial support was received for the research, authorship, and/or publication of this article.

Author information

Authors and Affiliations

Department of Mining Engineering, Isfahan University of Technology, Isfahan, 8415683111, Iran
Hamid Sabbaghi & Nader Fathianpour
School of Mining Engineering, College of Engineering, University of Tehran, North Kargar St, Tehran, 14395-515, Iran
Hamid Sabbaghi

Authors

Hamid Sabbaghi
View author publications
Search author on:PubMed Google Scholar
Nader Fathianpour
View author publications
Search author on:PubMed Google Scholar

Contributions

H.S. conceived the research idea, developed the computer code, performed the numerical experiments, derived the models, wrote the paper and interpretation of the results. NF: supervision, writing–review and editing.

Corresponding author

Correspondence to Hamid Sabbaghi.

Ethics declarations

Competing interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sabbaghi, H., Fathianpour, N. Hybrid Harris hawks-optimized random forest model for detecting multi-element geochemical anomalies related to mineralization. Sci Rep 15, 23662 (2025). https://doi.org/10.1038/s41598-025-07534-0

Download citation

Received: 05 March 2025
Accepted: 16 June 2025
Published: 02 July 2025
Version of record: 02 July 2025
DOI: https://doi.org/10.1038/s41598-025-07534-0

Keywords

This article is cited by

Multi-element geochemical anomaly recognition applying geologically-constrained convolutional deep learning algorithm with Butterworth filtering of frequency domain information
- Hamid Sabbaghi
- Seyed Hassan Tabatabaei
- Nader Fathianpour
Scientific Reports (2025)