Introduction

Predation plays a crucial role in ecosystem dynamics, functioning as a population regulation mechanism that contributes to maintaining ecological balance. This interaction is fundamental for the structuring of biological communities, as controlling prey numbers prevents overpopulation and, consequently, resource depletion1. Predation can also promote ecological diversity by influencing species distribution and abundance, inducing selective pressures that drive species evolution and adaptation2,3. Investigating predator–prey relationships is an important component of ecosystem dynamic studies because it can identify factors that shape and structure communities4,5,6.

The risk of predation can be influenced by various factors, including those related to environmental conditions7. Certain aspects, such as marked seasonality, can induce different types of microhabitats and landscapes8. For instance, during the dry season in seasonal dry forests, vegetation cover is lost, which often serves as shelter for prey, affecting the use of shelters and, consequently, the risk of predation9. In addition, the reduced availability of food items and the decrease in water in the environment can lead to changes in the food consumed and different predation behaviors compared to those observed in more abundant scenarios10,11. Some organisms, for example, adopt generalist and opportunistic feeding habits as a way to expand the possibilities of utilizing available resources12.

In this context of water and food scarcity, the ability to explore a wide range of food sources offers an adaptive advantage to organisms. Dietary diversity enables more efficient access to nutrients and energy, reducing the effort required to find food and lowering exposure to potential predators (see13). Additionally, generalist feeding helps reduce competition for limited resources, as organisms do not rely on a single type of food. In contrast, specialists may become more vulnerable when environmental changes occur, as their food sources may decrease or disappear14. Therefore, feeding strategies serve as an adaptive response to both biotic challenges, such as predation and competition, and abiotic ones, such as the scarcity of natural resources15,16,17,18,19.

Among the many feeding strategies observed in lizards is the behavior known as saurophagy, which involves the attack and consumption of other lizards, be they individuals of different species or the same species, in both competitive and non-competitive contexts2,20,21,22,23,24,25,26,27. Invertebrates usually compose almost the entire diet of most lizard species28, but saurophagy was suggested to be one of the mortality causes in sympatric lizard populations29. Squamates, including lizards, are well-known to feed on other squamates30. However, the factors driving the evolution of saurophagy in squamates are poorly understood.

South America exhibits remarkable landscape heterogeneity, ranging from dense tropical rainforests to arid deserts, the grasslands and savannas of the Cerrado and Pampas, to the mountains of the Andes31,32. This environmental complexity results in a wide variety of habitats and ecological conditions capable of influencing the distribution, abundance, and interactions of species throughout their ranges33. This influence is particularly important for ectothermic animals, such as lizards, which rely on environmental conditions to regulate their body temperatures and develop their daily activity34,35. Tropical forests, with their warm and humid climates, support immense biodiversity and, consequently, higher productivity, while drier biomes, such as the Chaco and Caatinga, pose more complex physiological challenges for species36,37. These challenges arise from a seasonal climate regime, with low and restricted rainfall occurring over just a few months of the year and frequent temperature changes8, which in turn lead to reduced water and food availability38,39,40. Such conditions could promote formerly rare behaviors, especially in more environmentally sensitive organisms such as lizards.

The first official record of saurophagy in South American lizards was published 32 years ago (see41). Since then, reports of this behavior have increased considerably in this region, with occurrences documented in various lizard families, such as Gekkonidae, Gymnophthalmidae, Liolaemidae, Phyllodactylidae, Scincidae, Teiidae, and Tropiduridae42. Despite the many reports of saurophagy in South America, no effort has yet been made to compile such data in search of occurrence patterns and their causes. Thus, we remain uncertain as to whether and how abiotic and biotic factors influence the occurrence patterns of this behavior. Herein, we investigate saurophagy in South America by (1) uncovering biological and spatial occurrence patterns in its occurrence, (2) testing for any climatic contribution to the frequency of saurophagy, and (3) identifying key factors influencing predator decision-making.

Results

Families, prey age and type of interaction

We gathered 127 records of saurophagy from literature, across 47 predator species belonging to nine families: Diploglossidae, Gekkonidae, Gymnophthalmidae, Leiosauridae, Liolaemidae, Phyllodactylidae, Scincidae, Teiidae, and Tropiduridae. Tropidurids acted as predators in 62 records (49%), followed by Teiidae with 26 records (20%), Leiosauridae, Scincidae, and Liolaemidae with nine records each (7%), Gekkonidae six records (5%), Phyllodactylidae three records (2%), Gymnophthalmidae two records (2%), and Diploglossidae with one record (1%) (Fig. 1a). The family Tropiduridae was also the most representated in terms of number of predator species in saurophagy records (32%; 15 species) (Fig. 1c). A total of 61 lizard species were recorded as prey items, belonging to 11 families: Tropiduridae (26%; n = 33), Teiidae (16%; n = 21), Liolaemidae (15%; n = 20), Gekkonidae (13%; n = 16), Scincidae (10; n = 13), Gymnophthalmidae (7%; n = 9), Phyllodactylidae (6%; n = 8), Sphaerodactylidae (3%; n = 3), Anolidae (2%; n = 2), Leiosauridae and Diploglossidae (1%; n = 1) (Fig. 1b,d).

Fig. 1
figure 1

Number and relative frequency (percentage) of records of saurophagy by lizard families (a,b) and number of species by family involved in records of saurophagy (c,d) as predator (a,c) or prey (b,d) in South America. A total of 47 species were recorded as predators and 61 species were recorded as prey items.

The number of nodes in the trophic network, representing distinct lizard families or groups, was 40, connected by 49 edges, which represent the feeding interactions between them. The modularity of the network was 0.451. Among the lizard families, Tropiduridae exhibited the highest number of interactions (39.6%; 19 links), followed by Teiidae (29.2%; 14 links), Scincidae (12.5%; 6 links), Leiosauridae (6.2%; 3 links), Gymnophthalmidae (4.2%; 2 links), and several other families (Diploglossidae, Gekkonidae, Liolaemidae, and Phyllodactylidae) with a single interaction each (2.1%; 1 link per family) (Fig. 2).

Fig. 2
figure 2

Trophic network of each family and which genera they interact with. The thickness of the lines represents the number of records for each specific family-genus interaction (interaction force). Colors represent each family. Figure produced using Gephi (http://gephi.org).

Among the records, 63% (n = 80) were interspecific (predation) interactions and 37% (n = 47) were intraspecific (cannibalism) (Supplementary Fig. S5a). The family Tropiduridae had the highest number of cannibalism records (51%; n = 24), followed by Liolaemidae, Gekkonidae, and Teiidae (13%; n = 6), Scincidae (4%; n = 2), and Leiosauridae, Diploglossidae, and Gymnophthalmidae (2%; n = 1 each) (Supplementary Fig. S5b). Juvenile lizards were prey in 53% (n = 38) of predation records and 88% (n = 38) of cannibalism records (Supplementary Fig. S5c,d).

Relationship and proportion of predator and prey body size

Predatory lizards had a mean size (SVL) of 86.41 mm, median of 81.15 mm, maximum size of 250 mm, and minimum of 35 mm. While prey lizards had a mean of 38.56 mm, median of 33.40 mm, maximum of 110 mm, and minimum of 5.70 mm. The general linear regression (Table 1) showed a positive relationship between predator and prey body sizes (n = 68, y = 43 + 1.1x, r2 = 0.58, p ≤ 0.001; Fig. 3a), with larger predators capturing larger prey.

Table 1 Generalized linear model (GLM) values involving predator and prey body sizes.
Fig. 3
figure 3

Relationship between predator and prey body size represented by a linear regression (a) and variance of the prey body size proportion in relation to the predator’s body size (b).

The average proportional consumption was 46% of the predator’s body size, with a median of 46%, a maximum of 69%, and a minimum of 0.7% (Fig. 3b). The ANOVA test indicated that there is no significant difference among families in the variance of consumption proportional to predator body size (F = 1.454; p = 0.194).

Environmental and geographic correlates of saurophagy occurrence

Of 127 saurophagy records, 107 (84%) occurred in open formations and 20 (16%) in forest formations (Supplementary Table S1) (Fig. 4a,b). A heatmap based on kernel density estimation indicated that the Caatinga and Fernando de Noronha Archipelago of NE Brazil, the coastal sand dunes of SE Brazil, the Mediterranean vegetation between Chile and Argentina, and the insular xeric scrub of the Galápagos, are where saurophagy is more likely to occur (Fig. 4c).

Fig. 4
figure 4

Records of saurophagy in open and forest phytophysiongnomies in South America (a), number and percentage of records in each type of formation (b), and heatmap of Kernel density estimation for saurophagy records (c). Both maps in the figure were produced using Qgis (as described in the in Methods section).

The SEM’s model evaluation index indicated that the model was appropriate (comparative fit index; CFI = 0.986; RMSEA = 0.072; SRMR = 0.065). In this model, evapotranspiration (β = 0.49; p = 0.02), Bio3 (isothermality; β =  0.43; p = 0.05), and longitude (β = 0.43; p = 0.001) were the environmental and geographic predictors significantly and directly related to the frequency of saurophagy (Fig. 5: Supplementary Fig. S6).

Fig. 5
figure 5

Structural Equation Model (SEM) summarization of geographic and bioclimatic variable effects on the variation of the frequency of saurophagy records in South America. Lat = latitude; Lon = longitude; Bio3 = isothermality; Bio5 = max temperature of warmest month; Bio15 = precipitation seasonality; Bio18 = precipitation of warmest quarter; ET = evapotranspiration; PP = primary productivity. The arrow thickness represents the effect size of predictor variables. Solid arrows represent significant effects (values in bold) and dotted arrows represent no significant effect. Red arrows represent negative associations and blue arrows represent positive associations. β = standardized regression coefficients. Saurophagy illustration by: Lucas Rosado.

Predictors of predator decision-making

The RF model showed an error rate of 19% for the training set and was able to predict/classify the type of interaction with an accuracy of 82%, sensitivity of 83%, and specificity of 81%. The most important predictors were predator family, followed by prey size and predator size (according to MDA) and predator family, followed by prey habitat use and prey size (according to Gini) (Fig. 6). The Shapley values to these predictors were: predator family (phi = 0.31; var.phi = 0.26), prey habitat use (phi = 0.15; var.phi = 0.17), prey size (phi = 0.06; var.phi = 0.06), predator size (phi = 0.02; var.phi = 0.02) (Supplementary Fig. S7).

Fig. 6
figure 6

Variance importance for predicting the predator choice based on the Mean Decrease in Accuracy (a) and Mean Decrease in Gini (b) metrics from a random forest model.

Discussion

We found that saurophagy occurs in nine of 14 families of South American lizards (see43,44,45,46,47), but that it is more common in a few of them. This behavior was not recorded in Iguanidae, Hoplocercidae, Polychrotidae, Sphaerodactylidae, and Anguidae. Juveniles and non-conspecifics are prey in most cases. We also found a positive relationship between predator and prey body size. This behavior was more common in open and dry biomes and was partially explained by instability in temperature (low isothermality) and high evapotranspiration. We also demonstrated that, although saurophagy is an opportunistic behavior, predators seem to not choose prey non-randomly, being probably influenced by predator family characteristics, prey body size, and prey habitat use.

Linear regression demonstrated a positive relationship between predator and prey body sizes. In lizards, this relationship could be explained by bite force capacity, which is influenced by the relationship between body and head size48,49. Although large species eat larger prey, we did not find significant differences in predator–prey body size proportion between families (on average, predators invested in prey with 46% of their body size). Our findings on this matter show a trend opposite to that of anurophagy. Recently, it was demonstrated that smaller frog species can consume proportionally larger frogs6. Unlike amphibians, lizards have higher movement and displacement rates, as they possess physiological adaptations that help prevent excessive dehydration50,51, which may result in more opportunities to find food items that are not too large to hinder digestion, yet not too small to be insufficiently energetic. Additionally, the presence of a tail may further limit predation in larger lizards, as it increases the overall size of the prey, making it harder to consume it whole. Moreover, the proportional gape size relative to body size in lizards may be smaller compared to generalist frogs, where prey consumption can be closer to the predator’s own body size52. This may restrict lizards to consuming prey that are proportionally smaller relative to their body size. Future studies are needed to clarify these issues and to better understand how body size influences proportional prey consumption in lizards.

We found that most of the saurophagy records in South America have been made in open phytophysiognomies, and the frequency of this behavior is influenced by isothermality longitudinally. Isothermality is a highly important bioclimatic variable for ectothermic animals, being partially responsible for the distribution patterns of many reptiles53,54,55. In general, the stability of environmental temperature promotes more predictable activity patterns and more suitable conditions for growing and developing in ectothermic species56,57,58, especially because stable environments provide more stable food availability59,60,61. Here, we observed that isothermality was inversely related to saurophagy occurrence, which could mean that this behavior is more frequent in more unstable environments, especially with respect to night–day temperature oscillations.

Although we did not find significant influence of the variable primary productivity (used as proxy of prey availability62) on the saurophagy occurrence, we found a positive effect of the evapotranspiration longitudinally on this behavior. Open environments tend to be less isothermic and less humid due to their sparse vegetation structure and high evapotranspiration, which results from lower precipitation rates and greater exposure to UV radiation8. In regions where predominate open environments, lizards often experience reduced water availability and lower prey diversity during certain times of the year63, as the reproductive cycles of some arthropods they consume are closely tied to precipitation patterns64. In some snake species, warmer habitats promote increased movement and prolonged activity65, leading to heightened predation pressure66. Despite being phylogenetically closely related to snakes, lizards respond differently to high temperatures by reducing exploratory behavior and seeking refuge in suitable shelters67. As a result, in open environments and under high evapotranspiration, lizards likely focus on prey that is easier to locate, as closely related species that utilize similar landscape features for shelter. In fact, our trophic network demonstrated that most predators prioritized prey from the same family or genus, which are usually syntopic.

The random forest model demonstrated that the predator family was the most important predictor of the predator decision-making. This was expected, since Tropiduridae and Teiidae were the most common families in saurophagy events. Among the families with saurophagy records in the literature, these two are the most abundant in open habitats68,69, where we found most of the saurophagy records. Prey body size was another important predictor of saurophagy, which corroborates our finding that small species and juveniles are the main targets, especially in cannibalism events as we saw by the greater percentage of cases involving juveniles as prey. Juvenile individuals spend more time foraging compared to adults, maximizing energy intake based on their size and prey availability70,71. Consequently, encounter rates with juveniles may be higher since they are more active than adults. Moreover, juveniles have less defense29,72. Another predictor evidenced was prey habitat use, which was already observed in a previous study investigating autotomy as a proxy of predation pressure in snakes and amphisbaenian species in South America and West Africa73. A possible explanation is based on the exposure level of prey to predators according to the substrate structure (see7)—warm habitats are usually more susceptible to predation pressure66, what also corroborates the higher occurrence of saurophagy in open environments.

The genus Tropidurus was the most frequent in our records, especially in terms of cannibalism. The diet of these sit-and-wait predators74 is mainly composed of arthropods and plants10,69,75,76,77,78,79, but also includes vertebrates such as amphibians78,80,81,82, mammals83,84, birds85,86, snakes87,88, and other lizards77,89,90,91,92,93. Tropidurus males often fight each other for territory and females, especially adults and juveniles72,94,95. Cannibalism is also very frequent in this genus2,72,96, perhaps due to their propensity for territorial combat encounters.

Among our saurophagy records, cannibalistic behavior seems to be particularly more common in generalist lizard species97. The high number of cannibalism records indicates that this type of interaction is not only common but also important to the ecology of many species, as it strongly influences population dynamics98, for example, by reducing the number of potential competitors99. Several eco-evolutionary factors result from cannibalism, including population density self-regulation100,101, a reduction in genetic variability within populations102, and increased risk of parasite infections103,104.

We could not account for every possible factor influencing the response variable. For example, one unstudied factor that may influence the spatial pattern of saurophagy is the diversity of lizard species in a given area6. Since areas closer to the Equator have greater diversity, this raises the question: Are areas with greater lizard diversity more prone to occurrences of saurophagy? Future studies are needed to determine whether high diversity within specific groups is a reliable predictor of predator–prey interactions within those same groups. Another factor could be the possible influence of the distribution of urban centers, number of specialists, and local research institutions that drive the publication of such data105. However, we focused on accounting for natural biotic and abiotic predictors independently of human influence. Furthermore, we believe that the distribution of urban centers could not explain the patterns of saurophagy occurrence that we found, as the biggest cities in territory and demography in all South America are in the Southeast region of Brazil (specifically, São Paulo state and city), where we found very few records compared to other regions.

Conclusion

Our study demonstrates that, despite numerous records, saurophagy remains a rare event in South American lizard assemblages, especially considering the large geographic scale and long sampling period of the present study. However, in open landscapes, this behavior becomes less rare due to higher temperatures, lower precipitation, and increased evapotranspiration rates, which can reduce food availability for most of the year. Key factors influencing predator decision-making include predator family, prey size, and prey habitat use. Our findings highlight potential drivers of saurophagy that can serve as a foundation for hypothesis testing in other regions, contributing to a better understanding of predator–prey interactions and their evolutionary and ecological implications. Given the ongoing climate change and increasing aridification, future studies on lizard social interactions and life history traits could improve predictions of dietary patterns and their deviations. Since climate change can affect invertebrate prey availability40, its potential role in intensifying saurophagy, especially cannibalism, should not be overlooked. Lastly, we emphasize the value of scientific collections, as many of the studies reviewed here were based on specimens from these invaluable resources.

Methods

Information sources and search strategy

We performed an extensive search for saurophagy records in South America until August 2024 in all volumes of Herpetological Review and on the main search platforms: Scopus, Scielo, ResearchGate, ScienceDirect, Web of Science, JSTOR, Directory of Open Access Journals & Articles (DOAJ), ScienceResearch, WorldWideScience, Springer, Science.gov, and Google Scholar. We used an association of 15 terms, in English, Portuguese, and Spanish, to retrieve all the terms listed in a single document and also separated into different documents: lizard or lagarto or lagartija or saurophagy or saurofagia or South America or América do Sul or América del Sur or predation or predação or predación or cannibalism or canibalismo or diet or dieta. We also used the “snowball” method, i.e., searching for records in the references of articles obtained in the research. We did not consider records of undergraduate or master’s dissertations, or theses or abstracts presented in congresses or symposiums. For those publications which reported more than one event, we considered each record separately. We considered predation events recorded in all their phases106,107.

From each record, we collected the following information: both predator and prey species name and body size, prey age, and name and coordinates of the locality (Supplementary Table S1). We completed our dataset with more information about environmental qualitative classification of each locality (macrohabitat). We made a macrohabitat classification of those biomes as “open” areas for those characterized by dry climate (high temperatures and low precipitation rates), and sparse and low vegetation, and “forest” areas for those composed by wet climate and dense vegetation cover. We conducted our macrohabitat classification based on the Dinerstein et al.108 biome classification (Supplementary Table S1). We also added information about the species (predator and prey taxonomic family, predator foraging mode, predator and prey body size, and habitat use type of the prey), syntopy (habitat overlapping in the main microhabitat type used by the predator and prey species), and type of interaction during the recorded event (inter or intraspecific) to our dataset. For qualitative classification of body size, we considered species with snout-vent length ≤ 60 mm as “small”, ≥ 61 and ≤ 200 mm as “medium”, and ≥ 200 as “large” based on the taxonomic literature of the species—the literature we gathered, also species description and ecology papers, uses this categorization when characterizing species or species of the same genus. We also used the literature to classify foraging mode (Supplementary Table S2).

Data analyses

Descriptive analyses and statistic tests

We calculated the number and percentage of records in different categories: predator and prey family, prey age, type of interaction and macrohabitat. To assess patterns of macrohabitat distribution, we plotted all the records gathered in a Qgis map109. After that, we applied the Kernel Density tool to identify hotspots of saurophagy occurrence. To graphically represent the predator–prey relationship among lizard families and genera, we created a trophic network among species and plotted it using the software Gephi110. The structure of such networks can be analyzed using modularity, a metric that quantifies the extent to which the network can be divided into distinct modules or communities. Since modularity identifies clusters where interactions are denser within than between modules, it provides insight into how feeding interactions are distributed among lizard families, helping to determine whether certain families tend to share similar prey types or occupy distinct trophic niches. To assess the effect of body length (SVL) between prey and predators, we performed a generalized linear model (GLM) analysis using discrete data on the snout–vent length of predators and prey available in the literature. Due to the data distribution, a Poisson family GLM with a logarithmic link function was applied. To analyze the difference in the proportional consumption of prey in relation to the predator’s body size (prey SVL/predator SVL), we performed an analysis of variance (ANOVA). We plotted values and calculated the probability density of paired data (predator and prey sizes) using the ‘ggplot2’111, ‘car’112, ‘tydiverse’113, and ‘viridis’114 R packages implemented in R version 3.5.1115.

Abiotic predictors of saurophagy

To investigate possible influence of environmental and geographic features on the occurrence of saurophagy, we mapped all saurophagy records over 59 grid cells with 2° grain throughout South America, including oceanic islands. We used that grain size to create enough variation between the grid cells (each one had between 1 and 9 records) (Supplementary Fig. S3). While creating that grid we also produced a presence-absence matrix using the function ‘lets.presab.points’ from the ‘letsR’ R package116. This matrix contained grid cell identification, centroid geographical coordinates of each grid cell, and the records of each grid cell. We also used the ‘rgdal’117, ‘sp’118, ‘maps’119, and ‘terra’120 R packages in those steps. Using those centroid geographical coordinates, we extracted values of 19 bioclimatic variables (corresponding to the period between 1970 and 2000, 10 min resolution ~ 340 km2 from WorldClim 2.1 (Fick and121) using the ‘raster’122, ‘sp’118, and ‘rgeos’123 R packages and applying the functions ‘extent’, ‘crop’, and ‘extract’.

Those bioclimatic variables represent variations in temperature (Bio1–Bio11) and precipitation (Bio12–Bio19). After that, we eliminated collinearity between the set of bioclimatic variables using the functions ‘colindiag’ and ‘findCorrelation’ from the R packages ‘metan’124 and ‘caret’125, respectively. Hereafter we used only Bio3 (Isothermality), Bio5 (Max Temperature of Warmest Month), Bio15 (Precipitation Seasonality), and Bio18 (Precipitation of Driest Quarter). The first function computes collinearity diagnostics, such as variance inflation factors (VIF), eigenvalues, and condition indices and, the second function, identifies and removes highly correlated variables based on a specified correlation threshold (We defined 0.7), helping to reduce redundancy in predictors and improving model stability.

We also took values from MODIS VI evapotranspiration (MOD16A2: MODIS Global Terrestrial Evapotranspiration 8-Day Global 1 km) and primary productivity (MYD17A2H.061: Aqua Gross Primary Productivity 8-Day Global 500 m) rasters using Qgis. Those rasters were downloaded from Google Earth Engine (https://earthengine.google.com/)126. Then, we created a dataset containing grid cell number and their corresponding latitude, longitude, number of records, and bioclimatic variables (Supplementary Table S4). We used this dataset to perform Structural Equation Modeling (SEM) running the ‘lavaan’ 0.6.16 R package127,128, using the number of records as the response variable and Bio3, Bio5, Bio15, Bio18, evapotranspiration, and primary productivity as predictor variables. These bioclimatic variables and evapotranspiration affect the microclimate and habitat variation at the local scale resulting in physiological consequences for ectotherms, such as body water loss129 and physical performance changes due to dehydration130. The primary productivity could indirectly represent prey availability, affecting ecological interactions in food web dynamics62. With SEM we were able to verify the direct and indirect effects among all variables simultaneously through regression models.

Biotic predictors for the predator decision-making

We built a Random Forest (RF) classifier131,132 to identify important predictors of the predator choice before catching its prey (that is, what predator and prey characteristics are important for an individual to decide whether to prey on a conspecific or an individual of another species). RF produces an ensemble of decision trees from a bootstrap set built from an initial training data set (usually 70% of the observations). Those remaining observations (usually 30%) compose the test group, treated as observations needed to be predicted from the previous final prediction using the training group133. It is a high accuracy method even when applied on small sample sizes with high-dimensional feature spaces, and is also easily parallelizable, thus, it is ideal for use with natural systems134. We used the type of interaction (two classifications: interspecific and cannibalism) as the response (dependent) variable. As predictor (independent) variables, we used eight categorical variables: predator family (considering the possibility of phylogenetic influence), predator foraging mode (the movement rate could affect the attack performance), predator size (large organisms usually require more energy and are also capable to capture more item variety), prey age (juveniles could be easier to capture because of their lower experience in escaping and combat), prey size (small lizards could be easier to capture, hold, swallow, and digest), prey habitat use (some microhabitats could influence the detection and capture of prey by predator), macrohabitat (saurophagy occurrence could be different between open and forest biomes), and syntopy (the predator could more easily capture prey if both share microhabitat).

Although the RF approach does not specifically answer why an individual preys on another lizard (perhaps because the response lies in abiotic factors), it helps us to understand how this decision is made. To apply RF, we first defined the training and test groups and set k-fold cross-validation (method = cv; number = 10) using the function ‘trainControl’. After that, we looked for the best mtry (subset of variables randomly sampled as candidate predictors for splitting at each node of each tree) and ntree (number of trees to be built) to our dataset using the function ‘train’ from the ‘caret’ R package125. We then performed the function ‘randomForest’ from the ‘randomForest’ R package135 using the suggested parameters mtry = 5 and ntree = 1000. Finally, we ranked the importance of each predictor based on the Mean Decrease in Accuracy (MDA)131 and Mean Decrease in Gini (GINI) metrics132 and Shapley values136.