Introduction

Pavement design is the process of developing plans, materials, and construction techniques to create economical and durable structures capable of withstanding the travelling load for a given period1,2,3. Elements of a flexible pavement system are as follows: surface, base, sub-base and subgrade are the common levels of flexible pavement4,5,6. These layers support the pavement structure and take the static and repeated loads originating from automobiles. Engineers properly choose the materials and provide the correct thickness of the layers of the pavement system to manage and sustain these loads throughout the useful life of the pavement, accommodating functions like traffic loads, types of vehicles, and environmental factors. All pavements have a base that contains a subgrade composed of one or more types of soils. The distribution of loads across various layers leads to deformation, compression, and distortion of subgrade soils 7. During the initial phase of vehicle traffic loading, the subgrade soil structure undergoes a shift in strain, measured in terms of deviator stress (δd). There is a progressive reduction in plastic strain, with the rate of increase in plastic strain declining and almost stagnating with successive stress cycles. Recoverable axial strain represents the resilient modulus (MR), or the elastic modulus under repeated loading processes, which can be determined from the applied deviator stress and recoverable axial strain8,9,10.

The MR is defined as the ratio of the recoverable resilient (elastic) strain (r) to the maximal cyclic stress (σcyc) under repeated dynamic loading:

$${{\varvec{M}}}_{{\varvec{R}}}=\frac{{{\varvec{\sigma}}}_{{\varvec{c}}{\varvec{y}}{\varvec{c}}}}{{{\varvec{\epsilon}}}_{{\varvec{r}}}}$$
(1)

In pavement engineering, MR is a critical metric for characterizing the elastic characteristics of different types of soil. For structural analysis and pavement design, the resilient modulus of subgrade soils is essential, as it is taken into account by both the AASHTO pavement design technique11 and the Mechanistic-Empirical Pavement Design Guide (MEPDG)12. Studies have demonstrated that MR significantly affects the design, especially concerning the base course and asphalt layer thickness13. There are two main approaches to determine the MR of subgrade soils: from in situ test equipment and rapid load testing (RLT) and subsequent re-analysis of consolidated- undrained triaxle tests performed on specimens taken from the site. To mimic field conditions, the RLT test imposes varying loads of confining and deviator stresses on either remoulded or undisturbed specimen samples14,15,16. This approach is reliable and provides accurate results in specimen identification, but it is time-consuming and demands the use of skilled and experienced workers and enhanced laboratory instruments. As a result, many state agencies have been reluctant to venture into the use of RLT testing and adopted some other methods17,18,19. In recent decades, in situ test devices have been proposed and have been employed as such as herein below. However, this method calls for accurate relationships between the measurements of field tests and that of laboratory tests initially, and this is still in its infancy13.

Recent advances in machine learning have introduced new methodologies for predicting MR20,21,22,23,24. Machine learning techniques can recognize complex correlations, patterns, and trends in data, offering improved interpretability, more precise predictions, and a deeper understanding of underlying events25,26,27. These capabilities make machine learning a promising method for capturing the complex interactions found in MR. Despite this potential, ML approaches are not widely applied for forecasting MR values in pavement subgrade soils28,29,30. For example, a neuro-fuzzy adaptive interface system (ANFIS) was utilized by Sadrossadat et al.31 in order to accurately estimate MR values in subgrade soils for flexible pavements. Kim et al.32 developed an advanced ANN model to estimate the MR of subgrade soils by incorporating essential physical soil characteristics and stress conditions as input features. Sarkhani Benemaran et al.33 and He et al.34 used ensemble methods for anticipation MR. However, these methods are considered as "black box" design, which can limit its accuracy and transparency35. To overcome these limitations, techniques such as (GEP) and Multi-Expression Programming (MEP) are used36,37,38.

The GEP extension of GP employs fixed-length segments for encoding compact programs39,40,41,42. GEP offers practical usability in addition to precise empirical equations. GEP's use of symbolic concepts to simplify physical processes has earned it the moniker "grey box model"43,44,45,46. Consequently, GEP-based models are believed to perform better than ANN models based on neural networks47. In order to forecast the compressive strength (CS) and tensile strength (TS) of concrete containing silica fume, Nafees et al.48 used MLPNN, GEP, and ANFIS models. CS and TS used databases with 283 and 145 data points, respectively. The outcomes showed that all three models demonstrated a high degree of prediction accuracy. But when it came to generating much higher R2 values than the other approaches, GEP models proved to be superior. GEP has several shortcomings, including complicated and lengthy expressions, high computing needs, laborious hyperparameters tuning, overfitting issues, restricted scalability, and no guarantee of optimality, even if it excels at solving complex, nonlinear problems. However, an innovative approach known as MEP has been created to get around these restrictions. Compared to other evolutionary algorithms, MEP is a more advanced iteration of GP that can produce correct results even in cases when the target's complexity is unknown49. One distinctive characteristic of MEP is its ability to encode several equations (or chromosomes) in a single program. The final representation of the problem is chosen from the best-performing chromosome50. Despite its obvious advantages over other evolutionary algorithms, MEP is still comparatively underutilized in civil engineering. Even in the presence of intricate interactions, MEP exhibits promise in capturing nonlinearities and providing trustworthy forecasts. Alavi et al.51, for example, used MEP to identify the type of soil by considering variables such as soil colour, plastic limit, and liquid limit between sand, fine-grained, and gravel particles. Furthermore, in a number of investigations52, MEP has been used to forecast the elastic modulus of both normal- and high-strength concrete53,54,55,56.

During the evaluation and modelling stages, literature databases are often utilized by both MEP and GEP approaches. In the field of sustainable concrete, where linear GP techniques like MEP and GEP have proven to be more effective in predicting different concrete qualities, their application has grown dramatically57,58,59. The advantages of merging MEP with linear genetic programming (LGP) over alternative neural network-based techniques were emphasized by Grosan and Abraham60. Notably, compared to MEP, the working mechanism of GEP is more complex and advanced. MEP and GEP differ in several key aspects. MEP offers greater flexibility in depicting solutions due to its less dense encoding50. This method allows for code reuse, leading to concise and efficient solution representations. Additionally, MEP allows non-coding sections to be located anywhere on the chromosomes, which enhances the flexibility in encoding genetic information61,62,63. Additionally, by explicitly encoding references to function arguments, MEP improves the transparency and interpretability of models. By utilizing the conventional GEP chromosomal architecture, MEP is able to encode software applications with syntactically sound structures since it has distinct head and tail segments with accurate symbols64. Because of these unique qualities, MEP is a reliable and adaptable approach for precisely modelling and resolving complicated issues in a range of settings. Considering the advantages of both GEP and MEP it is important to compare the outcomes of both techniques and propose the most appropriate one.

Although various applications of GEP exist in geotechnical engineering, GEP and MEP have not been used to predict the MR of subgrade soils. This study aims to utilize GEP and MEP for MR prediction and compare them with ANN. Additionally, various statistical measures were employed to validate the models. Sensitivity and SHAP (SHapley Additive exPlanations) analysis were performed to evaluate the effect of inputs on MR. The models' outcomes were also compared with literature to ensure robustness and accuracy. A comparative analysis was performed to provide insight into the effectiveness and accuracy of these techniques, enhancing the understanding and forecasting of MR in pavement subgrade soils. This innovative approach represents a significant advancement in pavement design, offering more accurate and reliable methods for predicting the resilient modulus of subgrade soils.

Data collection

A database containing 2813 data values from twelve compacted subgrade soils (classified as CL, CH, and CL-ML (USCS) or A-4, A-6, and A-7-6 (AASHTO)) was collected for this study. The information was gathered from earlier investigations by scientists, including Rahman et al.65, Ding et al.66, Ren et al.67, Zou et al.68, and Solanki et al.69. Important factors taken into account for MR prediction were moisture content (MC in %), confinement stress (δc in kPa), deviator stress (δd in kPa), number of freeze–thaw cycles (NFT), dry density (γd in kN/m3), and weighted plasticity index (wPI). These factors were selected after a careful analysis of the literature and suggestions from authorities in the field 70,71,72,73.

Initially, the links between MR and the previously identified major input variables (γd, wPI, δd, δc, NFT, and MC) are examined in this section. Confinement stress (δc) and deviator stress (δd) affect MR through their influence on soil compaction, shear strength, and deformation behavior. The number of freeze–thaw cycles (NFT) reflects seasonal durability challenges in pavements, impacting soil resilience and potential damage mechanisms. Dry density (γd) governs soil compactness and load-bearing capacity, crucial for pavement stability and deformation resistance. Lastly, the weighted plasticity index (wPI) contributes to soil compressibility and its role in predicting pavement performance. With a correlation coefficient of − 0.35, MC has the highest association with MR of all these variables. The correlation coefficients between MR and the other variables can be seen as: wPI = − 0.025, γd = − 0.27, δd = − 0.114, δc = − 0.033, and NFT = − 0.324 as shown in Table 1.

Table 1 Correlation measures for database.

Tables 2 and 3 provide detailed statistical summaries of the input parameters for training and testing sets, highlighting their distribution characteristics. Skewness and kurtosis measurements were employed to assess the symmetry and distribution patterns relative to a normal distribution. Skewness indicates the degree of asymmetry in the data distribution, while kurtosis measures its tailedness. For robust predictive models, the data distribution's adherence to suggested ranges (skewness close to 0, kurtosis between − 10 and + 10) is crucial, ensuring reliable statistical analysis74. The data points in this study are suitably dispersed throughout the whole range, confirming the robustness of the model, since the input variables for skewness and kurtosis fall within the advised limits. To visualize the relationships between input variables and MR, contour maps (as shown in Fig. 1) and frequency distribution plots (as shown in Fig. 2) were utilized. These tools depict data density and the range of data distribution, confirming the randomness and dispersion of data points across the entire spectrum. Such analyses are essential in machine learning (ML) applications to validate data integrity and model reliability. The thorough examination of these input parameters confirms the reliability and importance of selected parameters in predicting MR.

Table 2 Statistical summary for training set.
Table 3 Statistical summary for the testing set.
Figure 1
figure 1

Contour plots for inputs against MR.

Figure 2
figure 2

Frequency distribution plots for inputs.

Methodology

This section provides a comprehensive overview of the methodologies employed in GEP and MEP for predicting the MR of compacted subgrade soil. MEP models were developed using the MEPx program, leveraging its specialized capabilities in genetic programming. On the other hand, GEP models were constructed utilizing the Genexpro tool, renowned for its robust features in evolutionary computation (Supplementary information)75. These tools enable precise modeling and prediction of material properties through advanced genetic programming techniques tailored specifically for the challenges posed by subgrade soil analysis75.

Artificial neural networks (ANN)

Most widely used computational techniques called artificial neural networks (ANNs) are modelled after the information processing techniques found in the human brain76,77,78. Instead of using explicit programming, they acquire knowledge by recognizing patterns and correlations in data through experience. An ANN is made up of many processing elements (PE), or artificial neurons, coupled by coefficients (weights). The neural structure is made up of layers of these neurons. A transfer function, a single output, and weighted inputs are present in every processing element (PE). The learning rule, the general design, and the transfer functions of the neurons in the neural network all affect how the network behaves79,80,81. The neural network is a parameterized system because of the weights, the changeable parameters. The activation of the neuron is formed by the weighted sum of its inputs, and its output is obtained by passing it through the transfer function. The network gains non-linearity from the transfer function, which improves its capacity to represent intricate patterns. In order to minimize prediction errors and reach a predetermined accuracy level, a neural network's inter-unit connections are optimized during training. Following testing and training, the network is able to forecast outputs from fresh input data. There are many different kinds of neural networks, and new ones are always being created. The transfer functions of neurons, the learning rule they adhere to, and their connection formulae are characteristics all neural networks share, despite their variations82. ANNs first presented by McCulloch et al.83, are employed to classify and predict non-linear regression problems with high efficiency. Layers of neurons, which are fundamental processing units, make up neural networks. Every neuron in one layer is linked to every other layer's neuron. As seen in Fig. 3a, the computational structure is organized hierarchically and is composed of at least three layers: the input layer (input neurons), one or more computational layers (hidden layers), and the output layer. The variables that are used to train and assess the model are sent out to the input layer. The input and output layers are connected by the computational (hidden) layers, which process the data before delivering it to the output layer, which produces the results of the model84. The literature has extensive documentation of a number of widely used AFs that improve the effectiveness of ANN models85. ANNs often use logistic sigmoid, linear, and hyperbolic tangent sigmoid functions as activation functions86. The linear transfer function (PURELIN) and the BPNN transfer function (TRANSIG) were used in this investigation. During the training phase, these functions improve statistical performance and increase the number of neurons; but, during the testing and validation stages, they impair performance accuracy87. When data propagates from the input neurons into the network, an ANN is trained. To minimize output error, weights are modified according to specified criteria. Using a different dataset set aside for testing, the model is assessed and validated following training.

Figure 3
figure 3

(a) ANN architecture, (b) GEP architecture and (c) MEP architecture.

Genetic expression programming (GEP)

Genetic algorithms (GA) employ strings of fixed length. Koza proposed an alternate approach called genetic programming (GP) in addition to extending GA88,89,90. GP has an effective machine-learning technique thanks to the addition of a spatial parser structure. Nonetheless, only tree crossover is really employed of the three genetic operators found in GP, which results in a massive population of parse trees36. In order to decide which program to end and eliminate underperforming trees, GEP employs a selection mechanism during reproduction91. The GEP algorithm includes a mutation mechanism that is specifically designed to avoid premature convergence and preserve a wide range of genetic variations. The termination criteria in GEP include components such as the terminating set, fitness metrics, outcome identification mechanism, run control parameters, and basic functions. This strategy improves the selection process by eliminating programmes with the lowest fitness during reproduction92 During execution, the trees that are least suitable are removed, and the surviving trees repopulate the population using the selection technique. This evolutionary process ensures the model's initial convergence, guaranteeing continuous progress toward superior solutions92. There are five key components to the GP approach that are necessary for both modelling and research applications. First, fitness evaluation serves as the foundation for evaluating prospective solutions' performance in relation to predetermined standards. By ensuring that only the best ideas advance, this method improves the model's accuracy and efficacy in resolving challenging issues in a variety of fields.

Second, the essential elements that control the modifications and alterations made to the genetic material during the evolutionary process are known as basic domain functions. The diversity and complexity of solutions produced by the GP algorithm are influenced by these functions, which specify how genetic programs change and adapt over the course of multiple generations. These, along with fitness evaluation, constitute the fundamental principles that propel the optimization process and enable GP to successfully address problems in a variety of research contexts, from modelling and classification to optimization and prediction93. Extensive parse trees are produced by a crossover genetic processor, whereas GP builds models automatically. Complex expressions are needed to create non-linear generalized phenotypes 54 to serve the twin roles of genotype and phenotype. This combination makes sure that complicated relationships in the data may be explored and represented by the GP model. The GP algorithm's disregard for neutral genomes is one of its drawbacks. It is difficult to develop widely recognized and simple empirical equations for GP because both the genotype and the phenotype demand a non-linear structure. To address and lessen these inconsistencies, Ferreira developed a new form of GP called the GEP technique52,94,95. The fixed-length linear chromosomes of GA and the parse trees of GP are combined to form GEP. One notable modification to GEP is that, since modifications happen in a simple linear fashion, only the genome is handed on to the following generation, negating the need to reproduce the complete structure. The development of a model with a single chromosome made up of many genes divided into head and tail parts is another noteworthy characteristic96,97,98. In a GEP model, every gene has mathematical operators, a terminating function, and a fixed parametric length. The genetic code operator in GEP has a stable connection between the terminals and the chromosomes. The chromosomes contain the data needed to build the GEP model, and a novel language known as "karva" has been created to understand this information. Phenotypes can be predicted from genetic sequences using the GEP technique, which entails tracing nodes from the root to the deepest layer in order to translate expression trees (ETs) into Karva expressions (K-expression). The various ETs employed in GEP have an impact on this conversion process and may contain duplicated sequences that are not necessary for genome mapping99. This could lead to length disparities between K-expression and GEP genes. In order to assess each person's fitness, fixed-length chromosomes, or ETs, are randomly created at the beginning of the GEP process. The process involves numerous generations of selection and reproduction among potential people in an iterative cycle of reproduction and selection that continues until optimal solutions are reached as shown in Fig. 3b. To hasten the population's evolution, genetic processes including crossover, mutation, and reproduction are used.

Multi expression programming (MEP)

Phenotypes can be predicted from genetic sequences using the GEP technique, which entails tracing nodes from the root to the deepest layer in order to translate expression trees (ETs) into Karva expressions (K-expression). The various ETs employed in GEP have an impact on this conversion process and may contain duplicated sequences that are not necessary for genome mapping99. In order to assess each person's fitness, fixed-length chromosomes, or ETs, are randomly created at the beginning of the GEP process. The process involves numerous generations of selection and reproduction among potential people in an iterative cycle of reproduction and selection that continues until optimal solutions are reached. There are various ways that MEP and GEP vary. Because it is less densely encoded, solutions can be represented with greater flexibility100. Compact and effective solution representations are made possible by MEP's code reuse feature. Non-coding sections in MEP can also exist at any location along the chromosomes, providing further flexibility in the encoding of genetic information. MEP encodes references to function arguments clearly, improving model interpretability and transparency. It follows the traditional GEP chromosomal structure, with unique head and tail segments with accurate symbols that efficiently encode software programs with structures that make sense syntactically. These unique qualities establish MEP as a potent and adaptable technique for precise modelling and resolving difficult problems in a variety of fields. Similar to its GEP counterpart, the MEP model allows for the optimization of its performance by permitting alterations to a number of important factors, providing a high degree of flexibility101. These characteristics, which influence how well MEP performs in various applications, include the function set, crossover probability, subpopulation size, and code length. For example, MEP's capacity to manage computational efficiency and assessment complexity is highly influenced by the size and variety of its subpopulations, particularly when working with big datasets and varied programmed structures. Moreover, the MEP model's performance on a variety of tasks is similarly influenced by the algorithmic code length and the probability of crossover events101. The interpretability and computational demand of the model are directly impacted by the code length, which also has an impact on the length and complexity of the mathematical expressions produced by MEP. The architecture of MEP is shown in Fig. 3c.

Model structures

The key to creating a strong artificial intelligence prediction model is to pick the most important input variables with attention. During this process, every input parameter that was present in the database was thoroughly analyzed. Through early runs and statistical studies, the possible impact of each variable on the MR of subgrade soil was carefully assessed. Consideration was denied to variables that showed little effect on MR and to characteristics that were infrequently covered in pertinent literature. Only the most important variables made it into the final predictive model thanks to this strict selection procedure. This procedure culminates in Eq. (2), which is the revised model for predicting the compressive strength of concrete. Based on the chosen input variables that had strong correlations with MR in the early analysis, this equation was created. The model leverages insights from both theoretical considerations and practical facts to produce reliable MR predictions by concentrating on these critical elements. In the realm of geotechnical engineering, this method not only increases the predicted accuracy of the model but also guarantees its relevance and applicability in actual situations.

$${M}_{R}\left(MPa\right)=f(wPI,\delta c,MC,\delta d,\gamma d,NFT)$$
(2)

Appropriate hyper parameter selection and optimization are crucial to building scalable and reliable GEP, ANN and MEP models. To determine the best combinations, these factors are adjusted iteratively by referring to previous research publications51. The population size is one of these factors that is very important in controlling how the programs in the model evolve. More accurate and complex models are frequently produced with larger populations. But if the population grows, so too could the computing load and time needed to find the optimal answer. Therefore, when estimating the population size, it is crucial to strike a balance between the trade-off between model complexity and available computational resources. Furthermore, overfitting a condition in which the model performs badly on fresh data and becomes extremely specialized to the training set can result from beyond a particular threshold for population size. As a whole, the reliability and scalability of the GEP, ANN and MEP models are strongly influenced by the careful selection of hyper parameters, such as population size. Through careful trial-and-error optimization of these hyper parameters and the application of prior research, a more reliable and efficient modelling framework can be created. This method guarantees that the models are efficient, accurate, and have good data generalization abilities.

GEP model

Genexpro Tools software was used to create the GEP model, which predicts the MR of subgrade soil. Genetic operator selection has a major impact on how well genetic-based modelling works. For instance, the crossover rate plays a significant role in the convergence of GEP and MEP models; larger crossover rates are frequently associated with earlier convergence and lower diversity. On the other hand, mutation rates impact exploration; too high a mutation rate may cause important data to be disrupted. The size and organization of a program are greatly influenced by the transposition operator. Model performance can be improved by extending the head, code length, and linking functions, but doing so may also make the model more complex and less efficient. Furthermore, the accuracy of the GEP and MEP models is strongly affected by modifications to other genetic operators. Previous research102 served as a guide for the initial parameter selection in this work, which aimed to determine the ideal GEP model setup. The number of chromosomes within a population is determined by its size, which is a crucial element impacting the model's performance. In addition, the quantity of genes and the size of the head section are two important factors that characterize the structure of the GEP model. In this model, the head section size is set to 10, which affects the intricacy of each generated expression, and so determines the complexity of the model. The complexity of the GEP model results from summing expression trees, or sub-ETs. For maximum performance, key parameters, including head size, population size, and gene count, need to be carefully regulated. The structure and functioning of the model are improved by four extra genes. Table 4 provides a summary of the carefully chosen hyperparameters that guarantee precision and dependability. Gene numbers and head sizes rose when connections were made using a multiplication function. The process was repeated several times in order to improve the model.

Table 4 GEP hyperparameters.

MEP model

In order to construct compact equations utilizing basic mathematical operations such as division, addition, multiplication, and subtraction, the MEP model was first deployed with ten subpopulations. In order to maximize accuracy, different crossover probability configurations between 50 and 90% were evaluated during the model's development103. These modifications were essential for perfecting the MEP models, which were created with the MEPx program, a specialized application made for Multi-Expression Programming (MEP) uses. Starting with a count of 10 and a subpopulation size of 260, the MEP modelling procedure was initiated. The hyperparameters utilized are listed in Table 4. To arrive at a straightforward final equation, basic arithmetic operations (division, subtraction, addition, and multiplication) were used. Setting a limit on the number of generations made it possible to make iterative improvements until the required level of precision. Creating a generation-based termination criterion and making strategic hyperparameter decisions were essential to creating a strong MEP model. More generations resulted in fewer errors and increased accuracy. By varying the crossover chance and mutation rate, the likelihood of children undergoing genetic operations like crossover and mutation was regulated. The ideal set of values was found via rigorous testing with several hyperparameter combinations (explained in Table 5). To ensure the model's accuracy and dependability, the evaluation period for every generation was thoughtfully chosen. In this investigation, a halting criterion was created, even though the model can evolve endlessly with additional variables. To be more precise, the model stops evolving when the fitness function changes by less than 0.1% or after 1000 generations. These standards function as performance indicators, showing at what point further evolution ceases to yield appreciable improvements to the model. They make it possible to effectively control the model's performance, guaranteeing the right level of efficiency and accuracy.

Table 5 MEP Parametric settings.

ANN model

ANN models were created by applying the Levenberg–Marquardt technique. Ten hidden neurons were used, and the data were divided up randomly. For the iteration procedure, a feed-forward back-propagation network type was utilized. To find the ideal number of hidden layers for attaining the required performance, the trial-and-error approach was modified in this study. The parameters for the ANN method modelling process in this study are shown in Table 6. Forward propagation is used in this study, in which each previous neuron processes and sends information to the neurons that come after it. Weights (γd, wPI, δd, δc, NFT, and MC) that indicate the importance of the input data in relation to the output have an impact on each input. Every node also adds a threshold value (n) to the total of the weighted signal inputs. After that, the integrated input (Kn) is converted from one unit of measurement to another using a non-linear activation function (AF). In ANNs, the AFs are essential because they add non-linearity, which greatly affects the model's efficacy. Choosing the right AF is, therefore, essential.

Table 6 ANN hyperparameters.

Model’s performance evaluation

The coefficient of correlation, denoted as R, is a common metric used to evaluate a model's efficacy and efficiency. However, R has its limitations; it may not accurately reflect a model's performance in handling simple mathematical operations such as scaling outputs by constants, potentially masking underlying issues. To address these limitations, this study incorporates additional error measures into the evaluation system. These measures include mean absolute error (MAE), relative root mean square error (RRMSE), relative square error (RSE), and root mean squared error (RMSE). Using these measures alongside R provides a more comprehensive evaluation of the model's accuracy and predictive power. The inclusion of RSE, RMSE, MAE, and RRMSE enables a detailed examination of the model's performance across various scenarios. RSE provides insights into overall prediction accuracy by calculating the relative difference between actual and projected values. RMSE and MAE measure the average size of errors, with RMSE placing more emphasis on larger errors due to its squared form. RRMSE offers a relative measure of prediction accuracy by normalizing RMSE against the mean of observed values, complementing the information provided by RMSE. Collectively, these measures present a clearer picture of the model's fit to real data and its generalizability across different datasets and conditions. To further enhance the evaluation of model performance, additional indices such as the A10-index, variance account for (VAF)104, scatter index (SI)35, and index of agreement (IA)33 are used. These indices offer diverse perspectives on model accuracy and reliability, contributing to a more robust assessment framework. The study also introduces a performance index, represented by ρ105, which combines the RRMSE and R functions. This index provides a more comprehensive evaluation of model performance by integrating the strengths of both R and RRMSE. In order to ensure the model's generalizability, the data was split into training and testing phases, with 70% of the data used for training and 30% for testing. This approach helps validate the model's performance on unseen data, providing a realistic assessment of its predictive capabilities. By employing this multi-faceted evaluation strategy, the model's performance is assessed more accurately, allowing researchers to interpret its results more reliably in practical applications. This comprehensive approach addresses potential shortcomings that traditional metrics might overlook, ensuring a more thorough assessment of the model's predictive skills. The recommended equations for each of the metric are presented in Table 7.

Table 7 Statistical equations for evaluation metrices.

The aforementioned equations use the following notations: "\({m}_{c}\)" stands for the cth experimental output, "\({g}_{c}\)" for the model anticipated output, "\({\overline{m}}_{c}\)" for the average of the experimental/actual outputs, "\({\overline{g}}_{c}\)" for the mean value of the model outputs, and "w" for the total number of samples calculated. Based on their respective average values, these formulas provide a useful tool for evaluating the relative inaccuracy or disagreement between actual and model estimations. For a model to be considered reliable and well-calibrated, it must have a high R value, which denotes a strong correlation between forecast and actual values; low RMSE, RRMSE, and MSE values also indicate that the model is capable of accurately capturing underlying data patterns and guaranteeing close alignment with real experimental data. Researchers should focus on achieving high R values and minimizing error metrics because these indicators together validate the model's accuracy and its fidelity in reflecting real-world data. When a model is over-trained on data in machine learning, it is called overfitting and causes testing mistakes to rise quickly while training errors continue to decline106. Equation (16) denotes the objective function (OBF) that is proposed by Gandomi and Roke107 to address this problem. By acting as the fitness function and directing the choice of the ideal model configuration, this OBF counteracts the overfitting effect. This strategy effectively addresses the overfitting issue by balancing generalizability and complexity, ensuring the model remains reliable and accurate in forecasting new data. The strong correlation between measured and predicted values, commonly shown by an R-value higher than 0.8 (or even 1 for a perfect fit), has been suggested as an indicator of the model's efficacy. Better model performance is indicated by a lower RRMSE number, which falls between 0 and positive infinity. This multi-metric method guarantees a thorough assessment of the correctness and dependability of the model.

$$OBF=\left(\frac{{w}_{f}-{w}_{s}}{w}\right){\rho }_{f}+2\left(\frac{{w}_{z}}{w}\right){\rho }_{s}$$
(16)

The subscripts "f" and "s" in the equation designate training and validation (or testing) data, respectively, while "w" denotes the total number of data points. In order to appropriately evaluate overall model performance, the objective function (OBF) takes into account characteristics such as R and RRMSE in addition to the relative distribution of dataset entries throughout these sets. Reducing the OBF is essential because, as its value gets closer to zero, it indicates a precise and well-calibrated model. By running simulations with over 20 different combinations of the fitting parameters, the study determined which model had the lowest bias factor. This method efficiently chooses the model that performs at its best out of all the parameter combinations that are investigated.

Results and discussion

GEP model formulation

Figure 4 presents expression trees (ETs) generated by GeneXproTool, a software tool used to extract empirical expressions for predicting the resilient modulus (MR) of subgrade soil. These ETs are decoded to derive an empirical expression based on basic arithmetic operations: addition ( +), subtraction (-), multiplication ( ×), and division ( ÷). Equations (1721) simplify the derived expressions. This straightforward expression facilitates efficient and user-friendly MR prediction for subgrade soil, highlighting its practical utility in modeling and analysis.

Figure 4
figure 4

GEP ETs.

$${M}_{R}(MPa)=\text{A}\times \text{B}\times \text{C}\times \text{D}$$
(17)
$$A=\left[-2.051+\frac{Exp.wPI}{(-14.3+WPI)(-7.02 \gamma d +MC.wPI)}\right]$$
(18)
$$B= \left[Exp\left(8.18\text{\hspace{0.05em}}+\frac{8.89}{\delta c-{SD}^{2}+wPI-42.17 {N}_{FT}}\right)+\delta c\right]$$
(19)
$$C=\left[68.18\text{\hspace{0.05em}}-\delta c+\frac{\gamma d .{N}_{FT}. wPI}{Exp}-MC-{N}_{FT}\right]$$
(20)
$$D=\left[-\frac{0.749}{7.4025Exp+2.3\delta d-3.1\gamma d}\right]$$
(21)

MEP model formulation

An empirical equation was developed to estimate the resilient modulus (MR) of subgrade soil using the MEPx software. The MEP analysis considered six important independent variables, which were vital to the investigation. These variables were selected based on their significant impact on the MR, ensuring that the final equation accurately reflects the complex interactions affecting soil resilience. Thus, the derived equation, as seen in Eq. (22), is a reliable mathematical expression designed to predict MR values. This equation, developed through MEP, is straightforward to use in practice and is particularly valuable in geotechnical applications where precise forecasts of soil behaviour are crucial.

$${M}_{R}\left(MPa\right)=\frac{\gamma d.\delta c}{\gamma d-{N}_{FT}+\delta d+wPI}+\frac{MC(\gamma d+wPI-{N}_{FT})}{2MC-\gamma d+{N}_{FT}}-\frac{(\gamma d-{N}_{FT}+wPI)(\gamma d+2MC)wPI}{{(\gamma d+2MC-{wPI}^{2})}^{2}(\gamma d-2MC-{N}_{FT})}-\frac{(\delta d-{N}_{FT}+wPI)(\gamma d+2MC)}{(\gamma d+2MC-{wPI}^{2})(\gamma d-2MC-{N}_{FT})}$$
(22)

Accuracy evaluation of GEP model

The scatter plots of the proposed GEP equation for both the training and validation stages are provided in Fig. 5. The analysis reveals that the GEP model demonstrates impressive performance metrics. As shown in Fig. 5, the model achieved an R-value of 0.996 during the training phase, indicating a very high correlation between predicted and actual values. Similarly, during the testing phase, the model maintained an R-value of 0.99, further demonstrating its strong predictive capability. Figure 6a and b illustrate the relationship between the absolute error of the GEP equation and the corresponding experimental values. The maximum, minimum, and average errors in the GEP model are also identified, as shown in Fig. 11a and b, providing additional insights into its performance characteristics. The maximum error during the GEP training phase was 12.045 MPa, with an average error of 1.715 MPa. During the testing stage, the average error was 2.570 MPa, and the maximum error was 14.445 MPa. The consistency of these error values between the training and testing phases underscores the precision and versatility of the proposed GEP expression. This consistency indicates the model's effectiveness in predicting outcomes for new and unobserved data while minimizing the risk of overfitting.

Figure 5
figure 5

Regression plots for GEP.

Figure 6
figure 6

Overall error distribution in GEP model.

Accuracy evaluation of MEP model

The scatter plots for the MEP model are presented in Fig. 7. In the figure, it can be observed that the experimental and predicted values are very close, with slope values of 0.94 and 0.96 in the training and testing stages. The MEP model achieved an R value of 0.97, demonstrating a strong correlation between the predicted and experimental values. Furthermore, error analysis was performed, with the overall error in the models shown in scatter and 3D plots in Fig. 8a and b. The additional radar plots are also plotted, as shown in Fig. 11a and b for training and testing sets separately. the maximum, mean, and average error was noted as 18 MPa and 19 MPa, 3 MPa, and 2 MPa, and 5 MPa and 4 MPa, respectively, in both sets. These results indicate that the MEP model has a high degree of accuracy and reliability in predicting outcomes, with errors consistently low and a strong alignment between predicted and actual values. The performance of the MEP model, as illustrated by the low error values and high correlation, underscores its effectiveness in predicting the mechanical characteristics of subgrade soil.

Figure 7
figure 7

Regression plots for MEP.

Figure 8
figure 8

Overall error distribution in MEP model.

Accuracy evaluation of ANN model

A thorough statistical analysis was conducted to evaluate the ANN model's capability. The scatter plots for the ANN model are presented in Fig. 9. It is suggested that slope values should be close to 1. In the figure, it can be observed that the experimental and predicted values are very close, with slope values of 0.95 and 0.94 in the training and testing stages. In the training phase, the ANN model's R-value was 0.967, while in the testing phase, it was assessed at 0.953. Furthermore, the overall error in the model is shown in scatter and 3D plots in Fig. 10a and b. In addition, the radar plots are also used, as shown in Fig. 11a and b, to visualize the range and average error in both the training and testing stages. The error analysis revealed that, in the training phase, the model's errors ranged from a minimum of 0.002 MPa to a maximum of 26.00 MPa. During the testing phase, the error ranged from a minimum of 0.008 MPa to a maximum of 30.710 MPa. The average error was measured as 4 MPa and 3 MPa, respectively. While the ANN demonstrated the ability to produce highly accurate predictions under certain conditions, it was noted that the ANN was outperformed by both the MEP and GEP models, as their recorded errors were lower than those of the ANN model.

Figure 9
figure 9

Regression plots for ANN.

Figure 10
figure 10

Overall error distribution in ANN model.

Figure 11
figure 11

GEP, MEP and ANN errors range in (a) Training and (b) Testing sets.

External verification of GEP and MEP and ANN models

The GEP and MEP, and ANN models were also evaluated by external validation in order to assess their performance using standards recommended by earlier studies. Table 8 provides a brief summary of these external verification checks' outcomes. The gradient of the regression lines that pass through the origin is represented by the value indicated by (\({k}{\prime}\) of \(k\)); these values should ideally approach 1. The accuracy of all the proposed equations has proven to be impressive, effectively satisfying the given criteria. Although there is no statistical error and a greater correlation coefficient in the GEP equation, both models have strong predictive powers. Its deployment in practical applications is further facilitated by its compact character. Evaluation of \({{R{\prime}}_{o}}^{2}\) and \({{R}_{o}}^{2}\), which stand for the coefficient of determination between the results of the proposed model and the experimental data and the results of the proposed model and the experimental data, respectively was advised by another study. These numbers should ideally get closer to 1. All of the three proposed models, as indicated in Table 6, satisfy this condition, demonstrating their dependability and precision in predicting the MR of subgrade soil.

Table 8 External validation of proposed models.

Sensitivity analysis

Sensitivity analysis is a methodical way to find out how changes in a model or system's input, or independent factors, affect its output or dependent variables. This analysis's primary objective is to evaluate a model or system's sensitivity or responsiveness to changes in its input parameters. These input variables have a major impact on the model's anticipated results. The quantity of input variables and data points utilized to build the model has an impact on the results of sensitivity tests. The machine learning method used is able to evaluate each parameter's contribution separately. It's crucial to remember that changes in the proportions of the components and the addition of additional input variables could cause these evaluations to become inconsistent. The relative importance of each input parameter in predicting the MR of subgrade soil is revealed by Eqs. (23) and (24). In addition to highlighting the significance of each variable, this analysis provides insightful direction for further study and real-world applications in the field of geotechnical engineering.

$${W}_{b}={h}_{max}\left({p}_{b}\right)-{h}_{min}({p}_{b})$$
(23)
$$SA=\frac{{W}_{b}}{\sum_{c=1}^{w}{W}_{c}}$$
(24)

The previously discussed equations provide the upper and lower boundaries of the projected output, represented as h max (\({p}_{b}\)) and h min (\({p}_{b}\)) respectively, based on the b-th input domain. These findings demonstrate how some parameters, namely, dry density (26.89%), confining stress (21.6%), moisture content (14.6%), weighted plasticity index (14.92%), deviator stress (13.2%), and freeze–thaw cycles (10.3%) have a large, proportionate impact on the outcome projection. The influence of each input component on MR prediction is graphically represented in Fig. 12, which shows how these factors contribute across the spectrum. The other input variables are kept at their average values for the duration of this investigation. The computed sensitivity percentages for each parameter are presented in a clear and concise manner by the sensitivity analysis, which also offers insightful information about the parameters' relative importance inside the model.

Figure 12
figure 12

Sensitivity analysis outcomes.

SHAP analysis

As sensitivity analysis only provides information regarding the influence of each variable on output, it is also important to examine the positive or negative impact of each variable on the output. For this purpose, SHAP (SHapley Additive exPlanations) analysis was also performed in this study. SHAP analysis provides a unified measure of feature importance, allowing for the decomposition of individual predictions to understand the impact of each feature on the model’s output, both positively and negatively108. The summary plots for SHAP outputs are shown in Fig. 13. Additionally, feature plots representing the positive and negative influences are also provided, as shown in Fig. 14. It can be seen that moisture content (MC) has both positive and negative influences on the resilient modulus (MR), represented by red and blue dots, respectively. Similarly, dry density and deviator stress have a higher positive influence, while NFT has a negative influence on MR. These findings align well with the literature, demonstrating the reliability of the models used in this study109. The SHAP analysis thus provides a more nuanced understanding of how each variable affects the output, enhancing the interpretability and robustness of the model's predictions.

Figure 13
figure 13

SHAP summary plot.

Figure 14
figure 14

SHAP feature plot.

Comparative study of ANN, MEP, and GEP model outcomes

The outcomes of the comparative study between the ANN, MEP, and GEP models are shown in Fig. 15 and Table 3, which demonstrate the superior performance of the GEP model during both the training and testing phases. The GEP model consistently exhibited lower error values and higher coefficients of regression (R2) compared to the MEP and ANN odels. During the training phase, the GEP model achieved an RMSE of 2.408 MPa and an R2 value of 0.992, outperforming the MEP model (RMSE: 3.821 MPa, R2: 0.980) and the ANN model (RMSE: 4.937 MPa, R2: 0.966). This indicates that the GEP model's predictions were not only more accurate but also more consistent. Similarly, during the testing phase, the GEP model maintained its superiority with an RMSE of 3.636 MPa and an R2 value of 0.981, compared to the MEP model (RMSE: 4.889 MPa, R2: 0.966) and the ANN model (RMSE: 7.384 MPa, R2: 0.946). In terms of MAE, the GEP model demonstrated lower values (1.715 MPa in training and 2.570 MPa in testing) compared to the MEP (2.499 MPa in training and 3.284 MPa in testing) and ANN models (3.102 MPa in training and 5.481 MPa in testing). Additionally, the GEP model showed higher IA, A10-index, and VAF values and lower values of SI and PI, highlighting its better performance in capturing the variability of the observed data. The complete summary of these measures is presented in Table 9. The rank for the proposed was also calculated, as shown in Fig. 16. The highest rank of 10 was assigned to the model with the best performance, and the lowest numerical rank of 1 was given to models with lower performance. The GEP model consistently achieved the highest ranks during both the training and testing phases, with ranks of 9.200 and 8.000, respectively. In contrast, the ANN model had the lowest rank (6.600 in training and 4.500 in testing). When comparing the models based on the percentage improvement in average errors, the GEP model performed approximately 55% better than the ANN model and 20% better than the MEP model. These percentages reflect the GEP model's enhanced predictive capabilities and robustness.

Figure 15
figure 15

Scatter plot representing model’s performance comparison.

Table 9 Summary of statistical evaluations for GEP, ANN and MEP.
Figure 16
figure 16

Summated rank of developed models.

Additionally, Taylor diagrams (shown in Fig. 17) were also used to provide a visual comparison of model performance by displaying the correlation coefficient and standard deviation of model predictions against observed data for both training and testing sets. It can be seen that the GEP model performed superior with higher correlation compared to the MEP and ANN models.

Figure 17
figure 17

Taylor diagram for (a) training and (b) testing set.

Comparative analysis in the light of literature

Although there are no computational models using GEP and MEP for forecasting the MR, various other machine-learning techniques have been used in the past, as shown in Table 10. It is evident from the table that the existing models demonstrate comparable or higher accuracy in predicting MR. The high level of comparability between the values of R2 and RMSE of the existing models and the models suggested in this study confirms the efficiency of the developed models. Thus, the proposed MEP and GEP models offer a reliable and accurate method for forecasting the MR of subgrade soil.

Table 10 Comparison of proposed models with literature.

Conclusion

Understanding the resilient modulus (MR) is essential for comprehending the stress–strain behavior of subgrade materials, especially regarding their non-linear characteristics. Traditional procedures for determining MR through laboratory testing are frequently complex, time-consuming, expensive, and difficult. This study highlights the efficacy of advanced machine learning techniques, including GEP, MEP, and ANN, in accurately forecasting MR in subgrade soil, potentially enhancing our capacity to predict and understand soil behavior. Key findings from this study can be drawn as follows.

  • The GEP model demonstrated superior performance with training phase metrics of R: 0.996, MAE: 1.715 MPa, RMSE: 2.408 MPa, and testing phase metrics of R: 0.990, MAE: 2.570, RMSE: 3.636 MPa.

  • The ANN model showed good performance with training phase metrics of R: 0.983, MAE: 3.102, RMSE: 4.937 MPa, and testing phase metrics of R: 0.973, MAE: 5.481, RMSE: 7.384 MPa.

  • The MEP model also performed well with training phase metrics of R: 0.990, MAE: 2.499, RMSE: 3.821 MPa, and testing phase metrics of R: 0.983, MAE: 3.284, RMSE: 4.889 MPa.

  • The GEP model outperformed the MEP and ANN models with lower statistical errors (RMSE, MAE, SI, and PI) and higher correlation indices (VAF and R2).

  • GEP exhibited approximately 55% better performance than ANN and 20% better than MEP based on average errors.

  • Sensitivity SHAP analysis showed comparative results with the literature and identified dry density (26.89%), confining stress (21.6%), and moisture content (14.6%) as the most influential parameters.

  • In conclusion, the study demonstrates that advanced machine learning models, especially GEP, offer significant potential for accurately predicting the resilient modulus of subgrade soils. These findings underscore the practical value and feasibility of employing machine learning techniques in geotechnical engineering, paving the way for more efficient and cost-effective soil behaviour analysis.

Future research directions

Although the findings from the GEP, ANN, and MEP models are promising, it is crucial to compare their effectiveness against other advanced machine learning methods. Future research could explore the integration of these models with hybrid approaches or deep learning techniques to further enhance prediction accuracy. Additionally, expanding the dataset to include a broader range of soil types and conditions could provide a more comprehensive evaluation of model performance.