Enhanced characterization of hydraulic conductivity via standard penetration test for sandy soils and weathered rocks

Seo, Wanhyuk; Yang, Eomzi; Yun, Tae Sup

doi:10.1038/s41598-025-08300-y

Download PDF

Article
Open access
Published: 02 July 2025

Enhanced characterization of hydraulic conductivity via standard penetration test for sandy soils and weathered rocks

Wanhyuk Seo¹,
Eomzi Yang² &
Tae Sup Yun¹

Scientific Reports volume 15, Article number: 23594 (2025) Cite this article

1331 Accesses
Metrics details

Subjects

Abstract

This study introduces a novel methodology for predicting hydraulic conductivity (K) from standard penetration test (SPT) N-values, addressing the critical challenges of conventional field measurements that result in sparse K data. The research objectives were to: (1) establish empirical correlations between N and K, (2) develop a robust prediction model with quantifiable bounds, and (3) demonstrate practical applications for enhanced subsurface characterization. Analysis of 3508 boreholes across South Korea revealed a statistically significant negative correlation between N and K in sandy soils. Quantile regression enabled prediction of both point estimates and percentile ranges. Evaluation of six empirical equations for K estimation identified the Chapuis equation as optimal, which was integrated with field measurements to strengthen the regression model. For weathered rocks, a consistent K range was established. The methodology’s novelty lies in combining readily available SPT data with advanced statistical techniques to generate high-resolution 3D K domains, as demonstrated through kriging. Despite relatively low R² values, the methodology achieves practical accuracy with most predictions falling within one order of magnitude of measured values. This approach significantly enhances spatial and depth-wise characterization of subsurface K, offering a practical solution for groundwater flow modeling and geotechnical design with improved resolution.

Spatial variability of saturated hydraulic conductivity and its links with other soil properties at the regional scale

Article Open access 15 April 2021

Integrated geophysical and geospatial techniques for surface and groundwater modeling

Article Open access 26 October 2024

Integrating conventional and remote sensing with DC resistivity datasets to map groundwater potential areas using the analytical hierarchy process method, North Wadi Diit, Egypt

Article Open access 14 April 2025

Introduction

Hydraulic conductivity (K) is an essential soil property used to analyze groundwater flow, evaluate effective stress distributions, and predict contaminant transport behavior^1,2. Despite its importance, reliable field measurement of K is significantly challenging due to the high variability inherent in natural soil deposits and methodological limitations^3,4,5,6. Traditional field methods such as falling head or pumping tests are often costly, time-consuming, and typically provide limited data points without detailed vertical resolution, resulting in sparse and insufficient data for precise subsurface characterization⁷.

Conversely, the standard penetration test (SPT), a common site characterization tool, provides cost-effective and easily accessible data in the form of blow counts (N-values), which have established correlations with various geotechnical properties^{8,9,10,11,12,13,14,15}. Due to its simplicity and cost-effectiveness, the SPT is routinely performed and provides continuous depth-wise profiles, enabling the potential creation of detailed 3D maps through geostatistical methods. Although a direct physical relationship between N and K is not clear, since N-values mainly reflect soil resistance influenced by effective stress, whereas K is primarily controlled by pore structure, an indirect correlation through void ratio (e) and effective stress is plausible and warrants empirical investigation^16,17,18. Investigating a phenomenological correlation (i.e., statistical relationship observed consistently in data, without necessarily implying a direct theoretical mechanism) is worthwhile, because this effort enables constructing 3D-flow domains from N-values.

This study aims to establish the correlation between N and K specifically targeting sandy soils and weathered rocks. A database comprising 3,508 borehole records from various geotechnical investigations was analyzed. Empirical relationships were developed between N and measured K, supplemented by indirect estimates using established empirical equations (e.g., Chapuis equation) based on void ratio and grain size. To address data variability and uncertainty, quantile regression was employed, offering both central estimates and practical prediction intervals. In addition, the practical effectiveness of the proposed methodology was demonstrated by creating high-resolution 3D K models using ordinary kriging, enhancing subsurface characterization capabilities.

Research significance

This research contributes to geotechnical and hydrogeological practice by providing an effective methodology to predict K using widely available SPT data.

Novelty and literature gap

Previous studies have primarily focused on correlating N with mechanical properties such as modulus, relative density, or shear strength. This study addresses an important gap in literature by establishing correlations between N and K, using a comprehensive dataset from diverse geological settings.

High-resolution subsurface characterization

The methodology transforms routine SPT data into continuous vertical profiles of K, enabling high-resolution 3D characterization of subsurface hydraulic properties.

Cost-effectiveness and generalizability

By using existing SPT data, this approach eliminates the need for additional specialized hydraulic testing, making it particularly valuable for preliminary site assessments and projects with limited resources. The global standardization of SPT procedures further enhances the potential transferability of this approach to various geographical contexts.

The empirical foundation of this approach, supported by field observations, aligns with geotechnical engineering practices where correlations based on field data often prove valuable regardless of underlying theoretical relationships. This approach can significantly improve subsurface K characterization, leading to more reliable groundwater flow modeling and informed geotechnical designs.

Available data and correlation

Data acquisition and processing

Data from geotechnical investigation reports in practices across various apartment complex construction sites were used in the study. A total of 318 reports containing 3,508 boreholes were analyzed to extract the data:

Field-measured N-value profiles were available for all 3,508 boreholes.
Field-measured K values (K_{Field-measured}) from 653 cases were available with N-value profiles.
A pair of void ratio (e) and effective diameter (D₁₀) calculated by index properties from 281 cases were available with N-value profiles.
98 sets of both K_{Field-measured} and e–D₁₀ pairs were available with N-value profiles.

Consequently, the following datasets were prepared with a possible combination of each property: 653 N–K sets (555 N–K sets + 98 N–K–e–D₁₀ sets) and 281 N–e–D₁₀ sets (183 N–e–D₁₀ sets + 98 N–K–e–D₁₀ sets). Figure 1 shows how these datasets were categorized into three distinct types: (A) cases with only N and K measurements, (B) cases with N, e, and D₁₀ measurements but without K values, and (C, D) cases with complete data including all measurements. The following section summarizes each testing method described in the report.

Measuring hydraulic conductivity: field permeability test

A field permeability test (falling head test) was conducted following ASTM D6391-11¹⁹: A casing was installed up to the upper boundary of the depth range where K was to be measured. Further, water was injected from outside the casing until the water level increased to the casing top. Then, the water level dropped with time, and K was calculated. This procedure provided a single value of K to represent the depth range where the casing was installed.

Measuring N-value: borehole drilling and SPT

SPT was conducted in accordance with KS F 2307²⁰. For cases where less than 30 cm penetration even after 50 blows, the final penetration depth was recorded and the linear rescaling regarding 30 cm was applied to consistently compile data (e.g., N-value of 50 blows per 10 cm was converted to 150 blows per 30 cm). The N-values were compiled along the depth, and relevant data such as layer type, soil classification, and groundwater level (GWL) were summarized as reported in the document without any further correction. The N-values were continuously obtained at discrete intervals along the borehole, whereas K_{Field-measured} represents the overall depth range.

As shown in Fig. 1, this difference in measurement scale necessitated a methodology to determine a representative N-value (N^*) for each K-measured depth range. Three different approaches to extract N^* were explored:

Linear interpolation (N^*_interp): The N-value at the midpoint of the K-measured range was estimated through linear interpolation between adjacent N-values.
Arithmetic mean (N^*_mean): The average of all N-values within the K-measured range was calculated.
Weighted average (N^*_weighted): The weighted average of all N-value within the K-measured range was calculated using weight factors inversely proportional to the distance from the midpoint.

Among these approaches, the linear interpolation (N^*_interp) exhibited the highest correlation with K and was therefore adopted for all subsequent analyses. This can be attributed to several factors: (1) linear interpolation captures the continuous depth-dependent variation of N-values more accurately than simple averaging methods, (2) the midpoint of the K-measurement range typically represents the most characteristic hydraulic properties of that interval, and (3) arithmetic mean can be disproportionately influenced by extreme values, while weighted average introduces arbitrary assumptions about the influence of distance. The linear interpolation method provided correlations approximately 5–8% stronger than the alternative approaches, validating its selection for this study.

Estimating the void ratio and effective diameter: soil index property tests

Reports included soil index properties such as water content, specific gravity, grain size distribution (GSD) curve, and Atterberg limits, from laboratory tests^21,22,23,24. For boreholes lacking K_{Field-measured}, the void ratio (e) and effective diameter (D₁₀) that were used to estimate K in further sections were determined using $S \cdot e = w \cdot G_{s}$ and interpolated from the GSD curve, respectively.

Overview of methodological workflow

The methodological framework adopted in this study is schematically summarized in Fig. 1b, illustrating a structured workflow from data acquisition to the practical application of the developed model. The workflow involves the following sequential steps:

Data classification and representative N-value determination: Depending on the availability of parameters, data were grouped into three types: (1) N–K sets, (2) N–e–D₁₀ sets, and (3) N–K–e–D₁₀ sets. For each case, representative N-values at the K measurement depth were calculated using interpolation (Fig. 1a).

Empirical correlation development: Direct correlations between N and K_{Field-measured} were analyzed (Fig. 3). In parallel, K estimations were performed (Fig. 5) using empirical equations from soil index properties (i.e., e and D₁₀).

Quantile regression modeling: A quantile regression model was developed to capture both central trends and prediction intervals (10th–90th percentiles), reflecting the variability of field data (Fig. 7).

Model validation and comparison: The predictive model was further validated using order-based analysis (Fig. 8), and comparison with a multivariate random forest regression (Fig. 9) to assess its practical reliability.

3D hydraulic conductivity modeling: The predicted K were applied to create high-resolution 3D subsurface domains using ordinary kriging (Fig. 10), demonstrating the method’s practical utility.

Regional distribution and characteristics of the dataset

To evaluate the representativeness and spatial diversity of the dataset, all 3508 boreholes were grouped into nine regions (A–I) based on geographic proximity and similarities in soil composition, as illustrated in Fig. 2. Each region is characterized by its number (No.) of boreholes, average (Avg.) GWL, and average sampling or testing depth (i.e., the depth from which K or soil index properties were obtained), with statistics summarized in a tabular format.

Pie charts show the proportion of each soil type based on unified soil classification system (USCS) within each region. SM (silty sand) was the dominant soil type across all regions, with some areas consisting entirely of sand. In some regions (A, B, C, D, and I), the average test depth exceeds the average GWL, indicating that samples were typically taken from fully saturated zones. However, in the other regions (E, F, G, and H), the average GWL is deeper than the test depth, suggesting that certain samples were collected above the water table. These samples were excluded from correlation analysis to ensure consistency in saturated conditions. This spatial and soil-type-based overview highlights the diversity yet consistency of the dataset.

Correlations between available data

Correlation between measured N-value and hydraulic conductivity

Figure 3 shows the correlation between the measured and representative N-values and the corresponding K_{Field-measured} for each borehole (653 N–K sets; (A, C, D) in Fig. 1). Each soil type is labelled with different symbols and colors ranging from clay to weathered rock. Clayey and silty soils, characterized by weak SPT resistance and inherently low K, are plotted in the lower-left region. Conversely, gavel, with a high K and strong SPT resistance, appears in the upper part of the plot. The weathered rock consistently exhibits N-values exceeding 50 blows/10 cm, with K_{Field-measured} ranging between 10^–3 and 10^–5 cm/s.

On the other hand, the negative correlation between N and K is predominant for sandy soils, as regressed by a black line (Eq. 1) in Fig. 3, despite its relatively low coefficient of determination (R²) (= 0.3869) and high mean absolute percentage error (MAPE) (= 10.31%) calculated in logarithmic scale. It is noted that such scatter and relatively low R² values are typical in geotechnical correlations involving SPT N-values, yet these correlations are widely accepted and used in practice^14,25. This level of correlation can be considered acceptable particularly for K, where order of magnitude differences are required to significantly influence flow analyses.

$$K{\text{ [m/s]}} = 0.4124 \cdot N^{ - 0.5533}$$

(1)

Correlation between measured N-value and estimated void ratio

The calculated void ratio from the measured water content and specific gravity is plotted with the measured N-values (281 N–e–D₁₀ sets; (B, C, D) in Fig. 1) in Fig. 4. Each soil type is distinguished using different symbols and colors. As highlighted in previous section, clayey soil exhibits low K and N-values and a higher void ratio than sand. However, gravel and weathered rock show notably scattered data within specific ranges. Sandy soil shows a clear negative correlation between the N-values and void ratio, with the N-values increasing with an increase in soil density. The correlation between these variables indicates that N-values can be indirectly linked to K through their effect on void ratio.

A higher N-value indicates greater resistance in sand, which is often associated with higher effective stress. While effective stress itself does not directly cause changes in the void ratio, a higher effective stress typically corresponds to a lower void ratio from a phenomenological perspective. As soils become denser under higher effective stress (reflected by higher N-values), the pore structure undergoes fundamental changes that control hydraulic behavior, not only through reduced void volume, but also through potential changes in pore connectivity and increased flow path tortuosity. These modifications in pore structure may lead to reduced permeability pathways between soil particles, providing a physical basis for the observed negative correlation between N and K. This indirect physical connection supports the empirical correlation developed in this study.

Application of the empirical equation for estimating K

Comparative analysis of empirical equations

Among 281 N–e–D₁₀ sets (183 N–e–D₁₀ sets + 98 N–K–e–D₁₀ sets; ((B, C, D) in Fig. 1)), the initial focus was on estimating K for 183 N–e–D₁₀ sets ((B) in Fig. 1) that lacked field-measured values. Here, K_{Field-measured} were not available; however, N-values, void ratio (e), and effective diameter (D₁₀) were present and included in the analysis. The K for granular media increases with higher e and higher D₁₀²⁶. Various empirical equations reflecting these trend are summarized in Table 1^{27,28,29,30,31,32,33,34,35}.

Table 1 Empirical equations for estimating hydraulic conductivity (K [cm/s]) based on void ratio (e) or porosity (n), and effective diameter (D₁₀ [mm]).

Full size table

The data collected from boreholes with both K_{Field-measured} and the corresponding index properties (98 N–K–e–D₁₀ sets; (C, D) in Fig. 1) were used to evaluate the applicability of each model for estimating K. The estimated hydraulic conductivity (K_Estimated) by each equation in Table 1 is plotted in Fig. 5. Either the underestimation or overestimation of K_Estimated originated from limited applicability as designated for each model. Among the six equations, the Chapius equation (Fig. 5f), which has a broad applicability in terms of D₁₀, demonstrated the best performance with the lowest MAPE of 17.38%, making it particularly suitable for K estimation in sandy soils, including silty sands.

Validating the estimated hydraulic conductivity using quantile regression

Quantile regression methodology

Quantile regression is an advanced statistical technique that extends the linear regression model to estimate conditional quantiles of the response variable distribution³⁶. In this study, this is employed to address the scattered and enveloped distribution observed in the relationship between N-values and K, as indicated in Fig. 3. Although a negative correlation between N-values and K is evident, the data exhibit significant variability and spread, with no single linear trend capturing the full range of the possible K values for a given N-value. This variability highlights the inherent uncertainty in K distributions, which cannot be adequately represented by traditional linear regression methods that predict only a single central tendency.

Unlike traditional linear regression, quantile regression can model multiple conditional quantiles (e.g., 10th, 50th, and 90th percentiles). For example, the 50th quantile corresponds to the median of the distribution, whereas the 10th and 90th quantiles represent the lower and upper extremes, respectively. This allows for a more comprehensive analysis of the variability in K. This approach predicts not only a specific K value but also a range of likely values, thereby affording the potential to effectively capture the bounded distribution.

Quantile regression for measured and empirically estimated hydraulic conductivity

Among the results presented in Fig. 3, the K_{Field-measured} of sandy soils (410 sets; (A, C, D) in Fig. 1) were subjected to quantile regression, and the results were presented in Fig. 6a. The black line represents the 50^th quantile (i.e., median prediction). The red-colored region represents the quantile range of 25th–75th (i.e., near the median) and includes 49.76% of data, while the blue-colored region indicates 10th–90th quantile ranges and captures 79.27% of data. Theoretically, ideal quantile ranges of 25th–75th and 10th–90th should cover 50% and 80% of the data, respectively. The selection of 10th–90th and 25th–75th percentile ranges was based on both statistical and practical considerations. The 10th–90th range captures approximately 80% of the data while excluding extreme outliers that may result from measurement errors or highly localized anomalies, making it suitable for engineering design purposes. The 25th–75th range represents the interquartile range, a robust measure of central tendency that is less sensitive to outliers than standard deviation.

The quantile regression results derived from K_{Field-measured} were used as a benchmark to assess the validity of K_Estimated. The 183 N–e–D₁₀ sets ((B) in Fig. 1) with available N-values, void ratio, and effective diameter and without K_{Field-measured} were used to calculate K_Estimated using the Chapuis empirical equation. These K_Estimated values were then plotted against their corresponding N-values in Fig. 6b. Importantly, rather than developing new quantile ranges from K_Estimated, these values were overlaid on the previously established quantile ranges from K_{Field-measured}. This approach allows for an independent validation of the Chapuis equation’s performance against the empirically observed distribution patterns.

The alignment of K_Estimated with the quantile ranges derived from K_{Field-measured} was evaluated using a quantile–quantile (Q–Q) plot, a graphical method for comparing two probability distributions, shown as an inset in Fig. 6b. The Q–Q plot demonstrates that the distribution of K_Estimated closely aligns with the quantile distribution of K_{Field-measured}, with most points falling near the 1:1 line. This alignment was further quantified by calculating the quantile coverage: 86.07% of K_Estimated fell within the 10^th–90^th quantile range, and 53.28% fell within the 25^th–75^th quantile range. These values indicate that the distribution of K_Estimated satisfactorily matches the variability observed in K_{Field-measured}, demonstrating the reliability of the Chapuis equation for estimating K.

The Chapuis equation demonstrated superior performance not only in directly estimating K values but also in maintaining consistency with the N–K relationship. This dual effectiveness is further supported by the similar distribution and scattered tendency between K_{Field-measured} versus N-value (Fig. 6a) and K_Estimated versus N-value (Fig. 6b). The consistency across different data sources (i.e., field measurements and empirical estimations) supports the claim that the relationship between N-values and K is not merely coincidental; rather it reflects a physical relationship. The Chapuis equation’s effectiveness in bridging N-values and K validates the physical basis of our correlation, as it explicitly incorporates void ratio, the link between penetration resistance and hydraulic conductivity. This consistency demonstrates the reliability of K prediction even in scenarios where direct field measurements of K might be limited or unavailable.

Proposed regression model for predicting hydraulic conductivity

Given the reliability of the derivation of K_Estimated from the void ratio and effective diameter, the values of K_Estimated were included in the final regression analysis to improve robustness. The entire dataset presented in Fig. 6a,b was combined and plotted in Fig. 7 to propose the regression model provided in Eq. (2), which is represented as a blue line.

$$K[{\text{m}}/{\text{s}}] = 0.3873 \cdot N^{ - 0.5338} ;R^{{2}} = \, 0.{3497 }\,{\text{and }}\,{\text{MAPE }} = { 9}.{8}\%$$

(2)

Equation (1) is phenomenologically derived only from K_{Field-measured}, whereas Eq. (2) incorporates both measured data and estimated data calculated from void ratio and effective diameter (K_{Field-measured + Estimated}), integrating K data from different sources to enhance comprehensiveness. These different data sources were integrated to: (1) increase the sample size, which potentially leads to more statistically reliable results, and (2) demonstrate the ability of the model to reconcile direct measurements with theoretically derived estimates, which further validates the underlying physical relationships.

The histogram in the lower left corner in Fig. 7 presents the residuals calculated on a logarithmic scale for each data point. Its near-normal distribution suggests that the regression model captures the underlying data pattern well and that the error terms are independently distributed. Despite the scattered data distribution, the upper and lower limits can be bound by considering the 10th–90th quantile range (blue zone), as indicated in Eqs. (3) and (4). These upper and lower bounds provide practical reference limits, establishing a range within which predicted K values can be considered acceptable for engineering applications. The 10th–90th quantile range captures approximately 80% of the observed data points, offering a reliable prediction interval for practical purposes.

The quantile range gradually narrows with increasing N-values, suggesting that K becomes more predictable in denser soils where N-values are higher. This narrowing trend reflects the relationship between depth, effective stress, soil density, and void ratio. As depth increases, N-values typically increase due to higher effective stress. At shallow depths where effective stress is low, soils exhibit wide variations in their initial density states, leading to diverse void ratios and consequently diverse K values at similar N-values. Conversely, higher effective stress at greater depths induces natural densification of initially loose soils, resulting in more uniform void ratios. This convergence in void ratios explains the reduced variation in K values observed at higher N-values. The proposed regression is valid up to N-values less than 50 blows/10 cm. This upper limit corresponds to the transition from soil-like to rock-like behavior, where the relationship between density state and hydraulic conductivity fundamentally changes from matrix-controlled to fracture-dominated flow.

$$K_{{{\text{Lower limit (10\% )}}}} {\text{ [m/s]}} = 0.0771 \cdot N^{ - 0.3809}$$

(3)

$$K_{{{\text{Upper limit (90\% )}}}} {\text{ [m/s]}} = 3.0255 \cdot N^{ - 0.8311}$$

(4)

The correlation was not clearly pronounced for weathered rocks where N-values exceed 50 blows/10 cm (equivalent to a converted N-value of 150 blows)^37,38, as indicated by the red symbols in Fig. 7. Instead, it is distributed within a relatively narrow range. Therefore, the average K of 10^–4 cm/s was delineated, regardless of the N-values that depend on the degree of weathering, with an 80% confidence interval (i.e., red zone).

The consistent K observed in weathered rocks, irrespective of N-values, can be attributed to their rock-like nature, where fluid flow is mainly governed by discontinuities such as fractures and joints, and not only by the density or pore size of the structure^39,40. These discontinuities, as primary flow paths for fluids, dominate K in weathered rocks, which lead to its relatively consistent behavior. While our data shows approximately one order of magnitude variation in K values for weathered rocks, this simplified characterization provides a practical approach for engineering applications, though users should be aware of potential limitations in highly fractured or heterogeneous rock masses.

Order-based validation of the proposed regression model

The practical applicability of the proposed regression model for sandy soils was further evaluated using an order of magnitude analysis (Fig. 8). K_{Field-measured} values were categorized into two different orders of magnitude, with –4 representing values between 10^–4 and 10^–3 cm/s (362 samples) and –3 representing values between 10^–3 and 10^–2 cm/s (158 samples) in horizontal axis. Very low conductivity samples (order of –5) and high conductivity samples (order of –2) were excluded from this analysis due to insufficient sample sizes (11 and 1 samples, respectively).

For each order category, the match rate between K_{Field-measured} and K values predicted from N-values (K_Predicted) was quantified at three precision levels: within ± 0.5 order, within ± 1 order, and within ± 2 orders of magnitude. The left vertical axis in Fig. 8 represents these match rates as percentages. For soils with K_{Field-measured} in 10^–4 and 10^–3 cm/s range, the model demonstrated excellent reliability with 88.4% of predictions falling within ± 0.5 order of magnitude and 100% within ± 1 order. Similarly, for soils with K_{Field-measured} in 10^–3 and 10^–2 cm/s range, 67.7% of predictions were within ± 0.5 order of magnitude and 98.7% within ± 1 order. In both cases, all predictions fell within ± 2 orders of magnitude.

The average order difference between predicted and measured values, represented by the red squares in Fig. 8 (right vertical axis), was 0.23 for the lower conductivity range (–4) and 0.41 for the higher conductivity range (–3). This pattern of increasing divergence with higher K_{Field-measured} aligns with the quantile regression results shown in Fig. 7, where the prediction bands narrow with increasing N-values (corresponding to lower K). This systematic behavior confirms that the regression model performs more consistently in denser soils with higher N-values and lower K.

Despite scattered data distribution and the prediction model’s relatively low R² value, this order-based analysis validates the practical utility of the N-value-based prediction. A high percentage of K_Predicted values fall within one order of magnitude of the measured values, which is generally acceptable for most geotechnical applications. This level of accuracy, achieved using only readily available SPT data, highlights the model’s effectiveness for practical use, particularly in groundwater flow analyses. However, the exclusion of very low and high conductivity samples (order of –5 and –2, respectively) represents a limitation of the current validation, as the model’s performance at these extreme ranges remains unverified. Future studies with larger datasets including these extreme conductivity values would be valuable for extending the model’s applicable range.

Multivariate regression analysis using machine learning

While the proposed N–K regression model provides a practical and interpretable approach for estimating K using only SPT N-values, it is important to assess whether incorporating additional soil parameters can improve predictive performance. At the same time, recent studies have demonstrated the potential of machine learning models in predicting geotechnical properties from basic soil data or N-values^{41,42,43,44,45,46,47,48}. However, these methods often come with challenges such as increased model complexity, overfitting risk, and reduced transparency, which may limit their applicability in routine engineering practice. To evaluate both the benefit of additional input variables and the comparative performance of advanced modeling techniques, a multivariate regression analysis was performed using a random forest (RF), a widely used machine learning algorithm capable of modeling complex non-linear interactions among multiple predictors.

Using the complete N–K–e–D₁₀ sets ((C, D) in Fig. 1), the data were randomly split into training (80%) and testing (20%). Three random forest models with progressively expanding input features were developed: RF-I (N and e), RF-II (N, e, D₁₀, and median grain size (D₅₀)), and RF-III (N, e, D₁₀, D₅₀, coefficient of uniformity (C_u), and fines content (FC)). Figure 9a–c presents the comparison between measured and predicted K values for each model, and Fig. 9d–f illustrates the relative importance of each feature in the corresponding models.

Inspection of results from Fig. 9a–c reveals that the scatter patterns of predicted versus measured K values remain notably similar across all three models despite the increasing number of input features, indicating that additional parameters beyond N-values provide minimal improvement in prediction capability. The performance metrics for each model are summarized in Table 2. The R² improved from 0.5418 to 0.6629 as additional features were incorporated, with a corresponding decrease in MAPE from 7.8411% to 6.6513% for training data. However, when evaluating model performance on test data, mixed results were observed: while R² slightly improved from 0.2670 to 0.2911 in RF-II, it declined to 0.2652 in RF-III, falling below even in RF-I. Test MAPE consistently increased from 7.2015 to 7.5114% as model complexity increased. This pattern of deteriorating test performance despite improvements in training metrics further confirms overfitting in more complex models. Several strategies could potentially mitigate this overfitting: (1) implementing k-fold cross-validation during model training to better assess generalization performance, (2) employing feature selection techniques to identify most informative predictors, (3) applying regularization methods by limiting the number of estimators, or (4) acquiring larger datasets to better support complex models. However, even with these mitigation strategies, the fundamental challenge remains that comprehensive datasets with all required parameters are scarce in practice. Despite the theoretical advantages of including additional soil parameters, the practical utility of the simpler N–K regression model becomes evident when considering both model performance and data availability in typical geotechnical investigations. Feature importance analysis (Fig. 9d–f) consistently identified the N-value as the most influential predictor across all models, accounting for 59.76% of predictive power in RF-I, 42.59% in RF-II, and 38.50% in RF-III. These findings confirm our central hypothesis that N-values serve as robust predictors of K in sandy soils, even when considered alongside traditional soil parameters like void ratio and GSD characteristics. The consistent identification of N-value as the dominant predictor demonstrates that while additional input features contribute to K variation, N-values effectively capture the primary factors affecting K in sandy soils.

Table 2 Performance metrics for random forest regression models with different input features.

Full size table

Despite the marginal improvements in training accuracy with more complex models, the practical utility of the N-value-based approach becomes evident when considering data availability. Complete N–K–e–D₁₀ sets required for multivariate analysis are relatively scarce (98 sets in this study), whereas N-values are abundantly available from standard site investigations (3,508 boreholes in this study). Therefore, while incorporating additional soil parameters might theoretically improve prediction accuracy, the simple N–K regression model proposed in Eq. (2) offers a more practical solution for widespread application in geotechnical practice.

Generating 3D hydraulic conductivity domains using kriging

Constructing accurate flow domains of K is essential for analyses involving groundwater flow, contaminant transport, and settlement prediction. However, generating these domains using only in-situ measured K is challenging because of the limited data availability, which typically restricts the modeling to 2D analyses. Conversely, utilizing SPT N-values enables the assignment of K at a greater number of spatial locations and across depth profiles, facilitating the construction of more detailed 3D flow domains.

Figure 10a shows a plan view of the sample study area with borehole locations, along with the digital elevation model (DEM) of the area. Black symbols indicate boreholes where both K and N-values were available (17 locations), while red symbols indicate boreholes where only N-values were available (214 additional locations). Incorporating the datasets enabled constructing a 3D flow domain over an area of 2.8 km × 2.5 km, which covered both horizontal and vertical variations of K.

Ordinary kriging, which is a widely used geostatistical interpolation method for spatial data distribution, was employed to construct the flow domain⁴⁹. Ordinary kriging was selected due to its well-established performance in geostatistical modeling and its ability to account for spatial autocorrelation while maintaining computational efficiency⁵⁰. The kriging implementation involved a spherical variogram model (a function describing spatial correlation as a function of distance) to estimate spatial relationships, with optimized grid spacing for efficiency. Kriging was performed to generate and compare two 3D K domains: one using only K_{Field-measured}, and the other using both K_{Field-measured} and K_Predicted. For sandy soils, K was predicted based on Eq. (2), while for weathered rock, a constant K value of 10^–4 cm/s was applied.

Kriging with only K _{Field-measured}

The results showed that K remains nearly constant at a given elevation and decreases with an increase in depth. The near-constant K at the same elevation is attributed to the limited number of boreholes with K_{Field-measured} and the inconsistent elevations where measurements are conducted. The decrease in K with depth aligns with the negative correlation between the N-values and K discussed in previous sections, because N-values typically increase with depth.

Kriging with both K _{Field-measured} and K _Predicted

The results of kriging with both K_{Field-measured} and K_Predicted are shown in Fig. 10b,c respectively. Figure 10b presents the reconstructed 3D K distribution, where the surface mech represents the DEM. By incorporating K_Predicted, the kriging results revealed detailed horizontal and vertical variations in K, which were not discernible in domains generated using only K_{Field-measured}. Figure 10c provides horizontal cross-sections of the 3D domain at elevations of 10, 15, and 20 m. These cross-sections illustrate how incorporating N-based predictions enhance the resolution of K distribution in the horizontal direction. In addition, the variation in K in the vertical direction is captured more precisely. Whereas the overall trend shows a decrease in K with depth, localized variations where K increases at certain areas were identified.

The inclusion of N-based K predictions addressed the limitations posed by sparse K_{Field-measured}. The kriging results demonstrated how this approach enables a more robust representation of subsurface conditions, capturing localized variations and providing a continuous 3D distribution of K values. The ability to resolve horizontal variations and depth-dependent trends in K has potential to improve the accuracy and utility of flow domain models for geotechnical and hydrogeological applications.

Conclusions

This study presented a comprehensive approach for predicting hydraulic conductivity (K) in sandy soils and weathered rocks using standard penetration test (SPT) N-values to overcome the challenges of hydraulic data scarcity. A robust and generalized regression model was developed by integrating field data with empirical equations, despite no direct physical relationship.

A negative correlation was identified between N-values and K in sandy soils. For weathered rocks, a consistent range of K values was observed; however, no direct correlation with N was found.
K estimated from empirical equations, particularly the Chapuis equation, were incorporated to enhance the robustness of the N-based prediction model. This integration accounted for variability in K and strengthened the robustness of the prediction model, especially in cases where field measurements were sparse or inconsistent.
The quantile regression provided not only point predictions but also probabilistic ranges of K. This approach acknowledges the inherent variability in soil properties and offers more comprehensive predictions, supporting better decision making in geotechnical engineering.
Additional validation through order-based analysis and multivariate machine learning techniques confirmed the practical utility and robustness of the N-based prediction model. Most predictions fell within one order of magnitude of measured values, while random forest analysis consistently identified N-values as the dominant predictor of K.
A comparison between kriging results using only measured K and those incorporating predicted values highlighted the practical advantages of N-based predictions in constructing 3D K domains. The inclusion of predicted values significantly improved spatial resolution, offering a more detailed understanding of both horizontal and vertical variations of subsurface hydraulic characteristics.

The correlation between K and N enabled a more detailed spatial modeling of K. The prediction model demonstrated practical utility despite the inherent data variability. The robustness of the proposed methodology is supported through multiple parallel validation approaches, including empirical equation consistency, quantile regression analysis, and order-based validation, collectively establishing confidence in the model’s reliability. This study contributes to the field by improving the accuracy and applicability of K predictions, particularly in data-limited environments. While this study demonstrates the practical utility of N-based K prediction for enhancing subsurface modeling, several limitations, such as relatively low R² values and simplified approach for weathered rock characterization, should be acknowledged. Future work should focus on external validation with datasets from more diverse geological settings to further assess the model’s broader applicability.

Data availability

Data will be made available from the corresponding author upon reasonable request.

References

Rangarajan, S., Rahardjo, H., Satyanaga, A. & Li, Y. Influence of 3D subsurface flow on slope stability for unsaturated soils. Eng. Geol. 339, 107665 (2024).
Article Google Scholar
Goodarzi, M. R., Vazirian, M. & Niazkar, M. Hydraulic conductivity estimation: Comparison of empirical formulas based on new laboratory experiments. Water 16, 1854 (2024).
Article Google Scholar
Deb, S. K. & Shukla, M. K. Variability of hydraulic conductivity due to multiple factors. Am. J. Environ. Sci. 8, 489–502 (2012).
Article Google Scholar
Gofar, N. et al. Factors affecting hydraulic anisotropy of soil. Geomech. Eng. 36, 343–353 (2024).
Google Scholar
Hu, W., Shao, M., Wang, Q. & She, D. Effects of measurement method, scale, and landscape features on variability of saturated hydraulic conductivity. J. Hydrol. Eng. 18, 378–386 (2013).
Article Google Scholar
Lee, B.-J. Improvement of field falling-head test and determination of hydraulic conductivity using Darcy’s equation. Sci. Rep. 14, 17928 (2024).
Article CAS PubMed PubMed Central Google Scholar
Palmer, M. & El-Idrysy, H. Comparison of borehole testing techniques and their suitability in the hydrogeological investigation of mine sites. in Agreeing on Solutions for More Sustainable Mine Water Management–Proceedings of the 10th ICARD and IMWA Annual Conference, Santiago, Chile (2015).
Anbazhagan, P., Parihar, A. & Rashmi, H. N. Review of correlations between SPT N and shear modulus: A new correlation applicable to any region. Soil Dyn. Earthq. Eng. 36, 52–69 (2012).
Article Google Scholar
Bol, E. A new approach to the correlation of SPT-CPT depending on the soil behavior type index. Eng. Geol. 314, 106996 (2023).
Article Google Scholar
Cubrinovski, M. & Ishihara, K. Empirical correlation between SPT N-value and relative density for sandy soils. Soils Found. 39, 61–71 (1999).
Article Google Scholar
Fabbrocino, S., Lanzano, G., Forte, G., Santucci de Magistris, F. & Fabbrocino, G. SPT blow count vs. shear wave velocity relationship in the structurally complex formations of the Molise Region (Italy). Eng. Geol. 187, 84–97 (2015).
Article Google Scholar
Ji, P. et al. Energy measurement in standard penetration tests. Sustainability 15, 1–15 (2023).
Google Scholar
Kang, C. et al. Examination of the correlation between SPT and undrained shear strength: Case study of clay till in Alberta, Canada. Eng. Geol. 334, 107510 (2024).
Article Google Scholar
Mujtaba, H., Farooq, K., Sivakugan, N. & Das, B. M. Evaluation of relative density and friction angle based on SPT-N values. KSCE J. Civ. Eng. 22, 572–581 (2018).
Article Google Scholar
Panjamani, A., Manohar, D. R., Moustafa, S. S. R. & Al-Arifi, N. S. N. Selection of shear modulus correlation for SPT N values based on site response studies. J. Eng. Res. 4, 18–42 (2016).
Google Scholar
Morin, R. H. Negative correlation between porosity and hydraulic conductivity in sand-and-gravel aquifers at Cape Cod, Massachusetts, USA. J. Hydrol. 316, 43–52 (2006).
Article ADS Google Scholar
Rosas, J. et al. Determination of hydraulic conductivity from grain-size distribution for different depositional environments. Groundwater 52, 399–413 (2014).
Article CAS Google Scholar
Sperry, J. M. & Peirce, J. J. A model for estimating the hydraulic conductivity of granular material based on grain shape, grain size, and porosity. Groundwater 33, 892–898 (1995).
Article CAS Google Scholar
ASTM-D6391-11. Standard Test Method for Field Measurement of Hydraulic Conductivity Using Borehole Infiltration. ASTM International at (2020).
KS-F-2307. Method for Standard Penetration Test. Korean Industrial Standards at (2022).
KS-F-2303. Test Method for Liquid and Plastic Limit of Soils. Korean Industrial Standards at (2022).
KS-F-2306. Standard Test Method for Water Content of Soils. Korean Industrial Standards at (2020).
KS-F-2308. Test Method for Density of Soil Particles. Korean Industrial Standards at (2022).
KS-F-2302. Test Method for Particle Size Distribution of Soils. Korean Industrial Standards at (2022).
Tsai, C. C., Kishida, T. & Kuo, C. H. Unified correlation between SPT–N and shear wave velocity for a wide range of soil types considering strain-dependent behavior. Soil Dyn. Earthq. Eng. 126, 105783 (2019).
Article Google Scholar
Chapuis, R. P. Predicting the saturated hydraulic conductivity of soils: A review. Bull. Eng. Geol. Environ. 71, 401–434 (2012).
Article Google Scholar
NavfacDM7. Design Manual-Soil Mechanics, Foundations, and Earth Structures. US Gov. Print. Off. (1974).
Carman, P. C. Fluid flow through granular beds. Trans. Inst. Chem. Eng. Lond. 15, 150–156 (1937).
CAS Google Scholar
Kozeny, J. Ueber kapillare leitung des wassers im boden. Sitzungsberichte Akad. Wissenschaften Wien 136, 271 (1927).
Google Scholar
Carman, P. C. Flow of Gases Through Porous Media (Butterworths, 1956).
Google Scholar
Chapuis, R. P., Gill, D. E. & Baass, K. Laboratory permeability tests on sand: influence of the compaction method on anisotropy. Can. Geotech. J. 26, 614–622 (1989).
Article Google Scholar
Hazen, A. Some physical properties of sand and gravel with special reference to their use in filtration, in 24th Ann, Rep. Mass. State Board Heal. Boston, 1983 (1983).
Chapuis, R. P. Predicting the saturated hydraulic conductivity of sand and gravel using effective diameter and void ratio. Can. Geotech. J. 41, 787–795 (2004).
Article Google Scholar
Slichter, C. S. Theoretical investigation of the motion of ground waters, in 19th Ann. Rep. US Geophys Surv. 304–319 (1899).
Terzaghi, K. Principles of soil mechanics: III. Determination of permeability of clay. Eng. News Rec. 95, 832–836 (1925).
Google Scholar
Hao, L. & Naiman, D. Q. Quantile Regression (Sage, 2007).
Book Google Scholar
Seoul-Metropolitan-Government. Geotechnical Investigation Manual. 17 at (2006).
Ministry-of-Land-Infrastructure-and-Transport. Road Design Manual. 402 at (2000).
Chicco, J. M., Comina, C., Mandrone, G., Vacha, D. & Vagnon, F. Field surveys in heterogeneous rock masses aimed at hydraulic conductivity assessment. SN Appl. Sci. 5, 374 (2023).
Article CAS Google Scholar
Zoorabadi, M., Saydam, S., Timms, W. & Hebblewhite, B. Analytical methods to estimate the hydraulic conductivity of jointed rocks. Hydrogeol. J. 30, 111–119 (2022).
Article ADS Google Scholar
Olamide Taiwo, B. et al. Explosive utilization efficiency enhancement: An application of machine learning for powder factor prediction using critical rock characteristics. Heliyon 10, e33099 (2024).
Article PubMed PubMed Central Google Scholar
Rabbani, A. et al. Optimization of an artificial neural network using four novel metaheuristic algorithms for the prediction of rock fragmentation in mine blasting. J. Inst. Eng. Ser. D https://doi.org/10.1007/s40033-024-00781-x (2024).
Article Google Scholar
Rabbani, A. et al. Utilization of tree-based ensemble models for predicting the shear strength of soil. Transp. Infrastruct. Geotechnol. 11, 2382–2405 (2024).
Article Google Scholar
Rabbani, A. et al. A comprehensive study on the application of soft computing methods in predicting and evaluating rock fragmentation in an opencast mining. Earth Sci. Inf. https://doi.org/10.1007/s12145-024-01488-z (2024).
Article Google Scholar
Rabbani, A., Samui, P. & Kumari, S. Implementing ensemble learning models for the prediction of shear strength of soil. Asian J. Civ. Eng. 24, 2103–2119 (2023).
Article Google Scholar
Rabbani, A., Samui, P. & Kumari, S. A novel hybrid model of augmented grey wolf optimizer and artificial neural network for predicting shear strength of soil. Model. Earth Syst. Environ. 9, 2327–2347 (2023).
Article Google Scholar
Rabbani, A., Samui, P. & Kumari, S. Optimized ANN-based approach for estimation of shear strength of soil. Asian J. Civ. Eng. 24, 3627–3640 (2023).
Article Google Scholar
Rabbani, A. et al. Optimization of an artificial neural network using three novel meta-heuristic algorithms for predicting the shear strength of soil. Transp. Infrastruct. Geotechnol. 11, 1708–1729 (2024).
Article Google Scholar
Wackernagel, H. Ordinary Kriging. Multivariate Geostatistics: An Introduction with Applications (Springer, 2003). https://doi.org/10.1007/978-3-662-05294-5_11.
Book Google Scholar
Li, Y. et al. Database of soil properties incorporating organic content from roots and soil organisms for regional slope stabilisation. Sci. Rep. 15, 1066 (2025).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (Nos. RS-2021-NR060085, RS-2023-NR076991). This work was based on data obtained from “GS Engineering and Construction”.

Author information

Authors and Affiliations

School of Civil and Environmental Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea
Wanhyuk Seo & Tae Sup Yun
Department of Geotechnical Engineering Research, Korea Institute of Civil Engineering and Building Technology, Goyang, 10223, Republic of Korea
Eomzi Yang

Authors

Wanhyuk Seo
View author publications
Search author on:PubMed Google Scholar
Eomzi Yang
View author publications
Search author on:PubMed Google Scholar
Tae Sup Yun
View author publications
Search author on:PubMed Google Scholar

Contributions

Wanhyuk Seo: Visualization, Methodology, Software, Formal analysis, Investigation, Writing—Original Draft. Eomzi Yang: Methodology, Formal analysis, Investigation. Tae Sup Yun: Conceptualization, Supervision, Validation, Writing—Review and Editing.

Corresponding author

Correspondence to Tae Sup Yun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Seo, W., Yang, E. & Yun, T.S. Enhanced characterization of hydraulic conductivity via standard penetration test for sandy soils and weathered rocks. Sci Rep 15, 23594 (2025). https://doi.org/10.1038/s41598-025-08300-y

Download citation

Received: 28 February 2025
Accepted: 20 June 2025
Published: 02 July 2025
Version of record: 02 July 2025
DOI: https://doi.org/10.1038/s41598-025-08300-y