Introduction

Ground settlement prediction is a particularly significant topic in road construction practice. In construction practice, it is often incorporated in the practice of ground improvement and monitoring1,2. To predict the ground settlement, three kinds of approaches are often adopted, including analytical methods3,4,5, numerical methods6,7,8,9 and data-driven methods10,11,12,13,14,15. The analytical methods are usually based on the consolidation characteristics of soils, and the numerical methods incorporate the soil constitutive model. Since the consolidation characteristics and soil constitutive parameters need to be obtained through experiments and tests, which are prone to errors caused by measurement and disturbance, uncertainties are typically involved in the prediction results when using analytical and numerical methods. Also, the uncertainties would exist in the generated transformation models between the site investigation data and soil parameters when estimating a particular soil parameter of interest16,17,18,19,20. It is a consensus in the field of geotechnical engineering that uncertainties can significantly affect the results of geotechnical engineering risk assessment11,21,22,23,24,25 (e.g., geotechnical reliability analysis and soil liquefaction assessment). By contrast, the data-driven methods are usually based on the monitoring data of settlement displacement, where the in-situ state of the soils can be incorporated. With the development of internet of things (IoT) and real monitoring system, monitoring data are often obtained in engineering practice, and the data-driven methods are gradually popular for the ground settlement prediction. However, the accuracy of the predicted results would be significantly influenced by the data volume, and the problem of sparse data is often encountered in engineering practice. Hence, the exploration on the prediction method with sparse data is desired in the data-driven context.

The data-driven methods for ground settlement prediction can generally be classified into the following two categories: (a) data-based back analysis methods11,26,27,28; and (b) time series prediction methods10,29,30,31,32,33,34,35. Table 1 summarizes the data-driven methods used for ground settlement prediction in the previous studies. In the data-based back analysis methods, the soil properties (e.g., cohesion and friction angle) are obtained through back analysis of the monitoring data. The obtained soil properties are subsequently incorporated into the analytical equations for settlement prediction or numerical analyses. For example, Park et al.26 utilized a back - analysis method based on a genetic algorithm to evaluate the settlement of a thick soft clay deposit in the southern part of the Korean peninsula. Tian and Wang27 used a back—analysis method based on the Bayesian learning framework to evaluate the ground settlement of a real ground improvement project. In their method, the optimum constitutive parameters are obtained by numerical analysis of the monitoring data, which are then input to the finite element model to compute the development of settlement. In the data-based back analysis methods, the obtained soil properties would generally be closer to the real ones than those obtained by experiments and tests, but the assumptions associated with the analytical equation and constitutive model are inevitable. By contrast, in the time series prediction methods, the ground settlement is predicted through learning the time series feature of the historical monitoring data without involving the model assumption. Here, various regression methods are often adopted in the practice of ground settlement prediction, such as hyperbolic method, exponential curve method, Asaoka method and gray model method (i.e., GM(1,1))29,36,37. In recent decades, artificial intelligent (AI) techniques are often adopted to learn the time series feature of measured settlement in ground settlement prediction31,32,33,34,35,38,39,40. For example, Wen et al.34 incorporated convolutional neural network to learn the time series feature of the monitoring data obtained by IoT and predict the settlement caused by shield tunneling. Ning et al.41 combined gray relation analysis and long short-term memory network to perform real-time prediction of ground settlement during foundation excavation in a cloud platform. The use of AI techniques generally enhances the prediction accuracy in the context of the data-driven methods. However, the volume of data should be large enough to train an accurate Al model, which is not realistic in some engineering situations (e.g., the early stage of monitoring and manual monitoring project). In addition, the AI prediction techniques are usually associated with the ‘black box’; therefore, it is difficult to investigate the general applicability of AI methods in ground settlement prediction.

Table 1 The data driven methods for ground settlement prediction in the previous studies.

Interpolation technique is another idea to predict ground settlement in the context of time series prediction. In the framework of interpolation technique, the weight factors of every sample points would be identified and incorporated to interpolate the predicted result. Kriging method is a popular interpolation method used in engineering practice, which is proposed by a French statistician named Georges42. Under the framework of Kriging interpolation, the weight factors are calculated based on the autocorrelation between sample points. This calculation can lead to the best linear unbiased prediction. In geotechnical professions, Kriging method is often used in geostatistics and geotechnical reliability analysis43,44,45,46,47,48, whereas there are few studies that incorporated Kriging in time series prediction. In other realms, the use of Kriging in time series prediction has been explored. For example, Huang et al.49 used topological Kriging to predict the runoff time series in ungauged catchments. Liu et al.50 adopted Taylor Kriging method to predict the time series of wind speed, and they found that the performance of Taylor Kriging method is better than the moving average method. Farmer et al.51 used an extended ordinary Kriging method to predict the daily stream-flow in ungaged watersheds. Shtiliyanova et al.52 used Kriging-based approach to predict the time series of air temperature. They found that the ordinary Kriging approach can produce accurate predictors of hourly temperature data, but it cannot yield satisfactory performance for daily temperature data. In the context of ground settlement prediction, the measured settlement increases monotonically with time, indicating that the trend structure of the sample data should be identified and filtered under the framework of Kriging. In this regard, regression Kriging (RK) would be a better option for ground settlement prediction. Herein, the increasing trend function of the sample data is usually determined by polynomial regression, and the best estimates of the sample residuals (i.e., the values obtained by taking the trend value away from the original sample data) are obtained through Kriging interpolation. With sparse sample data, a main challenge of using Kriging in ground settlement prediction is the difficulty in achieving the stationarity of the sample residual (i.e., normal distribution with constant mean and variance). The stationarity is a precondition of using Kriging interpolation, but it is seldom incorporated in the previous investigations related to Kriging-based time series prediction. Apart from the stationarity of the sample residual, an appropriate trend structure of the sample data may have a non-negligible influence on the prediction accuracy of RK. Such an influence should be understood when using RK in predicting ground settlement.

The aim and novelty of the current work are to propose a new time series prediction method for ground settlement with sparse data based on Kriging, and point out the significance of the stationarity of sample residuals and trend structure. Regarding the issue of the stationarity of sample residuals, the Box–Cox transformation53 is incorporated to process the training samples. Through Box–Cox transformation, the probability distribution of the sample residuals would be closer to normal distribution (i.e., the Box–Cox transformation can help to achieve the stationarity). The predicted results obtained by RK incorporating Box–Cox transformation are also compared to those without incorporating it, so as to reveal the significance of stationarity of sample residuals. Subsequently, various orders of the trend function are considered to find the appropriate trend structure of the sample data for ground settlement prediction under the framework of RK. Moreover, comparative studies are implemented between the proposed method and classical methods for ground settlement prediction. Finally, the significance of stationarity and appropriate trend structure is discussed.

The paper is divided into six sections. The “Introduction” section emphasizes the importance of ground settlement prediction and the challenges of sparse data, and discusses data-driven methods and the potential of Kriging. The “Case description and data resources” section presents the monitoring data incorporated in the current work. The “Methodology” section explains the RK method, Box–Cox transformation, and the implementation procedures of the proposed method. The “Results” section shows the impacts of Box–Cox transformation and trend structure on prediction accuracy, and compares RK with other methods. The “Discussion” section highlights the significance of sample residual stationarity and trend structure. Finally, the “Conclusion” section summarizes the main findings of this study.

Case description and data resources

The monitoring data considered in the proposed study was collected from a highway construction in a coastal area of China. The length and width of the highway are 965 m and 8.5 m, respectively, and the subgrade of the highway mainly comprises silty soils with a thickness of 10–14 m. According to the ground investigation, the subgrade exhibits the features of high compressibility, high moisture content, low permeability and low strength. Thus, the settlement problem is crucial in this engineering case. To measure the ground settlement, four settlement plates are installed in the locations with appearances of significant settlements along the highway, as shown in Fig. 1. Here, the settlement plates No. 1–4 are denoted by S1–S4, respectively. On each settlement plate, the ground settlements were measured by the leveling instrument manually, and the monitoring data (i.e., the data resources used in the current work) are shown in Fig. 2. As can be observed from Fig. 2, the tendencies of the increase of the settlement are similar for the four monitoring points. In the first 25 days, the increasing rate of the settlement is higher, and subsequently it tends to be stable. Sudden increase of the settlement occurred during the monitoring period, which may be caused by the construction disturbance of the highway. In each time series in Fig. 2, there are only 25 data samples obtained (i.e., sparse time-series data). Here, 20 data samples are considered as the training samples, and the other samples are adopted to check the accuracy of the prediction methods. It should be noted that in this study, when referring to “sparse data”, it is specifically the sparse time–series data monitored at each measuring point, rather than the spatial distribution of the measuring points.

Fig. 1
figure 1

Diagram of the highway and layout of the monitoring points (Drawn by Shan-pian Yang).

Fig. 2
figure 2

Data resources of the proposed study (S1–S4 denote the settlement plates No. 1–4, respectively).

Methodology

Regression Kriging (RK)

Kriging interpolation technique can be used in spatial and temporal prediction. In the context of regression Kriging and time series prediction, the trend structure (\(\mu (t)\)) of the time series should be determined firstly, which will then be filtered using the following linear equation to obtain the sample residual corresponding to each time node (\(\varepsilon {(}t{)}\)):

$$z(t) = \mu (t) + \varepsilon (t)$$
(1)

where \(z(t)\) denotes the time series data at time t.

For ground settlement prediction, as the soil settlement usually increases monotonically, polynomials with simple structure can be used to express the trend function:

$$\mu (t) = B_{0} + B_{1} t + B_{2} t^{{2}} + B_{3} t^{3} + \ldots + B_{n} t^{n}$$
(2)

where\(B_{0}\)denotes the constant term of the polynomial; and \(B_{1} ,\begin{array}{*{20}c} {} \\ \end{array} B_{2} ,\begin{array}{*{20}c} {} \\ \end{array} B_{3} ,\begin{array}{*{20}c} {} \\ \end{array} \ldots ,\begin{array}{*{20}c} {} \\ \end{array} {\text{and}}\begin{array}{*{20}c} {} \\ \end{array} B_{n}\)denote the regression coefficients corresponding to the order, which can be obtained by the least square method. Under the framework of RK, the autocorrelation of the sample residuals should be identified and incorporated in determining the interpolation weights. In the current work, the following Gaussian function is adopted to express the autocorrelation of the sample residuals:

$$\rho (\tau ) = \exp \left[ { - \left( {\frac{\tau }{\theta }} \right)^{2} } \right]$$
(3)

where\(\rho\)denotes the autocorrelation coefficient; \(\tau\)denotes the lag between any two time nodes of the time series; and \(\theta\) denotes the autocorrelation parameter, which is determined by the maximum likelihood method reported by Liu et al.43.

After the autocorrelation function is determined, the covariance matrices can be obtained54, and the residual of the predictor of the ground settlement can be obtained by the following Eqs46,47:

$${\varvec{V}}_{{0}} {\varvec{\beta}}^{{{(}j{)}}} = {\varvec{V}}_{c}^{{{(}j{)}}}$$
(4)
$$\varepsilon_{p} (t) = \user2{z\beta }^{(j)}$$
(5)

where \(\varepsilon_{p} (t)\) represents the residual of the predictor at time t; \({\varvec{V}}_{{0}}\) represents the covariance matrix between the known samples of a time series (i.e., the training samples); \({\varvec{V}}_{c}^{{{(}j{)}}}\)denotes the jth column of the covariance matrix between the training samples and future time nodes; and \({\varvec{\beta}}^{{{(}j{)}}}\)represents the vector of weights of the known sample to the jth future time node.

Box–Cox transformation

Under the framework of RK, the stationarity of the sample residual is an essential prerequisite55. Here, the sample residuals are usually assumed to follow normal distribution with constant variance before conducting Kriging interpolation43. However, such an assumption is not always satisfied for engineering data especially in the situation of sparse data. In this case, the transformation techniques can be used to process the sample data. For instance, the log-normal transformation is often used due to its simplicity45. In the current work, a flexibly data-driven transformation technique, Box–Cox transformation53, is performed on the sample residual:

$$\varepsilon_{{{t}{\text{rans}}}} t = \left\{ \begin{gathered} \frac{{(\varepsilon (t) + \lambda_{2} )^{{\lambda_{1} }} - 1}}{{\lambda_{1} [gm({\varvec{\varepsilon}} + {{\varvec{\uplambda}}})]^{{(\lambda_{1} - 1)}} }}\;\;\;\;\;\;\;\;\;\;\;\;\begin{array}{*{20}c} {} & {} \\ \end{array} {\mathbf{ if}} \, \lambda_{1} \ne 0 \hfill \\ [gm({\varvec{\varepsilon}} + {\uplambda })]log(\varepsilon (t) + \lambda_{2} )\;\;\begin{array}{*{20}c} {} & {} \\ \end{array} {\mathbf{ if}}\;\lambda_{1} = 0 \hfill \\ \end{gathered} \right.$$
(6)

where \({\varvec{\lambda}}\) represents a vector containing elements equal to \(\lambda_{2}\), which is incorporated to ensure \(\varepsilon (t) + \lambda_{2} > 0\); \({\varvec{\varepsilon}}\) is a vector containing the residuals of the training samples; \(\lambda_{1}\) is determined through minimizing the residual sum of squares of \({\varvec{\varepsilon}}\); \(\varepsilon_{{{\text{trans}}}} (t)\) denotes the transformed sample residual at time t; and gm(.) represents the geometric mean of \({\varvec{\varepsilon}} + {\varvec{\lambda}}\). When predicting the ground settlement by RK, the sample residual, \(\varepsilon {(}t{)}\), is transformed into \(\varepsilon_{{{\text{trans}}}} (t)\) by using Box–Cox transformation. Thus, the predicted results are firstly obtained in the transformed space, which is subsequently back-transformed to the original space through Eq. 6. It is worth noting that the values of λ1 and λ2 are set according to the characteristics of the sample. This guarantees that the transformation is customized to the specific data distribution and improves the effectiveness of the transformation in achieving the desired stationarity.

Implementation procedures of the RK-based time series prediction

The implementation procedures of the proposed RK-based prediction method are described through a flowchart, as shown in Fig. 3, and the descriptions of the steps are presented as follows:

Step 1::

Collect the training sample, which is a time series of monitoring data.

Step 2::

Determine the trend structure of the time series, \(\mu (t)\), based on the simple polynomial (Eq. 2).

Step 3::

Filter the trend from the time series and obtain the sample residual, \(\varepsilon {(}t{)}\) (Eq. 1).

Step 4::

Perform Box–Cox transformation on the sample residual, and obtain the transformed one, \(\varepsilon_{trans} (t)\), by using Eq. (6).

Step 5::

Obtain the predicting residual \(\varepsilon_{p, trans} (t)\) by using RK. Here, the autocorrelation of \(\varepsilon_{trans} (t)\) is identified using Eq. (3) and maximum likelihood method, and the best estimates of the sample residuals in the transformed space are determined by RK (Eqs. 4 and 5).

Step 6::

Back-transform \(\varepsilon_{p, trans} (t)\) to that in the original space, \(\varepsilon_{p} (t)\), through Eq. (6).

Step 7::

Obtain the predicted result of the ground settlement through Eq. (1) (i.e., \(z_{p} (t) = \mu (t) + \varepsilon_{p} (t)\)).

Step 8::

Evaluate the accuracy of the predicted result using the evaluation metrics, including root mean square error (RMSE) (Eq. 7), mean absolute error (MAE) (Eq. 8), mean arctangent absolute percent error (MAAPE) (Eq. 9) and scatter index (SCI) (Eq. 10), which are given as follows56:

$${RMSE} = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {x_{i} - y_{i} }^{2} }$$
(7)
$${MAPE} = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {\frac{{x_{i} - y_{i} }}{{x_{i} }}} \right|}$$
(8)
$${MAAPE} = \frac{1}{N}\sum\limits_{i = 1}^{N} {{\text{arctan}}\left| {\frac{{x_{i} - y_{i} }}{{x_{i} }}} \right|} \times 100\%$$
(9)
$${SCI} = \frac{{\sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {x_{i} - y_{i} }^{2} } }}{{\overline{x} }}$$
(10)

where xi denotes the measured value; yi denotes the predictor; \(\:\stackrel{-}{x\:}\)denotes the mean of the measured values; and N denotes the number of the predictors. Here, smaller values of RMSE, MAPE, MAAPE and SCI indicate higher prediction accuracy.

Fig. 3
figure 3

Flowchart of the implementation procedures of the RK-based time series prediction.

Results

In this section, the influences of Box–Cox transformation and trend structure on the prediction accuracy of the ground settlement obtained by RK are covered. In addition, comparative results of predicting ground settlement obtained by the proposed RK method and other methods including hyperbolic method, exponential curve method, Asaoka method, grey model method (GM(1,1)) and back propagation neural network (BPNN) are presented. Here, the equations of these methods are presented in the “Appendix”.

Influence of Box–Cox transformation on the prediction accuracy

Stationarity of sample residual, \(\varepsilon {(}t{)}\), is a prerequisite when using RK. However, in the previous studies related to Kriging-based time series prediction, the stationarity was seldom considered. In the context of sparse data, the issue of stationarity will be particularly significant. To help to achieve the stationarity, Box–Cox transformation is often used. In this section, the influence of Box–Cox transformation on the accuracy of RK-based time series prediction is investigated.

Figure 4a–d show the comparative results of predicting cumulative settlement obtained by RK with and without Box–Cox transformation related to the settlement plates S1–S4, respectively. Here, each time series of the measured cumulative settlement contains 25 data samples, where 20 data samples are adopted as the training samples and the other 5 data samples are used as the testing samples. In addition, the first-order polynomial is adopted as the trend function. It can be observed from Fig. 4a–d that the predicted results obtained by RK incorporating Box–Cox transformation will be significantly closer to the measured values than those without incorporating Box–Cox transformation.

Moreover, Table 2 presents the evaluation metrics of the predicted results with and without Box–Cox transformation related to the settlement plates S1–S4. It can be observed from Table 2 that the evaluation metrics decrease significantly when the Box–Cox transformation is incorporated. Therefore, Box–Cox transformation can help to significantly increase the prediction accuracy of the RK method in predicting ground settlement, and the reasons will be covered in section “Discussion”.

Fig. 4
figure 4

Comparison between the predicted results of cumulative settlement obtained by regression Kriging (RK) with and without Box–Cox transformation related to the settlement plates: (a) S1; (b) S2; (c) S3; and (d) S4.

Table 2 Evaluation metrics of the predicted results obtained by the RK method with and without Box–Cox transformation.

Influence of trend structure on the prediction accuracy

Appropriate trend structure is essential for the prediction accuracy of RK. Polynomials are the commonly used trend functions when using RK46; and the appropriate order of the polynomial is different for different engineering situations. In this section, the influence of the polynomial order on the prediction accuracy is investigated incorporating the RK-based time series prediction method with Box–Cox transformation. Figure 5a–d show the comparative results of predicted cumulative settlement obtained by RK incorporating the first-order, second-order and third-order polynomials (Eq. 2) related to the settlement plates S1–S4, respectively. It can be observed from the figures that the predicted results obtained by the RK incorporating the first-order trend function are significantly closer to the measured values than those incorporating the second-order and third-order trend functions. In addition, Table 3 presents the evaluation metrics of the predicted results incorporating the first-order, second-order and third-order trend functions. It can be found from the table that the evaluation metrics related to the first-order trend function are significantly smaller than those related to the second-order and third-order trend functions. Therefore, the appropriate order of the polynomial trend function in the current engineering situation is the first - order, and the reasons will be discussed in section “Discussion”.

Fig. 5
figure 5

Comparison between the predicted results of cumulative settlement obtained by regression Kriging (RK) incorporating various trend structures related to the settlement plates: (a) S1; (b) S2; (c) S3; and (d) S4.

Table 3 Evaluation metrics of the predicted results obtained by the RK method under various trend functions.

Comparison between the RK-based time series prediction method and other methods

This section presents the comparative results of the predicted cumulative displacement obtained by the RK-based time series prediction method and other methods. Here, the commonly used classical ground settlement prediction methods, including the hyperbolic method, exponential curve method, Asaoka method and GM(1,1) method, are incorporated. Also, the machine learning-based technique (i.e., BPNN) is involved. Regarding the architecture of the BPNN model, both the input and output layers have 1 neuron. Additionally, there are two hidden layers in this model, with the first hidden layer containing 5 neurons and the second hidden layer containing 10 neurons. For the RK method, the Box–Cox transformation is incorporated with the first-order trend function. Figure 6a–d show the predicted cumulative displacements obtained by the various prediction methods related to the settlement plates S1–S4, respectively. It can be observed from the figure that the classical methods (i.e., hyperbolic method, exponential curve method, Asaoka method and GM(1,1) method) and BPNN would generally produce smaller predicted values of the cumulative displacement than the measured ones, and the predicted cumulative displacements obtained by the RK method are closer to the measured values than those obtained by the other prediction methods. Additionally, it can be found from the figures that among the classical methods, the hyperbolic method produces the predicted results closest to the measured values, and the predicted results obtained by the exponential curve method, Asaoka method and GM(1,1) method are generally close. It is interesting to find that the predicted results obtained by BPNN are the least accurate. The possible reason behind this observation is that the sparse training data samples are not adequate to yield accurate results for BPNN. In the situation of sparse sample data, the models with simple structures (i.e., the interpolation method and classical prediction methods) may have a better performance.

To compare the prediction methods in a more intuitive way, Table 4 presents the evaluation metrics of the predicted results obtained by the various prediction methods. Meanwhile, Table 5 presents the results of t-test for statistical significance between the RK method and the other methods. As can be observed in Table 4, the evaluation metrics obtained by the RK method are significantly smaller than those obtained by the other prediction methods, indicating that the accuracy of the RK method is the highest among the various prediction methods. In the meantime, the t-test results presented in Table 5 indicate that the comparative results are significant. In addition, as can be noted in Table 4, the evaluation metrics obtained by the hyperbolic method are the smallest among those obtained by the classical prediction methods, and the accuracy of BPNN is the lowest among the various prediction methods.

Fig. 6
figure 6

Comparison between the predicted results obtained by the various prediction methods related to the settlement plates: (a) S1; (b) S2; (c) S3; and (d) S4.

Table 4 Evaluation metrics of the predicted results obtained by the various prediction methods.
Table 5 Results of the paired t-test regarding the comparison of performances of the various prediction methods.

Discussion

The use of interpolation technique in ground settlement prediction with sparse sample data is investigated in the proposed study. As the ground settlement prediction is in the time-series basis, the regression Kriging interpolation method is considered, which combines the polynomial regression and best unbiased interpolation. Under the framework of RK, the stationarity of the sample residual and trend structure are key factors for the prediction accuracy, which are particularly significant in the context of sparse sample data. Here, the Box–Cox transformation technique can help to achieve the stationarity of sample data, and polynomials are the commonly used trend structure in the context of RK.

In section "Results", the influence of Box–Cox transformation and the order of the polynomial on the predicted results of cumulative settlement are investigated. It is found that the predicted results with Box–Cox transformation are significantly more accurate than those without Box–Cox transformation. In the previous studies related to Kriging-based time series prediction49,50,51, the significance of stationarity was seldom mentioned. That is because with adequate sample size, the influence of the stationarity may not be significant, as the stationarity of the raw sample data can generally be achieved. However, in the situation of sparse sample data, the stationarity of the raw sample residual is usually hard to be achieved. In this regard, the Box–Cox transformation can help to achieve the stationarity of the sample residual and indicates more accurate predicted results.

Regarding the order of the polynomial trend function, it has been observed that the predicted results of cumulative displacement under the first-order trend function are significantly more accurate than those under the second-order and third-order trend functions. This indicates that the first-order polynomial can better capture the pattern of ground settlement in this case. Considering the nature of the settlement data, in the current engineering case, the ground is mainly composed of saturated silt. Ground settlement in such soil is mainly due to the primary consolidation of the soils. As a result, the settlement rate would not vary significantly during the monitoring period. Mathematically, the first-order polynomial trend function is characterized by a constant slope in its graph. This property aligns well with the relatively stable settlement rate of the ground mainly consisting of saturated silt. In contrast, higher- order polynomial functions may introduce unnecessary complexity and over-fitting issues as they are designed to capture more complex and varying patterns, which are not present in our settlement data. Therefore, based on the better prediction accuracy and its compatibility with the nature of the settlement data, the first-order polynomial trend function is a more appropriate choice for the ground settlement prediction in this case.

Conclusion

This paper proposes a regression Kriging - based interpolation method for ground settlement prediction with sparse sample data. In the situation of sparse sample data, the significance of the stationarity of sample residual and trend structure is manifested. Here, the influences of the Box–Cox transformation and order of the polynomial trend function are investigated. Moreover, comparative studies are conducted between the proposed RK method and the other methods for ground settlement prediction, including the classical methods (i.e., hyperbolic method, exponential curve method, Asaoka method and GM(1,1) method) and BPNN. The main findings are as follows:

  1. (a)

    In the proposed RK method for ground settlement prediction, incorporating the Box–Cox transformation can significantly increase the prediction accuracy in the situation of sparse sample data. Specifically, the evaluation metrics of the predicted results can be significantly decreased when the Box–Cox transformation is used. For example, in the case of settlement plate S1, the RMSE decreases from 36.17 to 2.5, and the MAPE decreases from 0.30 to 0.02.

  2. (b)

    The predicted results obtained by the proposed RK method with the first-order trend function are significantly more accurate than those with the second-order and third-order trend functions. For instance, in the case of settlement plate S1, the RMSE associated with the first-order trend function is 2.5, whereas it is 58.98 for the second-order trend function and 107.37 for the third-order trend function. This is because the ground is mainly composed of saturated silt where settlement is due to primary consolidation, resulting in a relatively stable settlement rate. The first-order polynomial trend function, characterized by a constant slope in its graph, aligns well with this stability. In contrast, higher-order polynomials can introduce unnecessary complexity and over-fitting as they are designed for more complex patterns that are not present in the settlement data.

  3. (c)

    The comparative study among the various prediction methods shows that the proposed RK method can produce more accurate predicted results of cumulative displacement than the classical methods for ground settlement prediction and BPNN in the situation of sparse sample data. In addition, it is found that the performance of BPNN is the worst among the various methods. For example, in the case of settlement plate S1, the RMSE of the RK method is 2.5, while it is 9.28 for the hyperbolic method, 15.34 for the exponential curve method, 15.38 for the Asaoka method, 15.59 for the GM(1,1) method, and 20.52 for the BPNN.

This study reveals that the idea of Kriging interpolation can be an effective way to resolve the problem of sparse sample data in ground settlement prediction, and the stationarity of sample residual and trend structure are significant for achieving accurate prediction. However, this paper incorporates the prediction of ground settlement in a specific highway engineering case. While the study provides valuable insights, it is important to recognize certain limitations. The specific conditions of the tested location, including the traffic loads, soil types, and ground water conditions, might pose some challenges in directly applying the developed model to other scenarios with different characteristics. Changes in these factors may have an impact on the performance, and further investigation is needed to assess its performance in such situations. For the future work, it would be beneficial to expand the scope of the study by collecting data from a wider range of locations and under various conditions. This would enable a more comprehensive validation and refinement of the model, enhancing its generalizability and practical significance.