Introduction

In complex engineering areas, geological bodies often exhibit internal hollowing due to interbedding, intrusion, and erosion processes. These highly complex geological structures are typically loose in composition and weak in compressive strength, representing key risk factors affecting the stability and safety of underground engineering projects1. With the rapid development of underground space utilization and large-scale infrastructure construction, engineering activities are increasingly exposed to high-stress and highly complex geological environments, which place greater demands on the refined understanding and three-dimensional visualization of geological structures. Accurate reconstruction of the three-dimensional spatial distribution of such complex geological bodies not only facilitates the identification of potential geological hazards and the optimization of support and reinforcement designs but also provides a reliable modeling foundation for subsequent stability analyses and risk predictions2.

Traditional geological modeling methods can generally be classified into two main categories: interface-based interpolation and reconstruction approaches3,4,5 and voxel-based property simulation approaches. The interface-based methods convert discrete geological data (e.g., borehole data) into geological interfaces using appropriate interpolation techniques. Non-Uniform Rational B-Splines (NURBS)6,7 and Radial Basis Functions (RBF)8,9 are commonly applied, along with sequential segmentation, trimming, and merging operations, to construct geological models. However, these methods are highly subjective due to the dependence on the modeler’s experience, and the modeling process remains complex with a low degree of automation9,10. The voxel-based geological modeling methods (such as multiple-point statistics, MPS) focus on predicting and classifying the geological attributes of each unit within a geological body11,12. Nevertheless, the results obtained from geostatistical approaches are only consistent with the original probability data, failing to produce the distinct geological boundaries required for engineering applications13. In addition, voxel-based approaches are rarely applied in engineering geology, mainly because: (1) engineering projects typically have limited borehole data and lack geophysical constraints; (2) the dense voxel structure often leads to data redundancy; and (3) engineering practice emphasizes clear and accurate geological boundaries, whereas the complex data structure of voxel models makes them difficult to integrate with interface-based models.

With the rapid advancement of artificial intelligence and computer vision technologies, deep learning has opened new avenues for geological modeling. Deep-learning-based approaches train neural networks on large-scale datasets to automatically identify geological features, extract spatial structures, and generate geological models14,15. Compared with traditional methods, deep learning reduces reliance on expert experience and significantly enhances modeling automation and generalization. For instance, Guo14 applied geomagnetic data and a convolutional neural network (CNN) to classify and reconstruct geological structures. Among deep learning architectures, Generative Adversarial Networks (GANs) have become particularly promising due to their capacity to learn complex spatial patterns from sparse or incomplete data and to generate realistic geological samples. GAN-based methods can reconstruct complex geological bodies under limited observations while maintaining lithological continuity and statistical consistency. Sun16 developed a GAN-based 3D facies simulation method capable of modeling arbitrary stratigraphic relationships and spatial correlations. Mosser17 reformulated geological modeling as an image reconstruction problem using a Deep Convolutional GAN (DCGAN). Similarly, Yang18 employed an improved DCGAN to generate multi-attribute sedimentary facies models from preprocessed cross-sectional data, and Hu19 trained a GAN to produce geological images closely matching river-pattern datasets.

Overall, GAN-based deep learning methods exhibit strong adaptability and high accuracy potential for modeling highly complex geological bodies. However, several challenges remain: (1) limited availability of labeled geological data; (2) reliance on natural image datasets, causing semantic shifts in feature recognition; and (3) insufficient representation of geological topology and engineering interpretability. Achieving high-precision and structurally interpretable geological modeling under data-scarce conditions therefore remains a key challenge in this field.

To address the limitations of existing geological modeling methods—such as low automation, unclear structural representation, and strong dependence on sample data—this study proposes a deep-learning-based geological modeling method that integrates image completion and super-resolution generation. The proposed method formulates the interpolation of geological attributes in high-genus geological bodies as an image completion and super-resolution task. First, an improved EdgeConnect image completion algorithm is employed to transform the geological data interpolation into an automatic layer-by-layer voxel image completion process (Sect. “EdgeConnect image completion algorithm based on improved Canny detection”). Second, based on a Deep Plug-and-Play Super-Resolution Generative Adversarial Network (DPSRGAN), smoother and more accurate stratigraphic boundaries are generated (Sect. “Improved DPSRGAN image super-resolution algorithm”). Finally, a simplification method is proposed to convert voxel models into mesh models (Sect. “Voxel model to mesh model method”). Validation through a typical engineering case demonstrates that the proposed method achieves high precision and stability in the spatial reconstruction of high-genus geological bodies and in the delineation of stratigraphic boundaries.

Modeling method

In 2019, Nazeri20 proposed a two-stage EdgeConnect generative adversarial network model for image completion, which decomposes the image completion task into two steps: edge prediction and colour filling, implemented in two stages of the generative adversarial network: edge generator and image completion network. The edge generator first generates an “imaginary edge” for the missing area of the image, and the image repair network uses the generated “imaginary edge” and the original incomplete image to achieve pixel filling of the missing area. Compared to other image completion algorithms, EdgeConnect focuses on edge information to obtain interpolation results that align with the actual situation. Therefore, they can be used for geological modelling. However, when processing large areas of missing images, inaccurate regional edge drawing and image blurring also occur20,21,22.

The DPSRGAN is a super-resolution model for arbitrary blur kernels, as proposed by Zhang23. It comprises a degradation model considering arbitrary blur kernels and a network based on the improved SRResNet. It can obtain more precise and realistic images than other methods.

Therefore, the above algorithms can be improved for geological modelling tasks. First, the drilling data were converted into a voxel drilling model with a given resolution, and the model was then divided into batches of voxel plane images to be interpolated according to elevation. The voxel image uses the EdgeConnect algorithm to obtain preliminary interpolation results, and then the DPSRGAN algorithm is used to obtain a more precise and smoother voxel image. A high genus geological model can be generated by reading the image information and simplifying the voxel model. The framework of the modelling process is illustrated in Fig. 1.

Fig. 1
Fig. 1
Full size image

Automatic modelling method process of high genus geological bodies based on improved EdgeConnect-DPSRGAN.

Edgeconnect image completion algorithm based on improved canny detection

EdgeConnect adopts a two-stage generation model. In the first stage, the edge generator must input the borehole voxel grayscale image. \(\:{\text{I}}_{\text{g}\text{r}\text{a}\text{y}}\), edge contour image to be completed \(\:\text{E}\), and mask image \(\:\text{M}\text{A}\text{S}\text{K}\). Subsequently, the complete edge-contour prediction image, \(\:{\text{E}}_{\text{p}\text{r}\text{e}\text{d}}\). As shown in Eq. (1), \(\:{\text{G}}_{1}\) represents a one-stage GAN generator.

$${E_{{\text{pred}}}}={G_1}({I_{{\text{gray}}}},{E_{{\text{gt}}}},{\text{MASK}})$$
(1)

The original edge contours of the image were generated using a Canny edge detection operator. The conventional Canny edge detection process is as follows24.

  1. 1)

    use a Gaussian filter to smooth the image and filter out noise

  2. 2)

    calculate the gradient intensity and direction of each pixel in the image

  3. 3)

    apply non-maximum suppression to eliminate spurious responses caused by edge detection

  4. 4)

    apply Double-Threshold detection to determine actual and potential edges

  5. 5)

    Finally, complete edge detection by suppressing isolated weak edges

The Gaussian filter in the classical Canny operator loses edge information25,26. The artificial threshold in the fourth step also affected the edge information extraction27. Therefore, Gaussian filtering was replaced by bilateral filtering to improve the detection accuracy of geological boundary lines28,29, and Otsu was used to obtain adaptive high and low thresholds30,31.

Geological boundaries generally exhibit smooth transitions rather than abrupt noisy variations. The traditional Gaussian filter, while effective for noise suppression, often leads to boundary blurring during the denoising process. In contrast, bilateral filtering preserves spatial smoothness while effectively maintaining the edge features of lithological interfaces. Moreover, geological images often present significant contrast differences and complex texture variations, making it difficult to accurately identify boundaries under fixed threshold conditions. Therefore, the Otsu adaptive thresholding method was employed in this study to dynamically determine the optimal segmentation thresholds, thereby enhancing the robustness and adaptability of geological boundary extraction.

Bilateral filtering is a nonlinear filtering method. Its principle is equivalent to using a combination of two Gaussian filters that integrate the spatial proximity and pixel value similarity of the image while considering the spatial proximity information and colour similarity information. It can simultaneously preserve the edges in smooth images. It optimises the filter weight \(\:{\upomega\:}\) in Gaussian filtering into the product of two weights and then performs a convolution operation on the filter and the original image to obtain the filtered image \(\:\text{g}\):

$$ws(i,j,k,l)={e^{( - \frac{{{{(i - k)}^2}+{{(j - 1)}^2}}}{{2\sigma _{s}^{2}}})}}/\sum\nolimits_{{(k,1) \in S(i,j)}} {{e^{( - \frac{{{{(i - k)}^2}+{{(j - 1)}^2}}}{{2\sigma _{s}^{2}}})}}}$$
(2)
$$wr(i,j,k,l)={e^{( - \frac{{\parallel f(i,j) - f(k,l){\parallel ^2}}}{{2\sigma _{r}^{2}}})}}/{\sum\nolimits_{{(k,l) \in S(i,j)}} e ^{( - \frac{{\parallel f(i,j) - f(k,l){\parallel ^2}}}{{2\sigma _{r}^{2}}})}}$$
(3)
$$w=ws \times wr$$
(4)
$$g(i,j)=\sum\nolimits_{{(k,l) \in S(i,j)}} {f(i,j) \times w}$$
(5)

\(\:\left(\text{i},\text{j}\right)\)is the coordinate of the calculation point; \(\:\text{S}\left(\text{i},\text{j}\right)\) is the size range of \(\:(2\text{N}+1)\:(2\text{N}+1)\) centred on \(\:\left(\text{i},\text{j}\right)\); \(\:\text{f}\left(\text{i},\text{j}\right)\) is the pixel value of the calculated point; \(\:{{\upsigma\:}}_{\text{s}}\) and \(\:{{\upsigma\:}}_{\text{r}}\) are the standard deviation of the Gaussian function; \(\:\text{w}\text{s}\) is the spatial proximity Gaussian function weight; \(\:\text{w}\text{r}\) is the pixel value similarity Gaussian function weight.

Using a fixed threshold may result in a loss of edge information, and Otsu can be used to adaptively calculate the thresholds. The judgment of interclass separability is based on mathematical statistics’ maximum interclass variance or the minimum intraclass variance30. Therefore, the basic idea of the Otsu method is to divide the data into several classes and determine the optimal threshold by calculating the maximum interclass variance.

According to the Otsu method, image values after non-maximum suppression can be divided into three categories: \(\:{\text{L}}_{1}\),\(\:{\text{L}}_{2}\),\(\:{\text{L}}_{3}\). \(\:{\text{L}}_{1}\) contains pixels with gradient amplitudes \(\:\{{\text{t}}_{1},{\text{t}}_{2},...,{\text{t}}_{\text{k}}\}\), that are non-edge points in the picture. \(\:{\text{L}}_{2}\) contains pixels with gradient amplitudes \(\:\{{\text{t}}_{\text{k}+1},{\text{t}}_{\text{k}+2},...,{\text{t}}_{\text{m}}\}\), which are the points in the figure that must be judged as edges. \(\:{\text{L}}_{3}\) contains pixels with gradient amplitudes \(\:\{{\text{t}}_{\text{m}+1},{\text{t}}_{\text{m}+2},...,{\text{t}}_{\text{l}}\}\), which are the edge points in the image. Assume that the total number of pixels in the image is, and the grey gradient is \(\:\:{\text{t}}_{\text{j}}\), and the corresponding number of pixels is \(\:{\:\text{n}\:}_{\text{j}}\). The probability is:

$${p_{\text{j}}}={n_{\text{j}}}/N,(j=1,2,...l)$$
(6)

Then, the gradient amplitude expectation of the entire interval is:

$$E=\sum\nolimits_{{j=1}}^{l} {{t_{\text{j}}} \times {p_{\text{j}}}}$$
(7)

The expected gradient amplitude values of \(\:{\text{L}}_{1}\),\(\:{\text{L}}_{2}\),\(\:{\text{L}}_{3}\) are respectively:

$${e_1}(k)=\frac{{\sum\nolimits_{{j=1}}^{k} {{t_{\text{j}}} \cdot {p_{\text{j}}}} }}{{\sum\nolimits_{{j=1}}^{k} {{p_{\text{j}}}} }};{\text{ }}{e_2}(k,m)=\frac{{\sum\nolimits_{{j=k+1}}^{m} {{t_{\text{j}}} \cdot {p_{\text{j}}}} }}{{\sum\nolimits_{{j=k+1}}^{m} {{p_{\text{j}}}} }};{\text{ }}{e_3}(m)=\frac{{\sum\nolimits_{{j=m+1}}^{l} {{t_{\text{j}}} \cdot {p_{\text{j}}}} }}{{\sum\nolimits_{{j=m+1}}^{l} {{p_{\text{j}}}} }}$$
(8)
$$p(k)=\sum\nolimits_{{j=1}}^{k} {{p_{\text{j}}}} ;{\text{ }}p(k,m)=\sum\nolimits_{{j=k+1}}^{m} {{p_{\text{j}}}} ;{\text{ }}p(m)=\sum\nolimits_{{j=m+1}}^{l} {{p_{\text{j}}}}$$
(9)

Inter-class variance is:

$${\sigma ^2}(k,m)={\left[ {{e_1}(k) - E} \right]^2} \cdot p(k)+{\left[ {{e_2}(k,m) - E} \right]^2} \cdot p(k,m)+{\left[ {{e_3}(m) - E} \right]^2} \cdot p(m)$$
(10)

For the image after non-maximum suppression, \(\:{\text{t}}_{\text{j}}\) and \(\:{\text{p}}_{\text{j}}\) in the above formula can be obtained from the gradient histogram. The gradient level \(\:\text{l}\) can be determined artificially, and the interval can be set to 10. The \(\:{{\upsigma\:}}^{2}\left(\text{k},\text{m}\right)\) is a binary function of \(\:\text{k}\) and \(\:\text{m}\), \(\:\text{k}\) takes a value between\(\:[1,\text{l}]\), \(\:\text{m}\:\)takes a value between \(\:[\text{k}+1,\text{l}]\). So, searching for the maximum value of \(\:{{\upsigma\:}}^{2}\left(\text{k},\text{m}\right)\) can find the dividing points \(\:\text{k}\) and \(\:\text{m}\) with the best separation between classes, which are the desired low threshold and high threshold, respectively. The enhanced computational process is illustrated in Algorithm 1.

Algorithm 1

Improved Canny detection (Pseudocode, in-text).

  • Input: grayscale voxel image I; bilateral filter (σs, σr, kernel = 5 × 5); Otsu bins N = 10.

  • Output: binary edge map E.

  • (1). Apply bilateral filtering on I with (σs, σr) → If.

  • (2). Compute Sobel gradients Gx, Gy; magnitude M and direction Θ.

  • (3). Non-maximum suppression on M along Θ.

  • (4). Otsu(N = 10) on the gradient histogram → adaptive thresholds (TL, TH).

  • (5). Hysteresis thresholding: strong if M ≥ TH; weak if TL ≤ M < TH; link weak to strong → E.

  • (6). Return E.

Compared with natural images, geological profile images exhibit pronounced stratigraphic directionality and structural continuity. By incorporating edge constraints during reconstruction, the EdgeConnect model effectively preserves the coherence of geological boundaries and the topological relationships of structures, thereby avoiding common issues such as “fault blurring” and “interlayer misconnection” that often occur in texture-based networks. This edge-constrained mechanism enables the network to better conform to the spatial structural characteristics of geological bodies, thereby enhancing the geological interpretability of the reconstructed models.

Improved DPSRGAN image super-resolution algorithm

Because drilling data are relatively sparse, the EdgeConnect algorithm affects image quality and introduces some noise21,22. The interpolation results appear blurry at the geological boundaries. Therefore, image super-resolution technology can be used to obtain more accurate and smoother geological boundaries. In the image super-resolution method, a degradation model between high-resolution and low-resolution images is crucial23,32. Currently, these two degradation models are widely used. The first is a general degradation model given by Eq. (11):

$$y=\left( {x \otimes k} \right){ \downarrow _s}+n$$
(11)

where \(\:\text{x}\) is the high-resolution image, \(\:\text{k}\) is the blur kernel, \(\:{\downarrow\:}_{\text{s}}\) represents the downsampling operation with the scale factor \(\:\text{s}\), and \(\:\text{n}\) is the image noise. Most of these degradation models are based on the assumption of a priori-known fuzzy kernel, which is challenging to estimate for practical applications.

Another commonly used degradation model is bicubic degradation, the form of which is given by Eq. (12), where \(\:{\downarrow\:}_{\text{s}}\) represents a bicubic downsampler with a scale factor of \(\:\text{s}\). This degradation model is the most widely used and is primarily used in image super-resolution methods based on deep learning algorithms. Although simple, this model yields poor results in many practical situations.

$$y=x{ \downarrow _s}$$
(12)

Zhang et al. proposed a deep plug-and-play super-resolution framework that adopts a new degradation model considering arbitrary blur kernels. Existing deblurring methods can also be used to estimate blur kernels. The specific model is expressed by Eqs. (13). The algorithm trained by the new degenerative model can better realise the super-resolution task of complex blurry images and has a better generalisation ability. The DPSRGAN is formed by improving the SRResNet network and inserting it into this framework.

$$y=(x{ \downarrow _s}) \otimes k+n$$
(13)

DPSRGAN increases the number of feature maps to 96 and removes the batch normalisation layer in the network to solve the phenomenon of producing artefacts and optimise the training efficiency and stability. However, it quickly falls into locally optimal situations33, and residual learning is limited to adjacent layers. Therefore, a residual-in-residual dense block (RRDB) can improve a network, create local dense connections, and obtain a deeper and more complex network structure. This can improve the network performance and solve the problem of vanishing gradients.

The RRBD adopts a two-layer residual structure, combining residual and dense blocks to connect all layers. Rich local features are extracted through the densely connected convolution layer, and then the previous and current local features are used to learn more effective features34. The structure of the improved DPSRGAN is shown in Fig. 2. Although the discriminator architecture is not displayed in the figure, it employs a U-Net–based skip-connection design with spectral normalization to provide more refined pixel-level feedback and to enhance the stability of adversarial training.

Fig. 2
Fig. 2
Full size image

DPSRGAN’s generator structure.

Because the new degradation model is complex, the discriminator must have a stronger discrimination ability for the training output; therefore, the original discriminator network can be improved into a U-Net network structure with skip connections. The U-Net structure outputs a ground truth value for each pixel and provides detailed pixel-by-pixel feedback to the generator35. Simultaneously, a spectral normalisation operation was added to the discriminator to stabilise training and reduce artefacts. Spectral normalization constrains the Lipschitz constant of each convolutional layer by normalizing its weight matrix with the largest singular value. This limits the gradient magnitude during backpropagation, preventing discriminator overfitting and mode collapse. Consequently, the generator receives smoother gradient feedback, which helps suppress high-frequency noise and reduces artefacts in the reconstructed geological images.

Geological images are often characterized by low contrast, high noise levels, and multi-scale structural variations. Traditional interpolation methods frequently lead to boundary diffusion and the loss of fine structural details. In contrast, the proposed DPSRGAN model combines perceptual loss with residual-in-residual dense blocks (RRDB) to enhance resolution while effectively preserving lithological boundaries and stratigraphic details. This multi-scale feature reconstruction mechanism enables the network to better adapt to the complex textures and layered characteristics of geological imagery.

Voxel model to mesh model method

In engineering geology, the purpose of a three-dimensional geological model is to accurately describe the contact relationships between the interfaces of various strata. Therefore, a method was proposed to simplify the redundant voxel model into a bounding mesh model. The specific steps were as follows, and the process of simplification is shown in Fig. 3:

  1. 1)

    Establish a certain strata voxel point data set\(\:\:\text{P}\), including all voxel vertex sets of such formation\(\:\:{\text{P}}_{\text{i}}(\text{i}=\text{0,1},2,\dots\:,\text{n}-1)\), including each unit voxel of eight vertices \(\:{\text{P}}_{\text{i}\text{j}}\:(\text{j}=\text{0,1},2,\dots\:,7)\) and their respective three-dimensional space coordinates; establish a voxel mesh surface-data set \(\:\text{F}\), including 6 mesh surfaces of each voxel \(\:{\text{f}}_{\text{i}\text{k}}\:(\text{k}=\text{0,1},2,\dots\:,5)\) and the topology structure \(\:{\text{T}}_{\text{i}\text{k}}\) of each mesh surface (which vertices each mesh surface consists of, such as \(\:{\text{T}}_{\text{i}0}:\:{\text{P}}_{\text{i}0},\:{\text{P}}_{\text{i}1},{\text{P}}_{\text{i}2},\:{\text{P}}_{\text{i}3}\);

  2. 2)

    Traverse the set \(\:\text{P}\), record the number \(\:\text{n}\) and the corresponding serial number of points with the same coordinate value as \(\:{\text{P}}_{\text{i}\text{j}}\) in the set. If \(\:\text{n}<8\), add these point serial numbers \(\:\text{i}\) and corresponding \(\:\text{j}\) to the dictionary \(\:\text{E}\)(e.g. (“1”:1,3,5) means that point 1,3,5 in the first voxel is the edge point). After the judgment, remove these \(\:\text{n}\) points from \(\:\text{P}\);

  3. 3)

    Traverse \(\:{\text{T}}_{\text{i}\text{k}}\) sequentially according to the \(\:\text{i}\:\)value in \(\:\text{E}\), and determine whether there is a point in serial number \(\:\text{j}\) of the four points that make up this surface that is not in the list corresponding to \(\:\text{i}\) in dictionary \(\:\text{E}\). If it does not exist, index \(\:\text{i}\text{k}\) is added to set \(\:\text{G}\);

  4. 4)

    Finally, the strata boundary network is extracted from \(\:\text{F}\) according to the index in \(\text{G}\)

Fig. 3
Fig. 3
Full size image

Voxel model simplified to mesh model.

To enhance the clarity and reproducibility of the proposed automatic modeling framework, the main parameter configurations of each module are summarized in Table 1. The table includes the preprocessing parameters of the improved Canny edge detection, as well as the training hyperparameters of the EdgeConnect image completion network and the DPSRGAN super-resolution network, together forming a hierarchical generative modeling workflow. The improved Canny operator, as a non-trainable preprocessing step, integrates bilateral filtering with Otsu adaptive thresholding to automatically determine optimal edge thresholds and generate high-precision edge maps. The EdgeConnect module performs voxel-level image completion based on these edge maps, while the DPSRGAN module further enhances image resolution and boundary smoothness through residual-in-residual dense blocks (RRDB) and a U-Net discriminator with spectral normalization. This parameterized configuration ensures reproducible and stable modeling performance in high-dimensional geological body reconstruction tasks. The detailed training procedures, data preparation, and evaluation settings are further described in the section “Engineering applications”.

Table 1 Main parameter settings of the improved Canny–EdgeConnect–DPSRGAN framework.

Engineering applications

Experimental data

The dam foundation model of a hydropower station in southwest China was used as an example to verify the effectiveness of the proposed method. Due to various geological sedimentation, rock intrusion, and erosion phenomena, high genus characteristics are evident. The borehole models for this area are shown in Fig. 4.

Fig. 4
Fig. 4
Full size image

Borehole models.

All experiments in this study were conducted on a laboratory workstation. The implementation was developed based on the PyTorch 1.13.1 deep learning framework. Model training and testing were performed on a workstation equipped with an NVIDIA RTX 3060 GPU (12 GB VRAM), an Intel Core i7 processor (3.4 GHz), and 32 GB of system memory, running on a 64-bit Windows 10 operating system. Both the training and inference processes were executed on a single GPU. This configuration ensured the stability and reproducibility of all results presented in this study.

Model training

Geological data with high generic characteristics are scarce when applying image-processing algorithms. Therefore, a transfer-learning strategy was used for model training. The training process of the proposed network model consists of three main stages. In the first stage, the edge generation network of EdgeConnect is trained using incomplete images and edge maps generated by the improved Canny operator. In the second stage, the predicted edge maps are combined with the incomplete images to train the image completion network of EdgeConnect. In the third stage, the completed images produced by EdgeConnect are used to train the improved ESRGAN (DPSRGAN) network to enhance image resolution and the smoothness of geological boundaries. As illustrated in Fig. 5, intermediate results from each training stage are visualized to demonstrate the effectiveness of the overall training workflow.

Fig. 5
Fig. 5
Full size image

Overall training process of the proposed framework.

For the improved DPSRGAN, only high-resolution images (HR) must be input, and the corresponding low-resolution images (LR) can be obtained through the degradation model to train the network. After clarifying the training strategy, the required high genus voxel image training data includes complete voxel images, mask images, images to be interpolated, complete edges, and masked edges, where the complete voxel image dataset comes from the geological survey data of multiple hydropower projects. The mask image is obtained by drawing a mask that conforms to the distribution pattern of boreholes on the original image, and the image to be interpolated is from the mask. The complete voxel image is calculated, the complete edge is obtained using improved Canny edge detection on the complete voxel image, and the mask edge is calculated by calculating the complete edge and the mask image similar to the image to be interpolated.

The main geological data used for fine-tuning consist of 200 voxel image samples. During transfer learning, two natural image datasets were used for pretraining: the EdgeConnect model was pretrained on 800 images from the Places2 dataset, while the improved DPSRGAN model was pretrained on 800 images from the DIV2K dataset. As shown in Fig. 6, for the EdgeConnect model, we followed a training data preprocessing workflow (voxel image → mask → incomplete image → edge map) and randomly paired the Places2 images with large-size masks from the testing mask set to generate pretraining samples. The same workflow was adopted for fine-tuning using geological voxel images. After pretraining, both models were fine-tuned on the 200 geological samples. Therefore, the ratio of geological to natural images during the pretraining–fine-tuning process is approximately 1:4.

Transfer learning is effective in this context because pretraining on natural images allows the model to learn low-level, domain-independent features such as edges, contours, and texture gradients, which also form the fundamental representations of geological images. The subsequent fine-tuning stage adapts these representations to lithological boundaries and stratigraphic patterns, leading to faster convergence and more stable reconstruction performance.

In EdgeConnect training, the Adam optimiser is used where \(\:{{\upbeta\:}}_{1}=0\), \(\:{{\upbeta\:}}_{2}=0.9\). The initial learning rate \(\:{\upalpha\:}\) of the first two steps is set to \(\:{10}^{-4}\), and after \(\:{10}^{5}\) iterations it is reduced to \(\:{10}^{-5}\) until convergence, in the third step of training, set \(\:{\upalpha\:}\) to \(\:{10}^{-6}\) to train the model end-to-end until convergence; in the improved DPSRGAN training, the Adam optimiser is also used, where\(\:{{\upbeta\:}}_{1}=0.9\), \(\:{{\upbeta\:}}_{2}=0.999\). The initial learning rate was \(\:{10}^{-4}\), which was halved every iteration \(\:{10}^{5}\) until the learning rate was less than \(\:{10}^{-7}\). The learning rate of transfer learning is set \(\:{10}^{-6}\).

A model interpolation method was developed through the training described above. Subsequently, a 240 × 64 \(\:{\text{m}}^{2}\) dam foundation area was selected for testing. There were 51 boreholes in the study area, and eight were randomly selected as the test set. A total of 35 voxel images were obtained at intervals of 1 m for the computation. The accuracy ACC was used as the evaluation index to conduct an error analysis of the accuracy of the modelling results. The calculation formula for ACC is shown in Eqs. (14):

$${\text{ACC}}=\frac{{100\% }}{w}\sum\nolimits_{{i=1}}^{m} {\sum\nolimits_{{j=1}}^{n} {\sum\nolimits_{{k=1}}^{p} {({R_{ijk}} - {T_{ijk}})} } }$$
(14)

\(\:\text{m}\) is the number of geological voxel images; \(\:\text{n}\) is the number of boreholes in the test set; \(\:\text{p}\:\)is the number of pixels affected by a certain borehole position on the corresponding image; \(\:{\text{R}}_{\text{i}\text{j}\text{k}}\) and \(\:{\text{T}}_{\text{i}\text{j}\text{k}}\) represent the true value and the test value of the geological category of the \(\:\text{k}\)th pixel affected at the \(\:\text{j}\)th drilling position in the \(\:\text{i}\)th geological voxel image, when they are the same, \(\:{\text{R}}_{\text{i}\text{j}\text{k}}-{\text{T}}_{\text{i}\text{j}\text{k}}=0\), when they are not the same\(\:{\text{R}}_{\text{i}\text{j}\text{k}}-{\text{T}}_{\text{i}\text{j}\text{k}}=1\).

Fig. 6
Fig. 6
Full size image

Modeling result of high genus geological bodies at dam foundation (a Layered interpolation results; b Overall geological model; c Stratigraphic distribution maps).

The layer-by-layer interpolation results for the strata in the study area are shown in Fig. 6. It is evident that most of the strata are not layered but interlaced with irregular morphologies. The number of voxels in each stratum in Fig. 7 shows that some strata are small in volume, and many intrusion phenomena lead to obvious high-deficit characteristics.

Fig. 7
Fig. 7
Full size image

The number of voxels in each stratum.

The calculation resulted in 1,295 voxels at the borehole locations in the test set, of which 1,145 had the same attributes as the boreholes. The accuracy of the ACC was 88.4%. The test set shows that the calculation accuracy of this method is high and meets the accuracy requirements of hydraulic dam foundation geological modelling. The modelling results are shown in Fig. 6.

Data simplification and format conversion of the voxel model

In the voxel model, the total number of meshes was 3,225,600 (number of meshes = number of voxels × 6). To simplify the model, we used the method proposed in the section “Voxel model to mesh model method”. After simplification, the number was 267,629, and the data volume was reduced by 91.7%.

Fig. 8
Fig. 8
Full size image

The voxel model simplifies the results: a Model before redundant network simplification; b Model after simplification).

The internal redundant meshes were simplified, as shown in Fig. 8. The thickness of the overburden in the study area was uneven everywhere, and the overall layer was loose and broken. Sudden changes occurred in partial strata. The Q3 + 4 silty clay strata in Fig. 9(a) are distributed in this layer, but many penetrating holes exist. If conventional geological interface modelling methods are used, multiple geological surfaces must be used to cut the hollow area. However, the front-view morphology of C1ds carbonaceous shale, as shown in Fig. 9 (b), is very irregular. Owing to the intrusion and erosion of other strata, these inclined hollows were not as regular in shape as shown in Fig. 9(a). It is difficult to express such holes well, even through strata interface cutting and combinations. The voxel modelling method can be used to quickly model complex areas, and the proposed simplified method can realise automatic modelling of the mesh surface of high genus geological bodies.

Fig. 9
Fig. 9
Full size image

High genus geological body.

Discussion

Comparison of various modelling methods

Several methods commonly used in geological modelling were selected for comparison, including geostatistical kriging interpolation modelling, NURBS surface modelling, RBF implicit surface modelling and DCGAN generator adversarial network modelling.

Table 2 Comparison of the accuracy and applicability of modeling methods for high genus strata.

Geostatistical kriging interpolation is a method for unbiased optimal estimation widely used in geological facies research. However, geostatistical Kriging method relies on the assumption that geological attributes exhibit spatial variability and autocorrelation. High genus geological models typically demonstrate significant spatial variability and non-uniformity, rendering Kriging less suitable for such characteristics, leading to substantial modeling errors.

The modelling principle of the NURBS surface involves constructing a high-order mathematical function surface to fit the boundary points of the borehole strata. When the interface geometry is regular and the topological relationships are clear, NURBS can precisely control the surface shape through control points, weights, and knot vectors, achieving smooth continuity (C¹–C²) with good interpretability and editability. However, surfaces containing holes, such as high genus geological bodies, cannot be fitted using appropriate mathematical functions. In cases of complex topology or sparse and uneven observation data, it is difficult to establish a unified and stable parametric representation. Suitable surfaces can only be drawn based on geological contact relationships by setting reasonable control points and parameters. Then reasonable geological model can be obtained through Boolean segmentation operations that require significant manual interaction. This process is time-consuming and requires substantial manual effort. In this study, NURBS is used mainly as a geometric prior or an initial constraint surface for regular regions, which is then combined with the subsequent edge-constrained completion and super-resolution stages.

For the Radial Basis Function (RBF) method, closed stratigraphic interfaces can be automatically generated using implicit functions. This approach can effectively represent high-genus geological interfaces and is suitable for attribute or interface interpolation driven by sparse and irregular sampling points, as shown in Fig. 10(b). However, repeated or missing surfaces often occur at the boundaries between geological bodies, requiring manual trimming and filling, which makes precise boundary fitting difficult. Therefore, the RBF method is better suited as a fast tool for generating an initial closed model. In this study, it is combined with the improved Canny–EdgeConnect–DPSRGAN workflow: RBF is first used to obtain implicit closed interfaces, followed by edge-constrained completion and super-resolution refinement to improve boundary coherence and ACC under complex structures or missing data conditions.

Fig. 10
Fig. 10
Full size image

Apply NURBS and RBF method to model a high genus model, respectively.

The interpolation accuracy value shows that the accuracy of this research method is better than that of conventional methods, as shown in Fig. 11. Compared with Kriging, NURBS, RBF, DCGAN, and the method before improvement, this method improved by 19.2%, 3.7%, 11.0%, 20.9%, and 15.3%, respectively. It can be seen from the interpolation result image that this method can obtain more accurate and more precise interpolation results compared with the unimproved and DCGAN networks. The comparison results are shown in Table 2.

Fig. 11
Fig. 11
Full size image

Comparison of interpolation results.

Comparison of efficiency between automatic modelling methods and manual modelling

Although manual modelling has high accuracy in engineering projects, it requires multiple operating steps. The manual modelling process, as shown in Fig. 12, is as follows: first, determine the appropriate cross-section based on the borehole location, extract the geological boundary points at the section, and manually draw boundaries according to the geological trend; then, select an appropriate interpolation method to generate the geological interfaces; and finally, calculate the models based on the geological contact relationship by Boolean segmentation.

In this case, there are many types of overlays and bedrock, and each geological body in the study area exhibits non-layered, highly generic, and complex geometric characteristics. If the conventional modelling method is used to model high genus characteristics, geological engineers must perform complex cutting and combination operations at the geological interface. The NURBS method was used for interpolation using the Q3 + 4 silty clay strata model as an example.

Fig. 12
Fig. 12
Full size image

Manual modelling process.

In terms of efficiency, complete automatic modelling using the algorithm in this study took approximately 35 min, and most of the time was used to generate the voxel model and simplify the process. Manual modelling takes approximately 2.5 h to establish a complex stratigraphic model accurately and about 9.5 h to establish the overall model of this case. The specific time fluctuates depending on the professional ability of the modeller. This low efficiency is because the modelling process requires a detailed analysis of professional knowledge, such as strata direction, interpolation points, interpolation parameters, and strata contact relationships. The time is shortened by 93.8% compared with manual modelling; therefore, the automatic modelling method proposed in this study significantly improves the modelling efficiency.

Conclusion

This study proposes an automatic modelling method for high genus geological bodies. The main contributions of this study are as follows:

  1. 1)

    Borehole data were converted into voxel image data. We then combine the improved EdgeConnect and DPSRGAN algorithms to obtain accurate interpolation results. This solves the conventional method’s problems, which are unsuitable for high genus characteristics and a low degree of modelling automation.

  2. 2)

    A voxel model simplification method was proposed to convert the voxel model into a mesh model by extracting the boundary voxel surfaces. It will not be easy to integrate and utilise the voxel model in the future.

  3. 3)

    The project demonstrated that, compared with the geostatistical kriging interpolation, NURBS, RBF and DCGAN methods, the accuracy was improved by 19.2%, 3.7%, 11.0% and 20.9%, respectively. Regarding modelling efficiency, the time was shortened by 93.8% compared to manual modelling.

This study obtained a large amount of suitable geological data for training. A database with many geological models has potential uses, and this study can be improved through sample generation techniques. In addition, this study has better accuracy and generalisation ability but requires a longer training time than other methods. This can be improved upon in the future.