Introduction

Rock core integrity (denoted as RCI in this paper) directly reflects the stability and homogeneity of underground rock layers. In underground projects such as tunnels, mines and pits, the advance acquisition of RCI helps engineers to identify potentially unstable areas and to anticipate possible geologic hazards such as landslides, landslides and rockbursts. Thus, disaster prevention and mitigation measures can be deployed in advance during engineering design and construction to ensure the safety of construction personnel and equipment1,2. Most of the traditional RCI evaluation methods rely on manual observation, experimental testing and empirical judgment, which are not only time-consuming and costly, but also susceptible to the influence of human factors, resulting in subjectivity and uncertainty in the evaluation results. Especially when a large number of rock core samples need to be processed, the efficiency is obviously insufficient3.

Image processing technology has been gradually applied to RCI evaluation by virtue of its advantages of automation and long distance. For example, Hall et al.4. analyzed the rock core geometry through the Hough transform algorithm, which simplified the rock core photographs into relevant image texture attributes for the judgment of rock core integrity. Saricam et al.5. first used the edge tracking algorithm to detect the cross-section of the rock core block, then used the thresholding algorithm to eliminate the shadows produced by the light source and segment the rock core block, and finally calculated the RCI by measuring the length of the rock core’s central axis. Image processing techniques can quickly process a large number of rock core images, greatly improving the efficiency of RCI evaluation. However, this type of algorithm is very sensitive to lighting conditions and core features (surface color, texture, etc.). Rock core images under different conditions require different parameter settings, resulting in poor robustness of the algorithms. Secondly, the recognition accuracy of image processing techniques still needs to be improved when dealing with complex backgrounds, low-contrast images, and fine fissures6,7. In recent years, deep learning has achieved a large number of significant results in data detection, recognition, and classification of geotechnical engineering fields, which provides a new and effective solution for evaluating RCI. Alzubaidi et al.8. established a rock core classification model based on a Convolutional Neural Network (CNN), which realizes automated classification of rock cores. Su et al.9. realized the segmentation of rock core blocks with Mask RCNN and Unet al.gorithms, and the segmentation results was used for the intelligent evaluation of RCI. Fu et al.10. improved the YOLOv5 target detection algorithm by using channel and spatial attention to realize the accurate identification of rock core blocks, and automatically predicted the RCI by using the developed post-processing program. Compared with image processing techniques, deep learning algorithms automatically extract features from core images through multilayer neural networks, which avoids the tedious process of manually setting thresholds and improves the accuracy and comprehensiveness of feature extraction11. In addition, the robustness of the deep learning algorithm is significantly enhanced, which is mainly reflected in the ability to adapt to different lighting conditions, changes in color and texture features, and complex backgrounds12.

However, most studies directly apply the existing deep learning algorithms to RCI evaluation, which lacks the analysis for the characteristics of the rock core images and the targeted design of the network, as reflected in the following three aspects:

(1) Existing studies generally recognize intact rock core blocks by deep learning algorithms, and then calculate the proportion of intact rock core blocks occupying the total rock cores. However, the color and striations of rock in different properties vary greatly (Fig. 1a), which makes it difficult to construct a model that can accurately identify all types of rock, resulting in a narrow application scenario. (2)The shape of the convolution kernel of the traditional CNN is a fixed square (generally 1 × 1, 3 × 3), which makes the network not well adapted to the target deformation or targets with different angles when performing feature extraction. The rock core fissures are characterized by features with variable geometry, extension direction and edge features (Fig. 1b), It is difficult to accurately recognize such targets with square convolution kernels. (3) There are often handwritten numbers, lighting, shadows and other external interference in the rock core image (Fig. 1c). Therefore, it is critical to combine the global information and local information to distinguish the target and the background accurately. Currently, the commonly used intelligent detection algorithms are mainly divided into CNN-based and Self Attention (SA)-based target detection algorithms13,14. CNN extract local features layer by layer by convolution operation, which is suitable for processing image data with strong local correlation, but it is hard to model long-distance dependencies. SA can directly capture the global features by calculating the relationship between all positions in the input sequence, but the ability of local feature extraction is insufficient. The construction of network structures that can fully extract both global and local information from images still needs to be further improved in the existing research. In summary, the rock cores obtained from field sampling have complex characteristics including obvious color and striation differences, multiple external interferences, and variable variable geometry features of fissure, etc. The existing intelligent algorithms for rock core integrity often suffer from poor resistance to environmental interference, bad generalization performance, and difficulty in extracting global and local features of the target simultaneously, which leads to poor recognition accuracy in practical engineering applications.

Fig. 1
figure 1

Images of drilled rock cores.

Aiming at solving the above issues, a deep learning-based algorithm named IDA-RCF (Intelligent detection algorithm for Rock Core Fissure) is proposed in this paper, which realizes the automatic evaluation of rock core integrity in accordance with the fissure identification results. The main contributions of this paper are as follows:

  1. (1)

    A two-branch feature extraction network is constructed, in which branch one fully extracts the complex and varied local details of the rock core fissures through Deformable Convolution, and branch two captures the global context information through EfficientViT based on SA, so as to effectively improve the network’s ability to locate the fissures and resist the external interference.

  2. (2)

    A multi-level feature fusion network is proposed for adaptively fusing local and global features from the same level and the fused feature information from the previous level, thereby capturing more valid information and eliminating redundancies .

  3. (3)

    A high-quality rock core fissure dataset is build by collecting, organizing and labeling the drill hole images collected at the construction site, which contains rich rock core samples with different geological, integrity, and background conditions.

  4. (4)

    The experimental results show that the prediction accuracy of rock core integrity automatically calculated according to the fissure identification results of IDA-RCF is 93.8%, which is suitable to accomplish the task of rock core integrity evaluation with high efficiency and precision.

Intelligent detection algorithm for rock core fissures

Methodology

In this paper, a deep learning-based algorithm for intelligent detection algorithm of core fissures, named IDA-RCF (Fig. 2), is proposed. In this paper, a deep learning-based algorithm for intelligent detection algorithm of rock core fissures is constructed, named IDA-RCF (Fig. 2). IDA-RCF consists of three modules: Two-branch feature extraction network, multi-level feature fusion network, and feature decoder. Among them, two-branch feature extraction network is used to extract local and global features of the rock core image, the multi-level feature fusion network is used to adaptively fuse feature information from different levels, the feature decoder module is used to decode the fused feature information and output the identification results of rock core fissures. In the following, the principles of two-branch feature extraction network and multi-level feature fusion network will be elaborated. The feature decoding module is consistent with YOLOv815, which will not be described in this paper.

Fig. 2
figure 2

IDA-RCF network structure.

Two-branch feature extraction network

Two-branch feature extraction network consists of an EfficientViT-based global feature extraction branch (denoted as branch one) and a deformable convolution-based local feature extraction branch (denoted as branch two). The rock core images are input into the two branches in parallel. In branch one, the image is extracted by multiple EfficientViT modules to obtain three global feature layers C1, C2, and C3 with sizes of 13 × 13, 26 × 26, and 52 × 52. In branch two, the image is extracted by multiple DBS (Deformable convolution-Batch Normalization-SiLu) modules consisting of Deformable convolution, Batch Normalization and SiLu activation function to obtain three local feature layers S1, S2, and S3 with sizes of 13 × 13, 26 × 26, and 52 × 52. The principles of EfficientViT which is the core module in branch one, and the deformable convolution which is the core module in branch two are introduced in the following.

EfficientViT

The network structure of EfficientViT is shown in Fig. 3, which consists of Token Interaction, Feed forward network and Cascaded Group Attention. Among them, Token Interaction is used to enhance the model’s ability to process local information, Feed forward network is used to enhance the model’s nonlinear representation ability, and Cascaded Group Attention is used to extract the global context information of the image. The core structure of EfficientViT is the Cascaded Group Attention (CGA), which reduces the computational cost of the SA and enhances the feature representation capability of the model by means of feature segmentation and cross-head cascading.

Fig. 3
figure 3

EfficientViT network structure.

In CGA, the input feature layer (assuming that the dimension is [N, C, H, W], where N is the batch size, C is the number of channels, H and W are the height and width, respectively) is first segmented into G sub-feature layers, each of which has a dimension of [N, C/G, H, W]. Each sub-feature layer is then input independently into an attention head. In the attention head, sub-feature layer S is first multiplied with three different weight matrices (WQ, WK, WV) to generate Query matrix Q, Key matrix K, and Value matrix V. Then the weighted context vectors are obtained by performing SA computation via Eq. (2), which represent the degree of attention that each position in the input sequence pays to the other positions:

$$Q=S \cdot {W_Q},K=S \cdot {W_K},V=S \cdot {W_V}$$
(1)
$$Attention (Q,K,V)=Softmax\left( {\frac{{Q \cdot {K^T}}}{{\sqrt {{d_k}} }}} \right) \cdot V$$
(2)

where dk is the dimension of K.

Then a cascade operation is performed on the outputs of all the attention heads, as shown in Fig. 9b, which adds the outputs of each head to the subsequent heads, gradually optimizing the feature representation to form a feature layer containing richer information. The cascaded features are then projected through a linear layer to make the cascaded feature representation consistent with the original input dimensions with the following formula:

$$F=Linear(Concatenate[Attention1,Attention2,Attention3, \ldots \ldots ,AttentionG])$$
(3)

Deformable convolution

In the traditional convolution operation, the value of each position in the output feature layer is computed by applying a convolution kernel to a localized region on the input feature layer. Setting the input feature layer as X, the convolution kernel as W, and the output feature layer as Y, the convolution operation can be denoted as:

$$y(p)=\sum\limits_{{{p_n} \in R}} {W({p_n})} X(p+{p_n})$$
(4)

where p is the position of the output feature layer, R is the receptive field of the convolution kernel, and pn is the position of each element in the convolution kernel with respect to the center point.

In deformable convolution, the sampling positions in the convolution operation are learned by introducing learnable offsets Δpn. The mathematical expression for deformable convolution is:

$$y(p)=\sum\limits_{{{p_n} \in R}} {W({p_n})} X(p+{p_n}+\Delta {p_n})$$
(5)

where Δpn is the offset corresponding to position pn in the convolutional kernel, which are learned from the input feature layer X through a separate convolutional layer. Assuming that the convolutional kernel of the offset generation layer is Woffset, ΔP is computed as follows.

$$\Delta {p_n}={W_{offset}} * X$$
(6)

The number of output channels of Woffset is 2×k×k, where k×k is the size of the convolution kernel, and 2 indicates that there are two offsets for each position (corresponding to the x and y directions, respectively).

Since the offset Δpn may be a floating point number, p + pn + Δpn may not be an integer position. In this regard, deformable convolution uses bilinear interpolation to compute the feature values at non-integer positions. Setting p′=p + pn + Δpn, the bilinear interpolation is computed as follows: four neighboring integer points p0, p1, p2, and p3 of p′ are firstly localized. then a weighted average is carried out according to the distances from p′ to the four integer points. During the training process, the offset Δpn and the convolution kernel Woffset are updated simultaneously to minimize the loss of the network.

By introducing learnable offsets, the deformable convolution is able to sample at irregular locations, which enhances the network’s ability to handle geometric deformations and different scale objects. Figure 4 shows the difference in the feature extraction process with deformable convolution and standard convolution. As the number of network layers increases, the deformable convolution is able to perform better shape feature extraction for fissures.

Fig. 4
figure 4

Standard convolution and deformable convolution.

Multi-level feature fusion network

To effectively obtain different levels of local features and global features, enhancing the network’s anti-interference ability for complex backgrounds and capturing the detailed fissure features. A multi-level feature fusion network is constructed in this paper, which consists of two parts: upward fusion module and downward fusion module.

In the upward fusion module, the global feature layer Ci(i = 1, 2, 3) and the local feature layer Si outputted from the two-branch feature extraction network are fused through the MFFM to obtain the one-stage fusion feature layer FFi. then the smaller-size FFi is sequentially fused with the larger-size FFi−1 through the bottom-up path to obtain the two-stage fusion feature layer SFi−1. The core structure of the upper fusion module is the MFFM, as shown in Fig. 5. In MFFM, the global feature layer will be input into the channel attention, which utilizes the interdependence between channel mappings to improve the feature representation of specific semantics, which helps to reduce noise interference and enhance the robustness of image detection. The local feature layer is input into the spatial attention, which enhances the local details and suppresses irrelevant regions by adjusting the weights, thus improving the accuracy of image detection. Finally, the global and local feature layers after applying attention are fused. In the downward fusion module, the larger SFi is sequentially fused with the smaller SFi+1 through a top-down path, which outputs three three-stage fused feature layers with sizes of 13 × 13, 26 × 26, and 52 × 52.

Fig. 5
figure 5

MFFM network structure.

Data set

More than 7000 rock core images with good diversity were collected at the project site, containing rock core types of different integrity, color, texture, and background complexity, which is beneficial for constructing a RCI prediction model with good generalization performance. Some of the rock core images are shown in Fig. 6.

Fig. 6
figure 6

Rock core image.

The length and width of the images are adjusted to be the same when inputted into the network. However, the rock core image is generally presented as a long strip, which will lead to distortion in length compression if it is inputted into the network alone, as shown in Fig. 7a. To avoid this phenomenon, the horizontal splicing method is adopted in this paper to preprocess the images, as shown in Fig. 7b. A total of 973 spliced images of rock cores with similar length and width are obtained with a size of 640 × 640, which are divided into 60% training set, 20% validation set and 20% test set. Then, the LabelImg annotation tool was used to label the fissure regions in the rock core images (drawing bounding boxes and marking category information) (Fig. 7c) to obtain the final dataset.

Fig. 7
figure 7

Data set construction method.

Rock core fissure identification experiments

The experiment mainly consists of the following two stages: (1) Comparison experiment of rock core fissure identification: with the data set constructed in this paper as the basis, seven target detection algorithms are selected to conduct a comparison experiment with the algorithms constructed in this paper, to verify the effectiveness of IDA-RCF in identifying the fissures of different geological, integrity, and background conditions. (2) Ablation experiment: the effect of Deformable Convolution, Two-branch feature extraction network, and MFFM adopted in this paper are tested through ablation experiment.

Comparison experiment of rock core fissure identification

Comparison network

To compare the effectiveness of IDA-RCF for rock core fissure identification task, seven well-performing target detection networks were selected for comparison. A brief description of each network is given below:

DETR16: This network is constructed with the Transformer architecture, which utilizes the global attention mechanism to perform target detection for the first time.

Faster-RCNN17: a two-stage network which first generates candidate regions by Region Proposal Network and then performs fine-grained classification and regression on these candidate regions.

EfficientDet18: it combines EfficientNet as the feature extractor and BiFPN as the feature fusion module, which drastically reduces the computational complexity while maintaining high accuracy.

YOLOX19: an anchor-free frame design and a dynamic label assignment strategy are adopted, which simplifies the training and inference process and improves the detection accuracy. In addition, a decoupled detection header is introduced to optimize the classification and regression tasks independently, which further improves the model performance.

YOLOv5: Compared to the predetermined anchor points of YOLOv4, YOLOv5 adopts an adaptive anchor point design, which makes the anchor points closer to the distribution and morphology of the actual objects.

YOLOv8 (Varghese et al., 2024): the C3 module of YOLOv5 was replaced by the C2f module with richer gradient streams, and the classification and detection heads were separated in the decoupled head.

YOLOv5-ECASA (YOLOv5 enhanced by ECASA attention module) (Alzubaidi et al., 2022): the ECASA attention is used to improve the YOLOv5 algorithm, which realizes the intelligent identification of rock core fissures.

Evaluation indicators

The evaluation metrics selected in this paper include Recall, Precision, F1, and mean average precision (mAP). Recall indicates the proportion of actually existing fissures that are correctly detected. Precision indicates the proportion of detected fissures that are actual existing fissures. The calculation formula is shown below.

$$Precision=\frac{{TP}}{{TP+FP}}$$
(7)
$$Recall=\frac{{TP}}{{TP+FN}}$$
(8)

where TP (True Positives) denotes the number of fissures correctly identified as fissures; FP (False Positives) denotes the number of backgrounds incorrectly identified as fissures. FN (False Negatives) denotes the number of fissures incorrectly identified as backgrounds.

F1 is the reconciled mean of precision and recall. mAP denotes the mean of precision at different recall levels.

$$F1\_score=\frac{{2Precision \cdot Recall}}{{Precision+Recall}}$$
(9)
$$mAP=\int_{0}^{1} {Precision\left( {Recall} \right)d} (Recall)$$
(10)

mAP@0.5 is the mean value of the average accuracy calculated when the Intersection over Union (IoU) is set to 0.5. IoU represents the degree of overlap between the predicted and real frames, which is calculated as the ratio of the intersection area to the concatenation area. of the predicted box to the real box. mAP@0.5:0.95 is the mean value of mAP calculated at different IoUs (from 0.5 to 0.95 with a step size of 0.05). mAP@0.5 can reflect the model’s identification performance in general application scenarios. mAP@0.5:0.95 takes into account the model’s performance under multiple IoU thresholds, which comprehensively reflects the detection capability of the model.

Experimental results

The evaluation metrics results of each model are shown in Table 1. The F1, mAP@0.5和mAP@0.5:0.95 of IDA-RCF are 93.09%, 94.44%, and 84.61%, respectively, which are the highest among all the models, indicating that IDA-RCF has a better identification accuracy of rock core fissures. Compared with DETR, the F1, mAP@0.5和mAP@0.5:0.95 of IDA-RCF is improved by 4.62%, 4.82%, and 9.41%, respectively. DETR is built based on Transformer, which has good global feature capability but lacks in extracting the local features of the fissure, while two-branch feature extraction of the IDA-RCF takes advantage of both CNN in local feature extraction and SA in global feature extraction, thus achieving better identification accuracy. Among the four types of models built based on the YOLO framework (YOLOX, YOLOv5, YOLOv5-ECASA, YOLOv8), YOLOv5-ECASA has the best evaluation metrics, which is due to the fact that the model introduces the ECASA attention, which is able to filter the redundant information better for the fissure images with complex background interference and help the network put more attention to the fissure region. However, YOLOv5-ECASA is constructed entirely by convolutional neural networks. Due to the limitation of the localization of the convolutional kernel, it is insufficient in capturing the global contextual information, thus the accuracy is lower than that of IDA-RCF. In addition, IDA-RCF has the smallest difference between mAP@0.5:0.95 and mAP@0.5, which indicates that IDA-RCF is more broadly applicable, which can be used in more project scenarios that have different requirements on the detection accuracy.

Table 1 Results of model performance metrics.

The different geological conditions are mainly reflected in the color differences of rock cores. Figure 8a and b show the fissure identification results for rock cores in different geologic conditions. IDA-RCF, DETR, Faster-RCNN, EfficientDet, YOLOX, YOLOv5, YOLOv5-ECASA, and YOLOv8 are denoted as A, B, C, D, E, F, G, H in the figure. For a more objective comparison, the shown images are similar in the integrity and the background complexity. When the color of the rock is light (Fig. 8a), the models are able to accurately identify the coarser fissures. For the thin fissures, DETR and Faster-RCNN showed missed recognition. When the core color is dark (Fig. 8b, the overall confidence of each model appears to decrease, which is due to the fact that the color of fissure region is nearly black, a darker background will make it difficult for the models to differentiate the demarcation area between the background and the fissure. There are five fractures in Fig. 8b. IDA-RCF correctly identified all the fissures, while DETR, Faster-RCNN, EfficientDet, YOLOX, YOLOv5, YOLOv5-ECASA, and YOLOv8 identified 4, 2, 2, 3, 3, 3, 3, respectively. The above results show that IDA-RCF has better geological applicability in identifying the fissures under different core colors.

Fig. 8
figure 8

Results of fissure identification for different geological conditions.

Figure 9 shows the fissure identification results with different integrity of rock core. The fragmentation of the cores in Fig. 9a–d increases sequentially. The more fragmented the core is, the more number of fissures and the richer the morphology, which makes the identification more difficult. For the core with good integrity in Fig. 9a, all models accurately identified the fissure locations. In Fig. 9b, EfficientDet, YOLOv5, and YOLOv8 showed incomplete identification of the fissure area. Faster-RCNN, EfficientDet, YOLOv5-ECASA incorrectly identified one fissure into two rectangular boxes. In Fig. 9c and d, the missed identification of fissures in the comparison models is obviously increased, and the phenomenon that the predicted rectangular box area is obviously larger than the actual fissure area occurs. Comprehensively analyzing the identification results of IDA-RCF, although a small amount of missed identification occurs with the increase of fragmentation, it is still the least among all the models, in addition, the rectangular box can well reflect the position occupied by the fissures in the core, which can provide the basis for the subsequent accurate calculation of the fissure rate in the rock core.

Fig. 9
figure 9

Results of fissure identification under different integrity.

Figure 10 shows the identification results for the complex background case. In Fig. 10a, there is a lot of dark interference in the background, Faster-RCNN, EfficientDet, YOLOX, YOLOv5 and YOLOv8 incorrectly identify the dark material as fissure. In Fig. 10b, combination of low light brightness, small fissure widths, and interference in the background causes great difficulty in identification. EfficientDet and YOLOX did not identify any fissures, Faster-RCNN, EfficientDet, YOLOX, YOLOv5 and YOLOv8 showed varying degrees of misidentification. IDA-RCF accurately identifies all fissures without incorrect identification, indicating that the proposed algorithm has good resistance to environmental interference, which is suitable for complex and variable conditions.

Fig. 10
figure 10

Results of fissure identification in complex background.

Ablation experiment

The effect of Deformable convolution, Two-branch feature extraction network, and MFFM adopted in this paper are tested through ablation experiment.

The effect of deformable Convolution

The deformable convolution in IDA-RCF is replaced with the traditional convolution to construct the comparison model. The F1, mAP@0.5and mAP@0.5:0.95 of this model are 91.35%, 92.36%, and 80.22%, respectively, which are 1.74%, 2.08%, and 4.39% lower compared with IDA-RCF. Analyzing Fig. 11 (The model with deformable convolution are denoted as A1, the model with traditional convolution are denoted as A2), the model with traditional convolution correctly identifies the transverse fissures with relatively regular shapes, but the identification of curved and diagonal fissures is unsatisfactory. The model with deformable convolution accurately localizes the fissures in different directions. The above results show that the deformable convolution has better ability to deal with geometric deformation and different scales of fissures.

Fig. 11
figure 11

Identification results of models with deformable convolution and traditional convolution.

The effect of two-branch feature extraction network

Branch one and branch two are respectively used as the backbone to construct the comparison model, the test results are shown in Table 2; Fig. 12 (The model with two-branch feature extraction network are denoted as B1, the model with branch one are denoted as B2, the model with Branch two are denoted as B3). The F1, mAP@0.5and mAP@0.5:0.95 of the model with two-branch feature extraction network are improved by 3.1%, 3.6%, and 4.14% compared with the model with branch one, and 2.27%, 2.69%, and 3.29% compared with the model with branch two, which indicates that the combination of the advantages of CNN and SA can effectively improve the identification accuracy of fissures.

Table 2 Model evaluation indicators with different branch.
Fig. 12
figure 12

Identification results of models with different branch.

The effect of MFFM

To investigate the effectiveness of MFFM in fusing global features from branch one and local features from branch two, the models with and without MFFM are compared. The results show that the F1, mAP@0.5和mAP@0.5:0.95 of the model with MFFM is improved by 1.27%, 1.15%, and 2.08% compared to that of the model without MFFM, respectively. Figure 13 (The model with MFFM one are denoted as C1, The model witho MFFM one are denoted as C2) shows that the model with MFFM has better anti-interference ability for complex backgrounds and the ability to capture the detailed fissure features. The model without MFFM appeared to incorrectly identify the background as a fissure several times. In addition, for fissures with small widths, the model without MFFM had missed identifications and incomplete identifications.

Fig. 13
figure 13

Identification results of models with and without MFFM.

Rock core integrity evaluation experiments

The degree of RCI was evaluated using the fissure rate, which is the average number of fissures per unit length (1 m). The relationship between fissure rate and the degree of RCI is shown in Table 3. In this paper, the fissure rate is automatically calculated based on the fissure detection results of IDA-RCF. The calculation process is shown in Fig. 14. Sixteen rock core images are selected for RCI evaluation experiments. The results are shown in Table 4. The relative error between the fissure rate calculated by IDA-RCF and manual statistics is only 4.38%, and the prediction accuracy of RCI degree is 93.8%. The above results indicates that the proposed method can accurately evaluate the degree of RCI, which can greatly reduce the manpower and time consumed by manual evaluation means if it is applied to the practical projects.

Table 3 Relationship between fissure rate and degree of rock core integrity.
Fig. 14
figure 14

Process for predicting the degree of RCI.

Table 4 Predicted results of rock core integrity.

Conclusion

In this paper, an intelligent detection algorithm for rock core fissures based on deep learning, named IDA-RCF, is proposed by combining Deformable convolution, Two-branch feature extraction network, and Multi-level feature fusion network, which realizes the automatic evaluation of rock core integrity in accordance with the fissure identification results. The main conclusions are as follows:

  1. (1)

    The F1, mAP@0.5和mAP@0.5:0.95 of IDA-RCF are 93.09%, 94.44%, and 84.61%, respectively, which has higher fissure identification accuracy compared with other commonly used networks. In addition, IDA-RCF has the smallest difference between mAP@0.5:0.95 and mAP@0.5, which indicates that IDA-RCF is more broadly applicable, which can be used in more project scenarios that have different requirements on the detection accuracy.

  2. (2)

    Compared with other models, IDA-RCF has better applicability to the task of fissure identification under different geology, rock core integrity, and complex backgrounds, which is mainly reflected in the fact with less cases of missing identification and incorrectly identifying the background as fissures.

  3. (3)

    Based on the detection results of IDA-RCF, the fissure rate is automatically calculated and used to predict the degree of rock core integrity. The relative error between the fissure rate calculated by IDA-RCF and manual statistics is only 4.38%, and the prediction accuracy of RCI degree is 93.8%, which indicates that the proposed method can accurately evaluate the degree of RCI.

Finally, it is noted that the evaluation of RCI with only the fissure rate is not comprehensive. In the future work, we plan to build a high-precision semantic segmentation algorithm for intelligent prediction of RQD (Rock Quality Designation), which is another key evaluation index reflecting the RCI. Then, combining the two indexes of RQD and fissure rate to evaluate the degree of RCI to further improve the prediction accuracy.