Introduction

In the global energy landscape, shale oil and gas resources have demonstrated their extraordinary development potential and strategic value. With the successful exploitation of shale oil and shale gas, the United States has effectively alleviated the tension in its own energy supply. This example has prompted the exploration and development of tight shale oil and gas resources to quickly become the focus of attention of countries around the world1,2,3,4,5,6. With the increasing scarcity of traditional oil and gas resources. As an emerging energy source, shale oil and gas is like a beacon, opening up a new path for global energy supply. Its development process can not only expand the total amount of energy, enrich the categories of energy supply, reduce the excessive dependence on traditional energy, but also promote the implementation of the sustainable energy development strategy7,8,9,10. Therefore, accurate and rapid evaluation of shale oil and gas reservoir characteristics is the key prerequisite for the efficient development and rational utilization of shale oil and gas reservoirs.

Scanning electron microscopy (SEM) image technology plays a pivotal role in the field of reservoir evaluation. This technique enables high-resolution fine characterization and depth analysis of the microstructure inside shale reservoirs. In this way, the pore structure characteristics, mineral composition and organic matter characteristics in shale can be clearly revealed, which constructs a crucial information foundation for the evaluation of shale oil and gas reservoirs11,12,13,14. The SEM image method has excellent characteristics of high resolution, high sensitivity and high accuracy. It can accurately capture the subtle structure and characteristic information at the micro scale, which can provide intuitive and reliable image data support for shale reservoir evaluation15,16,17,18,19,20. Traditional artificial image analysis methods have significant drawbacks. It is extremely inefficient, inaccurate, and subjective. It cannot meet the requirements of real-time detection and analysis of images. For SEM professionals, it is challenging to accurately distinguish the various types of materials in the shale microstructure, which undoubtedly sets up many obstacles to the advancement of reservoir evaluation work.

The boom in deep learning technology presents a new opportunity to overcome this dilemma (Fig. 1). In the traditional shale scanning electron microscope (SEM) image characterization process, researchers often use computer graphics software such as PS and drawing software to manually annotate shale SEM images21,22. However, the subjective defects of this manual annotation method are difficult to avoid. The root cause of this lies in the uneven level of professional knowledge and cognition of different individuals. Crucially, reservoir evaluation often involves the processing of massive amounts of shale SEM images. Such an inefficient way of working is far from meeting the rigors of the actual job. In recent years, many scholars have actively introduced deep learning technology into the reservoir evaluation of unconventional oil and gas exploration and development, in order to significantly improve the work efficiency and accuracy4,23,24,25. However, it should not be overlooked that shale SEM images are characterized by a high degree of heterogeneity. This feature makes it difficult for existing deep learning models to achieve the ideal accuracy in small object detection tasks.

Fig. 1
figure 1

The history of deep learning object detection and its application in the oil and gas field.

In view of this, this study intends to take the deep learning object detection model as the core framework. Innovatively introduce different types of convolution, attention mechanisms and add a small object detection layer. A deep learning model adapted to shale SEM images was carefully constructed. By deeply integrating the improved deep learning object detection model with the unique properties of shale SEM images, the intelligent and accurate identification of microscopic substances in shale SEM images is fully realized. With the help of deep learning technology and the organic integration of shale oil and gas reservoir evaluation, the accuracy and efficiency of shale oil and gas reservoir evaluation are improved. It provides solid technical support and scientific method guidance for the in-depth development and rational utilization of shale oil and gas resources.

Datasets and hyperparameters

Samples were collected from the Longmaxi Formation, Longtan Town, Youyang Tujia and Miao Autonomous County, Chongqing, China. In order to collect high-quality datasets, SEM-Maps was used to acquire SEM images. Scanning electron microscopy experiments were carried out at the Chongqing Research Institute of Jilin University in Chongqing, China. The SEM experiment was carried out using the ZEISS Gemini 360 with a magnification of up to 100,000x. The Gantan 697 high-speed argon ion polishing instrument was selected to polish the observation surface of the prepared sheet sample. Scanning electron microscopy experiments were carried out after gold spraying on the observation surface.

The SEM-Maps image was cropped as shown in Fig. 2. Select clear images as the base data set (504 images in total). The labelimg tool was used to annotate quartz, pyrite, organic matter, organic matter pores and inorganic matter pores in the image to establish the basic data set. The data augmentation technology was used to image, randomly rotate, adjust the brightness and shade, increase noise, and randomly crop the images to establish a high-quality dataset (a total of 5544 images and 5544 corresponding label files). In order to compare the performance differences between different models, the hyperparameters of all deep learning network models are consistent. Table 1 lists the hyperparameters.

Fig. 2
figure 2

Sample production method and process.

Table 1 Hyperparameter settings for the Yolov3, Yolov5, Yolov7, Yolov8 and improved Yolov8 models.

Model innovation

In this paper, a fast and intelligent evaluation model for shale SEM images is designed. It is an improved version of the Yolov8 object detection model. The model structure of Yolov8 is divided into two parts: feature extraction and object detection26, which are the network structure of CSPDarknet and the detection head structure of Yolov4-Head, respectively, which can effectively reduce the number of parameters and calculations of the model, and improve the efficiency of feature extraction. More importantly, instead of predicting the position and size of the anchor box, Yolov8 uses a new Anchor-Free detection method, which greatly improves the detection speed and accuracy. Therefore, the research choice is based on the Yolov8 object detection model.

Build a model

The research selection was based on the Yolov8 deep learning model. The Yolov8 deep learning model was improved and innovated. In the first step, a small object detection layer is added to the Neck section. The size of the shale microstructure in the SEM image is small, while the downsampling factor of the Yolov8 model is relatively large. Therefore, the small object detection layer was added to detect the shallow and deep feature maps after splicing, so as to improve the recognition ability of the model for microscopic substances. In the second step, the small object detection layer will greatly increase the computational effort of the model. Therefore, MBConv and FusedMBConv were added to the backbone part of Yolov8, which can keep the model lightweight. In the third step, the SE attention mechanism is introduced to replace all the C2f modules in the Yolov8 model with C2f_SE modules. In this way, the attention mechanism can be introduced without increasing the network parameters, so as to improve the performance and generalization ability of the model. The structure of the improved Yolov8 object detection model is shown in Fig. 3.

Fig. 3
figure 3

Structure diagram of improved Yolov8 deep learning object detection model.

SE attention mechanism and small object detection layer

In the field of computer vision, attention mechanisms have been widely recognized in improving model performance and improving the ability to extract key features. Due to the complex texture and fine detail characteristics of shale microstructure materials, the SE (Squeeze-and-Excitation) attention mechanism was selected27 (Fig. 4a). The spatial dimension of the feature map will be reordered by the Shuffle attention mechanism, which can improve the model’s ability to detect small objects without increasing the computational cost of the model. The improved Yolov8 deep learning model is still lightweight. More importantly, the SE attention mechanism can dynamically adjust the channel weights of the feature map to enhance the feature information and suppress the useless information.

Fig. 4
figure 4

4 Attention mechanism module and small object detection layer. (a) SE attention mechanism, (b) small object detection layer.

The design of the small object detection layer can enhance the ability to capture details and extract features at multiple scales28 (Fig. 4b). By increasing upsampling and feature fusion, the model can better capture and characterize the detailed features of small objects. At the same time, feature extraction through convolution operation can enhance the model’s resistance to noise and interference, and improve the recognition accuracy of the model in complex scenes and noise interference. This further enhances the characterization ability and robustness of the model.

MBConv and fusedmbconv

The small object detection layer increases the computational cost of the model. In order to keep the model lightweight, we chose to introduce MBConv (Mobile Inverted Residual Block) and FusedMBConv29. Their main role is to improve the efficiency of the network while maintaining as high accuracy as possible (Fig. 5). MBConv uses an inverted residual structure, in which input features are added to the input after extended convolution (adding dimension to the channel), deep convolution (convolution for each channel independently), and pointwise convolution (linear combination of all channels). This enhances the expressive power of the feature. MBConv is able to extract rich features while maintaining high efficiency. This makes the network perform well in tasks such as image classification, object detection, etc. FusedMBConv fuses multiple operations, such as deep convolution and pointwise convolution, into a single operation, reducing memory access and computational overhead. This fusion operation typically involves merging activation functions before and after convolution, which reduces the amount of computation too much, it is not conducive to feature point extraction. Therefore, the study chose to cross-use MBConv and FusedMBConv.f storage space and computation required. Since FusedMBConv will reduce the amount o.

Fig. 5
figure 5

Structure of MBConv and FusedMBConv.

Analysis and discussion of training results

In this study, confusion matrix, precision, recall, F1-Score, mAP50 and mAP50-95 were selected as the key indicators to evaluate the performance of the improved Yolov8 object detection model. They can comprehensively reflect the performance of the model in different aspects, so as to provide strong guidance for model optimization and improvement.

The Confusion Matrix is a table used to evaluate the performance of a classification model. It shows the correspondence between the model predictions and the real labels, providing detailed information about the model’s performance. The confusion matrix consists of four parts. (1) True Positive (TP): The number of samples correctly predicted by the model as positive classes. (2) True Negative (TN): The number of samples correctly predicted by the model as negative. (3) False Positive (FP): The number of samples that the model incorrectly predicts as positive. (4) False Negative (FN): The number of samples that the model incorrectly predicts as a negative class.The confusion matrix formula is as follows:

$$\:\begin{array}{c}Confusion\:matrix=\left[\begin{array}{cc}TN&\:FP\\\:FN&\:TP\end{array}\right]\end{array}$$
(1)

Precision measures the proportion of samples that the model predicts to be positive are actually positive. The higher the accuracy, the better the model’s performance in positive class predictions. The formula for accuracy is as follows:

$$\:\begin{array}{c}Precision=\frac{TP}{TP+FP}\end{array}$$
(2)

Recall measures the proportion of the number of samples correctly predicted by the model to be positive out of all actual positive class samples. The higher the recall, the better the model will perform at capturing positive classes. The formula for the recall is as follows:

$$\:\begin{array}{c}Recall=\frac{TP}{TP+FN}\end{array}$$
(3)

Training results of the improved Yolov8 model

The diagonal lines of the confusion matrix provide a visual indication of whether the model can correctly divide the different samples. As can be seen from Fig. 6a, the values on the main diagonal are much larger than those at other locations. This indicates that the improved Yolov8 model has made a relatively correct prediction in the detection results of quartz, pyrite, organic matter, inorganic pores and organic matter pores. In order to provide a clearer representation of the model’s training results, we normalized the confusion matrix (Fig. 6b). The values on the main diagonal of the normalized confusion matrix are 1, 0.97, 0.99, 0.97 and 0.85, respectively. The proportion of prediction results in most categories is close to 1, which indicates that the improved Yolov8 model performs better in these categories.

Fig. 6
figure 6

Improved Yolov8 model confusion matrix results. (a) Confusion matrix result plot, (b) Confusion matrix result normalization.

As shown in Fig. 7, in the early stage of training, due to the influence of transfer learning weights and the interference of data augmentation, each evaluation index fluctuated greatly in the early stage of training. After 150 iterations, each evaluation index tended to be stable. The curves show that the maximum values of Precision, Recall, mAP50, and mAP50-95 of the improved Yolov8 model are 0.94442, 0.91695, 0.9579 and 0.71547 respectively, which indicate that the improved Yolov8 model proposed in this study can effectively identify the microscopic materials in the shale SEM.However, the mAP50-95 index of the improved Yolov8 model is not particularly high, indicating that the model may perform poorly in the detection tasks requiring higher precision, which is also closely related to the strong heterogeneity of shale. The results of the indicator curve are also confirmed by the predictions of the improved Yolov8 model, which can accurately identify various microscopic substances, but cannot identify a small number of more subtle and complex substances (Fig. 8).

Fig. 7
figure 7

Improved Yolov8 model evaluation index change curve.

Fig. 8
figure 8

Improved Yolov8 model predicts outcomes. (a) Organic matter; (b) OM Pores; (c) Pyrite; (d) quartz and inorganic pores.

Comparison of models before and after adding a small object detection layer

To illustrate the difference in the performance of the model before and after adding a small object detection layer. The same evaluation index was selected to show the training results of the model without a small object detection layer. As can be seen in Fig. 9, the Precision evaluation index and Recall evaluation index of the two models are almost the same compared to the improved Yolov8 model. It is proved that there is not much difference in the overall performance of the model before and after adding the small object detection layer. However, there was a significant difference between the evaluation index results of mAP50 and mAP50-95. Demonstrate that the model with the addition of a small object detection layer performs better when detecting more complex and tiny objects. The same conclusion can be drawn from the confusion matrix diagram (Fig. 10). The objects for this study were Organic matter, OM Pores, Pyrite, Quartz and Inorganic Pores. Usually the pores in organic matter are defined as OM Pores, which have more complex structural characteristics. The model without the small object detection layer was 78% accurate in identifying OM Pores, while the model with the small object detection layer was 85% accurate in identifying OM Pores (Figs. 6b and 10b). It can be seen that the small object detection layer plays an important role in the improved Yolov8 model.

Fig. 9
figure 9

Performance difference between the improved Yolov8 and the model without a small object detection layer.

Fig. 10
figure 10

Confusion matrix results of the non-small object detection layer model. (a) Confusion matrix result plot, (b) Confusion matrix result normalization.

Comparison of results from different models

From the confusion matrix obtained from the training of different models, it can be seen that almost every model has a good recognition effect on quartz, pyrite, organic matter and inorganic pores, with an accuracy of almost more than 90%. However, the identification of organic matter pores was not satisfactory (Fig. 11). Among them, the accuracy of the Yolov3 model is about 50%, and the accuracy of the Yolov5 model, the Yolov7 model and the Yolov8 model are about 75%. The reason is that the pores of organic matter are mostly developed in the areas of organic matter enrichment, which increases the difficulty of the object detection model to accurately identify the pores of organic matter. The improved Yolov8 model proposed in this paper has an accuracy of up to 85% for the identification of organic matter pores in samples (Fig. 6), which is much more accurate than other models and 7% higher than the improved Yolov8 model. Figure 12 shows the maximum value pairs of various evaluation indicators for different models. The evaluation indexes of Precision, Recall, mAP50 and mAP50-95 of improved Yolov8 model were higher than those of Yolov3, Yolov5, Yolov7 and Yolov8. The maximum values of the evaluation indexes of the improved Yolov8 model were 2.403%, 3.005%, 2.345% and 3.897% higher than those of the Yolov8 model, respectively. This indicates that the improved Yolov8 model has stronger accuracy in identifying shale microscopic materials and its ability to cope with complex environments.

Fig. 11
figure 11

Confusion matrix results for different models. (a) Yolov3 model; (b) Yolov5 model; (c) Yolov7 model; (d) Yolov8 model.

Fig. 12
figure 12

Comparison of the maximum values of various evaluation indicators of different models.

It is of great significance to accurately identify the microscopic materials in shale SEM images in shale reservoir evaluation. Through the accurate identification and analysis of quartz, pyrite, organic matter and various pores in shale SEM images, the mineral composition and microstructure characteristics of shale can be deeply understood. Revealing the characteristics of pore development, the content and distribution of quartz, the type and enrichment of organic matter and other information can help to determine the development of pores, the sedimentary environment of shale, the diagenesis process and the possible development of micro-fractures. This provides intuitive and accurate data support for further exploration of the petrological properties of shale. Equally important, the results of object detection can be used to calculate surface porosity and mineral ratios by methods such as threshold segmentation or watershed segmentation, which are critical for lithofacies evaluation30,31. It also provides comprehensive and in-depth technical support and scientific basis for the exploration and development of shale oil and gas resources.

Conclusion

  1. (1)

    In this paper, we propose a deep learning model suitable for identifying microscopic materials in shale. The training results of the model show that the Precision, Recall, mAP50 and mAP50-95 of the improved Yolov8 model are higher than those of other models. Compared with the improved Yolov8 model, the Yolov8 model is increased by 0.94442, 0.91695, 0.9579 and 0.71547, respectively. It is demonstrated that the improved Yolov8 model is suitable for identifying microscopic materials in shale SEM images.

  2. (2)

    By observing the prediction result graph, it can be seen that the improved Yolov8 model has excellent performance on untrained images. Due to its low mAP50-95 index, the model’s ability to cope with more complex object detection tasks may be weak. However, compared with the improved Yolov8 model, the improved Yolov8 model has stronger accuracy in identifying shale microscopic materials and the ability to cope with complex environments.

  3. (3)

    The combination of the improved Yolov8 model and SEM technology can help researchers to take more representative images in the shooting stage of shale SEM images, which will greatly reduce the difficulty of reservoir evaluation and improve the efficiency and accuracy of reservoir evaluation. At the same time, the model has great development potential. The improved Yolov8 can be used to train different data, which is expected to be applied to different fields such as medicine and materials science.