Introduction

Breakthroughs have recently been made in natural gas exploration of bauxite on the southern margin of the Ordos Basin1,2; these breakthroughs have greatly expanded the field of natural gas exploration and demonstrated the potential of large-scale reserves and broad exploration prospects of bauxite. In the past, because of their wide distribution, poor physical properties and strong capping ability, aluminium (Al)-bearing rocks and dark mudstone deposited in the transitional facies of the late Palaeozoic sea and land were regarded as the direct cap of the lower Palaeozoic weathering crust gas reservoir in the Ordos Basin3,4,5. The industrial breakthrough of bauxite, as a new type of natural gas reservoir, has fundamentally changed the traditional geological understanding that Al-bearing rock can only cap and cannot form an effective reservoir. Presently, the identification of gas reservoirs is becoming a popular topic of study in the efficient exploration and development of bauxite series.

At present, in the absence of coring, geophysical logging is usually used to identify effective reservoirs6,7. Conventional log intersection and superposition charts are considered the broadest and most low-cost methods for log-based reservoir identification8,9. However, this approach has limitations and challenges for bauxite reservoirs. The sedimentary and diagenetic processes of bauxite reservoirs exhibit significant complexity10, resulting in a more intricate rock composition, strong heterogeneity, and substantial interference and overlapping log information. First, bauxite may contain a variety of mineral components with varying contents such as hydrodiaspore, diaspore, anatase, haematite, siderite, kaolinite, illite, etc.11,12,13. The response characteristics of these minerals on the log differ significantly, contributing to the complexity of the bauxite rock response. Second, the development of bauxite reservoirs may entail the superposition and alternation of multiple diagenetic stages14, leading to considerable vertical and lateral heterogeneity in Al-bearing rock sequences, encompassing lithological variations and porosity fluctuations. In particular, abrupt lithological changes vertically result in overlapping logging responses, thereby increasing the complexity of lithological identification. Owing to the presence of multiple solutions for log responses and potential overlap in response characteristics, a single type of log response may correspond to various geological phenomena or rock types. Additionally, distinct rock types and pore structures may also exhibit similar response characteristics on the log curve. For example, in conventional clastic rock logs, a high clay content in nonreservoir strata can lead to increased natural gamma-rays as well as heightened density but reduced resistivity15,16. However, within Al-bearing rocks, a mudstone section with high contents of heavy minerals and potassium at the bottom of an aluminous formation can show the same response characteristics as a bauxite gas reservoir. If the conventional intersection diagram is used directly, the mudstone at the bottom is often mistaken as an effective reservoir, which is unfavourable for the exploration and development of bauxite gas reservoirs.

As a mathematical statistical method, principal component analysis (PCA) can reduce the dimension and concentrate the stratum geophysical information while largely maintaining the original information17,18,19. Since the 1980s, Schlumberger has researched the use of well log data for automated electrofacies zoning of wells through PCA, facilitating geological sequence determination, reservoir zonation, and well-to-well correlation20. PCA has been widely used in well logging identification and evaluation of oil and gas reservoirs in recent years. According to the characteristics of the diverse lithology and complex components of glutenite in Dongying hollow, Gao and Li established a lithological recognition method and porosity evaluation model for tight sandstone based on PCA, which successfully improved the accuracy of porosity logging evaluations of a tight glutenite reservoir21. Jahan et al. successfully identified different fault patterns in the Upper, Middle, and Lower Bakken members and the Three Forks Formation by integrating multiple attributes of seismic data through PCA. The results of fault cuts correlated well with those from missing well log Sect22. Karimi et al. used PCA to identify a carbonate reservoir in the Middle East and extended an automatic well-to-well correlation approach, and noted that PCA is beneficial for increasing well-to-well correlation accuracy by reducing the dependency of well-to-well correlation parameters and feature extraction from log data23. Li et al. reconstructed missing essential neutron and sonic logs from coal seam logs via coalbed methane well logging (density, resistivity, gamma ray, spontaneous potential, and calliper logs) via PCA and compared them with empirical methods, neural network models, and multivariate regression methods24. The results revealed that the key logging curves reconstructed based on PCA were more reliable in evaluating the coalbed methane content. In view of the failure of fracture characterization and classification in volcanic reservoirs with well logging data, Ge et al. proposed a method for identifying the degree of fracture development on the basis of the combination of PCA and multifractal detrended fluctuation analysis, which successfully predicted the degree of fracture development of volcanic reservoirs in the Tarim Basin25. In summary, the PCA method is now highly recognized for the types of reservoirs that have been discovered and produced at scale.

In this study, the authors aim to utilize the PCA method, which has proven successful in other reservoirs, to derive a parameter or model that serves as a highly indicative indicator of effective gas reservoirs for the Longdong area in the Ordos Basin. This paper first provides a comprehensive overview of the essential characteristics of Al-bearing rock layers through an analysis of geochemistry, rock minerals, and logging features. This study identifies diaspore as the characteristic mineral of an effective gas-bearing reservoir and emphasizes the importance of the aluminium oxide content as a key parameter. The conventional logging curves are subsequently dimensionally reduced via the PCA method to derive a new uncorrelated variable (principal component). On the basis of the relationships among lithology, the Al2O3 content, and the principal components obtained via PCA, a lithological division chart and an effective reservoir identification model based on PCA are established. The lithological division method uses 2 principal components derived from 8 conventional logging curves to preliminarily identify potential reservoirs. A reservoir identification model with a reasonable goodness of fit eliminates multicollinearity in direct linear regression. Moreover, the weight coefficient assigned to each variable reflects genuine geological principles, providing a reliable foundation for subsequent research endeavours. Verified by production test data, the method based on PCA processing constitutes an effective and rapid approach for the identification of bauxite gas reservoirs without other test data during exploration and development.

Geological background

The study area is situated in the southwestern Ordos Basin and spans two tectonic units: the Tianhuan depression and the Yishan slope (Fig. 1a). At the end of the Ordovician, influenced by the Caledonian Movement, the North China epicontinental basin was uplifted to become land, causing a sedimentary hiatus of approximately 130 million years from the Middle Ordovician to the Early Carboniferous, and the absence of Silurian to Devonian strata. During this period, carbonate rocks of the Ordovician Majiagou Formation were exposed to the surface and gave rise to karst landforms. At the end of the Carboniferous Benxi Age, seawater from the Qinqi sea and the North China sea invaded the central palaeouplift in the central and western parts of the Ordos basin, and the Ordos area subsided and received new deposits. Nevertheless, owing to its location in the karst highlands, the Longdong area has not received the depositions of the Benxi Formation. In the early Permian Taiyuan stage, the transgression expanded, and the Longdong area deposited Taiyuan Formation strata.

From the Carboniferous to Permian, the North China Block was located in the low and middle latitudes of the Northern Hemisphere. It had mainly tropical and subtropical summer wet ecosystems, as well as equatorial tropical ever-wet ecosystems. The Lower Palaeozoic karst system developed due to high temperatures, rainfall, and the rise and fall of sea level. These factors provided favourable drainage structures for the weathering and leaching of the original upper Palaeozoic deposits. Moreover, the rise and fall of sea level caused ferrallitization and gleization of the original sediments from the Carboniferous to the Permian, which led to the formation of Al-bearing rock series on the basin margin.

Fig. 1
figure 1

Comprehensive geological map of the Ordos Basin. (a) Structural division of the Ordos Basin and location of the study area. (b) Columnar map of Palaeozoic sedimentary strata in the study area.

The Palaeozoic strata in the study area include the lower Palaeozoic Cambrian system (the Maozhuang Formation, Xuzhuang Formation, Zhangxia Formation, and the Sanshanzi Formation), the Ordovician system (the Majiagou Formation only) and the upper Palaeozoic Permian system (the Taiyuan Formation, Shansi Formation, Shihezi Formation, and Shiqianfeng Formation). The bauxite reservoir discussed in this paper is in the Taiyuan Formation of the upper Palaeozoic Permian system. The Taiyuan Formation is within a transitional sedimentary environment, and features coal seams, carbonaceous mudstone, mudstone containing limestone, and the Al-bearing rock series (Fig. 1b). The bauxite reservoir discussed in this paper is located in the middle stratum of this aluminous rock series (Fig. 1b). The Al-bearing rocks show significant vertical variation in element enrichment due to weathering and leaching. To identify the gas reservoirs in the Al-bearing rock series efficiently and accurately on the basis of the predicted principal element content, a regression relationship was established between conventional logging and principal element content via tests, core observations, cuttings, and conventional logging data.

Methods

Major element test

Owing to the comprehensive effects of palaeostructure, palaeolatitude and palaeoclimate, a bauxite-rich stratum with a bean-shaped oolitic structure and diaspore formed26,27,28. Under the transformation of palaeokarst14,29, the soluble components in the bean-shaped oolitic structure dissolved to form intragrain soluble pores, matrix soluble pores and residual lattice pores, causing bauxite to become a favourable place for oil and gas storage. Therefore, the distribution of bauxite reservoirs is largely controlled by the content of diaspore minerals. The layer with a high diaspore content, well-developed bean-shaped ooids, and detritus is usually the main effective reservoir with good porosity. In contrast, clay rock with a low content of aluminium and without ooids and detritus cannot be used as an effective reservoir because of its weak karst corrosion and poor physical properties.

The primary chemical components play crucial roles in the origin, formation, and evolutionary processes of rocks. Specifically, they offer valuable insights into sedimentation and diagenesis. This test utilized the continuous variation in major elements along the vertical well section to preliminarily determine the mineral composition of the Al-bearing rock and aid in understanding pore formation as well as the relationship between porosity and aluminium mineral content. The concentrations of major element oxides were determined by 36 inductively coupled plasma (ICP) measurements. The samples were obtained from Well LD58 in the study area. After grinding and sieving procedures, these samples underwent high-temperature chemical treatment to enable dissolution, yielding a 40-millilitre solution for further analysis.

X-ray diffraction analysis

While bauxite rocks present higher aluminium contents, clay rocks also exhibit significant concentrations of aluminium. X-ray diffraction (XRD) analysis not only reveals the source minerals responsible for aluminium but also provides insights into the enrichment mechanisms based on longitudinal variations in mineral composition. After the XRD patterns were obtained, the sample X-ray diffraction data were compared with the standard mineral data for qualitative analysis. The Rietveld full-spectrum fitting method was used for semiquantitative analysis, with specific methods and steps described in the literature30. Twenty-three core samples were acquired from the well section for X-ray diffraction analysis. Preprocessing and testing of all the rock samples were performed at the Continental Dynamics Key Laboratory at Northwestern University. A D8 Advance X-ray diffractometer from Bruker AXS in Germany was utilized to perform this test, with instrument parameters including a Lynx XE array detector, a Cu target, a ceramic X-ray tube, a voltage of 40 kV, and a current of 40 mA. Continuous scanning mode was adopted, with a range of angles of − 110 to 168° (2θ), an angle accuracy of 0.0001°, a step size of 0.02°, and a scanning speed of 0.2 s per step.

Principal component analysis

PCA is in a category of unsupervised learning in machine learning. The core concept is to calculate new variables by linear transformation of multiple variables. The new variables are called principal components (PCs), and they are uncorrelated with each other and ranked in order of importance from the largest to the smallest18,20,23,31. Through PCA, dimension reduction, noise removal and simplification of data can be achieved through dimensionality reduction. Multiple logging indices are transformed into a few principal components, each of which reflects most of the original logging information, to show the essential relationship between the logging and geological information to the greatest extent by using fewer variables. The above concept of PCA can be visualized through an illustration via only two parameters, as shown in Fig. 2. Three genes can be classified into two principal components, PC1 and PC2. For a more extensive set of log parameters, the mathematics remains unchanged, but visualizing the geometry is challenging32.

The PCA method involves three key steps33,34 (Fig. 2). First, the data sets are standardized by standard deviation normalization method to obtain a standardized matrix. The covariances between standardized variables are subsequently calculated. The main diagonal represents the covariance of the identical variable, whereas the upper and lower triangles of the matrix denote the covariance between distinct values. Finally, the eigenvalues and eigenvectors are derived from the covariance matrix to reveal the underlying patterns in the data. Once eigenvectors are obtained from the covariance matrix, they are arranged in descending order of significance. Typically, the first two components contain maximum variances of the data of interest needed for evaluation35,36. In this study, eight conventional well logs, including gamma ray (GR), compensated neutron log (CNL), density (DEN), uranium (U), thorium (TH), acoustic time (AC), resistivity (R), and potassium (K) logs, were selected as source data. The Z-score standard deviation was employed to standardize these logs and calculate their covariance matrix. Following the core principles of PC extraction, multiple uncorrelated principal components with decreasing importance could be extracted.

Fig. 2
figure 2

Schematic diagram of PCA theory.

Results

Major element contents

The results of major element tests of 36 rock samples are shown in Fig. 3. The major elements clearly stratify in the vertical direction. The well sections from 4038.6 to 4141.2 m and 4042.8 to 4045.2 m have low contents of Al2O3, TiO2, and K2O and high contents of SiO2. The well sections from 4041.2 to 4042.8 m and 4045.8 to 4051.1 m have high contents of Al2O3 and TiO2 and low contents of K2O and SiO2. The well section from 4051.1 to 4060 m has low contents of Al2O3 and TiO2 and high contents of K2O and SiO2.

Fig. 3
figure 3

Vertical distribution of oxide concentrations.

The vertical chemical composition of the Al-bearing layer reflects the degree of weathering and leaching37,38,39,40. During chemical weathering, alkali metals and alkaline earth metals in high-aluminium-containing sections are typically dealkalized and leached to the lower layer because of their strong activity, resulting in a high content of K in the lower layer. Al2O3 and TiO2 showed an evident positive correlation, whereas Al2O3 and SiO2 showed an evident negative correlation (Fig. 4). During bauxite mineralization, the inactive elements, Al and Ti, exhibit the geochemical behaviour of comigration and coenrichment. In the chemical weathering process, Al and Si are initially enriched by dealkalization, and then continuous leaching and desilication cause Si to migrate downwards, resulting in the accumulation of Si at the bottom of the formation.

Fig. 4
figure 4

Relationships between Al2O3 and TiO2 or SiO2.

The Al-bearing rock series can be divided into seven types (Fig. 5) on the basis of the contents of silicon oxide (SiO2), aluminium + titanium oxide (Al2O3 + TiO2), and ferric oxide (Fe2O3): Fe-poor bauxite, bauxite, ferruginous bauxite, bauxitic ferruginous ore, clayey bauxite, clayey iron rock, and bauxitic clay41,42. Table 1 shows the mineral composition characteristics of the different lithologies. Only three of the seven classes shown in Fig. 5 are present in the study area: Fe-poor bauxite (for simplification, hereinafter referred to as bauxite), clayey bauxite, and bauxitic clay.

Fig. 5
figure 5

Al2O3–SiO2–Fe2O3 classification diagram for high-aluminium deposits in the study area, modified from Krasnova and Rostovtseva43, with illustrative points in the study area. The lithologic names and characteristics of the seven types are shown in Table 1.

Table 1 Mineral composition characteristics of seven rock types.

XRD mineral content

The results of the whole-rock X-ray diffraction analysis (Fig. 6) indicate that diaspore and clay minerals are the main composition in Al-bearing rock, with diaspore accounting for 36.6–94.1% of the total composition, which is predominantly found in the middle strata at depths ranging from 4043 to 4055 m, where it exceeds 70%. The high Al2O3 content in Al-bearing rocks is caused mainly by the diaspore. Clay minerals were mostly detected in the upper and lower parts (shallower than 4043 m and deeper than 4055 m, respectively) of the aluminous series, ranging from 2.1 to 35.7% (Fig. 6). In addition, minor occurrences of potassium feldspar, pyrite, and anatase were observed. The results indicate that kaolinite, montmorillonite, chlorite, and illite are the main clay minerals present in this Al-bearing rock series. Kaolinite contents range from 1.3 to 24.3%, chlorite contents range from 0.7 to 19.3%, and illite contents range from 0.2 to 6.9%. The presence of significant amounts of illite below 4055 m again indicates that the soluble K was leached to the lower strata to form illite, which was enriched at the bottom.

Fig. 6
figure 6

Mineral content from X-ray diffraction (XRD).

Logging response

On the basis of the above test data, a log value comparison of different intervals with varying levels of aluminum contents within the Al-bearing rock series of the Taiyuan Formation indicates that favourable gas formations with higher aluminium contents generally present high GR, high CNL, high DEN, high U, high TH, low AC, low R, and low K values (Table 2; Fig. 7).

Table 2 Log response characteristic values of a continuous well section of a typical Al-bearing layer.

Figure 7 presents a comprehensive illustration of the correlation between the Al2O3 content and each logging value, providing valuable insights into the relationship between the diaspore content and logging value. The data clearly indicate that an increase in diaspore content corresponds to a significant increase in U, Th, CNL, and GR values. Particularly noteworthy is the strong correlation observed between the GR values and the Al content, highlighting the influence of diaspore on GR. Furthermore, there is a clear negative correlation between the K logging value and the Al content. Additionally, the resistivity logging value consistently shows a negative correlation with increasing Al content; as the Al content increases, the resistivity decreases accordingly. Moreover, when AC and DEN are analyzed in relation to the Al content, it becomes evident that there is no obvious correlation between them.

The radioactivity of bauxite is very low, but the Al-rich well section shows a high GR on the log curve. The accumulation of radioactive U and Th in bauxite due to its strong adsorption is not the reason why the gamma value is high in well logs, because aluminous rock in the lower section of the Al-bearing rock series is stronger in terms of adsorption but lower in U and Th contents. Notably, the anatase content in the bauxite layer is always positively correlated with diaspore; anatase often coexists with highly radioactive U and Th and is the key to bauxite formation, resulting in high gamma values in the log curve. However, radioactive K is not enriched in the Al-rich layer, which is directly related to the downwards leaching migration of K+ during the late evolution of the aluminous rock.

There are two main reasons for the high neutron response values in the highly aluminized strata. On the one hand, the oolites and detritus developed in bauxite often undergo strong transformation in the late diagenetic stage and form dissolution pores. Dissolution is more common during diagenesis when bean-shaped oolites and detritus are developed in reservoirs, thus leading to a higher porosity and higher content of hydrogen-containing fluid and CNL. On the other hand, bauxite inevitably contains a large amount of diaspore, and the H in its crystal structure also results in a high CNL value.

Fig. 7
figure 7

Relationships between the Al2O3 content and each logging value. (a) Al2O3-U. (b) Al2O3-TH. (c) Al2O3-CNL. (d) Al2O3-GR. (e) Al2O3-DEN. (f) Al2O3-K. (g) Al2O3-AC. (h) Al2O3-RLLD.

In summary, GR, CNL, U and Th are the main log curves that respond significantly to bauxite with high aluminium contents. In addition, owing to the presence of various metal minerals, such as diaspore, anatase, and pyrite, the log curves of the bauxite layer are characterized by high density. Moreover, dissolution pores in the highly aluminized strata result in low resistivity. On the one hand, the formation water content is relatively high in the strata featuring developed pores, which constitutes the primary cause of the low resistivity. On the other hand, the intervals with developed pores are frequently those with high aluminium contents. Owing to the presence of metal elements such as Al, to a certain extent, the conductivity of the rock matrix is also increased, thereby resulting in a low resistivity. A log identification standard for Al-bearing formations is established on the basis of the Al2O3 content and conventional logging data.

In the well bauxite deposits developed in the study area, clayey rocks are generally distributed at the top and bottom of the Al-bearing formations. Compared with bauxite, clay rocks have evident differences in GR, CNL, U, Th, R and other log curves. The K content of clay rock developed at the bottom is significantly different from that of clay rock developed at the top, which is directly related to the downwards leaching migration of K+ during the late evolution of the Al-bearing rock. Clayey bauxite and bauxitic clay rock are transitional lithologies between clay and bauxite. Clayey bauxite has higher natural GR, CNL, R, U, and TH values and lower AC and K values than bauxitic clay rock. According to the degree of dissolution pore development, the bauxite layer should be the first choice for effective gas reservoirs, followed by clayey bauxite and bauxitic clay rock.

On the basis of the above analysis, bauxite deposits with higher concentrations of alumina can serve as more effective reservoirs, whereas clayey bauxite and bauxitic clay with high clay contents tend to have poorer reservoir properties. To further support this view, Fig. 8 presents the core photograph, thin section, and porosity data from Well LD58 in the study area, illustrating the physical characteristics of all types of rocks.

Reservoir porosity

The porosity data from conventional tests were sourced from the Changqing Oilfield. Figure 8 clearly shows patterns of vertical variation in aluminium content and porosity throughout the different rock layers of Well LD58. For the clayey bauxite and bauxitic clay layers in the upper well Sect. (4042–4045.2 m), the relatively dense core has an Al2O3 content of less than 40.0% and an average porosity of 7.74%. Although clumps and clastic layers are distributed throughout, there is no significant dissolution, and pyrite and siderite nodules can sometimes be observed. In the bauxite formation (4045.2–4053.5 m), the core is characterized by porous structural features. The slide deformation observed in the thin section of the rock suggests that local slip and collapse occurred during the early or middle stage of sediment-consolidated diagenesis, as the sediments were not yet fully consolidated. The oolites and beans were extensively dissolved, and residual oolitic and clastic materials developed, creating pores of various sizes. This well section contains a significant reservoir in the Al-bearing rock series, and the Al2O3 content is high, with an average of 60%, up to 75%. The average porosity of the bauxite layer is 15.66%, with a maximum value of 28.7%. The development of secondary dissolution pores that contribute significantly to porosity indicates that bauxite underwent chemical dissolution during diagenesis. Eventually, it led to the leaching of soluble elements and the enrichment of Al. In the lower well Sect. (4053.5–4057.8 m), the clayey bauxite and bauxitic clay rocks are dominated by clay minerals, with an average Al2O3 content of 40.0% and a porosity of 7.75%. The particles exhibit flattening and elongation along the surface, and sedimentary lamination is well developed. The thin section shows that the breccia is microcrystalline bauxite with subangular shapes, indicating that the previously formed bauxite rock underwent some transportation and abrasion again, but the corners were not completely rounded, and some original shapes were retained. Secondary dissolution pores are developed in this layer, but aluminium ions or other Al-containing substances in the fluid reprecipitated under appropriate conditions, forming diaspore and completely filling the original dissolution pore space, resulting in poorly developed pores in this section.

Fig. 8
figure 8

Comparison of aluminium content and reservoir porosity in different lithologies of Well LD58.

Figure 8 shows that bauxite deposits with higher Al2O3 concentrations exhibit improved porosity due to mineral dissolution. In contrast, rocks with high clay contents tend to have lower porosities, making it difficult for hydrocarbons to migrate and accumulate. The above segmented structure of the bauxite series is closely related to sea level rise and fall. The entire sedimentary environment of the late Palaeozoic North China landmass was an epicontinental sea. When the sea level fell, the original sediment was exposed to the surface and subjected to weathering, leaching, activation and migration of soluble elements, and enrichment of relatively stable elements such as aluminium, resulting in an increase in the Al2O3 content and the formation of dissolution pores. When the sea level rose, the sediment was under water and was exposed to a reducing environment. Silicification occurred, aluminium was replaced, and the aluminium content decreased to form a clay layer with high silicon and low aluminium contents. Owing to the limited circulation of water, dissolution pores did not develop. Therefore, alumina-rich bauxite clearly serves as a more effective reservoir than clay-rich rocks. To summarize, the data presented in Fig. 7 provide ample evidence that supports our main thesis, which is that a high alumina concentration is a critical factor in determining the effectiveness of oil reservoirs. Precisely, when the Al2O3 content within the reservoir rock exceeds 60%, distinct dissolution pores are generated within the rock, and this type of reservoir consistently exhibits a relatively high hydrocarbon content in gas logging. Consequently, these reservoirs can be initially recognized as effective ones. With this insight, we can better predict and optimize the exploration of bauxite gas resources, thereby improving the efficiency of our energy sector.

Principal component

After linear dimension reduction of the standardized data, a total of 8 principal components (PCs) were generated (Table 3); the eigenvalues of Components 1 and 2 (denoted Y1 and Y2, respectively) were both greater than 1, and were 4.417 and 2.041, respectively. The contribution ratios of Y1 and Y2 were 55.217% and 25.509%, respectively. Specifically, Y1 and Y2 account for 80.726% of the total variables, indicating that these two components are sufficient to represent formation logging information with less loss of data information and can better show the essential trend44. Therefore, the first two components are used as the main components.

Table 3 PCA results. Y1, Y2…, and Y8 represent the 8 principal components.

Combined with the weight coefficient (feature vector), the two principal components Y1 and Y2 can be expressed as follows:

$${Y_1} = - 0.078 \times {Z_{AC}} + 0.427 \times {Z_{CNL}} + 0.064 \times {Z_{DEN}} + 0.466 \times {Z_{GR}} - 0.287 \times {Z_R} + 0.443 \times {Z_{TH}} + 0.445 \times {Z_U} - 0.338 \times {Z_K}$$
(1)
$$\begin{gathered} {Y_2}= - 0.641 \times {Z_{AC}} - 0.106 \times {Z_{CNL}}+0.662 \times {Z_{DEN}} - 0.031 \times {Z_{GR}} \\ - 0.104 \times {Z_R}+0.035 \times {Z_{TH}}+0.085 \times {Z_U}+0.344 \times {Z_K} \\ \end{gathered}$$
(2)

By comparing the weight coefficients of the logging standard values, it can be inferred that the first component, Y1, predominantly reflects lithologic information on the basis of its higher weights for GR, U and TH. Similarly, the second component, Y2, primarily represents pore information as indicated by its higher weights for AC and DEN.

Discussion

Lithological division and identification

According to the lithological identification results from the cross plot from Well LD58 (Fig. 9), the distribution of log data of different lithologies has a large overlap zone, and there is no clear boundary between different lithologies. On the basis of Eqs. (1) and (2), the major element test data from 31 Al-bearing rock samples were used to conduct PCA. During PCA-based lithological identification, only Al-bearing rock samples were utilized, as siliceous rocks and carbonaceous mudstones could be identified without requiring special treatment. The PCA results, the principal components of Y1 and Y2 and the comprehensive score Y of the 8 conventional log curves produced the correlations shown in Fig. 10.

Fig. 9
figure 9

Lithology identification intersection diagram in the study area. (a) AC-GR. (b) AC-U. (c) AC-Th. (d) AC-K.

Fig. 10
figure 10

Relationships among the principal component, aluminium content and lithology. (a) Y-Al2O3. (b) Y1-Al2O3. (c) Y2-Al2O3. (d) Y-Y1. (e) Y-Y2. (f) Y2-Y1.

The results show that the principal component Y1 and comprehensive score Y increase linearly with increasing aluminium content, whereas Y2 tends to decrease with increasing aluminium content. The lithology shows evident linear zonation in the Y-Y1 chart. Bauxite is distributed mainly in the Y1 ≥ 2 and Y ≥ 1 regions, clay bauxite is distributed in the 2  > Y1≥ − 0.5 and 1 > ≥ − 0.5 regions, bauxitic clay rock is distributed in the − 0.5 > Y1≥ − 2 and − 0.5 > Y≥ − 1 regions, and clay rock is located in the Y1 < − 2 and Y< − 1 regions. The lithology presents evident two-dimensional zoning in the Y-Y2 and Y1-Y2 charts. Owing to some overlap of Y2 for different lithologies, the lithology can be divided according to Y2 as an alternative scheme. An effective bauxite gas reservoir has the highest Y and Y1 and the lowest Y2, whereas clay rock has the lowest Y and Y1 and the highest Y2.

Al2O3 content prediction and reservoir location

The linear regression model of the Al2O3 content (TAL) and principal components Y1 and Y2 is as follows:

$$TAL=9.012 \times {Y_1} - 0.289{Y_2}+42.916$$
(3)

According to the standardized processing and the Eqs. (1) and (2), the expression of each logging value is further rewritten as follows:

$$\begin{aligned} TAL&= - 0.011AC+0.147CNL+1.160DEN+0.023GR \nonumber\\ &\quad - 0.121R+0.132TH+0.549U - 2.073 K+19.235 \end{aligned}$$
(4)

Equation (4) shows that the Al2O3 content of the bauxite gas reservoir is negatively correlated with AC, R and K and positively correlated with CNL, DEN, GR, TH and U, which is consistent with the well logging geological response characteristics of bauxite reservoirs mentioned previously.

Figure 11 shows the results with and without PCA. By comparison, the Al2O3 content calculated via the regression model based on PCA is in good agreement with the test results. The fitting coefficient is 0.93 (Fig. 11a), which indicates that the model has a good effect on predicting the Al2O3 content and can be used to accurately indicate favourable reservoir strata with high aluminium contents. Compared with PCA, direct multiple linear regression of log data and the Al2O3 content is a more concise mathematical statistical method, and its fitting formula is as follows:

$$TAL=\sum\limits_{{i=1}}^{m} {({\beta _i} \times {X_i})} +{\beta _0},(i=1, \ldots ,m)$$
(5)

According to the principle of least squares, the corresponding fitting coefficient βi and its intercept β0 from the different log curves can be obtained via the Cholisky decomposition method:

$$\begin{aligned} TAL&= - 0.032AC+0.273CNL - 3.678DEN+0.055GR \nonumber \\&\quad - 0.133R+0.034TH - 0.165U+0.170 K+26.943 \end{aligned}$$
(6)
Fig. 11
figure 11

Al2O3 content from ICP and prediction. (a) With PCA. (b) Without PCA.

The fitting degree (R2) between the predicted value and the measured value was 0.949 (Fig. 11b), surpassing the accuracy of the regression algorithm based on PCA. The log values, such as GR, U, Th, etc., exhibit inherent correlations that necessitate the assessment of multicollinearity in the linear regression model. Each independent variable (Eq. (5)) in the regression model was regressed against other independent variables, and subsequently, tolerance (T) and the variance inflation factor (VIF) were computed (Table 4). The results indicate that the established regression model without PCA demonstrates multicollinearity, as evidenced by a tolerance value below 0.1 and a VIF exceeding 10. As geological researchers, we not only strive for enhanced model accuracy but also emphasize the geological interpretation of independent variables. However, the weight coefficients of Eq. (6) were not consistent with the geological significance of each log curve. As seen from our major element test, the Al2O3 content is highly positively correlated with U, and the correlation coefficient reaches 0.66, whereas the coefficient of U is negative (− 0.165) according to the multiple regression results. Moreover, an effective reservoir is closely related to the leaching of soluble K+, which indicates that the Al content should be negatively correlated with the K content, but the coefficient of K in the regression in Eq. (6) is positive (+ 0.170). The above problems are caused by the strong correlation among the logs, which leads to serious multicollinearity among the logging variables. Therefore, multiple regression as a statistical method to address the relationships among variables is feasible, but it does not provide a reasonable interpretation of logging geology.

Table 4 Results of the multicollinearity test of the linear regression algorithm.

In addition to dimension reduction and linear regression, we employed random forest (RF), decision tree classifier (DTC), and support vector machine (SVM) algorithms to predict the Al2O3 content (Fig. 12). The findings compared with those in Fig. 11 illustrate that various approaches result in distinct errors and levels of fit. The SVM algorithm with higher complexity has a relatively low R2 value; the remaining three algorithms demonstrate a relatively high level of goodness-of-fit. Among them, the PCA regression method may yield slightly lower R2 values than the RF, linear regression, and DTC methods because of its non-100% contribution degree. Notably, this study aims to identify an algorithm capable of rapidly and effectively identifying potential reservoirs while providing information for subsequent logging data mining and geological research. Therefore, it is crucial not only to achieve high data accuracy but also to ensure that the model reflects geological phenomena and laws. Although the DTC, RF, and linear regression algorithms achieved relatively high data accuracy, as expected by mathematicians, geologists prefer visually interpretable models that are suitable for geological research with reasonable accuracy. The PCA-based formula for the Al2O3 content yields relatively accurate predicted values; its advantage over the RF, DTC, and SVM algorithms lies in the fact that the relationship between the source data and target data is not a black box–i.e., the correlation between the Al2O3 content and well logging data can be visualized. Consequently, for field applications and geological research purposes, the PCA-based regression equation in reservoirs is more applicable.

Fig. 12
figure 12

The prediction results of RF, DTC, and SVM.

Application effect

The above research results are applied to lithological identification of the bauxite-bearing formation of the Taiyuan layer in Well LD58. Coal seams and carbonaceous mudstones can be identified directly according to acoustic wave and density curves. The lithology of the aluminous rocks is determined based on the boundary values of Y, Y1, and Y2 corresponding to different lithologies in Fig. 10. For example, in well sections from 4043.2 to 4044.7 m and 4046.3 to 4053.7 m in Fig. 13, where Y1 ≥ 2 and Y ≥ 1, the lithology is identified as bauxite according to Fig. 10; similarly, in well sections from 4040.1 to 4043.2 m and 4053.7 to 4055.2 m in Fig. 13, where 2 > Y1≥ − 0.5 and 1 > ≥ − 0.5, the lithology is classified as clayey bauxite. The principal components of Y1 and Y2 and the composite score Y of conventional log curves have evident stratification characteristics (Fig. 13).

Fig. 13
figure 13

Lithologic division of the Al-bearing layer in the Taiyuan Formation of Well LD58.

In Fig. 13, the sixth, seventh and eighth tracks represent the principal components of Y1 and Y2 and the comprehensive score Y, respectively. The lithological identification results, which are based on the Al2O3 content and PCs of the interpretation chart (Fig. 10), are shown in the last track. The Al-bearing layer of the Taiyuan formation in LD58 can be divided into 10 intervals. Two bauxite deposits from 4043.2 m to 4044.7 m and 4046.3 m to 4053.7 m are identified. Compared with ICP, both regression analyses with and without PCA demonstrated variations in the Al content in the strata; however, the application of PCA to the bauxite layer yielded more precise regression results, which indicates that this method has a good ability to identify the lithology of the Al-bearing layer in the Longdong area.

According to the Al2O3 content prediction equation, another well (LD47) was selected for lithological identification and reservoir prediction (Fig. 14). The sixth track is the predicted content of Al2O3 via PCA. On the basis of the corresponding relationship between the Al2O3 content and the degree of pore development (Fig. 8), the high-aluminium section with an Al2O3 content exceeding 60% is preliminarily determined as an effective reservoir (the red part in the sixth track). The seventh track is the result of gas logging, and its high value in the red well section of the sixth track represents the potential gas zone with a relatively high hydrocarbon content, which further validates the effectiveness of the reservoir prediction on the basis of the Al2O3 content. The prediction results of the Al2O3 content via PCA reveal that there are two high aluminium contents at the bottom of the Taiyuan Formation (P1t) in Well LD47: one from 4101.4 ~ 4105.8 m and the other from 4112.4 ~ 4119.8 m. The gas logging data obtained from these two intervals demonstrate excellent characteristics, with a peak value of 86.8149% and a base value of 2.4753%, which aligns remarkably well with the predicted Al-rich segments.

Fig. 14
figure 14

Effective reservoir prediction based on the Al2O3 content in Well LD47.

The production test curves of the two potential intervals are presented in Fig. 15, which confirms the reliability of the regression model established in this study for effectively identifying reservoirs in Al-bearing formations. The perforated interval was set at 4103 ~ 4106 and 4113 ~ 4115 m. The annulus volume fracturing technique was employed to achieve a high production industrial gas flow rate of 67.3832 × 104 m3/day (AOF). A trial production period of 45 days was conducted, with initial oil and casing pressures of 26.72 and 26.75 MPa, respectively. In the initial stage, production was carried out at 30,000 m3/day, with rapid decreases in oil pressure and casing pressure. After 35 days, the production allocation was adjusted to 20,000 m3/day. Throughout the trial production period, a decrease in liquid production was accompanied by an increase in chlorine content, indicating a favourable flowback effect. Posttest production analysis revealed daily decreases of 0.19 MPa and 0.20 MPa for the casing pressure and oil pressure, respectively. The cumulative gas production reached 1,245,166 m3, whereas the cumulative water production reached only 73.6 m3.

Fig. 15
figure 15

Production test performance curves of potential intervals of Well LD47.

Conclusions

  1. (1)

    Major elements have regular stratification in the vertical direction. During bauxite mineralization, the comigration and coenrichment of the inactive elements, Al and Ti, led to an evident positive correlation between Al2O3 and TiO2. Leaching and desilication caused an evident negative correlation between Al2O3 and SiO2, and Si was enriched at the bottom of the Al-bearing formation. The soluble K was leached to the lower strata to form illite, which was enriched at the bottom. According to the Al2O3 content, aluminous rock can be divided into clay rock, bauxitic clay rock, clayey bauxite and bauxite.

  2. (2)

    A bauxite deposit with a high aluminium content is an effective gas reservoir with dissolved pores, showing the characteristics of “five high and three low” (high GR, CNL, DEN, U and TH and low AC, R and K). The lithological boundary of Al-bearing rock is not very clear in conventional log cross plots. The four lithologies can be clearly divided into Y-Y1 charts, Y-Y2 charts and Y1-Y2 charts on the basis of linear dimension reduction. Moreover, the effective bauxite reservoir has the highest Y and Y1 and the lowest Y2. However, more tests and well log data are needed to further define the boundaries of Y, Y1, and Y2 for Al-bearing lithologies in the database that the division chart referenced.

  3. (3)

    The prediction model of the Al2O3 content established via regression based on PCA not only reasonably fits the test results but also reflects the variation characteristics of the Al content in Al-bearing rocks with respect to logging values, avoiding the phenomenon of “value true but meaning false” caused by the multicollinearity of multiple regression. The lithological identification chart and the major element content prediction equation established in this paper have good applicability to the identification of effective bauxite gas reservoirs.