Introduction

Soil biodiversity constitutes an essential fraction of the biodiversity on Earth1,2,3. It is diverse and complex, containing microorganisms and fauna, making it an important factor influencing the functioning of terrestrial ecosystems, such as climate regulation, plant growth, nutrient cycling and food production4,5. As the Earth enters the anthropocene age and faces another wave of mass extinctions, timely documentation of soil biodiversity, abundance, and biogeographical pattern is essential6. The currently understanding of the macroecological and biogeographical functioning of soil biota has advanced considerably in recent years due to the availability of information regarding the abundance of soil biota and their spatial patterns at the community levels, as well as climate modeling tools2,7,8,9,10. To extensively explore these ecosystem functions, it is imperative to understand the global or regional distribution of multiple soil biota components.

In the early 21th century, American scholars Fierer and Jackson11 pioneered the study of the geographical distribution of soil bacterial diversity on a continental scale. Since then, the geographical distribution of bacteria12,13, fungi12,14 and archaeal diversity15,16 has been studied on a global scale. In recent years, most global studies focusing on the large-scale taxonomic diversity patterns in soil microorganisms have employed molecular sequencing methods, providing valuable semi-quantitative information about the taxonomic diversity17,18. However, other organisms in soil have not been studied. Soil fauna (e.g., arthropods, nematodes, and earthworms) is also an important repository of biodiversity providing support and regulating ecosystem services19. Currently, only a small number of soil fauna, such as earthworms20, springtails21, nematodes22, protozoa23, mites24 and ants10, have been analyzed by integrating data from the available literature to characterize their global geographic distribution patterns. Unlike the ancient ecological concept that species diversity declines with the increase in latitude, current evidence suggests different latitudinal diversity gradients for above- and belowground species25,26. For instance, aboveground species diversity often peaks near the equator, but the species richness of soil organisms (e.g., microbes, nematodes, earthworms) does not coincide with the latitudinal gradient of vegetation diversity11,20,22,27,28. Overall, the α diversity of termites29, oribatid mites30 and springtails21 is predicted to be highest in the tropics, whereas earthworms tend to have the highest α diversity in the temperate zones20. Parasitic protozoa are particularly abundant in the tropics23, while nematode diversity is highest in temperate zones (30°−40°), followed by the tropics31. Nevertheless, many soil fauna, such as beetles, spiders, diplopods, chilopods and pseudoscorpion, have not been profiled on a regional or global scale. Furthermore, in contrast to the global distribution of aboveground organisms (plants and aboveground animals), which mainly rely on the energy supply and water, the current understanding of the distributional patterns of soil fauna, and especially the mechanisms underlying the observed patterns, is poor20,32,33.

Although there has been renewed interest in conducting large-scale biogeographic studies on soil fauna, the execution of such large-scale studies of soil fauna distribution is challenging. First, the collection, classification and identification methods of soil fauna used by different researchers are inconsistent. Second, not only have fewer sampling sites been reported, but a large number of unknown species have not been described and fully captured. In addition, there are gaps in the study of representative areas such as ecological restoration areas. Third, owing to the lack of technological and methodological innovations, taxonomic identification is not comprehensive, and the available studies focus on the time-consuming and labor-intensive morphological identification. In parallel, when describing a complex system such as the biosphere, a fundamental component of global biodiversity may be overlooked if there is no accurate estimate of soil biodiversity34.

Quantitative descriptions of biomass distributions enable the assessment of biological sequestration of carbon and modeling global biogeochemical cycles, as well as for understanding the historical and future effects of human activities35,36,37,38. Nevertheless, due to the technical/methodological limitations reported in various areas, most soil fauna population studies are primarily based on sampling techniques which measure trends in populations or biomass, but not their absolute numbers in an area. Moreover, the current understanding of the global/regional distribution assessment of soil fauna biomass/abundance is limited.

The Loess Plateau is a special region undergoing restoration and regenerative practices, showing an increase in vegetation cover from 31.6% since 1999 to 59.6% in 2013 to 71% in 202039,40. The process of vegetation restoration inevitably induce changes to the structure and diversity of soil fauna communities, either directly or indirectly41. The Loess Plateau has the largest area of grassland, about 300,000 km2, and it is distributed from the warm-temperate semi-humid zone in the southeast to the semi-arid zone, stretching to the middle-temperate semi-arid zone in the northwest. Therefore, it is more natural to study the geographic distribution of soil fauna and their drivers in the Loess Plateau grassland. Investigating the biogeographic information on the abundance and activity of soil biota is critical for climate modeling and guiding the formulation of environmental policies7,8,9. Consequently, studying soil fauna geography and soil fauna biomass assessment in the restoring ecosystem, such as Loess Plateau, will reveal the biodiversity at the regional scale and improve the design of biogeochemical models.

Here, we report scarce area-based data on soil fauna distribution and biomass through extensive field sampling. Our study aimed to (1) describe the current spatial distribution and community structure of soil fauna; (2) identify environmental factors affecting soil fauna assemblage; (3) quantify regional total soil fauna populations and biomass; and (4) combine the relationship between soil fauna and environmental drivers to predict the trends of soil fauna communities during vegetation restoration. This comprehensive assessment of the community assemblage structure of soil fauna in the Loess Plateau, especially the assessment of soil fauna biomass, providing new ideas and methodological references for future research on soil fauna. This study adds to the current understanding of the ecological function of soil fauna in the restoring ecosystem.

Results

A total of 26,924 individuals belonging to 3 phyla, 10 classes, 21 orders and 4 suborders were obtained in 95 areas of 19 sample sites of the grassland across the Loess Plateau (Fig. 1). Overall, the study covered all major biomes and captured soil fauna with an average density of 20,997 ind.  m−2. Among the identified individuals, Oribatida (6510 ind. m−2, 31.01%), Mesotigmata (6095 ind.  m−2, 29.03%), and Collembola (4607 ind. m−2, 21.94%) were dominant groups, Prostigmata (1796  ind. m−2, 9.41%) and Astigmata (1166 ind. m−2, 5.55%) were the most common groups, and the rest were considered as rare groups. Notably, the biomass of soil fauna in grassland was 18.62 g m−2, and the highest average biomass was obtained in the Hemiptera (3.86 g m−2), while the lowest was found in the Pauropoda (<0.01 g m−2).

Fig. 1: A phylogenetic tree.
figure 1

A Population density and biomass (B) of taxonomic groups for our analysis of soil fauna. The bar chart represents the average value.

Specifically, the population density and biomass of soil fauna significantly decreased as the latitude increased (Fig. S1). Soil fauna density varied c. 600-fold across latitudes, with maximum densities recorded in Fufeng (mean 82465 ind. m−2) and minimum densities in Balakong (mean 140 ind. m−2). The average biomass of all soil fauna ranged from about 82.32 g m−2 in Fufeng down to about 0.08 g m−2 in Yinchuan. This indicated that the southern regions had a larger number of soil fauna compared with northern areas. However, no significant relationships were observed between soil fauna populations characteristics and longitude.

The total population in the region estimated for soil fauna was ≈4.06 × 1015 individuals in the grasslands of the Loess Plateau (Fig. 2). The population density was dominated by mites and springtails, which ranged from about 85073 ind. m−2 in the southern semi-humid zone down to roughly 142 ind. m−2 in the northwestern arid zone. Mites and springtails accounted for >95% of the total population of soil fauna, with about three-quarters of these being mites. The total biomass in the region estimated for soil fauna was ≈4.26 × 106 t of dry weight. Among taxonomic units, Hemiptera contributed the most of soil fauna biomass 20.72%, while springtails and mites contributed 8.48% and 10.80%, respectively. In general, at the regional scale of the Loess Plateau, the spatial distribution pattern of density and biomass exhibited a spatial zonal distribution pattern decreasing from southeast to northwest.

Fig. 2: Distribution patterns of density (ind. m−2) and biomass (g m−2) in grasslands of the Loess Plateau based on spatial interpolation of actual sampling points.
figure 2

In the color gradient indicates density and biomass, with low value shown in blue and high shown in red.

Regression analysis (Figs. S2 and S3) showed that both density and biomass increased with SWC, EC, SOC, TN, TP, AP, LBM, LOC, LTN, LTP, TEM, PRE, and NDVI, but decreased with increasing SBD, pH, and ELE. Similarly, correlation analyses showed that density and biomass showed significantly (P < 0.05) negatively correlated with SBD, pH, and ELE but positively (P < 0.05) correlated with other variables (Table S1). Notably, the PRE was found to be a significant predictor of density and biomass at 72.48% and 58.00%, respectively (Fig. S4). Moreover, defining the combination of variables it was found that climate, soil, geography, and plant contributed 72.12% and 61.01%, 11.76% and −1.7%, 11.18% and 9.07%, 4.94% and 31.61% to the density and biomass models, respectively. Further analysis revealed that PRE and TEM were positively correlated with density and biomass. The results showed that TEM and PRE were important factors influencing the fauna density and biomass than soil, geography, or plant properties. Therefore, TEM and PRE were selected as the main parameters for predicting soil fauna distribution models (Table S2). In the constructed prediction model, climatic factors predicted 63.35 % and 82.48 % of the fauna density and biomass distribution, respectively. Following the implementation of the Grain-for-Green for more than 20 years, we estimate that the regional distribution of fauna density and biomass gradually increased over time and their distribution patterns decreased gradually from southeast to northwest (Figs. 3 and 4). Moreover, the individuals and biomass of soil fauna were conservatively elevated by about 27.72% and 29.09%, respectively, after the implementation of the Grain-for-Green.

Fig. 3
figure 3

The predicted distribution of fauna density (ind. m−2) in different years.

Fig. 4
figure 4

The predicted distribution of fauna biomass (g m−2) in different years.

At the regional scale, the distribution pattern of fauna individuals and biomass were mainly influenced by climatic factors but also by the anthropogenic disturbances. However, anthropogenic disturbance is not a specific experimental treatment of human disturbance, but a proxy for changes in grassland area across the region. The area of grassland on the Loess Plateau showed an increasing and then decreasing trend between 2000 and 2022 (Fig. S5). At the same time, when only anthropogenically induced changes in grassland area were considered, the fauna individual and biomass first increased and then decreased over time (Fig. 5, Figs. S6 and S7). However, individuals and biomass of soil fauna exhibited a gradual increase over time when only climatic factors were considered (Fig. 5, Figs. S8 and S9). Furthermore, fauna individuals and biomass showed a gradual increase over time when combining anthropogenic and climatic factors (model prediction). Focusing solely on human-caused changes in grassland area, soil fauna individuals and biomass decreased by 4.30% and 4.83% between 2000 and 2022. However, when considering only climatic factors (temperature and precipitation changes), these values increased by 30.04% and 31.69%. Climate factors were found to be the primary drivers of change, contributing 99.99% and 99.78% to the predicted values for fauna individuals and biomass, respectively. Anthropogenic factors contributed a much smaller portion, 72.41% and 73.97%.

Fig. 5
figure 5

Contribution of anthropogenic and climatic changes to the individuals and biomass of soil fauna.

Discussion

By using consistent sampling methods across different climatic zones, this study overcame the limitations of regional soil fauna distribution data. This allowed us to identify a clear pattern of soil fauna distribution in the Loess Plateau. Specifically, Collembola (springtails) and Acari (mites) were the dominant taxa in this region, accounting for more than 95% of the total individuals. Globally, mites and springtails account for more than 95% of the total population of soil arthropods, of which about two-thirds are mites42. In many soil ecosystems, it has been shown that springtails and mites are the dominant taxa43,44,45. The abundance and diversity of mites and springtails drove the overall patterns, and the ratio of Acarina to Collembola (A/C) have been used as important bioindicator of environmental stability46. In this study, the average ratio for the entire region of A/C was 3.4, which was lower than that recorded in the subalpine meadows and the Songnen grasslands, and lower than that of Robinia pseudoacacia and Pinus tabuliformis forests in the same region43,47,48,49. Perhaps, the lower A/C value in the present study was limited by water availability, similar to results obtained in arid and semi-arid regions where soil moisture was the most essential factor influencing soil fauna50,51. Moreover, it has been shown that mites have a stronger need for soil moisture than springtails52. Variations in vegetation properties underlie the distribution patterns of soil fauna at local scales. Specifically, plant functional types, diversity levels, and litter quantity/quality fundamentally shape soil fauna community composition and biodiversity. Direct effects manifest through plant communities altering faunal food resources and microhabitat conditions via inputs of aboveground biomass and belowground root exudates. Indirect pathways involve modifications to soil hydrothermal regimes resulting from differences in plant community composition, diversity, and functional types, thereby influencing soil fauna assemblages41. Of note, the restoring ecosystems in our study is a rather different land-use transformation, which created a dynamic assemblage with Acarina and Collembola in response to great changes in food resources and habitats.

Over the years, whether the abundance of soil fauna is highest in high or low latitudes has been a subject of debate in terms of geographic distribution patterns of soil fauna diversity. Studies on soil fauna, particularly earthworms, ants, springtail, and nematodes, have reported conflicting results at varying latitudes10,20,21. However, distribution patterns observed for single soil fauna taxa are not easy to translate to soil fauna assemblages. In the present study, we found that the density of soil fauna assemblages on the Loess Plateau decreased significantly as the latitude increased, which is similar to the distribution pattern of most studies based on a single taxon. Consistent with previous reports42, soil fauna density varied significantly with latitude, ranging from dozens to tens of thousands of individuals per square meter, primarily driven by variations in dominant taxa (mites and springtails). Specifically, compared with the population densities reported by Rosenberg et al.42, which ranged from about 200,000 to about 1000 ind. m-2, the soil fauna assemblage densities in this study ranged from about 80,000 to about 100 ind. m-2. The biomass of soil fauna mainly depended on the population density and body size53. This may explain the results of some soil fauna with high densities but low biomass. For example, the density of Hemiptera was only 0.60% of the total fauna density, while the biomass was 20.72% of the total fauna biomass (Fig. 1). Similar to soil fauna density, biomass also decreased as latitude increased, but with significant variation. These findings align with previous studies across different ecosystems, likely due to the varying environmental conditions along latitudinal gradients21. Additionally, utilizing comprehensive datasets on various environmental conditions, we extrapolated the estimates to create high-resolution quantitative maps of soil fauna assemblage density and biomass for the entire Loess Plateau region. These maps showed a spatial zonal pattern of decreasing soil fauna density and biomass from southeast to northwest. The regional-scale distribution of soil fauna communities was affected by environmental factors such as soil, plant, geography, and climate (Figs. S2 and S3). Among them, climatic variables, the precipitation and temperature, were the most important drivers of the distribution of soil fauna communities (Fig. S4). Evidence from previous investigations have shown that at the regional-scale soil fauna community composition are mainly influenced by climatic conditions, i.e., precipitation in grassland ecosystems54,55,56, which is consistent with our findings. At large scales, climatic variables may affect soil fauna directly by altering their metabolism, development, and feeding rates, as well as influencing the interactions among soil faunal groups, and indirectly by altering habitat cover and soil properties57,58. Overall, climate factors are known to affect the physiology of soil organisms, such that their activities and growth increase at higher temperatures and soil moisture57. Worldwide, climate change has become a main topic in the field of soil fauna ecology44,58,59, with rainfall and warming exerting positive effects on soil fauna60. Meanwhile, temperature and precipitation contributed to the changes in vegetation on the Loess Plateau, with contributions exceeding 26% and 49%, respectively61. Precipitation influences various aspects of soil organisms including soil fauna in the Loess Plateau, much of which is arid and semi-arid zones. Precipitation may help to replenish water for soil fauna and enhance the development of aboveground plant communities, providing adequate food resources and habitat space for soil fauna. Studies have suggested that changes in soil fauna under small-scale conditions are influenced by factors such as soil pH, soil carbon and nitrogen content, soil moisture, plant diversity and litter quality62,63,64. In the present study, precipitation and temperature were the most important drivers of soil fauna distribution under the same vegetation cover conditions. Notably, soil and plant properties at the regional scale are mainly affected by precipitation and temperature, either directly or indirectly. Therefore, we inferred that precipitation may influence soil fauna in the grasslands of the Loess Plateau, and that precipitation data have a higher resolution in estimating soil fauna densities, providing more accurate estimates. In future, researchers should combine global precipitation and stand condition distribution data for global soil fauna biomass estimation.

By analyzing climate data alongside soil fauna information, we were able to model the soil fauna community distribution changes over nearly 20 years, also estimated the total abundance and biomass of the soil fauna assemblage. In the 0–30 cm soil layer in 2022, the total soil fauna abundance was 4.06 × 1015 individuals and 4.26 × 106 t. This was equivalent to about 2.13 Mt of carbon (Mt C), one-sixth and one-fifteenth of the estimated global biomass of ants and nematodes, and 3.55% of the total human biomass on Earth22,38. In terms of global average, soil Hexapoda contributed the most to the total soil arthropod biomass (≈77%)42. In this study, Hexapoda accounted for about 60% of the total soil fauna assemblage biomass. The model predicted that soil fauna assemblage abundance and biomass have increased by 27.72% and 29.09%, over the last 20 years. Climate change was the main driver of this increase based on comparisons of the contributions of anthropogenic and climatic factors in the time series (Fig. 5 and Fig. S4). If the future climate, as assessed by IPCC AR665, continues to shift towards warmer and wetter conditions, soil fauna biomass will significantly increase on the Loess Plateau and globally.

Although microbial necromass contribute about 50% of the soil organic carbon66, the proportion of living microorganisms to soil organic carbon is less than 5%67. In contrast, the quantitative relationship between soil fauna necromass or live biomass and soil organic carbon has not been clarified. Based on field survey data, we estimated that the biomass carbon of living soil fauna in grasslands on the Loess Plateau accounted for about 0.25% of soil organic carbon stocks68. These findings should be considered a tentative underestimate due to the limited data available, which may also be systematically biased in various ways. Although soil fauna organisms do not contribute much to the soil carbon pool, they may affect the pool during their life activities. For example, soil fauna contributed as high as 40% in soil organic matter decomposition69,70. Future research that includes a wider range of soil fauna taxa and incorporates measurements of necromass is advocated to expand the understanding of soil fauna’s contribution to soil organic carbon. Nevertheless, our results demonstrated the empirical link between soil fauna and soil organic carbon on a large scale, expanding the quantitative understanding of soil fauna in terrestrial ecosystems.

Conclusion

Information on the global geographic distribution of soil fauna assemblages is scarce. To our knowledge, this represents the first large-scale assessment of the diversity, abundance, and underlying drivers of soil faunal communities across extensive geographical gradients. The study was conducted using harmonized sampling methods, taxonomic criteria, and sampling depths. The following conclusions are obtained: (1) The total soil fauna population and biomass of the Loess Plateau grasslands were estimated to be 4.06 × 1015 individuals and 4.26 × 106 t, respectively. (2) Soil fauna density and biomass decreases with increasing latitude. (3) Climatic factors (temperature, precipitation) are the main drivers of the spatial distribution of soil fauna assemblages. (4) Soil fauna population and biomass increased by 27.72% and 29.09%, from 2000 to 2022 in the Loess Plateau, which was mainly caused by climate change rather than the changes in the grassland area. Given the role of soil fauna in the ecosystem, estimating the soil fauna biomass will provide a foundation the developing strategies for enhancing soil biodiversity conservation and material cycling in soil ecosystems, including the carbon cycle. The present findings are expected to improve the accuracy of global soil fauna biomass assessment and provide a scientific reference for better understanding on the function of soil fauna in ecosystems.

Methods

Sampling locations

Covering approximately 640,000 km2, the Loess Plateau in China is characterized by semi-humid and semi-arid conditions (33°43′−41°16′N,100°54′−114°33′E). The region has a temperate continental monsoon climate, with a mean annual temperature and a mean annual precipitation that decreases from 14.3 °C to about 3.6 °C and from 800 mm to about 150 mm from southeast to northwest. As the thickest and largest loess deposition area in the world, it has long witnessed major ecological and environmental problems such as soil erosion, vegetation degradation and soil nutrient loss. To address these changes, several ecological management projects such as grain for green project have been implemented, achieving distinctly improvement in terms of ecology and environment, with a marked increase in vegetation cover and enhanced capacities for soil-water conservation and carbon sequestration. Except for the most humid southeastern part which is dominated by forests, over 80% of the Loess Plateau is covered by arid and semi-arid grassland ecosystems. Grasslands account for about 40–50% of the entire Loess Plateau. They are highly sensitive to environmental changes and thus serve as indicators of ecosystem restoration. The Loess Plateau, as a special ecological region undergoing restoration and regenerative practices, is a key area for vegetation restoration and biodiversity conservation in China. In this study, we conducted regional-scale field investigation to collect soil fauna samples from 19 sites (Fig. S10; Table S3) of the Loess Plateau, covering all the bioclimatic zones of the Plateau71.

Collection and identification of soil fauna

Field surveys on soil fauna of grasslands were performed in 19 sites of the Loess Plateau from July to September 2022, when soil fauna activity is highest within the year72,73. Firstly, five representative grassland sample areas (with a distance of more than 100 m between sample areas) were selected at each site with uniform topography, a large enough area, and little human disturbance. Subsequently, five sample squares (1 × 1 m) were established at each sample areas based on the diagonal method. Latitude, longitude, and elevation (ELE) of the sample area were determined using a handheld GPS. Soil was collected from 0–30 cm depth at the center of each sample square using a modified sampler (iron square sample frame, area: 1000 cm2, height: 30 cm). The soil fauna was sampled based on previously described methods Anderson and Ingram74, Kamau et al.75, Zhu et al.76, and Yang et al.41. The main sampling process is presented in Figure S11. All soil samples from the five samplers were mixed well and placed in trays. In addition, macrofauna were picked by hand on site and preserved in a 75% alcohol solution. The soil samples were transported to the lab within 2 h for isolation of meso- and micro- fauna using a modified Tullgren funnel method. Finally, the isolated soil fauna was identified and classified using the keys of Yin77, Krantz and Walyer78, and related website (http://www.collembola.org/).

Collection and measurement of soil and litter

Soil samples in the 0–30 cm soil layer were collected together with the soil fauna. Immediately following field collection, each homogenized soil sample was sieved (2 mm) and separated into two subsamples. One subsample was used to determine soil water content (SWC), while the other was air-dried, followed by the removal of impurities (e.g., roots, stones), and subsequently used for the determination of pH, electrical conductivity (EC), soil organic carbon (SOC), soil total nitrogen (TN), soil total phosphorus (TP), and soil available phosphorus (AP)41,79,80,81,82. The intact soil (steel cylinder core sampler with a diameter of 5 cm) was used to determine soil bulk density (SBD). For all litters, we measured litter organic carbon (LOC), litter total nitrogen (LTN), litter total phosphorus (LTP), and litter biomass (LBM) using standard methods41,81.

Climate data sources

The meteorological data, including annual precipitation (PRE) and mean annual temperature (TEM), were collected from the Loess Plateau SubCenter, National Earth System Science Data Center, National Science & Technology Infrastructure of China (http://loess.geodata.cn). The time scale of the data was the monthly mean temperature and precipitation dataset from 2000 to 2022 with a spatial resolution of 1 km (Figs. S12 and S13). The 30 m annual land cover datasets were derived from the National Cryosphere Desert Data Center (Fig. S5). (http://www.ncdc.ac.cn)83. The normalized difference vegetation index (NDVI) datasets were provided by National Tibetan Plateau/Third Pole Environment Data Center (http://data.tpdc.ac.cn)84.

Calculation of soil fauna density and biomass

For each biota, the number of individuals per square meter were calculated based on the actual number of individuals captured in the sample squares10,21, and the average density of all taxa was summed to obtain the average soil fauna density for the site.

The widely used dry weight was selected as the measure of biomass. Because dry weight is reserved for the individual body weight or mean body weight of the taxa found by sampling. The soil fauna biomass and biomass carbon was calculated following methods reported by Barnes et al.85, Bar-On et al.38 and Schultheiss et al.10, which are described in the following steps: (1) soil fauna density and the total number of individuals in each group were counted; (2) body length of individual soil fauna was measured with an accuracy of 0.01 mm; (3) the dry weight of individuals was calculated based on length-mass regression (Table S4); and (4) 50% of the dry weight of an individual soil fauna was determined to be its biomass carbon value.

Regional abundance and biomass estimation

Estimating soil fauna biomass at a regional level by simply multiplying the average biomass by the total surface area of such a large region can lead to erroneous estimates, therefore, we correlated the response variables with the environmental parameters to obtain more valid estimates (Figs. S4 and S14). Ecological management measures in the Loess Plateau are based on protection, and under this restriction, the natural succession of its vegetation takes a long time. Therefore, it can be assumed that areas with similar environmental conditions under the same land-use type have the same soil fauna, provided that there are no major changes in policy. Based on these, the specific extrapolation procedure for regional abundance and biomass was described as follows (Fig. 6): (1) The main environmental factors (mainly TEM and PRE in this study) were screened from the climate, plant, geography, and soil properties influencing the spatial distribution of soil fauna. (2) After geostatistical analyses the precipitation and temperature data were superimposed on the zoning to form 8833 zones with the same temperature and precipitation attributes, called same attribute zones. (3) The actual sampling point data were imported into ArcMap according to latitude and longitude, and the attribute zones where the sampling points were located were inversely extracted. Assign the actual acquired soil fauna density and biomass to the attribute zones with the same attributes as the sampling points (Based on the attributes of the 19 sites, combined with the extrapolation of the same attribute zones, there are a total of 7137 attribute zones). (4) Geostatistical analyses were performed on the basis of 7137 attribute zones, and then spatial interpolation was performed on the entire region to generate the predicted surface. (5) The spatial distribution maps of soil fauna density and biomass in the grasslands of the Loess Plateau were extracted based on the distribution of grasslands in the region. Based on the resultant raster, the number and biomass of soil fauna in the whole region of the Loess Plateau were rasterized by using the regional analysis tool under the Spatial Analyst module in ArcToolbox. (6) Correlation functions between environmental parameters and soil fauna were established by combining precipitation, temperature and soil fauna data extracted from the attribute area. (7) Obtain the relevant environmental parameters of the region from 2000 to 2021. These environmental parameters were brought into the constructed function in ArcMap to map the distribution of soil fauna density and biomass in different years as well as to calculate the total number of soil fauna and total biomass in each year.

Fig. 6: General framework for estimating regional soil fauna abundance and biomass.
figure 6

Soil fauna density or biomass were extrapolated from the actual sampling data in 2022 (1-5). The correlation was constructed between the actual sampling values and the environmental parameters in 2022 (6). The map of soil fauna regional distribution in the corresponding year based on the constructed correlation and the environmental parameters of other years was shown in (7). It is assumed that areas with similar environmental conditions have the same biological attributes without marked changes in current environmental conditions, such as vegetation cover, climate, and land-use practices, i.e., areas with similar habitat environments under the same ecosystem are likely to have similar biological compositions. Based on this assumption, in this study, the factors that have a greater influence on the habitat, such as precipitation and temperature, were selected to be superimposed, and the areas formed after the superimposition were considered to have similar biological attributes.

Statistical analysis

The mean value of soil fauna density and biomass were calculated for each taxonomic unit at each site, to summarize all taxonomic units. To evaluate the distribution characteristics and total biomass of soil fauna, we selected climate (TEM, PRE), plant (LBM, LOC, LTN, LTP, NDVI), soil (SWC, SBD, EC, pH, SOC, TN, TP, AP), and topographic (ELE) variables with known ecological effects on soil fauna, and determined their relationships with density and biomass. Regression analysis was performed to calculate the soil fauna density and biomass in relation to latitude, longitude, and environmental variables. Pearson correlation analysis (two-tailed test) was used to analyze the relationship between soil fauna and environmental variables. Linear mixed effects models (LMMs) were utilized to test the effects of environmental factors on density and biomass, using the R package lmerTest86. The three-dimensional relationship between TEM, PRE and soil fauna in the MATLAB was modeled. Statistical analyses were performed using ANOVAs and t-tests in the SPSS 26.0 software (SPSS, Chicago, IL, USA). Statistical significance was set at P < 0.05. ArcGIS 10.2 was used to calculate regional distribution patterns for density, populations, biomass, and biomass carbon. Prior to spatial interpolation, the normality or near-normality of each data category was systematically assessed through Kolmogorov-Smirnov (K-S) tests complemented by normal quantile-quantile (QQ) plots in exploratory data analysis. When datasets exhibited significant non-normality with marked skewness, logarithmic transformation was implemented prior to geostatistical interpolation. Semi-variogram cloud plots were subsequently constructed using georeferenced sample points. The optimal semi-variogram model was determined through comparative analysis of multiple theoretical models, with selection criteria requiring mean error (ME) values approaching 0 and root mean square error (RMSE) values approaching 1 during cross-validation.