Leveraging explainable AI to predict soil respiration sensitivity and its drivers for climate change mitigation

Novielli, Pierfrancesco; Magarelli, Michele; Romano, Donato; Di Bitonto, Pierpaolo; Stellacci, Anna Maria; Monaco, Alfonso; Amoroso, Nicola; Bellotti, Roberto; Tangaro, Sabina

doi:10.1038/s41598-025-96216-y

Download PDF

Article
Open access
Published: 11 April 2025

Leveraging explainable AI to predict soil respiration sensitivity and its drivers for climate change mitigation

Pierfrancesco Novielli^1,2,
Michele Magarelli¹,
Donato Romano^1,2,
Pierpaolo Di Bitonto^1,2,
Anna Maria Stellacci¹,
Alfonso Monaco^2,3,
Nicola Amoroso^2,4,
Roberto Bellotti^2,3 &
…
Sabina Tangaro^1,2

Scientific Reports volume 15, Article number: 12527 (2025) Cite this article

3054 Accesses
4 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Global warming is one of the most pressing and critical problems facing the world today. It is mainly caused by the increase in greenhouse gases in the atmosphere, such as carbon dioxide (CO₂). Understanding how soils respond to rising temperatures is critical for predicting carbon release and informing climate mitigation strategies. Q₁₀, a measure of soil microbial respiration, quantifies the increase in CO₂ release caused by a $10^\circ$ Celsius rise in temperature, serving as a key indicator of this sensitivity. However, predicting Q₁₀ across diverse soil types remains a challenge, especially when considering the complex interactions between biochemical, microbiome, and environmental factors. In this study, we applied explainable artificial intelligence (XAI) to machine learning models to predict soil respiration sensitivity (Q₁₀) and uncover the key factors driving this process. Using SHAP (SHapley Additive exPlanations) values, we identified glucose-induced soil respiration and the proportion of bacteria positively associated with Q₁₀ as the most influential predictors. Our machine learning models achieved an accuracy of $0.813 \pm 0.007$, precision of $0.812 \pm 0.008$, an AUC-ROC of $0.884 \pm 0.003$, and an AUC-PRC of $0.883 \pm 0.007$, ensuring robust and reliable predictions. By leveraging t-SNE (t-distributed Stochastic Neighbor Embedding) and clustering techniques, we further segmented low Q₁₀ soils into distinct subgroups, identifying soils with a higher probability of transitioning to high Q₁₀ states. Our findings not only highlight the potential of XAI in making model predictions transparent and interpretable, but also provide actionable insights into managing soil carbon release in response to climate change. This research bridges the gap between AI-driven environmental modeling and practical applications in agriculture, offering new directions for targeted soil management and climate resilience strategies.

The soil microbiome governs the response of microbial respiration to warming across the globe

Article 04 December 2023

The temperature sensitivity of soil: microbial biodiversity, growth, and carbon mineralization

Article 29 March 2021

Quantifying thermal adaptation of soil microbial respiration

Article Open access 06 September 2023

Introduction

Global warming is one of the most pressing and critical problems facing the world today. It is mainly caused by the increase in greenhouse gases in the atmosphere, such as carbon dioxide (CO2), methane (CH4) and nitrous oxide (N2O), resulting from human activities such as burning fossil fuels, deforestation and intensive agriculture. Climate change is causing global temperatures to rise, which in turn directly affects biological processes in soils¹. Soil microbes decompose organic matter and release CO₂ as a by-product of this process (microbial respiration). The rate of this CO₂ release is strongly influenced by soil temperature, and Q₁₀ indicates how much this respiration increases with increasing temperature^2,3. Q₁₀ is widely recognized as a critical metric for quantifying the sensitivity of soil respiration across various ecosystems, including forests, grasslands, and agricultural soils^4,5,6,7. Q₁₀ is an important parameter in ecology and biology, used to describe the sensitivity of a biological or chemical process to temperature changes. Specifically, Q₁₀ measures how much the rate of a biological reaction (such as microbial respiration in soil) increases with a a temperature increase of $10 ^\circ$ Celsius^8,9. Q₁₀ is often used to describe how much soil microbes increase their respiration activity (and therefore CO₂ release) in response to warming. A Q₁₀ of 2, for example, means that for every $10^{\circ }\hbox {C}$ increase in temperature, the rate of respiration doubles. Soils with low Q₁₀ are less responsive to temperature changes. With a low Q₁₀, CO₂ release due to microbial respiration is less pronounced even under warming conditions. A low Q₁₀ indicates greater resilience to temperature changes, with a lower potential for CO₂ release and a higher capacity to retain carbon. This makes such soils crucial for climate change mitigation strategies and sustainable ecosystem management. A high Q₁₀ indicates that microbial respiration increases rapidly with increasing temperature. Increased microbial respiration leads to accelerated CO₂ release and subsequent degradation of soil quality due to the positive feedback between temperature and microbial respiration^10,11. Soils with a high Q₁₀ should be monitored carefully, as they are more vulnerable to warming and may require more active soil management strategies to avoid excessive CO₂ release. For example, agricultural practices that enhance carbon sequestration, such as growing cover crops or adding compost, could be adopted. Q₁₀ is particularly important in the context of climate change because it helps us understand how global warming will affect carbon cycling in soils. Since soils store large amounts of carbon in the form of organic matter, increased microbial respiration due to rising temperatures could lead to accelerated release of CO₂ into the atmosphere, further exacerbating global warming. Therefore, understanding and predicting Q₁₀ in different soil types is essential to assess the risk of positive climate feedback: more CO₂ released from soils could accelerate warming, which in turn could further stimulate microbial respiration.

In recent years, machine learning (ML) and explainable artificial intelligence (XAI) have entered various areas of the agricultural field, providing new ways to improve efficiency, precision, and sustainability¹². These technologies are now being used to address complex agricultural challenges, including the prediction of soil health and the development of data-driven strategies for mitigating the impact of climate change. By making models more interpretable, XAI allows for actionable insights that can be directly applied to agricultural practices¹³.

Previous studies have explored the role of machine learning in predicting soil respiration dynamics, but these efforts primarily focused on traditional black-box models that limit interpretability^14,15,16. This lack of transparency can be a major drawback when applying machine learning in critical areas like agriculture and environmental science. Moreover, these studies often miss the opportunity to uncover more granular insights about specific subgroups of soils that may be at higher risk under warming. As a result, there remains a gap in our ability to identify subtle variations within low Q₁₀ soils that could lead to increased sensitivity to temperature.

By utilizing Machine Learning (ML) techniques, we can delve deeper into the complex interactions between soil properties, microbial communities¹⁷, and environmental factors to identify which soils are higher risk of increasing the sensitivity of soil microbial respiration to warming¹⁸.

Explainable artificial intelligence (XAI) can play a key role in applying machine learning (ML) models to analyze and predict the sensitivity of soil microbial respiration to warming (Q₁₀). The main goal of XAI is to make ML models transparent and interpretable, allowing us to understand why and how models reach certain conclusions¹⁹. This is particularly useful in contexts such as climate change and soil biology, where decisions based on black-box models may not be reliable or understandable to domain experts. XAI enables the interpretation of ML models by identifying the most influential soil characteristics (e.g. organic content, pH) in predicting high Q₁₀ values. With techniques such as SHAP (SHapley Additive exPlanations) it is possible to see the importance and impact of each factor on each individual prediction of the model²⁰.

If the ML model predicts that a particular soil is at high risk of increasing microbial respiration due to warming, XAI can reveal that this is mainly due to low soil organic matter or high nitrogen content. This helps focus attention on those aspects that need to be monitored or mitigated. With interpretable models, XAI could help scientists make more robust hypotheses about underlying microbial mechanisms²¹. Knowing exactly why a model predicts a high Q₁₀ could reveal new insights into the interactions between microbes, soil, and temperature, improving our understanding of soil carbon cycling. XAI improves confidence in model outputs. In a scenario where ML model outputs are used to inform land management policies or climate change mitigation strategies, decision makers need to understand how predictions are generated. For example, an ML system that recommends actions to reduce the increase in soil microbial respiration needs to be able to explain the rationale behind those recommendations²². This increases the acceptance and adoption of models by experts.

One of the strengths of using SHAP values within the XAI framework is that they provide a detailed understanding of feature importance, which can be leveraged to create an embedding space where the separation between classes is more distinct. This embedding, generated from SHAP values, is particularly suited for classification tasks, as it enhances the distinction between different groups, making it easier to apply further analyses like clustering²³. By creating such an embedding, we can not only improve classification accuracy but also explore more granular patterns within the data.

In the context of soil respiration sensitivity (Q₁₀), this study uses SHAP-based embeddings to distinguish soils with high Q₁₀ from those with low Q₁₀. Once this separation was achieved, the objective was to focus on soils classified as low Q₁₀ and determine which among them have characteristics similar to those with high Q₁₀. Through clustering analysis on this embedding, we aimed to identify subgroups of low Q₁₀ soils that may be at higher risk of transitioning to high Q₁₀. This approach allows us to pinpoint the factors most responsible for increasing the risk, providing valuable insights into soil management strategies aimed at mitigating warming-induced CO₂ release.

The objectives of this study are twofold: first, to utilize interpretable machine learning models to assess the extent to which biochemical, microbiome, and environmental soil characteristics are associated with higher or lower Q₁₀ values; and second, to identify subgroups of low Q₁₀ soils that are at greater risk of transitioning to higher Q₁₀ values. By leveraging SHAP-based embeddings and clustering techniques, this research aims to uncover granular patterns within low Q₁₀ soils, providing actionable insights for targeted soil management strategies to mitigate warming-induced CO₂ release.

Materials and methods

In Table 1 detailed information about each input variable is provided. These variables were downloaded from a public repository and have been used for a similar analysis, offering a comprehensive summary of the diverse climate conditions and soil characteristics across various countries globally.

The dataset includes 332 unique soil samples collected from a global survey spanning multiple ecosystems. No duplicate samples were included in the analysis, and each sample represents a distinct soil environment. Sampling points span 29 countries and seven continents, with geographic locations ranging between $67^\circ \hbox {N}$ and $62^\circ \hbox {S}$ latitude and $153^\circ \hbox {E}$ to $159^\circ \hbox {W}$ longitude, encompassing diverse environmental conditions and vegetation types, including forests, grasslands, and shrublands. Field surveys were conducted between 2016 and 2019. Soil samples were collected at a depth of 0–10 cm, a biologically active soil layer, to capture relevant microbial and biochemical activities. Each site included composite samples derived from multiple soil cores, which were homogenized to ensure representative sampling. The climatic conditions represented in the dataset include tropical, temperate, continental, polar, and arid biomes, with mean annual temperatures (MAT) ranging from $-7^\circ \hbox {C}$ to $30^\circ \hbox {C}$ and mean annual precipitation (MAP) ranging from 50 to 4,500 mm/year²⁴.

For each soil sample, 27 distinct factors were utilized, encompassing environmental traits, soil microorganisms, biochemical stability, substrate levels, and mineral protection elements. Additionally, each sample is associated with a Q₁₀ value, providing an in-depth perspective on the sensitivity of soil respiration to temperature variations.

The environmental traits cover soil-specific characteristics and the geographical location of samples. These parameters include longitude, forest presence, MAP, MAT, plant richness, plant cover, pH, electrical conductivity, clay and silt percentage, total phosphorus content, and the Soil C ratio, based on total organic carbon and total nitrogen levels.

Additionally, biochemical stability refers to the resistance of organic compounds to degradation and includes factors such as the proportions of aromatic compounds, alkanes, polysaccharides, and amides. Mineral protection factors emphasize the proportions of particulate organic carbon (POC) and mineral-associated organic carbon (MAOC), which play a role in how organic matter interacts with minerals and its stabilization over time.

Finally, the dataset incorporates information about the soil microbiome, capturing microbial diversity and activity. This includes measures like microbial biomass (estimated via glucose-induced respiration), bacterial, fungal, and protist richness, as well as standardized ratios of bacterial, fungal, and protist taxa associated either positively or negatively with Q₁₀. These microbiome factors provide key insights into how microbial communities respond to soil warming, ultimately affecting soil respiration processes.

The environmental auxiliary variables were derived as follows:

Climatic variables, such as mean annual temperature (MAT) and mean annual precipitation (MAP), were obtained from the WorldClim database v2²⁵;
Soil-specific properties, including pH, electrical conductivity, and fine texture, were derived from direct measurements on the soil samples following standardized laboratory protocols²⁴.

Table 1 Description of the input variables used in the analysis.

Full size table

Data preparation

To evaluate biomarkers distinguishing soils with high and low Q₁₀ values, we partitioned the soil samples into two categories based on the median Q₁₀ value. The median split approach was chosen for statistical reasons, as it is a widely used method for dichotomizing continuous variables while being less sensitive to outliers compared to the mean²⁶. This method ensures a balanced classification while accounting for the broad range of temperature, vegetation types, and soil characteristics represented in our dataset. The group with low Q₁₀ consisted of 166 soil samples, while the group with high Q₁₀ also contained 166 soil samples. This binary classification served as the foundation for further analysis, particularly in identifying the factors contributing to high Q₁₀ and determining soils at greater risk.

Machine learning based classification

The workflow adopted for the analysis is depicted in Figure 1. The illustration shows the mechanism of soil respiration and the process of Q₁₀ prediction. The histogram presents the distribution of the two groups, divided by the median line into high Q₁₀ and low Q₁₀ categories, providing initial insight into the dataset. Subsequently, all variables related to these soils were merged to form the input table for further analysis.

The workflow consisted of two main phases. The first phase applied machine learning (ML) techniques using a classifier to distinguish soils with high Q₁₀ from those with low Q₁₀. In the second phase, explainable artificial intelligence (XAI) methods, specifically SHAP (SHapley Additive Explanations) values, were used to enhance transparency. SHAP values were crucial in identifying new biomarkers and understanding the importance of each feature in the model. Clustering analysis was then performed on the low Q₁₀ group to identify subgroups of soils at higher risk of having high Q₁₀ values.

To ensure the reliability and consistency of our results, we applied a 10-fold nested cross-validation procedure, repeated 20 times. During this process, the parameters of the machine learning algorithm were fine-tuned within the inner loop of the cross-validation. After the model was trained, SHAP values were calculated to provide greater transparency in the model’s decision-making process. These values helped explain the importance and influence of each variable on the prediction outcomes.

To evaluate the machine learning performance for predicting high and low Q₁₀ values, the Extra Trees Classifier was selected because its performance was shown to be superior in the referenced article¹³. The Extra Trees Classifier is an ensemble learning algorithm designed for classification tasks, based on decision tree methodologies. It uses randomization techniques to enhance accuracy and minimize variance. Similar to other ensemble methods, the Extra Trees Classifier constructs multiple decision trees. Each tree is trained on a randomly chosen subset of the data, with random features selected at each split. This approach of random node splitting and data sampling helps the Extra Trees Classifier to be more resistant to overfitting compared to standard decision trees. As a result, it performs well in scenarios where the data is noisy or highly variable. Additionally, Extra Trees is adept at managing datasets with numerous features, making it ideal for tackling complex challenges²⁷. Moreover, the random feature selection during tree construction enables the model to estimate the significance of each feature.

The parameters grid for the ensemble models are as follows:

max_depth $\in$ {None, 3, 5, 7, 9},
n_estimators $\in$ {50, 100, 150, 200, 250}.

We systematically tested a range of parameter values to enhance the classifiers’ generalization to unseen data and improve their overall predictive accuracy.

Evaluation metrics

Evaluation metrics are crucial tools for assessing the performance and effectiveness of machine learning models²⁸. These metrics provide quantitative measures that help quantify how well a model is performing on a given task. The most commonly used evaluation metrics in classification analysis are:

Accuracy:

It is defined as the proportion of correct predictions made by the model out of the total number of predictions
$$\begin{aligned} ACC=\frac{Number of Correct Predictions }{Total Number of Predictions } \end{aligned}$$
(1)
Precision:

It measures how many of the instances predicted as positive are actually positive
$$\begin{aligned} PREC=\frac{True Positives (TP)}{True Positives (TP) + False Positives (FP)} \end{aligned}$$
(2)
Area Under the ROC Curve (AUC-ROC):

The ROC (Receiver Operating Characteristic) curve and AUC (Area Under the Curve) are key metrics used to assess the performance of binary classification models. The ROC curve visually represents the relationship between a model’s sensitivity (true positive rate) and specificity (true negative rate) across different decision thresholds. The AUC provides a numerical measure of this performance, representing the area under the ROC curve: a value near 1 indicates excellent model accuracy, whereas a value near 0.5 indicates that the model’s predictions are no better than random guessing²⁹.
Area Under the PRC Curve (AUC-PRC):

The Precision-Recall Curve (PRC) and the Area Under the Curve (AUC) are used to evaluate the effectiveness of binary classification models. The PRC illustrates how precision (the proportion of true positives among all positive predictions) and recall (the proportion of actual positives correctly identified) vary across different threshold settings. The AUC of the PRC provides a summary measure of the model’s performance in balancing precision and recall. A higher AUC reflects better overall performance, with a perfect model achieving an AUC of 1³⁰.

Explainable artificial intelligence (XAI)

Explainable Artificial Intelligence (XAI) represents a crucial development in AI, aimed at making models more transparent and understandable to humans. One of the main technique is SHapley Additive exPlanations (SHAP) which provides a structured method for interpreting machine learning model outputs by assigning importance to each feature in relation to the model’s predictions^31,32,33. By evaluating the effect of each feature on the predictions, SHAP values offer significant insights into the decision-making process of the model. One key advantage of SHAP is the ability to generate local explanations, where each SHAP value corresponds to an individual instance in the dataset. This allows for a granular understanding of how specific features contribute to the prediction for each instance. The SHAP value for the jth feature of the instance x is calculated by aggregating it across all possible subsets according to the formula:

$$\begin{aligned} \Phi _j(x)=\sum _{F\subseteq S- \{j\}}\frac{|F|!(|S|-|F|-1)!}{|S|!}[f_{x}(F\cup j)-f_{x}(F)] \end{aligned}$$

(3)

|F|! denotes the permutations of features in the subset F, $(|S|-|F|-1)!$ represents the permutations of features in the subset $S-(F\cup \{j\})$, and |S|! signifies the total number of feature permutations³⁴.

The SHAP values, calculated for each individual soil sample, provided a basis for generating embeddings that captured the contribution of each feature at a local level.

In this study, SHAP values explain probabilities, where positive values indicate an increase in the probability of a soil sample being classified as high Q₁₀, and negative values indicate a decrease in this probability. The magnitude of the SHAP value reflects the strength of the feature’s contribution to the prediction. This enables a granular understanding of how specific features influence prediction probabilities.

Clustering analysis

T-distributed Stochastic Neighbor Embedding (t-SNE) is a widely used technique for dimensionality reduction, particularly in data visualization and exploration. It is designed to project high-dimensional data into lower dimensions (typically two or three) while preserving the relative distances between data points. Unlike linear dimensionality reduction techniques like PCA, t-SNE is capable of capturing complex, non-linear relationships in the data, making it an ideal choice for visualizing clusters that may not be evident in high-dimensional space. t-SNE operates by minimizing the divergence between two probability distributions: one representing pairwise similarities in the high-dimensional space and the other representing these similarities in the lower-dimensional space^35,36.

In this study, t-SNE was used to visualize the data by reducing the dimensionality of both the input feature space and the SHAP value space²³. By applying t-SNE to both, we aimed to identify the embedding that best separates the two classes (low Q₁₀ and high Q₁₀). This embedding was then used to perform clustering on the low Q₁₀ soils, identifying subgroups at higher risk of transitioning to high Q₁₀, using three unsupervised clustering algorithms: KMeans, Agglomerative Clustering, and Birch Clustering.

The classification probabilities used to assess the risk of transitioning to high Q₁₀ were derived from the machine learning model trained to distinguish between high-Q₁₀ and low-Q₁₀ soils. Each soil sample was assigned a probability score representing its likelihood of being classified as high Q₁₀. After clustering the low-Q₁₀ soils using SHAP-based embeddings, we analyzed the distribution of these classification probabilities within each cluster. This allowed us to identify subgroups of low-Q₁₀ soils with higher predicted probabilities of transitioning to high Q₁₀, indicating potential heterogeneity within this category.

KMeans Clustering: KMeans is a widely used clustering algorithm that partitions data into a predefined number of clusters (K). It works by minimizing the within-cluster variance by iteratively assigning data points to the nearest cluster centroid and then updating the centroids based on the mean of the points in each cluster³⁷.
Agglomerative Clustering: This is a hierarchical clustering method that builds nested clusters by merging or splitting them successively. In this study, we used a bottom-up approach, starting with each sample as its own cluster and then successively merging the closest pairs of clusters based on a defined distance metric. This process continues until a desired number of clusters is reached³⁸.
Birch Clustering: Birch (Balanced Iterative Reducing and Clustering using Hierarchies) is particularly effective for large datasets. It builds a clustering feature tree to summarize the dataset and then applies hierarchical clustering to create clusters. Birch is efficient and scalable, making it suitable for complex, high-dimensional datasets³⁹.

The number of clusters was determined using the Silhouette Score, a metric that evaluates the consistency and quality of the clustering. The Silhouette Score is calculated by measuring the mean intra-cluster distance (how close each point in a cluster is to other points in the same cluster) and the mean nearest-cluster distance (how close each point is to points in the nearest different cluster). The higher the Silhouette Coefficient, the better defined the clusters are. The clustering method that yielded the highest Silhouette Score was selected as the optimal clustering solution^40,41.

Results

This study applied interpretable machine learning models to analyze the relationships between biochemical, microbiome, and environmental soil characteristics and Q₁₀ values. Through clustering analyses of low Q₁₀ soils using SHAP-based embeddings, we identified distinct subgroups with varying probabilities of transitioning to high Q₁₀.

The average performance of the machine learning classifiers, calculated as the mean performance obtained during the 10-fold cross-validation repeated 20 times, is summarized as follows:

Accuracy (ACC): $0.813 \pm 0.007$;
Precision: $0.812 \pm 0.008$;
Area Under the ROC Curve (AuROC): $0.884 \pm 0.003$;
Area Under the Precision-Recall Curve (AuPRC): $0.883 \pm 0.007$.

This approach was chosen to ensure the robustness and generalizability of the machine learning model.

Figure 2 displays the ROC (a) and Precision-Recall (b) curves along with their respective areas under the curve. The figure shows the average curves with standard deviations derived from the 20 repetitions.

In Figure 3, the global feature importance is represented using the SHAP summary plot. This plot visualizes the impact of each feature on the model’s output based on the calculation of SHAP values. Each point on the plot corresponds to a SHAP value for a particular feature and a specific soil in the dataset. The color gradient represents the feature value, with blue indicating lower feature values and red representing higher feature values. The position along the x-axis indicates whether the effect of the feature increases or decreases the model’s prediction of Q₁₀ sensitivity.

The two most important variables, as indicated by their impact on the model output, are the proportion of bacterial taxa positively associated with Q₁₀ (“Bacteria_Positive”) and the microbial biomass (“Microbial_Biomass”), estimated via glucose-induced soil respiration. Both variables show a strong influence on the prediction of Q₁₀, suggesting that soils with higher values for these features tend to have increased sensitivity to temperature changes.

Other significant features include the percentage of alkanes (“Alkane”) and soil organic carbon content (“SOC”). These features also play a crucial role in distinguishing soils with higher and lower Q₁₀ values.

Among the environmental variables, forest presence, fine soil texture, and soil pH showed notable associations with Q₁₀ sensitivity. As illustrated in the SHAP summary plot (Figure 3), forested soils (represented by red-colored points for the Forest variable) were associated with a higher likelihood of high Q₁₀ values, indicating greater temperature sensitivity. Similarly, soils with a higher proportion of fine texture (clay and silt) demonstrated an increased likelihood of high Q₁₀ sensitivity. Moreover, soils with lower pH values were more likely to be classified as low Q₁₀, highlighting the influence of soil acidity on temperature sensitivity.

In this study, a comparison was made between the outcomes of t-SNE conducted on the input features and those resulting from t-SNE applied to the SHAP values. After computing the SHAP values, t-SNE was performed on these values, and the first two components, t-SNE component 1 and t-SNE component 2, were calculated. The visualization of the results is shown in Figure 4 (b). For comparison, Figure 4 (a) also includes the t-SNE results obtained from the input features. Figure 4 (b) presents the projection of the SHAP values onto the two principal components. A distinct pattern was observed, where there is a clear separation between high and low Q₁₀ soils along t-SNE comp.1. In contrast, the t-SNE results based on the input features (Figure 4 (a)) do not exhibit such clear differentiation between the two classes. The SHAP value embeddings, also referred to as local explanation embeddings, outperform the original input features by better capturing the differences between the two soil categories (high Q₁₀ vs. low Q₁₀). This highlights the effectiveness of SHAP values in capturing key distinctions within the dataset. To quantitatively assess the separability between soils with high and low Q₁₀ values, we calculated the centroids for each class in both projections and computed the Euclidean distance between these centroids. In the input feature space (Figure 4(a)), the centroid distance was 5.22, indicating limited class separation. In contrast, the SHAP-based t-SNE embeddings (Figure 4(b)) showed a centroid distance of 20.59, highlighting a marked improvement in separability.

Based on the clear separation between high and low Q₁₀ soils observed in the previous analysis, it became reasonable to further cluster the soils classified as low Q₁₀ into distinct subgroups, with the assumption that these subgroups might exhibit varying probabilities of transitioning to high Q₁₀. To explore this hypothesis, clustering techniques were applied to the t-SNE components (comp.1 and comp.2) derived from the SHAP values. The optimal number of clusters was determined using the Silhouette score (Figure 5(a)), which evaluated different clustering algorithms: K-means, Birch, and Agglomerative Clustering. Among these methods, K-means produced the highest Silhouette score, indicating that six clusters offered the best-defined grouping for the low Q₁₀ soils.

The six clusters identified by K-means are shown in Figure 5(b). These clusters represent distinct subgroups within the low Q₁₀ soils. As depicted in Figure 5(c), Clusters 4, 6, and 2 have a higher probability of being classified as high Q₁₀ soils compared to the other clusters. Conversely, soils in Clusters 1 and 3 demonstrate the lowest median probability, suggesting they are less likely to transition into high Q₁₀ soils.

Figure 6 presents the box plots of the four most important variables, as identified by the SHAP summary plot (Figure 3). The figure illustrates the distribution of these key variables across the different clusters obtained from the clustering analysis. Notably, in clusters with the highest predicted probabilities of transitioning to high Q₁₀ soils, such as Cluster 4 and Cluster 6, certain variables show distinct patterns.

For instance, in Figures 6(a) and (b), the variables “Microbial_biomass ” and “Bacteria Positive” exhibit significantly higher values in Clusters 4 and 6 compared to the other clusters. This indicates a strong association between elevated levels of these variables and the increased probability of a soil transitioning from low Q₁₀ to high Q₁₀. Conversely, the other clusters, particularly those with lower probabilities (e.g., Clusters 1 and 3), tend to show lower median values for these variables. This trend reinforces the importance of microbial biomass and “Bacteria_Positive” in driving the differences between soils more resistant to temperature sensitivity and those more susceptible.

Discussion

This study makes several significant contributions to the field of environmental research, particularly in predicting soil microbial respiration sensitivity (Q₁₀) using machine learning and explainable AI (XAI) techniques. By leveraging the power of machine learning, we achieved high predictive performance in estimating Q₁₀ values, demonstrating the potential of these models to capture complex soil-climate interactions.

First, our research underscores the critical role of machine learning in predicting Q₁₀ sensitivity with high accuracy across diverse soil types. The high predictive accuracy of our models, as evidenced by metrics such as an AUC-ROC of 0.884 and an AUC-PRC of 0.883, highlights the effectiveness of machine learning in capturing the intricate relationships between soil properties, microbial communities, and environmental factors.

Second, the explainable machine learning approach used in this study successfully uncovered important biochemical, microbiome, and environmental factors that influence Q₁₀ values. These factors, such as glucose-induced soil respiration and the proportion of bacterial taxa positively associated with Q₁₀, emerged as key drivers of soil sensitivity to temperature^42,43,44,45. Our findings align with those of Sáez-Sandino et al. (2023)²⁴. However, while Sáez-Sandino et al. used feature importance embedded within a Random Forest model, our study leverages an explainable AI approach based on post-hoc explanations. This methodological shift offers three key advantages: (1) Model Agnosticism, as XAI can be applied to any machine learning model, enhancing flexibility and generalizability; (2) Local Feature Importance, allowing for the interpretation of individual predictions rather than just global trends; and (3) Directional Insights, providing information on whether a variable positively or negatively influences Q₁₀ in different contexts. The identification of these variables not only deepens our scientific understanding of soil-microbiome-climate interactions but also has practical implications. By highlighting the soil characteristics that contribute to higher Q₁₀ values, this research provides a foundation for developing customized agricultural strategies. These strategies could be used to reduce CO₂ emissions from soils, enhancing carbon sequestration and mitigating the impact of climate change.

Among the environmental variables, forest presence, fine soil texture, and soil pH exhibited associations with Q₁₀ sensitivity. Forested soils were associated with an increased likelihood of high Q₁₀ values. The organic matter-rich environment of forest soils promotes microbial activity, which may lead to greater temperature sensitivity⁴⁶. Similarly, fine-textured soils (high in clay and silt content) showed a positive relationship with Q₁₀, supporting the idea that soil texture influences microbial access to substrates and respiration processes. Prior research suggests that finer-textured soils tend to retain moisture and organic matter, creating conditions that can amplify microbial respiration responses to temperature changes⁴⁷. Additionally, soil pH was found to influence Q₁₀ sensitivity, with lower pH values being associated with a higher likelihood of classification as low-Q₁₀ soils. Soil acidification is known to affect microbial community composition and metabolic processes, potentially leading to reduced respiration sensitivity⁴⁸. Acidic soils may limit microbial activity and enzymatic efficiency, thereby lowering Q₁₀ values and altering carbon cycling dynamics.

Third, by leveraging local explanation embeddings derived from SHAP values and applying unsupervised clustering methods, this study identified distinct subgroups within the low Q₁₀ soils that are at higher risk of temperature sensitivity. Clusters 4 and 6, in particular, were found to have significantly higher probabilities of transitioning to high Q₁₀ soils, making them prime candidates for targeted interventions. This clustering approach represents a significant advancement over traditional methods, as it not only provides a nuanced categorization of soils but also elucidates the reasons why certain soils are more prone to warming-induced sensitivity. Unlike conventional t-SNE analyses based solely on feature variables, which failed to clearly distinguish between high and low Q₁₀ soils, the SHAP-based embeddings effectively captured the intricate patterns and interactions driving soil sensitivity. This enables the identification of high-risk subgroups within the low Q₁₀ category, facilitating targeted interventions and improving our understanding of the soil-microbiome-climate nexus.

Moreover, our study demonstrates the broader potential of combining XAI with machine learning in environmental research. The transparent nature of XAI models bridges the gap between complex machine learning predictions and practical, actionable insights. This is crucial for gaining the trust of stakeholders, including land managers and policymakers, who require clear, evidence-based guidance for implementing sustainable practices. The integration of XAI into environmental models could also facilitate more effective communication of scientific findings, fostering a wider adoption of AI-driven solutions in agriculture and climate change mitigation.

Overall, this work provides a robust, explainable framework for predicting soil sensitivity to temperature increases, contributing to both the scientific understanding and practical management of soil ecosystems. Future research should integrate this approach with long-term field data, exploring the interactions of Q₁₀ with other soil health indicators, and applying the framework to different biomes. By doing so, we can further enhance our ability to predict and manage soil responses to climate change, ultimately supporting more resilient agricultural systems and ecosystems.

Our findings reveal that the presence of forests, fine soil texture, and soil pH are key environmental variables influencing Q₁₀ sensitivity. However, the absence of explicit soil type classifications in the dataset limited our ability to directly associate Q₁₀ variations with conventional soil types (e.g., sandy, clay, loam). Incorporating detailed metadata on soil classifications in future studies would enable a more comprehensive understanding of how specific soil types influence Q₁₀. Additionally, analyzing Q₁₀ variations within these types could provide more targeted insights into soil management strategies for mitigating climate change.

A key limitation of this study is the representativeness of the dataset, which, although global in scope, may not fully capture the diversity of microbiomes and soil environments across all ecosystems. Certain underrepresented soil types or environmental conditions may introduce biases in the model predictions. Additionally, the reliance on SHAP embeddings and clustering techniques involves assumptions regarding the separability of soils in the reduced-dimensionality space. These methods may be sensitive to the chosen dimensionality reduction parameters and clustering algorithms, potentially impacting the robustness of the identified subgroups. Future studies should explore alternative embeddings and validation techniques to minimize these uncertainties and enhance generalizability.

In conclusion, this study highlights the power of explainable AI in uncovering the key drivers of soil respiration sensitivity to temperature. By providing both accurate predictions and interpretability, our approach bridges the gap between complex machine learning models and practical applications in soil management. However, while these findings offer actionable insights, their practical implementation requires validation through field experiments across diverse environmental conditions and soil types. As the impacts of climate change continue to escalate, the integration of XAI into environmental research will be crucial for developing effective mitigation strategies and ensuring the sustainability of ecosystems.

Data and code availability

All data utilized in the analysis are publicly available and have been downloaded from the repository on Figshare at the following DOI: DOI: https://doi.org/10.6084/m9.figshare.20776243.v1. Additionally, all the data preprocessing steps, machine learning models, and analysis scripts used in this study are available in the GitHub repository at https://doi.org/10.5281/zenodo.14751615.

References

Allen, D. E., Singh, B. P. & Dalal, R. C. Soil health indicators under climate change: a review of current knowledge. Soil health and climate change 25–45, https://doi.org/10.1007/978-3-642-20256-8_2 (2011).
Lal, R. Soil health and climate change: an overview. Soil health and climate change 3–24, https://doi.org/10.1007/978-3-642-20256-8_1 (2011).
Patil, A. & Lamnganbi, M. Impact of climate change on soil health: A review. Int. J. Chem. Stud. 6, 2399–2404 (2018).
Google Scholar
Meyer, N., Meyer, H., Welp, G. & Amelung, W. Soil respiration and its temperature sensitivity (q10): Rapid acquisition using mid-infrared spectroscopy. Geoderma 323, 31–40. https://doi.org/10.1016/j.geoderma.2018.02.031 (2018).
Article ADS Google Scholar
Luan, J., Liu, S., Wang, J. & Zhu, X. Factors affecting spatial variation of annual apparent q10 of soil respiration in two warm temperate forests. PLoS One 8, e64167. https://doi.org/10.1371/journal.pone.0064167 (2013).
Article ADS PubMed PubMed Central Google Scholar
Kurganova, I. et al. Temperature sensitivity of soil respiration in two temperate forest ecosystems: the synthesis of a 24-year continuous observation. Forests 13, 1374. https://doi.org/10.3390/f13091374 (2022).
Article Google Scholar
Gromova, M., Matvienko, A., Makarov, M., Cheng, C.-H. & Menyailo, O. Temperature sensitivity (q 10) of soil basal respiration as a function of available carbon substrate, temperature, and moisture. Eurasian Soil Sc. 53, 377–382. https://doi.org/10.1134/S1064229320020052 (2020).
Article ADS CAS Google Scholar
Haaf, D., Six, J. & Doetterl, S. Global patterns of geo-ecological controls on the response of soil respiration to warming. Nat. Clim. Chang. 11, 623–627. https://doi.org/10.1038/s41558-021-01068-9 (2021).
Article ADS Google Scholar
Meyer, N., Welp, G. & Amelung, W. The temperature sensitivity (q10) of soil respiration: controlling factors and spatial prediction at regional scale based on environmental soil classes. Global Biogeochemical Cycles 32, 306–323. https://doi.org/10.1002/2017GB005644 (2018).
Article ADS CAS Google Scholar
Crowther, T. W. et al. The global soil community and its influence on biogeochemistry. Science 365, eaav0550, https://doi.org/10.1126/science.aav0550 (2019).
Mahecha, M. D. et al. Global convergence in the temperature sensitivity of respiration at ecosystem level. science 329, 838–840, https://doi.org/10.1126/science.1189587 (2010).
Novielli, P. et al. Explainable artificial intelligence for genotype-to-phenotype prediction in plant breeding: a case study with a dataset from an almond germplasm collection. Frontiers in Plant Science 15, 1434229. https://doi.org/10.3389/fpls.2024.1434229 (2024).
Article PubMed PubMed Central Google Scholar
Novielli, P. et al. Climate change and soil health: Explainable artificial intelligence reveals microbiome response to warming. Machine Learning and Knowledge Extraction 6, 1564–1578. https://doi.org/10.3390/make6030075 (2024).
Article Google Scholar
Lu, H. et al. Comparing machine learning-derived global estimates of soil respiration and its components with those from terrestrial ecosystem models. Environmental Research Letters 16, 054048. https://doi.org/10.1088/1748-9326/abf526 (2021).
Article ADS CAS Google Scholar
Liu, J., Hu, J., Liu, H. & Han, K. Global soil respiration estimation based on ecological big data and machine learning model. Scientific Reports 14, 13231. https://doi.org/10.1038/s41598-024-64235-w (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhu, X. et al. Estimating ecosystem respiration in the grasslands of northern china using machine learning: Model evaluation and comparison. Sustainability 12, 2099. https://doi.org/10.3390/su12052099 (2020).
Article Google Scholar
Papoutsoglou, G. et al. Machine learning approaches in microbiome research: challenges and best practices. Frontiers in Microbiology 14, https://doi.org/10.3389/fmicb.2023.1261889 (2023).
Roland C. Wilhelm, D. H. B., Harold M. van Es. Predicting measures of soil health using the microbiome and supervised machine learning. Soil Biology and Biochemistry 164, https://doi.org/10.1016/j.soilbio.2021.108472 (2022).
Biecek, P. & Burzykowski, T. Explanatory model analysis: explore, explain, and examine predictive models (Chapman and Hall/CRC, 2021). https://doi.org/10.1080/00401706.2022.2091871.
Li, Z. Extracting spatial effects from machine learning model using local interpretation method: An example of shap and xgboost. Computers, Environment and Urban Systems 96, 101845. https://doi.org/10.1016/j.compenvurbsys.2022.101845 (2022).
Article Google Scholar
Ghalebikesabi, S., Ter-Minassian, L., DiazOrdaz, K. & Holmes, C. C. On locality of local explanation models. Advances in neural information processing systems 34, 18395–18407 (2021).
Google Scholar
Rolnick, D. et al. Tackling climate change with machine learning. ACM Computing Surveys (CSUR) 55, 1–96. https://doi.org/10.1145/3485128 (2022).
Article Google Scholar
Novielli, P. et al. Personalized identification of autism-related bacteria in the gut microbiome using explainable artificial intelligence. iScience 27, https://doi.org/10.1016/j.isci.2024.110709 (2024).
Sáez-Sandino, T. et al. The soil microbiome governs the response of microbial respiration to warming across the globe. Nature Climate Change 13, 1382–1387. https://doi.org/10.1038/s41558-023-01868-1 (2023).
Article ADS Google Scholar
Fick, S. E. & Hijmans, R. J. Worldclim 2: new 1-km spatial resolution climate surfaces for global land areas. International journal of climatology 37, 4302–4315. https://doi.org/10.1002/joc.5086 (2017).
Article ADS Google Scholar
Iacobucci, D., Posavac, S. S., Kardes, F. R., Schneider, M. J. & Popovich, D. L. Toward a more nuanced understanding of the statistical properties of a median split. Journal of Consumer Psychology 25, 652–665. https://doi.org/10.1016/j.jcps.2014.12.002 (2015).
Article Google Scholar
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Machine learning 63, 3–42. https://doi.org/10.1007/s10994-006-6226-1 (2006).
Article Google Scholar
Ferrer, L. Analysis and comparison of classification metrics. arXiv preprint arXiv:2209.05355 https://doi.org/10.48550/arXiv.2209.05355 (2022).
Ozenne, B., Subtil, F. & Maucort-Boulch, D. The precision-recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. Journal of clinical epidemiology 68, 855–859. https://doi.org/10.1016/j.jclinepi.2015.02.010 (2015).
Article PubMed Google Scholar
Wen, P., Xu, Q., Yang, Z., He, Y. & Huang, Q. Algorithm-dependent generalization of auprc optimization: Theory and algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence https://doi.org/10.1109/TPAMI.2024.3361861 (2024).
Burkart, N. & Huber, M. F. A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research 70, 245–317. https://doi.org/10.1613/jair.1.12228 (2021).
Article MathSciNet Google Scholar
Lundberg, S. M. et al. Explainable ai for trees: From local explanations to global understanding. arXiv preprint arXiv:1905.04610 https://doi.org/10.48550/arXiv.1905.04610 (2019).
Novielli, P. et al. Explainable artificial intelligence for microbiome data analysis in colorectal cancer biomarker identification. Frontiers in Microbiology 15, 1348974. https://doi.org/10.3389/fmicb.2024.1348974 (2024).
Article PubMed PubMed Central Google Scholar
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Advances in neural information processing systems 30, https://doi.org/10.48550/arXiv.1705.07874 (2017).
Van der Maaten, L. & Hinton, G. Visualizing data using t-sne. Journal of machine learning research 9 (2008).
Wattenberg, M., Viégas, F. & Johnson, I. How to use t-sne effectively. Distill https://doi.org/10.23915/distill.00002 (2016).
Ahmed, M., Seraj, R. & Islam, S. M. S. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics 9, 1295. https://doi.org/10.3390/electronics9081295 (2020).
Article Google Scholar
Ackermann, M. R., Blömer, J., Kuntze, D. & Sohler, C. Analysis of agglomerative clustering. Algorithmica 69, 184–215. https://doi.org/10.1007/s00453-012-9717-4 (2014).
Article MathSciNet Google Scholar
Zhang, T., Ramakrishnan, R. & Livny, M. Birch: an efficient data clustering method for very large databases. ACM sigmod record 25, 103–114. https://doi.org/10.1145/235968.233324 (1996).
Article Google Scholar
Januzaj, Y., Beqiri, E. & Luma, A. Determining the optimal number of clusters using silhouette score as a data mining technique. International Journal of Online & Biomedical Engineering 19, https://doi.org/10.3991/ijoe.v19i04.37059 (2023).
Shahapure, K. R. & Nicholas, C. Cluster quality analysis using silhouette score. In 2020 IEEE 7th international conference on data science and advanced analytics (DSAA), 747–748, https://doi.org/10.1109/DSAA49011.2020.00096 (IEEE, 2020).
Pérez-Valera, E., Verdú, M., Navarro-Cano, J. & Goberna, M. Soil microbiome drives the recovery of ecosystem functions after fire. Soil Biology and Biochemistry 149, 107948. https://doi.org/10.1016/j.soilbio.2020.107948 (2020).
Article CAS Google Scholar
Macdonald, C. A. et al. Plant productivity is a key driver of soil respiration response to climate change in a nutrient-limited soil. Basic and Applied Ecology 50, 155–168. https://doi.org/10.1016/j.baae.2020.12.005 (2021).
Article Google Scholar
Jansson, J. K. & Hofmockel, K. S. Soil microbiomes and climate change. Nature Reviews Microbiology 18, 35–46. https://doi.org/10.1038/s41579-019-0265-7 (2020).
Article CAS PubMed Google Scholar
Jansson, J. K. & Hofmockel, K. S. The soil microbiome-from metagenomics to metaphenomics. Current opinion in microbiology 43, 162–168. https://doi.org/10.1016/j.mib.2018.01.013 (2018).
Article CAS PubMed Google Scholar
Winkler, J. P., Cherry, R. S. & Schlesinger, W. H. The q10 relationship of microbial respiration in a temperate forest soil. Soil Biology and Biochemistry 28, 1067–1072. https://doi.org/10.1016/0038-0717(96)00076-4 (1996).
Article CAS Google Scholar
Ding, F., Sun, W., Huang, Y. & Hu, X. Larger q10 of carbon decomposition in finer soil particles does not bring long-lasting dependence of q10 on soil texture. European journal of soil science 69, 336–347. https://doi.org/10.1111/ejss.12530 (2018).
Article CAS Google Scholar
Jin, L. et al. Effect of soil acidification on temperature sensitivity of soil respiration. Agronomy 14, 1056. https://doi.org/10.3390/agronomy14051056 (2024).
Article CAS Google Scholar

Download references

Acknowledgements

Authors want to thank the Funders: METROFOOD-IT project has received funding from the European Union - NextGenerationEU, PNRR - Mission 4 “Education and Research” Component 2: from research to business, Investment 3.1: Fund for the realization of an integrated system of research and innovation infrastructures - IR0000033 (D.M. Prot. n.120 del 21/06/2022); The National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.4-Call for tender No. 3138 of 16 December 2021 of Italian Ministry of University and Research funded by the European Union-NextGenerationEU. Award Number: Project code: CN00000022, Concession Decree No. 1031 of 17 February 2022 adopted by the Italian Ministry of University and Research, CUP H93C22000440007, Project title: “National Research Centre for Agricultural Technologies (Agritech)”.

Author information

Authors and Affiliations

Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, 70125, Italy
Pierfrancesco Novielli, Michele Magarelli, Donato Romano, Pierpaolo Di Bitonto, Anna Maria Stellacci & Sabina Tangaro
Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, 70125, Italy
Pierfrancesco Novielli, Donato Romano, Pierpaolo Di Bitonto, Alfonso Monaco, Nicola Amoroso, Roberto Bellotti & Sabina Tangaro
Dipartimento Interateneo di Fisica ‘M. Merlin’, Università degli Studi di Bari Aldo Moro, Bari, 70125, Italy
Alfonso Monaco & Roberto Bellotti
Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, 70125, Italy
Nicola Amoroso

Authors

Pierfrancesco Novielli
View author publications
Search author on:PubMed Google Scholar
Michele Magarelli
View author publications
Search author on:PubMed Google Scholar
Donato Romano
View author publications
Search author on:PubMed Google Scholar
Pierpaolo Di Bitonto
View author publications
Search author on:PubMed Google Scholar
Anna Maria Stellacci
View author publications
Search author on:PubMed Google Scholar
Alfonso Monaco
View author publications
Search author on:PubMed Google Scholar
Nicola Amoroso
View author publications
Search author on:PubMed Google Scholar
Roberto Bellotti
View author publications
Search author on:PubMed Google Scholar
Sabina Tangaro
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, P.N., S.T.; methodology P.N., S.T., M.M.; software, P.N.; data curation, M.M., P.N., formal analysis, P.N.; validation, A.M.S.; visualization P.N.; supervision, S.T.; project administration, S.T.; funding acquisition, S.T. writing-original draft preparation P.N., M.M., writing-review and editing, All authors; All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Sabina Tangaro.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Novielli, P., Magarelli, M., Romano, D. et al. Leveraging explainable AI to predict soil respiration sensitivity and its drivers for climate change mitigation. Sci Rep 15, 12527 (2025). https://doi.org/10.1038/s41598-025-96216-y

Download citation

Received: 19 October 2024
Accepted: 26 March 2025
Published: 11 April 2025
DOI: https://doi.org/10.1038/s41598-025-96216-y