Introduction

Tomato is one of the world’s most economically important horticultural crops, with annual production exceeding 180 million tons globally, serving as a vital source of nutrients and income for farmers worldwide1. Greenhouse-based cultivation systems are increasingly adopted to enhance yield stability, fruit quality, and environmental sustainability, particularly in regions with unfavorable outdoor climatic conditions2,3. Fruit development in tomatoes is highly influenced by microclimatic factors such as temperature, relative humidity, light intensity, and CO₂ concentration4,5,6. Therefore, understanding the precise relationships between greenhouse environments and fruit developmental dynamics is essential for achieving both high yield and sustainability.

With the rise of Agriculture 4.0, the integration of Internet of Things and artificial intelligence (AI) has revolutionized controlled environment agriculture7,8. IoT-driven systems provide high-frequency, real-time monitoring of multiple environmental factors, enabling growers to track greenhouse dynamics with unprecedented precision9,10,11. However, while sensor networks can generate large volumes of multivariate data, transforming these raw measurements into actionable knowledge for agronomic decision-making remains a key challenge12. Recent studies have highlighted the role of IoT and AI in optimizing irrigation and fertigation strategies, improving crop stress detection, and enhancing resource efficiency13,14,15. Nevertheless, most existing works focus on overall yield prediction or disease monitoring, whereas finer-scale processes such as fruit expansion dynamics are often overlooked16,17.

Machine learning methods, including Random Forests, Support Vector Machines, and Gradient Boosting Decision Trees, have been widely applied in smart agriculture for yield estimation, disease detection, and growth monitoring18,19,20,21. However, many of these models are criticized as “black-box” approaches, limiting their transparency and acceptance in agronomic decision-making22. The emergence of Explainable Artificial Intelligence, particularly methods such as SHapley Additive exPlanations and Local Interpretable Model-agnostic Explanations, has enabled researchers to quantify the contribution of each input variable, thus improving trust and interpretability of model predictions23,24. For example, recent research has demonstrated the potential of explainable models for enhancing transparency in crop recommendations and stress management under smart greenhouse settings25.Despite these advances, there is still limited research combining in-situ IoT sensing systems with explainable machine learning to specifically interpret how microclimatic factors affect tomato fruit expansion—a critical morphological trait directly linked to fruit size, marketability, and consumer preference26.

Recent works on IoT-enabled greenhouse agriculture highlight the potential of integrating sensing with AI to improve crop management, but have not adequately addressed the nonlinear and threshold-driven effects of environmental drivers on fruit diameter growth in real-world greenhouse environments27,28.For instance, IoT-based digital twin approaches29, ML-enabled hydroponics growth modeling30, fault-tolerant IoT for hydroponics31, and sustainable IoT-driven greenhouse frameworks32 have been reported. However, these studies emphasize prediction or system robustness, while interpretability of fruit growth mechanisms remains limited.

To bridge this gap, this study proposes a novel framework that integrates IoT-based environmental sensing with explainable machine learning (RF + SHAP + PDPs) to unravel the environmental drivers of tomato fruit expansion. Unlike prior works that focus mainly on prediction accuracy, our emphasis is on interpretation, providing growers with actionable insights for climate and fertigation management.

Unlike most existing IoT-based prediction models, our framework emphasizes transparency by extracting actionable thresholds, directly addressing the research gap in explainable fruit expansion modeling. The main contributions of this study are as follows:

  1. (1)

    Development of a deployable IoT-based environmental sensing platform tailored for tomato cultivation, capturing multivariate real-time microclimatic data.

  2. (2)

    Integration of Random Forest regression with SHAP and PDPs to interpret the influence of individual environmental factors on tomato fruit expansion.

  3. (3)

    Identification of critical thresholds for key environmental drivers (soil temperature, light intensity, soil electrical conductivity), thereby providing actionable guidelines for greenhouse climate regulation and fertigation strategies.

  4. (4)

    Advancement of explainable AI applications in agriculture, shifting the focus from black-box prediction to transparent, agronomically interpretable knowledge.

This study contributes to sustainable smart agriculture by transforming complex greenhouse data streams into interpretable, decision-support knowledge, laying the foundation for precision environmental management in intelligent greenhouses.

Materials and methods

Experimental design

The experiment was conducted in a sunken solar greenhouse (E117.166°, N36.174°) at the Science and Technology Industrial Park of Shandong Agricultural University (Panhe Campus). The greenhouse featured brick-soil composite walls rein-forced with cement, with dimensions of 70 m (east–west) × 9.8 m (north–south), a 0.5 m excavation depth, a 3.8 m rear wall height, and a 5 m ridge height (Fig. 1).

Fig. 1
figure 1

Solar greenhouse.

The tomato cultivar ‘Shengyulan 3690’, known for its vigorous root system, high resistance to viral pathogens, and cold tolerance, was selected for this study. Seedlings were transplanted on February 24, 2022, with a planting density of 2,400 plants per 666.7 m2, using a row spacing of 160 cm and a plant spacing of 17.5 cm. Single-stem pruning was adopted, and the growth cycle lasted 124 days, concluding on June 30, 2022. The growth cycle included 36 days for the seedling stage, 19 days for flowering and fruit setting, and 69 days for fruit development. To reduce nutrient competition, apical dominance was removed after the fifth fruit cluster. Water, fertilizer, and pest control practices adhered to standard management procedures, as referenced in “Sec-tion 2.2: Tomato Cultivation and Management” of the paper “Research on Multi-Step Fruit Color Prediction Model of Tomato in Solar Greenhouse Based on Time Series Data”6.

IoT-based environmental data acquisition system

Establishing an IoT-based environmental data acquisition system is crucial for the real-time and accurate monitoring of greenhouse conditions, which serves as the foundation for developing data-driven tomato growth models in controlled environments.

As shown in Fig. 2, the central processing unit is connected to a DAM-3058R current-type acquisition card (manufactured by Beijing Art Technology Development Co., Ltd.) via an RS-485 bus, which supports simultaneous acquisition from eight current-type sensors. The CO₂ sensor (EE820, E + E, Austria), light intensity sensor (TBQ-6, Jinzhou Sunshine Meteorological Technology Co., Ltd.), and air temperature and humidity sensor (DB-171, Dalian Beifang Measurement & Control Engineering Co., Ltd.) output 4–20 mA analog signals, which are transmitted to the acquisition card through its corresponding input channels. The DAM-3058R card converts these analog signals into digital signals, and to minimize errors, an average of ten consecutive readings is calculated before being transmitted to the CPU.

Fig. 2
figure 2

System architecture of the IOT.

For soil parameter monitoring, the 5TE sensor (Decagon, USA) is utilized, which integrates soil temperature, moisture, and electrical conductivity measurements with a digital output using the SDI-12 protocol. A TRS-1203 bus hub (Dalian Beifang Measurement & Control Engineering Co., Ltd.) converts the SDI-12 signals into RS-485 signals, which are sequentially transmitted to the CPU in the order of soil moisture, soil electrical conductivity, and soil temperature. Finally, the CPU transmits all collected data to the cloud server via an RS-485 bus. These sensors parameters are shown in Table 1 and the distribution is shown in Fig. 3, for more specific introduction, please refer to the paper “Research on Multi-Step Fruit Color Prediction Model of Tomato in Solar Greenhouse Based on Time Series Data”6.

Table 1 Parameters of environmental sensors.
Fig. 3
figure 3

The stereogram layout of IoT sensors.

Real-time measurement system for tomato fruit growth

To effectively monitor tomato fruit growth during the expansion stage, a real-time fruit diameter measurement system was developed. This system consists of a clamping device, a linear displacement sensor, and a wireless transmission module.

A KTC-150 linear displacement sensor (manufactured by Hermitt) was selected for its 0–150 mm measurement range, 0.1% independent linearity, and infinite resolution. The sensor was securely attached to the fruit, continuously tracking changes in fruit diameter. The KTC-150 sensor transmits displacement signals via an RS-485 bus to the CPU, where the processed data is uploaded to the server through the IoT network. The measurement frequency was set at one reading per 30 min.

Figure 4 illustrates the fruit diameter measurement sensor, while Fig. 5 presents the real-time monitoring platform for fruit growth. The system features offline alerts, threshold exceedance warnings, data management, and large-screen visualization capabilities. It simultaneously collects diameter data from 10 fruit sensors and provides a geospatial display of sensor deployment locations.

Fig. 4
figure 4

Fruit transverse diameter measurement sensor.

Fig. 5
figure 5

Data acquisition system of the IOT.

Data processing

Parameter definition

To analyze the correlation between tomato growth ratÁe and environmental changes, seven environmental indicators and one growth indicator were derived from the collected raw data using a 6-h moving average method:

6-h average environmental parameters: air temperature, air humidity, light intensity, CO₂ concentration, soil temperature, soil moisture, and soil electrical conductivity.

$$f\left( X \right) = \frac{1}{T}\sum\limits_{i = 1}^{T} {X_{i} }$$
(1)

The raw data is collected every 30 min, and a total of 12 environmental parameters are collected within 6 h, so T = 12.\(X_{i}\) is the i-th environmental parameter collected within 6 h, and the calculation methods for the 7 environmental parameters are similar.

6-h fruit diameter increment

$$\Delta_{i} = Size_{i + 12} - Size_{i} \;(1 \le i \le 2546)$$
(2)

where \(\Delta_{i}\) represents the fruit diameter increment over six hours, and \(Size_{i}\) refers to the diameter measurement at time step i. Calculations showed that tomato diameter increments during the expansion stage ranged between 0.1 mm and 0.5 mm per 6-h interval.

Data preprocessing

Due to power outages and occasional disruptions from farm operations, some fruit diameter data contained anomalies and missing values. Figure 6 presents an example of the raw fruit diameter dataset, where abrupt fluctuations indicate sensor disturbances.

Fig. 6
figure 6

Primary fruit transverse diameter data.

To address these issues, a Bézier curve smoothing method29 was applied to rectify sudden variations. The final fruit diameter values were determined by averaging the measurements from 10 sensors.

A Bézier curve is constructed using a set of control points. An n-th order Bézier curve consists of n + 1 control points, and the curve’s coordinate at a given time t is determined by the following equation:

$$P\left( t \right) = B_{0}^{n} \left( t \right)P_{0} + B_{1}^{n} \left( t \right)P_{1} + \cdots + B_{n}^{n} \left( t \right)P_{n} = \sum\limits_{i = 0}^{n} {P_{i} B_{i}^{n} \left( t \right)} ,\;0 \le t \le 1$$
(3)

where \(P_{i}\) represents the set of n + 1 control points, n is a positive integer, and \(B_{i}^{n} \left( t \right)\) is the n-th order Bernstein polynomial, expressed as:

$$B_{i}^{n} \left( t \right) = C_{n}^{i} t^{i} \left( {1 - t} \right)^{n - i} ,C_{n}^{i} = \frac{n!}{{i!\left( {n - i} \right)!}}$$
(4)

By combining equations, the Bézier curve can be formulated recursively as follows:

$$P\left( t \right) = \sum\limits_{i = 0}^{n} {P_{i} \left( {\left( {1 - t} \right)B_{i}^{n - 1} \left( t \right) + tB_{i - 1}^{n - 1} \left( t \right)} \right)} = \sum\limits_{i = 0}^{n - 1} {\left( {\left( {1 - t} \right)P_{i} + tP_{i + 1} } \right)} B_{i}^{n - 1} \left( t \right)$$
(5)

This recursive process reduces the number of control points from n + 1 to n, continuing until only a single control point remains, yielding the final smoothed value.

In this study, a third-order Bézier curve was selected to smooth abrupt fluctuations in the tomato fruit diameter measurement curve, effectively mitigating anomalies caused by sensor disturbances. The smoothed tomato diameter data is presented in Fig. 7.

Fig. 7
figure 7

Fruit transverse diameter data after smoothing.

Development of the tomato growth model

The Random Forest regression model, originally proposed by Breiman30, was employed in this study due to its capability to handle high-dimensional data, resist overfitting, and remain robust in the presence of multicollinearity. RF has been widely applied in agricultural modeling tasks such as predicting fuel consumption in grain harvesters and maize yield estimation.

The RF algorithm constructs multiple decision trees by performing bootstrapped sampling from the original dataset. Each tree is trained on a randomly selected subset of features and samples. The final result is obtained by averaging the outputs of all individual trees. To ensure the model’s generalization ability, RF adheres to two fundamental principles: random sampling of training data and random selection of feature subsets for each split.

This study implemented the RF model using Python 3.9 and Scikit-learn 1.2.1 under a Windows 10 operating system with 32 GB DDR4 memory.

According to Fig. 7, the fruit diameter growth rate during the expansion stage exhibits a pattern of rapid increase followed by a plateau, particularly during the mature green phase, which reflects the physiological characteristics of tomato fruit development. To reduce the influence of genetic variation and focus on environmental effects, we selected data collected between April 1 and April 22, encompassing a total of 996 samples. After preprocessing, the data were split into a training set (70%) and a test set (30%).

Hyperparameter tuning was performed to optimize the number of decision trees (n_estimators). As shown in Fig. 8, the model’s mean squared error (MSE) stabilized when n_estimators exceeded 160. Therefore, 160 trees were selected as the optimal value to balance performance and computational efficiency.

Fig. 8
figure 8

Relationship between mean absolute error and number of decision trees.

To assess model performance, Mean Squared Error and Coefficient of Determination (R2) were used:

$$MSE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} }$$
(6)
$$R^{2} = 1 - \frac{{\sum\nolimits_{i}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)}^{2} }}{{\sum\nolimits_{i}^{n} {\left( {y_{i} - \overline{y}_{i} } \right)}^{2} }}$$
(7)

where \(y_{i}\) represents the actual diameter increment,\(\hat{y}_{i}\) is the predicted value, and \(\overline{y}_{i}\) denotes the mean observed increment.

The Random Forest model demonstrated satisfactory predictive performance, achieving an R2 of 0.82 and a mean squared error (MSE) of 0.0046 on the test dataset. This indicates that the selected environmental features explained more than 80% of the variance in fruit diameter increment, confirming the suitability of the chosen predictors for modeling fruit expansion. These metrics are now explicitly reported to clarify model reliability.

Pseudocode of the IoT–XAI

The main computational steps of the proposed IoT–XAI framework are summarized in Table 2, which outlines the sequential workflow from data acquisition to model interpretation.

Table 2 Pseudocode of the IoT–XAI framework.

Results and analysis

Statistical characteristics of environmental data

A total of 2,568 samples were collected, with key statistical summaries presented in Table 3. The dataset reflects a broad variation in microclimatic conditions, which provides a robust basis for analyzing environmental effects on fruit expansion.

Table 3 Test data statistics.

Air temperature variation

As shown in Fig. 9, during the fruit expansion stage, air temperature ranged from 8.91 to 36.50 °C, with a mean of 20.13 °C. Daily minimum temperatures typically occurred between 5:00 and 6:00 a.m., while maximum values peaked between 1:00 and 2:00 p.m. A marked upward trend in both day and night temperatures was observed from April 1 to April 11. To mitigate heat stress, upper ventilation was activated starting April 12. Temperature fluctuations corresponding to two rainfall events (April 28–30 and May 8–10) are also evident. Ventilation adjustments effectively stabilized daytime highs below 35 °C and nighttime lows around 16 °C.

Fig. 9
figure 9

Map of air temperature change.

Air humidity variation

As shown in Fig. 10, relative humidity fluctuated between 23.3 and 90.2%, with an average of 64.94%. Daily RH followed a consistent pattern, peaking around 6:00–7:00 a.m. (pre-ventilation) and dropping to a minimum near 1:00–2:00 p.m. Humidity levels were notably elevated between May 8 and May 13 due to persistent rainy weather.

Fig. 10
figure 10

Map of air humidity change.

Light intensity variation

As shown in Fig. 11, light intensity ranged from 0.14 to 83.58 Klux, averaging 15.60 Klux. Following the manual removal of insulation mats each morning, light levels increased rapidly, peaking at noon and declining in the afternoon. During overcast or rainy days (e.g., April 29 and May 8–10), light intensity was substantially reduced. The average light intensity in May was only 10.52 Klux—approximately 33% lower than the 4–5 month average—attributable to reduced solar radiation.

Fig. 11
figure 11

Map of illumination intensity change.

CO₂ concentration variation

As shown in Fig. 12, CO₂ concentration varied from 242.70 to 1580.72 ppm, with an average of 544.15 ppm. CO₂ levels were lowest during the daytime when ventilation was active, aligning with outdoor ambient conditions (~ 400 ppm). After ventilation closed in the evening, CO₂ accumulated overnight, peaking before reopening in the morning. Limited variation from April 9–12, April 25–28, May 1–7, and after May 15 coincided with high nighttime temperatures when vents remained partially open, corroborated by field logs.

Fig. 12
figure 12

Map of CO2 concentration variation.

Soil temperature variation

As shown in Fig. 13, soil temperature ranged from 13.45 to 24.40 °C, with a mean of 19.48 °C. Daily patterns mirrored air temperature but exhibited smaller amplitudes and time lags. Minimum values occurred around 7:00–8:00 a.m., and maximums around 5:00–6:00 p.m. Irrigation events (April 6, 11, 18, 23, 29; May 2, 6, 13, 21) caused noticeable drops due to the use of cool water.

Fig. 13
figure 13

Map of soil temperature variation.

Soil moisture variation

As shown in Fig. 14, VWC ranged from 21.26 to 32.02%, averaging 26.21%. Soil moisture dynamics were predominantly driven by irrigation events. After irrigation, moisture levels increased rapidly to 30–32% and then gradually declined over subsequent days. When VWC reached 22–24%, irrigation was reinitiated. Rainy weather from May 8 to May 13 slowed the moisture decrease due to elevated atmospheric humidity.

Fig. 14
figure 14

Map of soil moisture variation.

Soil electrical conductivity variation

As shown in Fig. 15, EC values ranged from 0.33 to 1.00 dS/m, with a mean of 0.64 dS/m. EC exhibited a rising trend 1–2 days after each irrigation event, differing from soil moisture, which began declining immediately. This pattern is attributed to the delayed dissolution of water-soluble fertilizers (NPK and boron) applied through fertigation.

Fig. 15
figure 15

Map of soil electrical conductivity variation.

Feature importance analysis using SHAP

Although the constructed random forest model can better reflect the relationship between the lateral diameter increment of tomato fruits during the swelling period and the environment, its “black-box” nature prevents direct interpretation of the relationships between input variables and model outputs. To address this issue, we adopted the SHapley Additive exPlanations method, originally proposed by Lundberg and Lee33,34.Rooted in cooperative game theory, SHAP treats each input feature as a “player” in a collaborative task, attributing the model output to individual feature contributions.

As illustrated in Fig. 16, the Random Forest model, validated with MSE and R2, identified soil temperature, light intensity, and soil EC as the top three environmental drivers. Quantitatively, soil temperature alone accounted for ~ 35% of the variance in fruit growth, followed by light intensity (~ 27%) and soil EC (~ 18%). Air temperature contributed ~ 12%, while soil moisture contributed ~ 6%. CO₂ and RH together explained less than 5%. This ranking demonstrates that stable root-zone conditions and sufficient radiation are more critical than aerial humidity or CO₂ fluctuations in this specific context.

Fig. 16
figure 16

Feature analysis of random forest model.

Moreover, SHAP value distributions revealed threshold-like behaviors: high soil temperature (> 22.5 °C) shifted SHAP values negatively, while optimal light intensities (> 20 Klux) strongly shifted values positively. These findings emphasize that not only mean values but also extremes and transitions play decisive roles in regulating fruit expansion.

Environmental influence on fruit growth rate based on PDP

To further examine how individual environmental variables affect tomato growth rate, Partial Dependence Plots were used. PDPs visualize the marginal effect of a single feature on the predicted outcome by marginalizing over the distribution of all other features. This technique helps uncover non-linear relationships between input variables and the model’s output34.

Effect of soil temperature on growth rate

As shown in Fig. 17, the PDP curve shows a strong non-linear relationship between soil temperature and fruit growth rate. When soil temperature was below 18 °C, the growth rate was significantly suppressed. A rapid increase in growth was observed as temperature rose to 19 °C, reaching its peak around 21.8 °C. Beyond 22.5 °C, the growth rate began to decline, indicating that maintaining soil temperature within this narrow window is critical for maximizing fruit expansion. These results suggest that excessively high soil temperatures may hinder fruit development. Given the tight coupling between soil and air temperature, precise temperature control during the expansion stage is crucial to optimizing tomato growth.

Fig. 17
figure 17

Effect of Soil Temperature on Growth Rate.

Effect of light intensity on growth rate

As shown in Fig. 18, as a light-loving crop, tomato fruit diameter growth rate increases with higher light intensity. The PDP reveals a clear positive correlation between light availability and growth. During cloudy or rainy days, insufficient solar radiation reduces greenhouse light levels and temperatures, restricting fruit development. While this study does not include artificial lighting experiments, related research indicates that supplemental lighting can positively influence growth. Thus, manual adjustment of insulation mat timing (e.g., earlier unveiling, later covering) could extend daily light duration and promote growth.

Fig. 18
figure 18

Effect of light intensity on growth rate.

Effects of soil electrical conductivity and moisture

As shown in Fig. 19, soil electrical conductivity (EC) and soil moisture (VWC) are analyzed jointly due to their interdependence. The PDP indicates that:

Fig. 19
figure 19

Effects of soil electrical conductivity and humidity on fruit growth rate.

When VWC falls below 24% or exceeds 28.5%, the growth rate declines markedly.

During the experiment, irrigation was typically performed when VWC dropped to 22–24%, and post-irrigation levels exceeded 30%, as shown in section “statistical characteristics of environmental data”.

The EC remained below 0.64 dS/m for extended periods, especially during long irrigation intervals, which also negatively impacted growth.

These findings suggest that fine-tuned irrigation—both in timing and fertilization method—is vital for maintaining optimal soil conditions for fruit expansion.

Effect of air temperature on growth rate

As shown in Fig. 20, similar to soil temperature, air temperature also shows a strong non-linear effect on fruit growth: Below 15 °C, growth is restricted; Growth rate increases sharply around 17.5 °C and peaks at approximately 23 °C; Temperatures exceeding 25 °C lead to a noticeable decline in growth rate. These results underscore the necessity of climate regulation during the expansion phase to avoid overheating, which can inhibit fruit development.

Fig. 20
figure 20

Effect of air temperature on growth rate.

Effect of CO₂ Concentration on Growth Rate

As shown in Fig. 21, although CO₂ is a key substrate for photosynthesis and theoretically enhances crop growth, the PDP analysis indicates a relatively weak influence on fruit expansion rate in this experiment. This could be attributed to: Limited variation in daytime CO₂ levels due to continuous ventilation (~ 400 ppm), the temporal mismatch between peak CO₂ concentrations (nighttime) and active photosynthesis (daytime).

Fig. 21
figure 21

Effects of CO2 concentration on fruit growth rate.

Thus, simply increasing CO₂ concentration during non-photosynthetic periods may not effectively enhance fruit growth.

Effect of air humidity on growth rate

As shown in Fig. 22, air humidity influences transpiration, which in turn affects nutrient uptake and dry matter accumulation. However, the PDP reveals minimal impact of RH on fruit growth rate in this study. This may be due to:

Fig. 22
figure 22

Effects of air humidity on fruit growth rate.

The RH being relatively stable during the daytime when transpiration is most active,

The open ventilation system, which keeps indoor humidity closely aligned with outdoor levels.

As a result, air humidity did not vary enough to exhibit a strong effect on growth.

The results collectively indicate that root-zone factors (soil temperature, soil EC, soil moisture) and solar radiation (light intensity) exert dominant control over tomato fruit expansion, while atmospheric humidity and CO₂ are less decisive under ventilated greenhouse conditions. Importantly, the thresholds derived from SHAP–PDP analyses (soil temperature ~ 21.8 °C, light intensity > 20 Klux, soil EC ~ 0.6–0.8 dS/m, soil VWC 24–28%) provide actionable guidelines for growers. For instance, maintaining irrigation schedules that prevent VWC from falling below 24% while avoiding > 30% can directly optimize fruit size. Similarly, timely insulation mat operation to increase effective daily light hours can compensate for cloudy conditions.

At the same time, while soil temperature, light intensity, and soil electrical conductivity emerged as the top predictors, model outcomes may shift with the inclusion of additional features. For example, soil quality indicators such as organic matter content, nutrient availability, or pH could further refine the interpretability of fruit growth dynamics. In this study, we focused on continuously measurable parameters within the IoT framework; however, integrating soil physicochemical attributes into future sensing systems could enhance predictive accuracy and provide a more holistic understanding of growth drivers. By quantifying both the dominant factors and the potential role of unmeasured soil quality parameters, the framework not only interprets model predictions but also translates them into practical climate and fertigation strategies, thereby enhancing the applicability of explainable AI in real-world greenhouse systems.

Conclusions

This study presents a practical and interpretable framework for analyzing how microclimatic variables influence tomato fruit expansion in greenhouse environments. By continuously monitoring air and soil temperature, humidity, light intensity, CO₂ concentration, and soil electrical properties, we identified that soil temperature, light intensity, and soil electrical conductivity are the most influential variables, with an optimal soil temperature of approximately 21.8 °C. The integration of Random Forest regression with SHAP and PDPs enabled a transparent interpretation of model outputs, revealing nonlinear and threshold-based relationships that align with known physiological responses.

To further emphasize the originality of this work, a comparative analysis with other recent greenhouse studies was conducted (Table 4). While most prior research has focused on predictive accuracy in areas such as evapotranspiration, irrigation, fertilization, or yield estimation, they often lack model transparency and practical agronomic thresholds. In contrast, this study uniquely highlights explainable AI for tomato fruit expansion, providing interpretable environmental thresholds that can directly guide growers in precision management.

Table 4 Comparative analysis with related works.

As the comparative summary highlights, our framework achieves comparable performance to state-of-the-art IoT and ML studies, while uniquely providing interpretable thresholds for practical management. The contributions of this study are fourfold: (1) development of a deployable IoT-based sensing system; (2) integration of explainable ML methods to interpret environmental drivers; (3) identification of critical thresholds for environmental regulation; and (4) demonstration of an interpretable framework for real-time agronomic decision support. Unlike traditional black-box models, this approach emphasizes transparency and practical applicability in greenhouse management.

The future work will mainly focus on the following three aspects: (1) Multi-site validation: Extend the framework to different greenhouse types, cultivars, and climatic zones to test its generalizability.(2) Real-time integration: Couple the explainable AI outputs with automated control systems for adaptive greenhouse regulation. (3) Cross-crop applications: Apply the proposed methodology to other horticultural crops to evaluate broader scalability.

Overall, this research contributes to sustainable and precision-controlled horticulture by transforming IoT data into interpretable, actionable knowledge through explainable AI.