Introduction

Greenhouses were created to improve agricultural production for the growing population, tackling the challenges faced by food production due to the limited land1. China owns more than 90% of the world’s total greenhouse facilities, the climate and environment in each region are different due to China’s vast territory, resulting in diverse environmental control strategies2. Temperature and humidity are important factors affecting crop growth. High temperature will significantly affect the sexual reproduction of crops, affect seed setting and yield, reduce the hardness and size of fruits, and affect the composition of fruits3,4,5. High humidity around crops can promote plant diseases and physiological disorders6. Therefore, the control strategies of temperature and relative humidity in the greenhouse are very important, and temperature and relative humidity prediction is the key for accurate controlling.

In recent years, data-driven models based on machine learning have been widely used in forecasting weather7,8. Tsai developed a new framework combining weather forecast data, numerical models, and machine learning methods to simulate and predict soil temperature and volumetric water content in greenhouses9. Dariouchy et al.10 developed an artificial neural network (ANN) was developed to predict the internal parameters of 7 d inside a greenhouse. ANN methods have some disadvantages, such as low convergence, overfitting and poor stability, and cannot solve the complex system prediction well. Support vector regression machine learning (SVM) has attracted more and more attention in predictive models1112,13. established a temperature prediction model, which can solve the problems of nonlinear, large variables and small sample size. Particle swarm optimization (PSO) and differential evolution (DE) are both efficient and powerful population-based stochastic models search techniques for solving optimization problems, which have been widely used in many fields14. Research15 proposed the Radial Basis Function Neural Network (RBFNN) to monitor the greenhouse microclimate, which could be used to detect the optimal setpoints for the PID controllers in any weather conditions after training and testing the RBFNN offline on the generated data. Most of the studies used a single method to predict the weather for their satisfactory prediction results.

Some researchers employed multiple methods to predict the environment parameters to obtain the best prediction method. Taki et al. selected the best method between an artificial neural network (ANN) and a support vector machine (SVM) to estimate three different variables in a polyethylene greenhouse in Shahreza City, Isfahan Province, Iran, including indoor air, soil and plant temperatures and energy exchange16. Research17 developed four different machine learning approaches such as Adaptive Neuro-Fuzzy Inference System (ANFIS) with Fuzzy C-Means (FCM), ANFIS with Subtractive Clustering (SC) and ANFIS with Grid Partition (GP) and Long Short-Term Memory (LSTM) neural network to predict temperature in one-hour ahead and one-day ahead. Mao et al.18 used Particle Swarm Optimization Algorithm (PSO) to optimize the weight coefficients of the predicted values from BiGRU-Attention and LightGBM models at different times for enhancing the accuracy of predicting air temperature, relative humidity and Photosynthetically Active Radiation (PAR) in a greenhouse. Zhu et al.19 proposed a method for predicting the irradiance of photovoltaic greenhouse under 9 weather conditions, and found Transformer model had the best prediction effect for dataset at one year of usage and Pyraformer model had the best prediction effect on winter and summer.

Multi-interval time series forecasting is widely used in facility agriculture. Li et al.20 proposed a new short-term multi-interval prediction model, which used an Attention-LSTM- based time series method to accurately predict multi-interval short-term temperature changes in a greenhouse. By using the right amount of historical data (about 48 h), the model can provide future temperature predictions with high accuracy in the range of 30 to 480 min. Guo and Yu proposed a new network framework was proposed, which considers the spatial correlation of exogenous environmental factors, the short - and long-term time dependence of sequences, and the spatio-temporal fusion correlation at different times. The model was applied to forecast the circulating water temperature in hydroponic greenhouse. The empirical results of global and local indicators show that dual storage scale network (DMSNet) can accurately predict the future 6 h, 12 h and 24 h water temperature21. Most of the previous studies have predicted the historical trends of areas such as northern China in the coming period based on historical data, but there are few studies on the prediction of greenhouse temperature and relative humidity in South China, which belongs to the tropical climate. Besides, the current research shows that the minimum model training interval was 30 min, and there is no study on the impact of smaller time intervals on the model.

In this paper, 3 models were proposed to predict the temperature and relative humidity changes of greenhouse in South China, including BPPSO (Back Propagation Particle Swarm Optimization) and LSSVM (Least Squares Support Vector Machine) and RBF (Radial Basis Function) model for obtaining the best prediction method. Six time intervals were used to analyze the sensitivity of the three models and prediction accuracy, the results of which provides a reference for the precise control of greenhouses in South China.

Materials and methods

Experimental materials

This study was conducted at the Baiyun Test Base of Guangdong Academy of Agricultural Sciences (113°25′44′′ E, 23°23′30′′ N), Guangzhou, which is located in South China and belongs to a subtropical monsoon climate. The greenhouse had a length of 30 m, a width of 32 m, and a height of 5.6 m. The Venlo greenhouse used in this experiment was equipped with a fan-pad system, a skylight, a side window, and internal and external sunshade curtains, and it was cooled with the fan-pad system. The length, width, and thickness of the fan-pad system were 16 m, 1.2 m, and 0.2 m, respectively. The greenhouse was equipped with four fans, with a 1.2 m side length and 1.1 kW of power2. The study was conducted from April 27 to June 11, 2021, total 44 days. During the experiment, data on temperature and relative humidity in the greenhouse were collected. There are 9 sensors (Elitech RC-4, − 30–60 °C ± 0.1 °C; 0–99% RH ± 3% RH, Jiangsu Jinchuang Electric Co., Ltd., Xuzhou, China) equipped in greenhouse (Fig. 1). The data recording interval was 15 min, and the total samples was 4300. The average of the temperature and relative humidity of the 9 sensors were calculated as the greenhouse temperature and relative humidity and was used as the data set (Fig. 2). Data fusion was introduced due to the inherent differences in the temperature and relative humidity readings at the sensor positions, as well as the possibility of inaccurate data caused by sensor failures or data redundancy, which aimed to optimize and calibrate the collected data using weighted methods to represent the readings of each sensor more accurately.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Sensor location.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Variation of average temperature and relative humidity in greenhouse.

Radial basis function (RBF) model

The structure of RBF network is similar to multi-layer forward network, it is a three-layer forward network. The input layer is composed of signal source nodes. The second layer is the hidden layer, the number of hidden elements depends on the needs of the problem described. The transformation function of the hidden elements is the RBF radial basis function, which is a non-negative nonlinear function with radial symmetry and attenuation to the center point. The third layer is the output layer, which responds to the action of the input mode. The transformation from the input space to the hidden layer space is nonlinear, while the transformation from the hidden layer space to the output layer space is linear. RBF networks exhibit important advantages over other neural network types, including simpler structure and faster learning models. It is a well-performing feed forward neural network model, which has proven its universal approximation capability without local minimum problem22. RBF network configuration is shown in Fig. 3. The outputs of the nonlinear activation are combined linearly with the weight vector of the output layer to produce the network output ym.

$${y_m}=\sum\limits_{{i=0}}^{M} {{\beta _i}{\varphi _i}}$$
(1)

where, βi is the joint weighted value of the ith basis function. The most commonly used radial base is the Gaussian function given as:

$${\varphi _i}(x)=\exp \left( { - \frac{{{{\left\| {x - {c_i}} \right\|}^2}}}{{\sigma _{i}^{2}}}} \right)$$
(2)

Where, ci and σi are center and spread of the ith RBF node.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

RBF network configuration.

Least squares support vector machine (LSSVM) model

Support vector machine is a supervised learning model which can be used for classification and regression. In SVM, the data is represented in n-dimensional space, and it can predict whether new training examples belong to the same category or different category. The main goal of SVM is to find a hyperplane in n-dimensional space that can classify logarithmic data points. Several potential hyperplanes can be selected to distinguish between the two types of data points. But the ideal hyperplane is one that maximizes the margin between the two types of data points, as shown in Fig. 4. The hyperplane is the boundary of the decision and helps to distinguish data points. Data points that fall on either side of the hyperplane can be attributed to various classes. The hyperplane size depends on the number of features23. Hyperplanes can use Eq. (3):

$$\overrightarrow w \cdot \overrightarrow x +b=0$$
(3)

Where \(\overrightarrow w\) is the normal vector to the hyperplane; \(\overrightarrow x\) is the set of points. The width of the margin is (2/w).

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Support vectors and intervals.

Traditional SVM relies on quadratic programming solvers with high computational complexity and slow convergence speed. LSSVM solves linear equations, which greatly reduces computational time and is suitable for real-time prediction scenarios. The regularization term of LSSVM can effectively suppress outlier interference, and combined with robust kernel function, reduce the influence of small fluctuation of input data on model output. In time-varying data, LSSVM combines dynamic optimization models to adjust model parameters in real time to match changes in data distribution and improve long-term prediction stability. LSSVM transforms the quadratic programming problem of SVM into solving linear equations and replaces the traditional SVM loss function by least square loss function, which reduces the complexity of solving and the sensitivity to noise, thus improving the prediction accuracy.

Back propagation particle swarm optimization (BPPSO) model

Neural networks have good nonlinear mapping ability and self-learning ability, and the most popular one is backpropagation (BP) neural network, which was originally proposed by Rumelhart and McClelland24. It is a multi-layer feedforward neural network trained by error backpropagation algorithm. However, some calculated values of BP model are easy to fall into local optimal values in the calculation process, so the weights and thresholds of BP model should be optimized using optimization algorithms. ANN is easy to overfit due to its complex network structure and redundant parameters, especially when it deals with nonlinear dynamic data in greenhouse25. ANN dependent gradient descent method is easy to fall into local optimization, which reduces the prediction accuracy‌. ANN training relies on iterative gradient descent, convergence speed is slow, and a lot of data support is required, which is difficult to meet the real-time prediction demand. ANN is sensitive to input noise, and when the network structure is fixed, the prediction stability decreases.

Particle swarm optimization (PSO) algorithm relies heavily on determining the spatial speed and location of updates to the particle population during evolution, as well as the dual effects of particle self-memory and population during each update26. In the process of parameter optimization, the standard PSO algorithm is prone to premature convergence, and it is difficult to find the global optimal solution, which leads to the limited accuracy of the model. Due to random initialization of particle swarm, traditional PSO has unclear convergence path, many iterations and low efficiency‌. The parameters of the standard PSO are fixed, which is difficult to adapt to the time-varying environment of the greenhouse, and the optimization results fluctuate greatly‌.

By introducing dynamic inertia weights or gradient information, BPPSO balances global exploration and local development capabilities, avoids premature convergence, and improves parameter optimization accuracy‌. At the same time, multiple hyperparameters of the prediction model, such as the weight of neural network and kernel function parameters, are optimized to reduce the artificial deviation of parameter adjustment and improve the model’s adaptability to complex greenhouse data ‌. The learning factor of particle swarm is dynamically adjusted according to the optimization stage, and the convergence process is accelerated. Combined with the gradient descent direction of BP algorithm, it provides local optimal solution guidance for particle swarm, shorters the search path and reduces invalid iterations‌. Dynamic adjustment of particle swarm parameters according to environmental data enhances adaptability to greenhouse time-varying characteristics. The outlier interference is suppressed by regularization strategy or robust loss function, and the anti-interference ability of the model is improved ‌by combining the stable parameter combination optimized by PSO. Therefore, PSO algorithm can be used to optimize BP model for temperature predictions in greenhouse (Fig. 5).

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

BPPSO working flow chart.

Data processing

The samples were taken from the average temperature and relative humidity calculated by 9 sensors collected in greenhouse, which was mentioned in “Experimental materials” section. It was divided into training set, validation set testing set, and ratio of that was set to 7 : 1 : 2. According to previous studies, six kinds of time interval were selected to study the influence of different time interval on the prediction accuracy of different models (Table 1).

Table 1 Division of the training group and the verification group.

The models adopted in this study for temperature prediction was programmed in MATLAB 2018a on a 2th Gen Intel(R) Core (TM) i9-12900 computer, and the LSSVM toolbox was employed from MATLAB 2018a. The time intervals of the model are as follows:

Step1: Historical temperature data was collected and cleaned to remove missing values or outliers. In addition, the data was normalized to make the model easier to learn.

Step2: The data was divided into feature set (input) and label set (output), and it was further divided into training set, validation set and testing set.

Step3: Set parameters for each of the three models. The fitness accuracy of the normalized samples was set to 0.001. The expansion rate of the radial basis function was set to 1000 for RBF model. For the BPPSO model, the training frequency was set to 1000, the target error was set to 10− 6, the learning rate was set to 0.01. c1 and c2 were both set to 4.494, the population size was set to 5, and the population update frequency was set to 30. Kernel parameters, penalty parameter was set to 100, and used f regression and c classification for LSSVM model.

Step4: Training the model and calculating the performance index of the model. Analyze the prediction results of the model to determine the accuracy and generalization ability of the model.

Step5: The model was used to predict the testing set and obtained the predicted result.

Model evaluation index

The indexes the root-mean-square error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), and coefficient of determination (R2) were used to evaluate the forecasting capacity of the models mentioned above. The best prediction model was selected as the least errors and the highest correlation. The indexes were calculated from the following equations:

$${\text{MAE}} = \frac{1}{n}\sum\limits_{{i = 1}}^{n} {\left| {Y_{i} - X_{i} } \right|}$$
(4)
$${\text{RMSE}} = \sqrt {\frac{1}{n}\sum\limits_{{i = 1}}^{n} {\left( {Y_{i} - X_{i} } \right)^{2} } }$$
(5)
$${\text{MAPE}} = \frac{1}{n}\sum\limits_{i}^{n} {\frac{{|X_{i} - Y_{i} |}}{{X_{i} }}}$$
(6)
$${\text{R}}^{2} = \frac{{\left[ {\sum\nolimits_{{i = 1}}^{n} {\left( {X_{i} - \bar{X}} \right)} \left( {Y_{i} - \bar{Y}} \right)} \right]^{2} }}{{\sum\nolimits_{{i = 1}}^{n} {\left( {X_{i} - \bar{X}} \right)^{2} } \sum\nolimits_{{i = 1}}^{n} {\left( {Y_{i} - \bar{Y}} \right)^{2} } }}$$
(7)

where Xi and Yi represent the measured and predicted at the ith time interval, respectively; \(\overline {{{X_i}}}\) and \(\overline {{{Y_i}}}\) represent the corresponding mean values; n is the number of data. The closer R2 is to 1, the better models perform; MAE, MAPE, and RMSE all range from 0 (perfect fit) to ∞ (the worst fit).

Results

Performance of three models in temperature prediction

The model parameters are set according to the above time intervals, and the results are reversely normalized to achieve the prediction of greenhouse temperature and relative humidity. 4300 samples of temperature were divided into intervals of 90 min, 75 min, 60 min, 45 min, 30 min and 15 min to evaluate the performance of three models. Figure 6 shows the prediction results of temperature with different models. It can be seen from Fig. 6 that the prediction accuracy of the three models is improved with the decrease of time interval, and the R-square of BPPSO, LSSVM and RBF is 0.923, 0.923 and 0.912, respectively (Fig. 7).

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Prediction results of temperature with different models.

Figure 7 shows the temperature performance of different prediction models, which also evaluated the MAE, MAPE and RMSE. In the time interval used, the maximum MAE of the three models were 1.452, 1.442 and 1.461, respectively, which responded to the time interval of 75 min. The minimum MAE were 0.571, 0.574 and 0.605, respectively, which responded to the time interval of 15 min. The maximum MAPE of three models were 5.137, 5.080 and 5.157, respectively, which responded to the time interval of 75 min. The minimum MAPE were 1.932, 1.941 and 2.038, respectively, which responded to the time interval of 15 min. The maximum RMSE of three models were 1.898, 1.942 and 1.879, respectively, which responded to the time interval of 75 min. The minimum RMSE of three models were 0.866, 0.867 and 0.924, respectively, which responded to the time interval of 15 min.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Performance of different prediction models in temperature.

Performance of three models in relative humidity prediction

4300 samples of relative humidity corresponded to temperature were divided into time intervals of 90 min, 75 min, 60 min, 45 min, 30 min and 15 min to evaluate the performance of three models. Figure 8 shows the prediction results of relative humidity with different models. It can be seen from Fig. 4 that the prediction accuracy of the three models is improved with the decrease of time interval, and the R-square of BPPSO, LSSVM and RBF is 0.948, 0.958 and 0.948, respectively, which corresponded to the time interval of 15 min (Fig. 9).

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

Prediction results of relative humidity with different models.

Figure 9 shows the relative humidity performance of different prediction models, which also evaluated the MAE, MAPE and RMSE. In the time intervals used, the maximum MAE of the three models were 4.911, 4.307 and 4.682, respectively, which responded to the time interval of 75 min. The minimum MAE were 1.591, 1.574 and 1.793, respectively, which responded to the time interval of 15 min. The maximum MAPE of three models were 5.784, 5.147 and 5.565, respectively, which responded to the time interval of 75 min. The minimum MAPE were 1.992, 1.983 and 2.154, respectively, which responded to the time interval of 15 min. The maximum RMSE of three models were 6.061, 5.583 and 5.859, respectively, which responded to the time interval of 75 min. The minimum RMSE of three models were 2.491, 2.396 and 2.493, respectively, which responded to the time interval of 15 min.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

Performance of different prediction models in relative humidity.

Model error of three models

Figure 10 shows the model error of three models while predicting temperature and relative humidity. The LESSVM model performance well in predicting temperature and relative humidity in south China greenhouse (Figs. 6, 7, 8 and 9). However, some predicted values deviated from the actual values when predicting temperature and relative humidity by three models. The number of deviation points increased with the decreased of time interval and reached the maximum value when the time interval was 15 min. The mid-value of three models was 0.32, 0.31 and 0.31, respectively, when predicting temperature, while it was 0.80, 0.76 and 1.10, respectively, when predicting relative humidity. The temperature error of three models at 15 min interval was mainly within [0.13, 0.79], [0.15, 0.82] and [0.13, 0.79], respectively. The relative humidity error of three models at 15 min interval was mainly within [0.27, 2.31], [0.40, 2.25] and [0.66, 2.18], respectively.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

Relative error of three prediction models.

*Where the white dot is the mean, the boxes represent the 25% and 75% of the data, and the whiskers contain 99.3% of the data.

Discussion

It is very important for growers to predict the changes of temperature and relative humidity advanced in greenhouse, so that growers can take measures as soon as possible to prevent crops from being hindered by environmental discomfort. The predicted temperature and relative humidity had good fitting with the original data. The R-square of temperature ranges from 0.618 to 0.923 (Fig. 7), and the R-square of relative humidity ranges from 0.565 to 0.952 (Fig. 9). When the time interval was 15 min, the temperature prediction R-square of BPPSO, LSSVM and RBF models was 0.923, 0.923 and 0.912, respectively, while the relative humidity was 0.948, 0.952 and 0.948, respectively. It showed that the smaller the time interval, the higher the prediction accuracy of the model, which was similar with the results of Yu11. In northern China, temperature is more important and easy to control, but in Southern China, relative humidity must be taken into account in temperature control, which helps to accurately control the temperature. The fitting effect of relative humidity was better than that of temperature, which was similar to the results of Mao18.

The BPPSO model shows the characteristics of high precision in practical application, which can provide more accurate prediction results and it is simple, easy to implement and operate, suitable for dealing with complex problems. The generalization ability is prone to decline due to the fixed depth of network and parameter redundancy. Therefore, when the sample size is insufficient, the risk of BP overfitting is significant. LSSVM can better solve the higher computational problem than SVM and handles large scale problem more easily and more efficiently27. ‌ The global kernel function of LSSVM is vulnerable to outlier interference and requires additional regularization, which can improve the precision of the model. RBF model is widely used in modeling and controlling nonlinear systems and it leads to accurate performance prediction by reducing variable input noise28. The number of nodes in the RBF hidden layer can be dynamically adjusted through clustering to avoid overfitting and improve model accuracy. All three models can be used to predict temperature and relative humidity, but the LSSVM model had higher accuracy for its large scale problem handling ability. Kohzadvand also found that the evaluation performance of LSSVM model was better than RBF model when he proposed multi-layer perceptron (MLP), RBF and LSSVM to accurately predict the contact angle of H2/mineral/brine systems29.

This research used LSSVM model in the time interval of 15 min, the R2, MAE, MAPE and RMSE of temperature were 0.923, 0.574, 1.941and 0.867, respectively, while relative humidity were 0.952, 1.574, 1.983 and 2.396, respectively. The R2 of the three models all increased with the decrease of the time interval, but MAE, MAPE and RMSE showed a trend of first increasing and then decreasing, and the maximum value appeared at the time interval of 75 min and the minimum value appeared at the time interval of 15 min. It may because the number of samples was too small and the fitting accuracy was poor, resulting in small errors of MAE, MAPE and RMSE. A larger amount of data will be obtained in the future, and optimize the existing methods to obtain higher-precision temperature and humidity prediction values in real time.

With the shortening of the time interval, the relative errors of each model and evaluation index would also change. Table 2 shows the relative error of evaluation index. With the decrease of the time interval, the relative error of MAE, MAPE and RMSE predicted by the three models showed a trend of decreasing first and then increasing, while the relative error of the R2 of the temperature was not obvious, which was similar with the R2 of relative humidity. With the decrease of the time interval, the relative error of MAE, MAPE and RMSE predicted by the three models showed a gradual upward trend. The results of relative error show that the time interval has obvious effects on MAE, MAPE and RMSE of the prediction model.

Table 2 Relative error of evaluation index.

Conclusions

This research studies the temperature and relative humidity prediction in South China greenhouse based on time series by the model of BPPSO, LSSVM and RBF. R2, MAE, MAPE and RMSE were adopted to evaluate the performance of prediction models. We came to the following conclusions:

  1. (1)

    The R2 of temperature and relative humidity increased gradually with the decrease of the time interval, and reached the maximum value when the time interval was 15 min. The R2 of the temperature predicted by three models were 0.923, 0.923,0.912, and the R2 of the relative humidity were 0.948,0.952, and 0.948, respectively. The prediction accuracy of relative humidity was higher than that of temperature.

  2. (2)

    All three models could be used to predict temperature and relative humidity in greenhouses in South China, among which LSSVM had higher fitting accuracy (R2). The MAE, MAPE and RMSE of temperature were 0.574, 1.941 and 0.867, respectively, while the relative humidity of that were 2.747, 3.383 and 3.907, respectively, when the time interval was 15 min.

This study provides reference for early intervention of greenhouse temperature and relative humidity management. We will adopt the regression prediction methods, combining the internal and external environmental parameters of the greenhouse to obtain the more accurate prediction models in the future.