RNN and GNN based prediction of agricultural prices with multivariate time series and its short-term fluctuations smoothing effect

Min, Youngho; Kim, Young Rock; Hyon, YunKyong; Ha, Taeyoung; Lee, Sunju; Hyun, Jinwoo; Lee, Mi Ra

doi:10.1038/s41598-025-97724-7

Download PDF

Article
Open access
Published: 21 April 2025

RNN and GNN based prediction of agricultural prices with multivariate time series and its short-term fluctuations smoothing effect

Youngho Min¹,
Young Rock Kim²,
YunKyong Hyon³,
Taeyoung Ha³,
Sunju Lee³,
Jinwoo Hyun⁴ &
…
Mi Ra Lee³

Scientific Reports volume 15, Article number: 13681 (2025) Cite this article

2791 Accesses
1 Citations
Metrics details

Subjects

Abstract

In this study, we investigate appropriate machine learning methods for predicting agricultural commodity prices. Since environmental factors including weather affect price fluctuations of agricultural commodities, we constructed a multivariate time series dataset combining wholesale prices of four agricultural commodities in South Korea, six weather variables, and week numbers. We adopted two prominent prediction methods based on recurrent neural networks (RNN) and graph neural networks (GNN): one is the stacked long short-term memory, and the other consists of two GNN-based methods, the spectral temporal graph neural network (StemGNN) and the temporal graph convolutional network. Also, we utilized a univariate prediction model as a control to evaluate the effectiveness of the multivariate approach for predicting agricultural commodity prices. In this investigation, we applied five different smoothing time window lengths to evaluate the effect of mitigating short-term fluctuations on the predictive performance of the models. The experimental results showed that the mitigation of short-term fluctuations had a greater impact on improving the performance of multivariate prediction models compared to the univariate prediction model. Among the multivariate prediction models, the GNN-based network outperformed the RNN-based network. In view of the trained model, we analyzed the main weather variables affecting agricultural commodity prices by utilizing the adjacency weight matrices in the self-attention mechanism of StemGNN.

Enhancing agricultural commodity price forecasting with deep learning

Article Open access 01 July 2025

Hybrid modeling approaches for agricultural commodity prices using CEEMDAN and time delay neural networks

Article Open access 04 November 2024

CMTNet: a hybrid CNN-transformer network for UAV-based hyperspectral crop classification in precision agriculture

Article Open access 11 April 2025

Introduction

Agricultural commodities play a vital role in the global economy, serving as essential food resources and raw materials for various industries^{1,2,3,4,5,6,7,8,9}. Their prices are influenced by complex factors such as weather, seasonality, and market dynamics^10,11,12. Climate is one of the most influential factors in agricultural production, as rising temperatures, changes in precipitation patterns, and extreme weather events such as heatwaves, droughts, and floods significantly affect crop growth and yields^{13,14,15,16,17}. These climatic changes negatively affect the growth of certain crops and, in severe cases, necessitate the relocation of production sites^{18,19,20,21,22}. The uncertainty of weather conditions causes anxiety among agricultural suppliers and negatively impacts the formation of stable agricultural commodity prices. These challenges have exacerbated price volatility, posing difficulties for accurate price prediction. For the reason, even though it is getting more harder and harder task, price prediction of agricultural commodities is becoming an increasingly important factor for protecting economic environment^23,24,25. This holds true not only for direct consumers of agricultural commodities but also for food manufacturers, as the prediction of raw material costs plays an important role in making informed purchasing decisions. Through price forecasting, companies can optimize their sourcing strategies and effectively manage inventory to minimize costs and improve profitability. Price fluctuations can also affect the entire supply chain, therefore accurate price prediction can help companies mitigate risks^26,27,28. Consequently, there is a growing need for advanced methods capable of handling the intricate interdependencies within agricultural data²⁴.

Traditional time-series models and univariate time series models have been widely used for agricultural commodity prices forecasting^{29,30,31,32,33}. There are substantial amounts of researches in the field of agricultural forecasting modeling, involving a wide range of forecasting models. A comprehensive overview of the range of modeling techniques available for agricultural price forecasting is shown in Table 1. While these methods perform well when patterns are consistent, they often fail to account for external factors like weather, leading to significant forecast errors. Table 1 systematically reviews the current state of researches, categorizes and lists existing researches, and highlights their contributions and limitations in descriptions column. Existing researches are classified based on modeling techniques into traditional time-series models (e.g., ARIMA, Exponential smoothing, etc.), machine learning models (e.g., Random Forest (RF), Support Vector Machines (SVM), etc.), deep learning models (e.g., LSTM, GNN), hybrid models (combination of traditional and machine learning models, e.g., ARIMA-LSTM, etc.), and ensemble models (stacking, bagging, boosting, etc.). Deep learning models, particularly Recurrent Neural Network (RNN)-based models like Long Short-Term Memory (LSTM), have improved performance by capturing temporal dependencies^34,35. However, these models primarily focus on single-variable patterns or struggle to handle complex multivariate dependencies, limiting their effectiveness in real-world scenarios. Additionally, recent advances in Graph Neural Network (GNN)-based models have demonstrated their potential in capturing spatial and temporal relationships³⁶, yet their application to agricultural commodity prices forecasting remains underexplored. Recent work, such as Jin, Ming, et al.³⁷ on GNNs for time series forecasting, emphasize the growing interest in GNN applications across domains. Furthermore, advancements in memristor-based neural network hardware open new avenues for efficient implementation of such models, including applications in associative memory and operant conditioning^38,39.

Table 1 Existing researches on agricultural commodity forecasting models.

Full size table

According to the researches presented in Table 1, most models that did not employ deep learning performed univariate predictions using only a single agricultural commodity price. Even in univariate predictions, machine learning and deep learning models were introduced to capture the non-linearity of agricultural commodity prices, with LSTM receiving particular attention. Additionally, studies comparing statistical methods with LSTM consistently found that LSTM demonstrated superior predictive performance. As models advanced to predict multiple agricultural commodity prices simultaneously, interest in capturing correlations among multivariate data increased. This correlation analysis was not limited to relationships among different agricultural commodities but was also applied to examining the relationship between weather variables and agricultural commodity prices. According to related studies, it has been revealed that weather variables play a crucial role in agricultural commodity price forecasting. However, a key limitation of this studies was that they did not consider weather variables. Additionally, existing GNN-based studies on agricultural commodity price forecasting used only a single graph network and lacked comparative analyses across different graph-based architectures.

To address these limitations, this study introduces two distinct GNN networks to analyze the relationships between agricultural commodity prices and weather variables. By comparing their performance with LSTM, we evaluate the overall effectiveness of both RNN and GNN based models in agricultural commodity price forecasting. Specifically, we employ Stacked LSTM as the representative RNN model and introduce Spectral Temporal Graph Neural Networks (StemGNN) and Temporal Graph Convolutional Networks (T-GCN) as GNN-based models^36,48. These models are tested on datasets integrating weather variables, agricultural commodity prices, and temporal features. We further examine the impact of smoothing techniques, such as rolling averages, on mitigating short-term price volatility and enhancing predictive accuracy.

Our findings contribute to the literature in several key ways. First, we provide a comprehensive comparison of RNN and GNN approaches for multivariate time series forecasting in the context of agricultural commodity prices. Second, we demonstrate the effectiveness of GNN models, particularly StemGNN and T-GCN, in capturing complex inter-variable relationships and temporal dependencies. Third, we evaluate the role of smoothing techniques in improving model performance, particularly for commodities with high price volatility. Finally, we offer insights into the influence of weather and temporal factors on agricultural commodity prices through interpretability analyses using GNN models.

This research underscores the potential of GNN-based models to outperform traditional RNN-based approaches by effectively capturing the intricate spatial and temporal dynamics present in agricultural data. By introducing short-term fluctuations smoothing, this study was able to overcome the limitations of existing methods and also provides valuable insights and practical implications for stakeholders seeking reliable agricultural commodity prices forecasts.

Data sets

In this section, we introduce the data set used to predict agricultural commodity prices. We select representative agricultural commodities that demonstrate active consumption in South Korea and predict their wholesale prices. The characteristics considered in the selection of agricultural commodities include the annual production frequency, storability, importability, and market behavior. The production frequency is a feature influenced by factors such as the growing period of the agricultural commodities and the agricultural environment, which includes field farming and greenhouse cultivation. Based on these characteristics, we have chosen four agricultural commodities: potatoes, onions, lettuce, and cucumbers. The information on the characteristics of each agricultural commodity is based on data from the Rural Development Administration (RDA) of South Korea⁴⁹. This is described in Table 2.

Table 2 Characteristics of agricultural commodities.

Full size table

As described in Table 2, the four agricultural commodities exhibit distinct characteristics. These characteristics influence the wholesale price data of agricultural commodities, resulting in different types of time series data. Hence, they were selected as predictor variables suitable for researching time series data with diverse characteristics.

For this purpose, we collected wholesale market price data for four agricultural commodities. These data were provided by the Korea Agricultural Marketing Information Service (KAMIS) and represent daily prices traded at the Seoul Wholesale Market. In addition, we collected weather data including maximum temperature, minimum temperature, precipitation, average humidity, average temperature, and daily temperature range from the Korea Meteorological Administration (KMA). We used data from January 1, 2000 to April 1, 2022. In particular, agricultural product prices have missing values due to transactions not taking place on weekends or public holidays. To address this issue, we replaced the missing values by imputing them with the prices of the previous day. In actual markets, traders also refer to the prices of the previous business day when making decisions on weekends and public holidays. Therefore, our approach to handling missing values reflects real market practices.

We were then able to construct a table dataset with columns representing the collected features and rows representing specific time instances. In addition, to reflect annual seasonality, we included a feature representing the week number within a year for each date. This feature indicates which week it represents out of the 52 weeks of the year. The features in our dataset consist of four agricultural commodity prices, six weather data, and the week number.

Data preprocessing

Our dataset relies on daily price data for agricultural commodities in wholesale markets, where prices are established through public auctions or sales counters. These prices are determined through competitive bidding, considering factors such as the quality and quantity of agricultural commodities available on a given day. Consequently, agricultural product prices reflect their fair market value, driven by supply and demand, with additional uncertainty introduced by the auction process.

There is a need to smooth the data to consider the price fluctuations due to auctions as noise and to discern the trend of agricultural product prices. Smoothing the data helps to alleviate the noise and visualize the main trends and patterns more clearly. This allows the model to focus on the actual patterns rather than noise, which improves the prediction accuracy. Specifically, the rolling average method clearly reveals the periodicity and trends in the data and highlights important data features. In addition, as the trend becomes clearer, the correlation between multivariate time series data can be more clearly revealed, which can affect the performance of the model.

The length of the rolling average period should be selected based on the data and the purpose of the analysis. According to the report of Statistics Korea⁵⁰, the time taken to survey harvested agricultural commodities varies. In the case of onions, activities such as harvest evaluation, quality inspection, storage and distribution condition analysis, and price surveys are included, and it takes more than two weeks. On the other hand, potatoes take about a month because they have a long storage period and need to evaluate potential changes during storage. The results of these surveys affect decisions about import and export quantities. Therefore, it is more appropriate for the Korean market to analyze and forecast price trends developed over two weeks or more than using wholesale market price data determined in a single period.

As a result, we applied a four-step smoothing process to derive datasets from the original dataset, and used the rolling average method for data preprocessing. By setting the rolling window sizes to 7 days (1 week), 14 days (2 weeks), 21 days (3 weeks), and 28 days (4 weeks), we obtained five datasets including the original data. To easily distinguish the datasets, we named the original data as \(ra_0\), the rolling average data for 7 days as \(ra_7\) (for 14, 21, 28 days as \(ra_{14}\), \(ra_{21}\), \(ra_{28}\), repectively). Due to the smoothing of the datasets, the length of each dataset is different. In order to fairly evaluate the model using the five datasets, we fixed the period of the training, validation, and test datasets to specific periods. Specifically, the training period is set from January 30, 2000 to December 31, 2015, the validation period is set from January 1, 2016 to December 31, 2019, and the test period is set from January 1, 2020 to April 1, 2022.

Methods

RNN and GNN-based machine learning methods

This study employs both RNN-based and GNN-based models for agricultural commodity prices forecasting. Stacked LSTM, an RNN-based model, is used to predict both univariate and multivariate time series, whereas StemGNN and T-GCN, both GNN-based models, are applied only to multivariate prediction. The GNN-based models utilize a graph structure representing the relationships between agricultural commodity prices and weather variables, making them suitable exclusively for multivariate forecasting.

All models share common hyperparameters. The input sequence length, referred to as the window size, determines how many past observations are used for training and prediction. The prediction horizon specifies how far ahead the model forecasts beyond the last observation in the input window.

Forecasting process

In this subsection, we describe the prediction process that is commonly applied to Stacked LSTM, StemGNN, and T-GCN. To ensure fair evaluation, we use the same prediction process for all models.

Let T be the number of timestamps in the training period and m be the number of features, then the data can be represented as a matrix \(X\in {\mathbb {R}}^{T\times m}\) and for each time step t, \(X_t\in {\mathbb {R}}^m\). In a multivariate prediction model, m is greater than 1, while in a univariate prediction model, m is set to 1. Assuming the window size is n, the input data is provided as an \(n\times m\) matrix. We stack these two-dimensional input data samples to prepare a three-dimensional data set of size \((T-n+1) \times n \times m\). The validation and test data can be prepared in the same way. The hyperparameter horizon, denoted as h, represents the target data that is h days ahead, which is less than the prediction horizon of T. To represent the learning and prediction process, we define a machine learning model as a function F. The model is trained to make predictions using a target vector of size m, which represents the value in h days. This process is described by the equation \(F([X_{t-n+1}, X_{t-n+2}, \dots , X_{t}], \Phi ) = \hat{X}_{t+h}\), where \(\hat{X}_{t+h}\) represents the predicted value in h days, and \(\Phi\) represents the parameters of the model.

The structures of models

In this study, we employ three machine learning models: Stacked LSTM, StemGNN, and T-GCN. This subsection provides a detailed description of their architectures and the mechanisms through which they capture temporal and spatial dependencies in time series data.

Stacked LSTM

A stacked LSTM is a type of RNN and has a structure where LSTM cells are stacked in multiple layers. The LSTM cell in the structure is used to remember both long-term and short-term states of the data and to predict the state at the next time step. The stacked LSTM enables learning of more complex time series patterns by stacking multiple LSTM layers and predict the complexity of given time series data.

We applied a SBU-LSTM⁵¹ to forecast univariate and multivariate time series data. The SBU-LSTM is a stacked mechanism with multiple layers of univariate LSTM or bidirectional LSTM components⁵¹. In this study, we adopted a bidirectional LSTM stacked with one univariate LSTM layer. To avoid any confusion, we will refer to the SBU-LSTM applied for forecasting univariate and multivariate time series data as univariate stacked LSTM and multivariate stacked LSTM, respectively.

StemGNN

StemGNN is a graph-based model designed to capture both inter-series correlations and temporal patterns in multivariate time series data³⁶. Within the StemGNN framework, the data is embedded into a graph representation through a latent correlation layer, and predicted results are obtained through the StemGNN layer, which comprises two residual StemGNN blocks.

A self-attention mechanism is employed in the latent correlation layer to learn dependencies between different time series, ensuring that the model captures complex interactions among variables. In addition, StemGNN applies Graph Fourier Transform (GFT) to analyze time series data in the frequency domain. By transforming time series signals into the spectral domain, GFT allows the model to identify dominant frequency components and filter noise, enhancing predictive performance.

The model first passes data through Gated Recurrent Units (GRUs) to extract hidden states for each timestamp, which are then used to construct the adjacency matrix via self-attention. The transformed graph is processed through the StemGNN layer, where spectral graph convolution is applied to extract meaningful frequency-domain representations. Finally, the model reconstructs the time series from these spectral features to generate accurate forecasts.

T-GCN

T-GCN combines Graph Convolutional Networks (GCNs) and Gated Recurrent Units (GRUs) to model both spatial and temporal dependencies in multivariate time series data⁴⁸. The GCN layer captures spatial correlations between variables, while the GRU layer processes sequential patterns over time. This structure makes T-GCN particularly effective for forecasting problems where relationships between multiple variables evolve dynamically.

We applied T-GCN which is originally designed for traffic prediction⁴⁸ to agricultural commodity prices forecasting. Instead of modeling road networks, we construct a spatio-temporal graph representation where each node represents an agricultural commodity or weather variable. The edges between nodes are defined based on their correlations, forming an adjacency matrix that encodes the interdependencies between commodity prices and weather conditions. The GCN layer extracts spatial features from this graph, which are then processed by the GRU layer to learn temporal dependencies in price fluctuations. By leveraging both spatial and temporal relationships, T-GCN has the potential to improve agricultural commodity prices forecasting by capturing complex interactions that traditional time series models may overlook.

Results

Comparative analysis of prediction errors among models

In this subsection, we evaluated and analyzed the performance of each model using mean absolute percentage error (MAPE), \(MAPE = \frac{100}{n} \sum _{i=1}^n \left| \frac{Y_i - \hat{Y}_i}{Y_i} \right|\) where \(Y_i\) is the actual value, \(\hat{Y}_i\) is the predicted value, and n is the total number of samples. Here, we compared the MAPE of StemGNN, T-GCN, multivariate stacked LSTM, and univariate stacked LSTM across five datasets, from \(ra_0\) to \(ra_{28}\). To assess the performance of the four models, we set the prediction horizons to 7 days and 14 days. These horizons are practical indicators that can be used to reflect the local real market conditions. The MAPEs of the models across all hyperparameters are presented in the Supplementary Table S1. Using the results in Supplementary Table S1, we presented the errors in the mean MAPE for horizon 7 and 14 on RAs for each products in Fig. 1.

In Fig. 1, we calculated slopes of the dashed lines through linear fitting. If we looked at the slopes of the dashed lines for all products, we could see that MAPE values decreased as the rolling window sizes increased. The slopes of these dashed lines illustrate the rate of change in MAPE as the rolling window size increases across the dataset. For example, in the top-left panel in Fig. 1, the slope for StemGNN is \(-0.2134\), for T-GCN is \(-0.2480\), for univariate stacked LSTM is \(-0.1610\), and for multivariate stacked LSTM is \(-0.1959\). Similar trends are observed in the other panels. We obtained smaller MAPE values with smaller rolling window sizes for the univariate prediction model (univariate stacked LSTM), which was a relatively small improvement. Among the four models, we found that T-GCN exhibited the steepest decline in MAPE with increasing rolling window sizes, followed by StemGNN, multivariate stacked LSTM, and univariate stacked LSTM. These results indicate that mitigating short-term fluctuations in data is particularly beneficial for multivariate prediction models, with GNN-based models showing significant improvements.

In Fig. 2, we presented the predicted prices of onions with \(ra_{28}\), a window size of 60, and a prediction horizon of 14 days. (The predicted prices for all agricultural commodities with \(ra_{28}\), window sizes of 45 and 60, and a prediction horizon of 14 days were provided in Supplementary Fig. S2.) The black, red, green, blue, and pink curves represented the actual prices, and the predicted prices of StemGNN, univariate stacked LSTM, multivariate stacked LSTM, and T-GCN, respectively. Also, one could easily find time shift phenomena in the magnified region of Fig. 2. To accurately measure the time shift between the predictions of all tested networks and the actual data, we introduced cross-correlation, which was a measure of correlation between two time series while considering time shifts. It evaluated how well the values of one time series matched the values of another time series at a specific time shift. Using cross-correlation, we found the optimal time shift that maximized the correlation between the predicted data and the actual data for each network. We presented these optimal time shift values in Table 3. The multivariate prediction models showed small shift values compared to univariate prediction models. Moreover, the optimal shift values for multivariate prediction models were smaller than the prediction horizon, whereas the optimal shift values for the univariate stacked LSTM were equal to or larger than the prediction horizon. The univariate stacked LSTM maintained a trend almost identical to the actual data and had lower MAPE, but experienced significant shift values during the prediction process, failing to accurately reflect price fluctuations at specific times. Therefore, it was insufficient to evaluate model performance based solely on MAPE, but it was more important to accurately capture short-term fluctuations.

Table 3 Comparison of optimal shift ranges among StemGNN, T-GCN, multivariate stacked LSTM, and univariate stacked LSTM models for \(ra_{28}\) dataset.

Full size table

The multivariate stacked LSTM showed predicted prices that followed the actual price trends and reduced the shift phenomena, reflecting price fluctuations at the appropriate times. However, it showed larger variances between predicted prices and actual prices, with a larger MAPE compared to the other two models. StemGNN followed the overall price trends and showed predicted prices with a small MAPE range compared to the actual prices. While it exhibited more fluctuations than the univariate prediction model, it had the lowest MAPE because of the mitigated shift phenomena. T-GCN had a shift value between StemGNN and the univariate stacked LSTM. Therefore, considering both performance and time shifts, GNN-based models showed better predictions of short-term fluctuations compared to univariate models. Specifically, StemGNN exhibited the least shift phenomena among them.

Performance analysis of prediction models for each agricultural commodity

In this subsection, we mainly conducted an analysis of the effects of short-term fluctuations on each agricultural commodity by examining the prediction errors. However, a simple exploratory data analysis (EDA) for seasonalities and price trends was adopted, and their volatilities was also adopted for more information. First of all, the results in Fig. 3 were obtained by the Seasonal-Trend decomposition using LOESS (STL)⁵² by year. Since the seasonalities and trends were a quite different, we could not find their common behavior easily. We considered their volatilities to extract more information among the prices. As evident from Fig. 3, each agricultural commodity exhibited varying levels of volatility. As shown in Table 4, the price volatilities⁵³ of lettuce and cucumber were bigger than those of other agricultural commodities about 22 years (January 1, 2000–April 1, 2022).

The volatility in agricultural commodity prices was caused by various complicated factors, including the characteristics of the products themselves and the uncertainty of auction prices in the wholesale market. Hence, it was not easy to figure out specific factor. However, according to Table 2, potato and onion were relatively stable agricultural commodities in terms of storability and importability. In contrast, lettuce and cucumber were more dependent on domestic production due to their relatively lower storability. Additionally, lettuce and cucumber had shorter cultivation periods compared to the other two crops, resulting in more frequent price fluctuations. This fact aligned with our volatility analysis, which showed that lettuce and cucumber exhibited high levels of volatility.

Table 4 The volatility measured on the price dataset with different rolling window size.

Full size table

In Fig. 1, we observed that the prediction errors for cucumber and lettuce, which showed high volatility, were larger than other agricultural commodities in both univariate and multivariate prediction models. However, we found in the previous subsection that mitigating short-term fluctuations improved prediction performance for both univariate and multivariate prediction models, but a more significant effect in multivariate prediction models.

We compared the prediction performance between stable commodities (potatoes and onions) and highly volatile commodities (lettuce and cucumber). As shown in Fig. 1, we observed that predicting the stable commodities was generally easier than predicting the highly volatile commodities. However, the reduction in MAPE for the highly volatile commodities when short-term fluctuations were mitigated was more significant. This suggested that while predicting the prices of highly volatile commodities was inherently more challenging than stable commodities, the impact of mitigating short-term fluctuations was bigger for the former. When comparing the performance of the multivariate and univariate prediction models for each group, we found that for the stable commodities, the average slope of the MAPE for the multivariate models was approximately \(-0.2261\), while for the univariate model it was around \(-0.1372\). On the other hand, for the highly volatile commodities, the average MAPE slope for the multivariate models was about \(-0.5140\), while for the univariate model it was approximately \(-0.4690\). This indicated that mitigating short-term fluctuations significantly enhanced prediction performance, especially when multivariate prediction models were applied to highly volatile commodities. Furthermore, when comparing the results of the GNN-based models with those of the RNN-based models, we observed that the GNN-based models showed a more significant improvement in prediction performance when short-term fluctuations in agricultural commodity data were mitigated. This suggested that GNN-based models were better equipped to handle the effects of short-term volatility in agricultural data.

Next, we examined the relationship between agricultural commodity prices and weather data. Since agricultural commodity prices and weather data were complex time series, extracting correlations between them was a challenging task. However, StemGNN had an advantageous structural layer-the latent correlation layer-which was the adjacency weight matrix of the graph. We evaluated the performance based on different window sizes and prediction horizons to determine the optimal hyperparameters for StemGNN. In the experiments, since the best performance was achieved with \(ra_{28}\), we considered the best model obtained with \(ra_{28}\). The errors for each agricultural commodity are summarized in Table 5. The lowest errors for each agricultural commodity are highlighted in bold in Table 5. For all agricultural commodities, the errors were smaller when the horizon was set to 7 compared to 14. Potato achieved the best performance with a window size of 45 days when the horizon was 7 days, while the other agricultural commodities achieved the best performance with a window size of 30 days. Moreover, the performance of the model varied depending on the prediction horizon, and the optimal window size for the best performance changed accordingly. We presented a certain relationship among the multivariate variables with the adjacency weight matrix in Fig. 4. Note that the adjacency weight matrix was dependent on the target variable, as it was learned from data with varying target variables. In Fig. 4, panel (a) showed the adjacency weight matrix when using the \(ra_{28}\) dataset with a horizon of 7 days and a window size of 45. The rows in panel (a) represented the features used, and the first row showed the connection strength between the prices of potatoes and other features. Similarly, panel (b) showed the adjacency weight matrix when using the \(ra_{28}\) dataset with a horizon of 7 days and a window size of 30. The prices of lettuce, onion, and cucumber achieved the best performance using the model with this adjacency weight matrix.

Table 5 The prediction errors of all agricultural commodities when rolling window size is 28-day. Bold values indicate the lowest MAPE for each agricultural commodity on window sizes and horizons.

Full size table

The results in Fig. 4 summarized the feature importance based on agricultural and weather features, which were further detailed in Table 6. This table provided a comparison of the importance of agricultural and weather variables for each agricultural commodity. For potato, the most significant agricultural feature was onion, with the week number emerging as the most important weather variable. Average temperature and average humidity also played important roles. In contrast, for onion, potato was the most influential agricultural feature, while average humidity was the dominant weather variable, with temperature and precipitation also having notable effects. Lettuce’s predictions were most influenced by onion prices, while the week number showed minimal importance among the weather variables. The remaining weather variables had similar levels of importance. As for cucumber, potato was the most important agricultural feature, with average humidity again being the most influential weather variable. The importance of weather variables varied depending on the crop. For potato, week number, average temperature, and humidity were crucial, while for onion and cucumber, average humidity and precipitation were more impactful. These findings were consistent with real-world agricultural production variables. For example, Dahal et al. reported that potatoes are highly sensitive to temperature changes, making them particularly susceptible to climate variations, which significantly affected potato production⁵⁴. This confirmed that our machine learning model’s findings, specifically the relationships between agricultural commodity prices and weather data represented in the GNN-generated graph, were consistent with actual agricultural insights.

Table 6 Feature importance by adjacency weight matrix based correlation for each agricultural commodity.

Full size table

Discussion

As shown in the previous section, the comparison of multivariate and univariate prediction performances in stacked LSTM showed that mitigating short-term fluctuations through the rolling average technique was effective in improving the performance of multivariate prediction models. Univariate prediction models captured characteristic patterns within a single time series data, while multivariate prediction models not only captured these patterns but also learned the latent relations between different time series data. The result elucidated that the effectiveness of short-term fluctuation mitigation in multivariate prediction models helped in capturing the relations between different time series data.

Another interesting observation was the time shift phenomena in the prediction results. The time shift in multivariate prediction models was shorter than in univariate ones in the optimal shift values, Table 3, which were calculated using cross-correlations. The reason was that the more correlated time series data were, the less time shift there was. Therefore, properly correlated multivariate time series data were a more suitable choice than univariate time series data for reducing the time shift phenomena.

Considering the complexity of time series, lettuce and cucumber exhibited higher volatility compared to other agricultural commodities, and the volatility of these commodities was found to affect the performance of the prediction model. Specifically, for highly volatile commodities, smoothing short-term fluctuations using the rolling average technique significantly enhanced the prediction performance. These results showed that smoothing short-term fluctuations helped prediction models handle volatile data more effectively. In particular, GNN-based models showed the best performance on datasets with smoothed short-term fluctuations. Thus, multivariate GNN-based prediction models, which smoothed short-term fluctuations, were more suitable for capturing potential relationships in highly volatile data.

As expected, we also found that weather variables played a crucial role in predicting agricultural product prices. According to the adjacency weight matrix obtained by GNN-based models, the average temperature was an important variable for predicting potato prices, while the highest and lowest temperatures played a significant role in predicting lettuce prices. These findings were consistent with real-world agricultural production factors, where weather conditions directly impacted crop yields. Therefore, combining weather variables in predictive models could significantly enhance the accuracy of agricultural product price forecasts.

This study focused on agricultural commodity prices forecasting within South Korea, nevertheless, since the capability of machine learning based on data allows the model to predict those in other counties and regions, especially in similar patterns. However, it is not easy to get an expected performance in using the base model for quite different patterns in trends and seasonality of the time series data. This barrier can be overcome by fine-tuning approach with the local data. The investigation and analysis for applicability of the trained model to other countries and regions are planned to future studies.

Conclusion

In conclusion, this study demonstrates that mitigating short-term fluctuations can significantly improve the performance of multivariate prediction models, especially for highly volatile agricultural commodities. Furthermore, it reveals that GNN-based prediction models are more effective than RNN-based prediction models for multivariate data prediction involving agricultural and weather data. We showed that using methodologies that effectively leverage multivariate data can improve the accuracy of agricultural commodity prices prediction, which has important implications for practical market applications.

Future research should aim to further enhance the performance of prediction models through more comprehensive data analysis that includes various agricultural commodities and their international import and export price data. It might be more applicable to real agricultural commodity prices, but harder task.

Data availability

The datasets analyzed in this study were obtained from publicly accessible sources. Wholesale market price data for agricultural commodities were provided by the Korea Agricultural Marketing Information Service (KAMIS) and are available at https://www.kamis.or.kr. Weather data, including variables such as maximum and minimum temperature, precipitation, average humidity, mean temperature, and daily temperature range, were retrieved from the Korea Meteorological Administration (KMA) and can be accessed at https://www.kma.go.kr/neng.

References

Tilman, D., Cassman, K. G., Matson, P. A., Naylor, R. & Polasky, S. Agricultural sustainability and intensive production practices. Nature 418, 671–677 (2002).
Article ADS CAS PubMed Google Scholar
Alexandratos, N. World food and agriculture: Outlook for the medium and longer term. Proc. Natl. Acad. Sci. 96, 5908–5914 (1999).
Article ADS CAS PubMed PubMed Central Google Scholar
David, B., Wolfender, J.-L. & Dias, D. A. The pharmaceutical industry and natural products: Historical status and new trends. Phytochem. Rev. 14, 299–315 (2015).
Article CAS Google Scholar
Sharma, N., Allardyce, B., Rajkhowa, R., Adholeya, A. & Agrawal, R. A substantial role of agro-textiles in agricultural applications. Front. Plant Sci. 13, 895740 (2022).
Article PubMed PubMed Central Google Scholar
Madurwar, M. V., Ralegaonkar, R. V. & Mandavgane, S. A. Application of agro-waste for sustainable construction materials: A review. Constr. Build. Mater. 38, 872–878 (2013).
Article Google Scholar
Finlay, M. R. The industrial utilization of farm products and by-products: The USDA regional research laboratories. Agric. Hist. 64, 41–52 (1990).
Google Scholar
Pan, Z., Zhang, R. & Zicari, S. Integrated Processing Technologies for Food and Agricultural By-Products (Academic Press, 2019).
Morin-Crini, N., Lichtfouse, E., Torri, G. & Crini, G. Applications of chitosan in food, pharmaceuticals, medicine, cosmetics, agriculture, textiles, pulp and paper, biotechnology, and environmental chemistry. Environ. Chem. Lett. 17, 1667–1692 (2019).
Article CAS Google Scholar
Gollin, D. Agricultural productivity and economic growth. Handb. Agric. Econ. 4, 3825–3866 (2010).
Article Google Scholar
Habib-ur Rahman, M. et al. Impact of climate change on agricultural production: Issues, challenges, and opportunities in Asia. Front. Plant Sci. 13, 925548 (2022).
Article PubMed PubMed Central Google Scholar
Arora, N. K. Impact of climate change on agriculture production and its sustainable solutions. Environ. Sustain. 2, 95–96 (2019).
Article Google Scholar
Kukal, M. S. & Irmak, S. Climate-driven crop yield and yield variability and climate change impacts on the us great plains agricultural production. Sci. Rep. 8, 1–18 (2018).
Article ADS Google Scholar
Uprety, D. C. & Reddy, V. Crop Responses to Global Warming (Springer, 2016).
Schlenker, W., Hanemann, W. M. & Fisher, A. C. The impact of global warming on us agriculture: An econometric analysis of optimal growing conditions. Rev. Econ. Stat. 88, 113–125 (2006).
Article Google Scholar
Rahat, S. H. et al. Bracing for impact: How shifting precipitation extremes may influence physical climate risks in an uncertain future. Sci. Rep. 14, 17398 (2024).
Article CAS PubMed PubMed Central Google Scholar
Sivakumar, M. V. Climate extremes and impacts on agriculture. Agroclimatol. Link. Agric. Clim. 60, 621–647 (2020).
Google Scholar
Heino, M. et al. Increased probability of hot and dry weather extremes during the growing season threatens global crop yields. Sci. Rep. 13, 3583 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Olesen, J. E. et al. Impacts and adaptation of European crop production systems to climate change. Eur. J. Agron. 34, 96–112 (2011).
Article Google Scholar
King, M. et al. Northward shift of the agricultural climate zone under 21st-century global climate change. Sci. Rep. 8, 7904 (2018).
Article ADS PubMed PubMed Central Google Scholar
Deryng, D., Conway, D., Ramankutty, N., Price, J. & Warren, R. Global crop yield response to extreme heat stress under multiple climate change futures. Environ. Res. Lett. 9, 034011 (2014).
Article ADS Google Scholar
Bisbis, M. B., Gruda, N. & Blanke, M. Potential impacts of climate change on vegetable production and product quality—a review. J. Clean. Prod. 170, 1602–1620 (2018).
Article CAS Google Scholar
Das, H. P. Agrometeorological impact assessment of natural disasters and extreme events and agricultural strategies adopted in areas with high weather risks. In Natural Disasters and Extreme Events in Agriculture: Impacts and Mitigation, 93–118 (Springer, 2005).
Pearce, D. W. & Pretty, J. N. Economic Values and the Natural World (Earthscan, 1993).
Tomek, W. G. & Kaiser, H. M. Agricultural Product Prices (Cornell University Press, 2014).
Klemm, T. & McPherson, R. A. The development of seasonal climate forecasting for agricultural producers. Agric. For. Meteorol. 232, 384–399 (2017).
Article ADS Google Scholar
Albarune, A. & Habib, M. M. A study of forecasting practices in supply chain management. Int. J. Supply Chain Manag. (IJSCM) 4, 55–61 (2015).
Google Scholar
Chopra, S. & Meindl, P. Supply Chain Management. Strategy, Planning & Operation (Springer, 2007).
Minner, S. Strategic Safety Stocks in Supply Chains, Vol. 490 (Springer, 2012).
Menculini, L. et al. Comparing prophet and deep learning to arima in forecasting wholesale food prices. Forecasting 3, 644–662 (2021).
Article Google Scholar
Jakaša, T., Andročec, I. & Sprčić, P. Electricity price forecasting—arima model approach. In 2011 8th International Conference on the European Energy Market (EEM), 222–225 (IEEE, 2011).
Pai, P.-F. & Lin, C.-S. A hybrid arima and support vector machines model in stock price forecasting. Omega 33, 497–505 (2005).
Article Google Scholar
Weng, Y. et al. Forecasting horticultural products price using arima model and neural network based on a large-scale data set collected by web crawler. IEEE Trans. Comput. Soc. Syst. 6, 547–553 (2019).
Article Google Scholar
Mgale, Y. J., Yan, Y. & Timothy, S. A comparative study of arima and holt-winters exponential smoothing models for rice price forecasting in Tanzania. Open Access Lib. J. 8, 1–9 (2021).
Google Scholar
Omar, M. I., Dewan, M. F. & Hoq, M. S. Analysis of price forecasting and spatial co-integration of banana in Bangladesh. Eur. J. Bus. Manage 6, 244–255 (2014).
Google Scholar
Park, T.-S., Keum, J., Kim, H., Kim, Y. R. & Min, Y. Predicting Korean fruit prices using LSTM algorithm. J. Korean Soc. Ind. Appl. Math. 26, 23–48 (2022).
Google Scholar
Cao, D. et al. Spectral temporal graph neural network for multivariate time-series forecasting. Adv. Neural. Inf. Process. Syst. 33, 17766–17778 (2020).
Google Scholar
Jin, M. et al. A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection. IEEE Trans. Pattern Anal. Mach. Intell. 46, 10466–10485 (2024).
Sun, J., Yue, Y., Wang, Y. & Wang, Y. Memristor-based operant conditioning neural network with blocking and competition effects. IEEE Trans. Ind. Inform. 22, 10209–10218 (2024).
Sun, J., Zhai, Y., Liu, P. & Wang, Y. Memristor-based neural network circuit of associative memory with overshadowing and emotion congruent effect. IEEE Trans. Neural Netw. Learn. Syst. 36, 3618–3630 (2024).
Kaewchada, S., Ruang-On, S., Kuhapong, U. & Songsri-in, K. Random forest model for forecasting vegetable prices: A case study in Nakhon Si Thammarat Province, Thailand. Int. J. Electr. Comput. Eng. (IJECE) 13, 5265–5272 (2023).
Article Google Scholar
Paul, R. K. et al. Machine learning techniques for forecasting agricultural prices: A case of brinjal in Odisha, India. PLoS ONE 17, e0270553 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q. et al. Short-term forecasting of vegetable prices based on LSTM model—evidence from Beijing’s vegetable data. PLoS ONE 19, e0304881 (2024).
Article CAS PubMed PubMed Central Google Scholar
Kurumatani, K. Time series forecasting of agricultural product prices based on recurrent neural networks and its evaluation method. SN Appl. Sci. 2, 1434 (2020).
Article Google Scholar
Bhardwaj, M. R. et al. An innovative deep learning based approach for accurate agricultural crop price prediction. In 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE), 1–7 (IEEE, 2023).
Jin, D., Yin, H., Gu, Y. & Yoo, S. J. Forecasting of vegetable prices using STL-LSTM method. In 2019 6th International Conference on Systems and Informatics (ICSAI), 866–871 (IEEE, 2019).
Ribeiro, M. & dos Santos Coelho, L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl. Soft Comput. 86, 105837 (2020).
Article Google Scholar
Fang, Y., Guan, B., Wu, S. & Heravi, S. Optimal forecast combination based on ensemble empirical mode decomposition for agricultural commodity futures prices. J. Forecast. 39, 877–886 (2020).
Article MathSciNet Google Scholar
Zhao, L. et al. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 21, 3848–3858 (2019).
Article Google Scholar
Nongsaro. Agricultural technology portal nongsaro. http://www.nongsaro.go.kr (2024). In Korean.
Statistics Korea. Agriculture and foresty. https://kosis.kr/eng/statisticsList/statisticsListIndex.do?parentId=K1.1&menuId=M_01_01 &vwcd=MT_ETITLE &parmTabId=M_01_01 (2024). Accessed: August 2, 2024.
Cui, Z., Ke, R., Pu, Z. & Wang, Y. Stacked bidirectional and unidirectional LSTM recurrent neural network for forecasting network-wide traffic state with missing values. Transp. Res. Part C Emerg. Technol. 118, 102674 (2020).
Article Google Scholar
Cleveland, R. B., Cleveland, W. S., McRae, J. E. & Terpenning, I. STL: A seasonal-trend decomposition produced based on loess. J. off. Stat 6, 3–73 (1990).
Google Scholar
D’Ecclesia, R. L. & Clementi, D. Volatility in the stock market: ANN versus parametric models. Ann. Oper. Res. 299, 1101–1127 (2021).
Article MathSciNet Google Scholar
Dahal, K., Li, X.-Q., Tai, H., Creelman, A. & Bizimungu, B. Improving potato stress tolerance and tuber yield under a climate change scenario—a current overview. Front. Plant Sci. 10, 563 (2019).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Youngho Min was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(Grant No. RS-2023-00243988) and by the Kwangwoon University Research Grant in 2025. Mi Ra Lee, YunKyong Hyon, Taeyoung Ha, and Sunju Lee were supported by National Institute for Mathematical Sciences (NIMS) grant funded by the Korean government (No. B24910000). Young Rock Kim and Jinwoo Hyun were supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1A2C1011467). Young Rock Kim was supported by Hankuk University of Foreign Studies Research Fund.

Author information

Authors and Affiliations

Ingenium College of Liberal Arts, Kwangwoon University, Seoul, 01897, Republic of Korea
Youngho Min
Major in Mathematics Education, Graduate School of Education, Hankuk University of Foreign Studies, Seoul, 02450, Republic of Korea
Young Rock Kim
Division of Industrial Mathematics, National Institute for Mathematical Sciences, Daejeon, 34047, Republic of Korea
YunKyong Hyon, Taeyoung Ha, Sunju Lee & Mi Ra Lee
Department of Mathematics, Graduate School, Hankuk University of Foreign Studies, Yongin-si, Gyeonggi-do, 17035, Republic of Korea
Jinwoo Hyun

Authors

Youngho Min
View author publications
Search author on:PubMed Google Scholar
Young Rock Kim
View author publications
Search author on:PubMed Google Scholar
YunKyong Hyon
View author publications
Search author on:PubMed Google Scholar
Taeyoung Ha
View author publications
Search author on:PubMed Google Scholar
Sunju Lee
View author publications
Search author on:PubMed Google Scholar
Jinwoo Hyun
View author publications
Search author on:PubMed Google Scholar
Mi Ra Lee
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors contributed to the conceptualization and design of the study. Data was collected by S.L. and M.L., while experiments were conducted by Y.M. and J.H. All authors participated in the analysis of the results. Supervision and project administration were managed by Y.K., Y.H., and T.H. Manuscript drafting and visualization were handled by Y.M. The manuscript was reviewed by all authors, with editing performed by Y.M. and J.H. The review process was managed by the primary corresponding author, J.H.

Corresponding authors

Correspondence to Jinwoo Hyun or Mi Ra Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Min, Y., Kim, Y.R., Hyon, Y. et al. RNN and GNN based prediction of agricultural prices with multivariate time series and its short-term fluctuations smoothing effect. Sci Rep 15, 13681 (2025). https://doi.org/10.1038/s41598-025-97724-7

Download citation

Received: 30 August 2024
Accepted: 07 April 2025
Published: 21 April 2025
DOI: https://doi.org/10.1038/s41598-025-97724-7

Keywords

This article is cited by

Weather-driven groundnut price forecasting and profitability assessment of cropping patterns in Tamil Nadu using boosting algorithms
- Kalpana Muthuswamy
- Shrishail Dolli
- Krupesh Sivakumar
Scientific Reports (2025)

Subjects

Abstract

Similar content being viewed by others

Enhancing agricultural commodity price forecasting with deep learning

Hybrid modeling approaches for agricultural commodity prices using CEEMDAN and time delay neural networks

CMTNet: a hybrid CNN-transformer network for UAV-based hyperspectral crop classification in precision agriculture

Introduction

Data sets

Data preprocessing

Methods

RNN and GNN-based machine learning methods

Forecasting process

The structures of models

Stacked LSTM

StemGNN

T-GCN

Results

Comparative analysis of prediction errors among models

Performance analysis of prediction models for each agricultural commodity

Discussion

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Weather-driven groundnut price forecasting and profitability assessment of cropping patterns in Tamil Nadu using boosting algorithms

Search

Quick links