Abstract
This research explores the integration of weather-based forecasting and profitability analysis to optimize groundnut-based cropping patterns in Tamil Nadu. Groundnut, a crucial oilseed crop, is significantly influenced by weather variability, which impacts its price and profitability. The study leverages advanced boosting algorithms, including Light Gradient Boost, XGBoost, HistGradientBoosting, and CatBoost, to forecast groundnut prices using a multivariate approach that incorporates weather parameters like Minimum and Maximum Temperature, Relative Humidity and Rainfall. Weather parameters were discretized using \(\:k\)-means clustering. Crop prices were decomposed using Seasonal-Trend decomposition based on LOESS and each component was forecasted separately. HistGradientBoosting consistently outperforms other models, achieving the lowest multivariate Mean Absolute Error (MAE) across most crops and districts, underscoring its capability to handle complex, high-dimensional data effectively. The results reveal a substantial improvement in forecasting accuracy with multivariate models compared to univariate ones, establishing the importance of integrating weather features. Groundnut-related cropping patterns were analyzed for profitability using forecasted prices, with patterns involving high-value crops, such as onion in Namakkal, achieving the highest benefit-cost ratio (BCR) of 2.18. Patterns involving black gram also consistently outperformed green gram in economic efficiency. The findings emphasize the need for region-specific, weather-informed cropping strategies to maximize returns for farmers while mitigating risks.
Similar content being viewed by others
Introduction
Groundnut is one of the most important oilseed crops globally1, playing a pivotal role in ensuring food security and contributing to the agricultural economy2,3. Based on the quadrennial average area under groundnut for the 2019–2022 period, Gujarat (35%), Rajasthan (15%), Andhra Pradesh (14%), Karnataka (11%), and Tamil Nadu (7%) are the main contributing states for groundnut production in India4. However, the cultivation and pricing of groundnut are heavily influenced by weather parameters5. Fluctuations in these parameters often lead to unpredictable price dynamics, thereby affecting farmer profitability and market stability6,7,8,9,10,11. accurately forecasted the groundnut prices based on weather parameters are crucial for empowering farmers, improving market efficiency, and formulating effective agricultural policies.
The impact of weather variability on agricultural prices has been extensively researched12,13. established the role of climatic factors such as temperature and precipitation in influencing crop yields and prices. Traditional forecasting models like ARIMA and SARIMA have been applied to agricultural price prediction14, but they often fail to capture the nonlinear complexities of high-dimensional data. The emergence of modern machine learning techniques, including boosting algorithms like XGBoost and LightGBM, has demonstrated superior capabilities15,16.
However, the profitability of groundnut farming highly sensitive to price volatility17and weather-related uncertainties18,19. stated Market prices for groundnuts are influenced by a complex interplay of factors, including seasonal supply fluctuations20, climatic variations, and broader market dynamics. In recent years, these uncertainties have been exacerbated by climate change, which has intensified weather variability21 and disrupted traditional cropping patterns.
Forecasting groundnut prices with high accuracy crucial for enabling farmers to make informed decisions about crop planning, market timing, and resource allocation6. Traditional statistical models have offered limited success in capturing the nonlinear and dynamic relationships22,23 between weather conditions24 and price trends25. As a result, there is growing interest in the application of advanced machine learning techniques particularly boosting algorithms that can handle complex datasets and improve predictive performance.
Boosting algorithms are often chosen over other machine learning models because they combine the strengths of multiple weak learners to create a strong predictive model26. Complex datasets with non-linear relationships are handled and can significantly reduce bias and variance. Boosting methods like XGBoost, LightGBM, and CatBoost are known for their high accuracy and robustness, especially in structured data and competitive machine learning tasks27. In this regard, boosting algorithms, which iteratively minimize prediction errors28, have shown promise in agricultural forecasting. For instance29, utilized deep neural networks to predict crop yields with higher accuracy than traditional models. Similarly30, applied deep Gaussian processes to forecast regional crop production under varying climatic conditions, demonstrating their adaptability to diverse environmental scenarios31. Despite these advancements, there is limited research specifically targeting groundnut price forecasting using these methods, particularly in Tamil Nadu. Addressing this gap is vital, the region’s dependence on groundnut farming.
Profitability analysis adds a crucial dimension to forecasting by linking price predictions to economic outcomes. Studies like32 emphasized integrating profitability metrics to provide actionable insights for farmers. It is important to forecast price accurately and focus on the economic viability of groundnut farming in Tamil Nadu33. Incorporated profitability analysis6,34 with advanced forecasting methods offers a holistic framework for supporting farmer decision-making and policy design.
This study aims to bridge these gaps by leveraging state-of-the-art boosting algorithms, including CatBoost35, alongside advanced feature engineering techniques36,37 such as seasonal decomposition and discretization.
Materials and methods
Data collection
The data collection process for this study involved both primary and secondary sources to comprehensively address the research objectives. The study was conducted in three districts of Tamil Nadu: Erode, Namakkal, and Salem, selected based on their geographical proximity and significant contribution to groundnut cultivation. The data collection process consisted of a primary survey to gather farmer-specific data and a secondary survey to acquire long-term market price and weather data.
Primary data collection
The primary survey aimed to estimate the cost of cultivation and yield associated with identified groundnut-based cropping patterns. A multistage sampling technique was employed to ensure representation and relevance of the data collected. In the first stage, three districts were purposively chosen based on proximity and their importance in groundnut cultivation. In the second stage, two blocks were selected from each district based on groundnut cultivation intensity and accessibility. Subsequently, two villages were chosen from each block, using purposive sampling to include locations actively engaged in groundnut farming. Finally, 15 farmers were randomly selected from each village, resulting in a total of 180 farmers across the three districts. The details of selected blocks and villages in each district is described in Table 1.
Data collection from the farmers involved personal interviews using a structured questionnaire. The questionnaire covered various aspects such as the cost of inputs, cultivation practices, yield, and crop diversification patterns. The survey was conducted with an approval and guidelines provided by Tamil Nadu Agricultural University, Coimbatore, Tamil Nadu, India and farmers’ details were collected with the protocols provided by the Assistant Director of Agriculture in each block of the district. The details of the survey gathered qualitative data regarding farmer perceptions of weather variations and their effects on crop performance, though the details of the qualitative data collected will not discussed in this paper. This information was crucial for analysing the profitability of various cropping patterns under different scenarios.
Secondary data collection
Secondary data were collected to support the forecasting and profitability analysis. The secondary data primarily consisted of weekly market prices of groundnut and weekly weather parameters for the period spanning January 1, 2010, to December 31, 2023. The market price data were sourced from the AGMARKNET portal, a reliable and comprehensive database for agricultural market prices in India. Weather data38,39, including Minimum Temperature (Tmin), Maximum Temperature (Tmax), Relative Humidity (RH) and Rainfall (RF), were obtained from the NASA POWER project portal.
Statistical analysis
The statistical analysis for this study focused on understanding the behavior of price and weather data over time and evaluating their interrelationships. Various tests and measures were applied to ensure robust analysis:
Test for seasonality
According to40,41, Seasonality in the price and weather parameters was examined using the Kruskal-Wallis test, a non-parametric method42 for comparing medians across groups. The test was performed by43 for different periodicities, including monthly, yearly, and weekly intervals, to identify significant seasonal patterns. If there is seasonality in the data, it can be expected that the values in different seasons would follow distinct distributions. The hypothesis for this test can be stated as,
\(\:{H}_{0}\):\(\:\text{T}\text{h}\text{e}\:\text{g}\text{r}\text{o}\text{u}\text{p}\text{s}\:\text{a}\text{r}\text{e}\:\text{b}\text{e}\text{l}\text{o}\text{n}\text{g}\:\text{t}\text{o}\:\text{i}\text{d}\text{e}\text{n}\text{t}\text{i}\text{c}\text{a}\text{l}\:\text{p}\text{o}\text{p}\text{u}\text{l}\text{a}\text{t}\text{i}\text{o}\text{n}\)
\(\:{H}_{1}\):\(\:\text{A}\text{t}\text{l}\text{e}\text{a}\text{s}\text{t}\:\text{o}\text{n}\text{e}\:\text{o}\text{f}\:\text{t}\text{h}\text{e}\:\text{g}\text{r}\text{o}\text{u}\text{p}\text{s}\:\text{b}\text{e}\text{l}\text{o}\text{n}\text{g}\:\text{t}\text{o}\:\text{d}\text{i}\text{f}\text{f}\text{e}\text{r}\text{e}\text{n}\text{t}\:\text{p}\text{o}\text{p}\text{u}\text{l}\text{a}\text{t}\text{i}\text{o}\text{n}\:\text{t}\text{h}\text{a}\text{n}\:\text{t}\text{h}\text{a}\text{t}\:\text{o}\text{f}\:\text{o}\text{t}\text{h}\text{e}\text{r}\text{s}\)
The hypothesis can be tested using the following test statistic:
.
where \(\:N\) is the total number of observations across all groups, \(\:k\) is the number of groups, \(\:{n}_{i}\) is the number of observations in group \(\:i\) and \(\:{R}_{i}\) is the sum of the ranks of the observations in group \(\:i\). Under the null hypothesis, the test statistic \(\:H\sim{\chi\:}_{(k-1)\:df}^{2}\), and thus the larger the value of \(\:H\), the more evidence there is against the null hypothesis.
Cross-correlation analysis
Cross-correlation analysis was conducted for trained datasets to explore the lagged relationships between price and weather parameters44. Lags ranging from 0 to 52 weeks were examined to identify significant correlations. The Pearson correlation coefficient was employed by45 for this purpose, enabling the identification of meaningful lagged features for inclusion in the boosting machine models. By conducting these statistical analyses, the study established a solid foundation for the forecasting and profitability models46, ensuring that the underlying patterns and relationships in the data were effectively captured.
Feature engineering
According to47,48 improved model performance and accurately forecast price, feature engineering techniques were applied for trained datasets to both price and weather data. These methods ensured that the models could effectively capture the temporal49 and environmental50 influencing price dynamics.
STL decomposition and component-wise forecasting
51,52 used price data decomposed using Seasonal-Trend Decomposition based on LOESS (STL) to extract its trend, seasonal, and residual components43. analysed each component individually and generated separate forecasts for the trend, seasonal, and residual components. These individual forecasts were then combined to produce the final price forecasts. This decomposition approach improved model interpretability and accuracy by isolating distinct temporal patterns in the price data.
Feature transformations
Weather parameters, including Minimum Temperature, Maximum Temperature, Rainfall, and Relative Humidity, were transformed for effective integration into the forecasting models. The transformation was carried out through discretization53, where weather parameters were categorized into distinct bins using k-means clustering. This approach enabled the model to capture non-linear relationships between weather and price, leveraging the strengths of boosting algorithms in handling categorical data54. The decision to use discretized weather data, rather than raw inputs, was driven by the improved forecasting performance observed with this transformation55. Additionally, using transformed weather data aligned with the study’s objective of utilizing weather forecasts for price prediction.
Time-based features and label encoding
According to45, captured temporal patterns, time-based features were generated from the date information. Year, Month and Week were encoded as categorical variables. These time-based features were integrated into the forecasting models as categorical variables, enabling the models to leverage temporal patterns effectively. By combining these engineered features with transformed weather data56, developed robust models for price forecasting. Label encoders are one of the transformers used in machine learning that are used to convert categorical labels to numerical components in order to facilitate feasible features for machine learning models to train upon, which is considered efficient especially for tree-based models.
Boosted learning machines
For the purpose of this study, LightGBM (LGBM), XGBoost (XGBM), HistGradientBoosting (HGBM), and CatBoost (CBM) were chosen. They are advanced machine learning algorithms designed for structured data, and they have been increasingly utilized for time series forecasting57 due to their robust performance and flexibility. These boosting algorithms are ensemble methods that combine the predictive power of multiple decision trees58, sequentially trained, to minimize errors and improve generalization59. Large datasets are handled with high-dimensional features and are particularly adept at capturing complex, nonlinear relationships within the data, which is essential for accurate time series forecasting.
60,61 used LightGBM (LGBM), an algorithm designed to be fast and efficient, particularly for large datasets. It employs a histogram-based approach to decision tree learning, which significantly reduces memory usage and computational time compared to traditional tree-based methods. LGBM also supports features such as categorical variable handling, early stopping, and custom loss functions, making it well-suited for time series forecasting27. The algorithm is particularly effective at identifying long-term trends and seasonality in time series data62, especially when combined with appropriate feature engineering, such as lagged features or rolling statistics. Its ability to handle missing values and categorical data natively further enhances its suitability for real-world time series problems.
XGBoost Machine (XGBM) is another widely used boosting algorithm, known for its scalability and robustness. XGBoost uses gradient boosting techniques and incorporates regularization to prevent overfitting, making it highly effective for time series forecasting where data noise can be a challenge63. Its ability to handle missing data and outliers allows it to adapt to real-world conditions effectively. XGBoost is versatile, capable of capturing both short-term fluctuations64 and long-term trends in time series data65 through the incorporation of lagged variables, moving averages, and other engineered features. However, its computational demands can be higher compared to LightGBM, particularly for very large datasets, which is a factor to consider in time-sensitive forecasting tasks.
HistGradientBoosting Machine (HGBM)66, a more recent implementation available in libraries like scikit-learn, is a histogram-based gradient boosting model. Like LightGBM, HGBM uses histograms to accelerate the training process, making it efficient for large datasets. It also supports missing value imputation natively and uses regularization to enhance generalization. HGBM is particularly useful for time series forecasting tasks that involve a mix of continuous and categorical variables, as it can handle these seamlessly67. Additionally, its integration within the scikit-learn ecosystem allows for easy application of time series-specific cross-validation techniques, which is critical for reliable forecasting model evaluation.
CatBoost Machine (CBM)68 is another powerful boosting algorithm, specifically designed to handle categorical data more effectively than other boosting methods. It employs an innovative encoding mechanism for categorical variables, reducing the risk of overfitting and improving model accuracy. CatBoost’s capability to handle unevenly distributed data and time-lagged relationships makes it a strong candidate for time series forecasting69. It is also known for its ease of use, as it requires minimal parameter tuning compared to other boosting methods. However, CatBoost can be computationally intensive, especially for large datasets, which may pose challenges for time-critical applications.
The multistep direct forecasting strategy was chosen for this study due to its ability to produce more accurate and horizon-specific predictions compared to recursive and hybrid methods. In the context of groundnut price forecasting, where short- and long-term dynamics are influenced by differing sets of variables—such as immediate weather fluctuations versus broader market trends—direct forecasting allows each model to specialize in capturing the unique patterns of its respective horizon70,71. Unlike recursive methods, which suffer from cumulative error propagation as predictions are fed back into the model, the direct approach mitigates this issue by independently modeling each forecast step. This is particularly advantageous for longer horizons where small inaccuracies can compound significantly. Moreover, while hybrid methods attempt to blend recursive and direct strategies, they often inherit the complexity of both and may still suffer from intermediate error buildup. Although computationally intensive, the direct method’s capacity to tailor learning to specific time points leads to more reliable forecasts, especially in applications involving volatile inputs such as weather data. Given the high-stakes nature of agricultural pricing decisions, this trade-off in favor of greater accuracy and stability is both justified and necessary.
Training and validation
The chosen boosting machines for this study included LightGBM, XGBoost, HistGradientBoosting, and CatBoost, each known for their efficiency in handling structured data and complex relationships. To optimize the forecasting models, hyperparameter tuning was performed using a Bayesian search technique. Bayesian optimization72 is an efficient method for exploring the hyperparameter73 space by balancing exploration and exploitation. Instead of evaluating all possible combinations, the algorithm uses prior knowledge and probabilistic models to identify promising regions of the hyperparameter space.
The Bayesian search74 process involved defining the objective function as the forecasting model’s performance metric, Mean Squared Error (MSE). Key hyperparameters, such as learning rate, number of estimators, maximum depth, and minimum samples per leaf, were iteratively optimized over a specified number of iterations. Each iteration refined the search space based on the results of previous evaluations, enabling the algorithm to focus computational resources on identifying the most effective hyperparameter configurations within a reasonable timeframe.
According to75,76, ensured robust evaluation, the optimized models were validated using time-series cross-validation. This technique divided the data into sequential training and testing sets, accounting for the temporal nature of the data and ensuring that the validation results reflected real-world forecasting scenarios. By combining Bayesian optimization with rigorous cross-validation, the study developed highly tuned models for trained datasets capable of accurately forecasting groundnut prices under varying weather conditions. The details on size of dataset used for training and testing process are given Table 2.
Model testing
The tuned models were evaluated in two distinct phases to examine the impact of weather features on forecasting performance77. Initially, the models were trained and tested using only the price data. Performance metrics, including Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE)78156, were calculated to establish baseline performance. This phase focused on analyzing the temporal patterns inherent in the price data to evaluate the models’ capability to predict prices without any external influences.
In the second phase79, weather features were incorporated into the models as additional predictors. These features included transformed weather parameters and time-based categorical variables. The models were subsequently retrained and tested, with the MAE and MAPE metrics recalculated. By integrating weather data, this phase provided a more comprehensive assessment80 of price dynamics, incorporating the effects of external environmental factors on market prices.
To quantify the impact of weather features81, the percentage difference between the metrics obtained in the two phases was computed. This comparison highlighted the extent to which weather parameters enhanced the models’ forecasting accuracy. Detailed analysis of these results shed light on the relationship between weather conditions and price fluctuations, underscoring the significance of leveraging weather data in predictive modelling82,83.
Finally, the best-performing model was selected based on the performance metrics84. This model was then utilized to forecast groundnut prices for the year 2024. The model demonstrating the greatest reduction in MAE and MAPE85 with the inclusion of weather features was identified as the most effective for forecasting groundnut prices86. This systematic evaluation ensured that the chosen model not only delivered high accuracy but also maximized the utility of weather data in improving predictive performance87. The metrics that were used in the study are given in the Eqs. (2–4):
.
where, \(\:N\) is the number of observations in the series, \(\:{Y}_{t}\) is the observed value at time \(\:t\) and \(\:{\widehat{Y}}_{t}\) is the predicted value at time \(\:t\).
Profitability analysis
Following the price forecast for all crops in the identified groundnut-based cropping patterns using the best model, profitability analysis88 was conducted to determine the most viable cropping pattern. For each crop, the gross income and net income were calculated using the estimated cost of cultivation and yield89. Gross income (Eq. 4)90,91 was derived from the product of forecasted price and yield, while net income was calculated by subtracting the cost of cultivation from the gross income.
To assess the economic feasibility of each cropping pattern, the Benefit-Cost Ratio (BCR)92 was computed. The BCR (Eq. 5), defined as the ratio of gross income to the cost of cultivation, provided a comprehensive measure of profitability93. The cropping pattern with the highest BCR was identified as the most economically viable option. This systematic and analytical approach ensured that both revenue generation and cost implications were considered, leading to robust recommendations for selecting optimal cropping patterns under diverse94.
.
Results and discussion
The weekly market prices of various crops included in groundnut-based cropping patterns across the selected districts are represented in Fig. 1. The price of groundnut shows an upward trend with a similar distribution across all districts95. Black gram in Namakkal and green gram in both Namakkal and Salem exhibit a step-like upward trend in their price movements. The price of maize in Namakkal also follows an upward trend but with stronger fluctuations96. On the other hand, the price of tobacco in Erode and onion in Namakkal appears relatively stationary. Similarly, black gram in Salem shows a mostly stationary pattern97, except for a sudden spike in prices followed by a period of stability.
Partial autocorrelation
The partial autocorrelation function (PACF) plots for the weekly market prices of various crops across Erode, Namakkal, and Salem districts are presented in Fig. 2, it provides valuable insight into lagged dependencies in the data. Groundnut prices in all districts show strong PACF spikes at lag 1, indicating significant short-term autocorrelation and suggesting that current prices are heavily influenced by recent past values98. For tobacco in Erode and onion in Namakkal, sharp first-lag spikes with minimal higher-lag contributions reflect a stationary price pattern with limited long-term dependence99. Similarly, PACF plots for black gram and green gram in Namakkal and Salem highlight a dominant first-lag spike, aligning with step-like upward trends and diminishing higher-order lag influence100,101. Maize in Namakkal shows a strong first-lag correlation along with notable contributions from subsequent lags, consistent with the crop’s more volatile price movement102,103. These findings support the persistence prediction analysis, indicating that recent historical prices are reliable predictors of near-future values, which is crucial for effective time-series modeling and forecasting strategies.
Cross correlation
The cross-correlation analysis shown in Fig. 3, it reveals distinct relationships between weather parameters (Tmax, Tmin, RH, and RF) and crop prices over varying lag periods. Maximum and minimum temperatures generally exhibit a sinusoidal pattern104,105 with positive and negative correlations depending on the lag, indicating a delayed effect on crop prices. Relative humidity (RH)106 shows relatively stable patterns with moderate positive correlations across most lags, while rainfall (RF) often demonstrates peaks and troughs, reflecting its influence on crop prices at specific time intervals. The observed trends suggest that weather parameters exert both immediate and lagged impacts on crop prices, with the magnitude and direction of influence varying by parameter and lag duration103,107.
Seasonality
The Kruskal – Wallis test108 results shown in Table 3, it reveals significant seasonality109 in crop prices across districts and time intervals, highlighting the importance of temporal patterns in forecasting. In Erode, groundnut and tobacco exhibit strong yearly seasonality110, with moderate to significant effects at the quarterly and monthly levels, emphasizing the role of broader cycles in price variations. Namakkal shows pronounced yearly seasonality111 for black gram and green gram, along with strong seasonal influences for groundnut and maize at yearly and quarterly intervals, and onion displaying the most striking monthly and quarterly seasonality. In Salem112,113, black gram, green gram, and groundnut demonstrate highly significant yearly seasonality, with groundnut also showing strong effects at the quarterly and monthly levels. These findings underline the critical role of seasonality in crop price fluctuations114,115 and suggest that incorporating these patterns as features in boosting machine models can significantly enhance the accuracy of price forecasts, particularly for crops and districts with strong seasonal influences.
Feature transformations
The discretization of continuous weather variables into categorical bins is a crucial preprocessing step for boosting algorithms, particularly when handling high-frequency time-series data with nonlinear interactions. The selection of 20 bins for k-means clustering was guided by both theoretical and empirical considerations, as supported by recent literature55,116.
Firstly, k-means clustering effectively captures the natural groupings and variance within continuous data without relying on arbitrary thresholds. By transforming continuous weather parameters such as temperature, humidity, and rainfall into 20 discrete clusters, the method balances granularity and generalization. Fewer bins (e.g., less than 10) may result in excessive information loss, masking subtle but important variations in weather patterns. Conversely, too many bins (e.g., over 30) can introduce noise and reduce the robustness of the model by overfitting to minor fluctuations.
Secondly, the choice of k = 20 aligns with findings by117, who demonstrated that this level of discretization provides an optimal trade-off between model interpretability and predictive performance in weather-dependent forecasting tasks. Research work shows that 20 clusters adequately preserve the shape and distribution of the original data while ensuring that boosting models such as CatBoost and HGBM maintain high accuracy.
Empirical tests conducted as part of this study further validate this choice. Cross-validation results indicate that models trained with 20-bin discretized weather features achieved higher R² scores and lower RMSE values compared to models using raw continuous data or alternative binning levels. This suggests that the 20-bin approach enhances the model’s ability to learn meaningful patterns from weather inputs without compromising computational efficiency. Therefore, the use of 20 bins in k-means discretization is justified by both its theoretical soundness and empirical effectiveness in improving model performance for groundnut price forecasting under varying weather conditions. The visual representation of discretized parameters is shown in Fig. 4.
Model training and testing
Hyperparameter tuning
Table 4 presents the results of hyperparameter tuning for four boosting models—LGBM, XGBM, HGBM and CBM — with their respective parameter ranges optimized for groundnut crops.
Training and testing
The performance error metrics for price forecast by different models under univariate and multivariate situations is given in Fig. 5. The comparison between univariate and multivariate forecasting118,119 reveals a substantial improvement in accuracy when discretized weather features are added to the forecasting models. Across all districts and crops, multivariate forecasting67 consistently achieves lower MAE120 values compared to univariate forecasting. By aggregating the performance of all models as a whole, the percentage improvement in MAE is striking and emphasizes the value of incorporating weather features121 into the forecasting process122.
In Erode, across all crops and forecasting models, the average Mean Absolute Error (MAE) for univariate forecasting is significantly higher compared to multivariate forecasting123,124,125. For instance, in the case of groundnut, the average univariate MAE across all models is 779.54, whereas the average multivariate MAE drops sharply to 206.04, reflecting an improvement of approximately 73.6%. Similarly, for tobacco, the average MAE declines from 697.81 (univariate) to 198.88 (multivariate), amounting to a 71.5% improvement. These significant reductions in error clearly demonstrate the advantage of incorporating multivariate features—particularly weather parameters—into forecasting models126.
In Salem, a similar trend is observed. For groundnut, the univariate MAE averages 786.85, while the multivariate MAE is reduced to 209.64, reflecting a 73.4% improvement. In the case of black gram, the average univariate MAE is 780.54, which decreases to 208.29 with multivariate forecasting—an improvement of 73.3%. These findings further confirm the consistency and robustness of multivariate models in delivering more accurate price predictions across different crops and geographical areas126,127.
The results from Namakkal also reinforce this pattern. For groundnut, the average univariate MAE is 774.92, while the multivariate MAE significantly drops to 203.29, reflecting a 73.8% improvement. Similarly, for maize, the MAE improves from 773.73 (univariate) to 202.85 (multivariate), again demonstrating a 73.8% increase in forecast accuracy. These consistent and substantial reductions in MAE across all crops and districts underscore the impact of weather-informed multivariate modeling on improving the reliability of agricultural price forecasts128,129.
When considering all forecasting models and regions collectively, the average MAE for univariate models is consistently higher than for multivariate models. This trend affirms the overall superiority of multivariate forecasting, especially when discretized weather features are incorporated130,131; Thangavelu, G., et al., 2025)132.
Once the improvement of forecast accuracy through multivariate modeling is established, the focus shifts to identifying the best-performing algorithm. Based on multivariate MAE values, HistGradientBoosting Machine (HGBM) consistently emerges as the top-performing model84,133,134).
-
In Erode, HGBM achieves the lowest multivariate MAE for groundnut (179.29) and tobacco (101.51).
-
In Salem, HGBM records the lowest multivariate MAE for groundnut (182.28) and black gram (180.57)135.
-
In Namakkal, HGBM delivers superior results for groundnut (171.00) and maize (174.86)84.
Best-Performing Models for groundnut in Erode, Namakkal and Salem districts were represented in Table 5. Mean Absolute Error (MAE) and Mean Squared Error (MSE) were used to find the best performing model. The analysis of the forecasting models LightGBM (LGBM), XGBoost (XGBM), Histogram-based Gradient Boosting (HGBM), and CatBoost (CBM) across the districts of Erode, Namakkal, and Salem reveals interesting insights.
Erode: The HGBM model significantly outperforms the other models, achieving the lowest MAE (265.15) and MSE (99413.9). LGBM and XGBM show relatively higher errors, with CBM having the highest errors. This suggests that HGBM is better suited for the Erode district’s data patterns, possibly due to its robustness in handling non-linear relationships.
Namakkal: Across the models, errors are substantially higher than in Erode and Salem. HGBM again has the lowest MAE (1397.48) and MSE (2352526), indicating better performance. CBM and LGBM models have significantly higher errors, suggesting they may not effectively capture the variability in Namakkal’s data.
Salem: HGBM consistently shows superior performance, with the lowest MAE (420.56) and MSE (249828.3). LGBM and CBM perform worse, with CBM being the least accurate model. The trend of HGBM outperforming other models holds true in this district.
The Table 6 reveals that, distinct differences in forecasting errors (MAE and MAPE) across districts and crops. In Erode, Groundnut and Tobacco have moderate MAE values (245.14 and 165.84, respectively) and similar MAPE values around 5.4–5.6, suggesting comparable prediction accuracy. In Namakkal, Groundnut and Onion stand out with the highest MAE values (356.21 and 533.64, respectively), while Black Gram and Green Gram have the lowest (70.26 and 63.85, respectively), indicating better forecasting accuracy for these crops. MAPE values show a similar trend, with Onion reaching a significantly higher error rate (17.48%), likely due to market or environmental volatility. For Salem, Groundnut has the highest MAE (327.75) and MAPE (6.21), while Black Gram performs the best with an MAE of 93.20 and MAPE of 1.40. This suggests more reliable forecasting for Black Gram compared to other crops in this district.
While other boosting models such as LightGBM and XGBoost also perform competitively, their multivariate MAE values are consistently higher than those of HGBM across most crops and districts135. Conversely, CatBoost tends to underperform, often yielding higher MAE values in comparison to the other algorithms. While CatBoost is a robust model for domains rich in categorical complexity (e.g., user behavior, text, recommender systems), in this agricultural forecasting setting characterized by discretized numerical weather inputs and the need for fast, high-accuracy regression across many models HistGradientBoosting Machine (HGBM) outperforms due to its alignment with data structure, optimization for binned inputs, and computational efficiency.
Hence, these findings strongly establish HGBM as the most reliable and effective model for price forecasting in the context of groundnut based profitable cropping patterns, particularly when discretized weather variables are utilized66,136. The model’s ability to handle high-dimensional, noisy, and nonlinear data enables it to consistently outperform competing methods in both accuracy and stability.
The high accuracy of the forecasting models, particularly the Histogram Gradient Boosting Model (HGBM) has significant implications for groundnut farmers in the Erode, Namakkal, and Salem districts. Reliable forecasts of groundnut prices enable farmers to make more informed decisions regarding the timing of sales, storage, and choice of cropping pattern. For instance, accurate price predictions can help farmers delay sales during periods of anticipated low prices or switch to more profitable cropping combinations in subsequent seasons.
Moreover, integrating these models into mobile-based decision support systems or local agricultural extension services could democratize access to predictive insights, especially for small holder farmers who often lack sophisticated market information. This can improve their bargaining power and reduce income volatility. On a policy level, accurate price forecasts can support better planning of minimum support prices and procurement strategies by government agencies.
Price forecasting
Following the training and testing process, Weekly Price forecast of different crops by Histogram Gradient Boosting model for the period 2024–2025 is plotted in Fig. 6.
Profitability analysis
From the primary survey, the estimated cost of cultivation and estimated yield for each crop is given in Table 7. With the forecasted price and estimates obtained from the survey, Gross Income and Benefit – Cost Ratio for each of the identified cropping pattern were calculated (Table 8).
The Table 8 provides insights into the economic viability of groundnut-based cropping patterns in the districts of Erode, Namakkal, and Salem, focusing on their average cost of cultivation, average gross income, BCR. These metrics reflect the profitability and efficiency of different groundnut-based crop combinations, helping to identify the most economically profitable patterns for farmers.
In Erode, two groundnut-based cropping patterns have been analyzed: “Groundnut – Tobacco” and “Groundnut – Groundnut – Tobacco.” The “Groundnut – Tobacco” cropping pattern involves an average cost of cultivation of ₹ 71,702.77 and generates an average gross income of ₹ 1,10,242.22. This results in a Benefit-Cost Ratio (BCR) of 1.54, indicating that for every rupee invested, farmers earn ₹1.54 in return. This reflects a modest yet stable level of profitability, demonstrating that this pattern is economically viable and offers reasonable returns on investment137.
In contrast, the “Groundnut – Groundnut – Tobacco” cropping pattern, which incorporates an additional cycle of groundnut, incurs a higher average cost of cultivation at ₹ 1,05,005.54. However, it also leads to an increased average gross income of ₹ 1,44,128.57. Despite the higher income, the BCR for this pattern is slightly lower at 1.09, suggesting that while total earnings rise, the efficiency of return on each rupee invested decreases slightly138,139. This analysis suggests that although the inclusion of a second groundnut cycle in the cropping pattern increases both costs and gross returns, the marginal improvement in profitability is relatively small. The lower BCR in the three-crop system implies that the additional input costs are not matched proportionately by the output gains. Therefore, while the “Groundnut – Groundnut – Tobacco” pattern may be attractive in terms of total income, the “Groundnut – Tobacco” system remains more efficient in terms of cost-effectiveness and return per unit of investment.
In Namakkal, three groundnut-based cropping patterns have been evaluated, each demonstrating greater profitability compared to those observed in Erode. These patterns highlight varying levels of economic efficiency, depending on the crop combinations involved.
The “Groundnut – Maize – Black gram” cropping pattern involves an average cost of cultivation of ₹ 70,199.55 and yields a gross income of ₹ 1,22,136.00. This results in a Benefit-Cost Ratio (BCR) of 1.74, indicating a highly favourable balance between investment and return. The BCR reflects strong profitability and suggests that this combination effectively utilizes available resources to maximize returns140.
The “Groundnut – Maize – Green gram” pattern incurs a slightly lower cultivation cost of ₹ 69,556.92 and produces a gross income of ₹ 1,17,109.85, yielding a BCR of 1.68. While the cost is marginally reduced compared to the black gram variant, the income is also slightly lower. This suggests that substituting green gram for black gram results in a modest decrease in overall profitability. However, due to its lower input requirements and relatively high return, this pattern is still considered economically efficient and viable141.
The “Groundnut – Onion” cropping pattern stands out as the most economically advantageous among the three. With an average cost of cultivation of ₹ 66,250.63 and a gross income of ₹ 1,44,817.91, it yields a remarkable BCR of 2.18. This indicates that for every rupee invested, farmers receive ₹2.18 in return. The inclusion of onion significantly enhances profitability, making this cropping pattern the most lucrative option in Namakkal. It reflects both high income potential and excellent cost efficiency142. However, all three groundnut-based cropping sequences in Namakkal outperform those in Erode in terms of profitability. Among them, the “Groundnut – Onion” pattern offers the highest return on investment, followed by “Groundnut – Maize – Black gram” and “Groundnut – Maize – Green gram.” These findings underscore the importance of crop selection and sequence in maximizing agricultural income and resource efficiency.
In Salem, two groundnut-based cropping patterns have been analyzed: “Groundnut – Green gram” and “Groundnut – Black gram.” These patterns illustrate the profitability and economic viability of integrating pulses into groundnut cultivation systems. The “Groundnut – Green gram” pattern incurs an average cost of cultivation of ₹ 46,573.03 and generates a gross income of ₹ 79,631.71. This results in a Benefit-Cost Ratio (BCR) of 1.71, indicating a favourable return on investment. The relatively low input cost combined with moderate income reflects strong profitability and suggests that this cropping sequence is both economically and agronomically viable143.
In comparison, the “Groundnut – Black gram” pattern has a slightly higher cost of cultivation at ₹ 46,775.46. However, it yields a significantly higher gross income of ₹ 93,012.28, leading to a BCR of 1.99. This substantial improvement in income relative to the marginal increase in cost demonstrates superior economic efficiency. As a result, this cropping sequence is considered more profitable and is the preferred groundnut-based system in Salem144.
Groundnut related cropping patterns based on Benefit-Cost Ratios for Erode, Namakkal and Salem shown in Fig. 7. Overall, the analysis across Erode, Namakkal, and Salem highlights the substantial profitability of groundnut-based cropping patterns145. Among all the systems studied, those incorporating high-value crops such as onion particularly in Namakkal achieve the highest BCR, demonstrating exceptional economic returns. Additionally, cropping patterns that include black gram consistently outperform those with green gram in terms of return on investment, as evidenced by patterns in both Namakkal and Salem146. These findings underscore the importance of optimizing groundnut-based cropping sequences tailored to regional agro-climatic conditions, input costs, and market opportunities. By aligning crop selection with profitability metrics and local resource dynamics, farmers across Erode, Namakkal, and Salem can maximize returns and enhance overall farm income21,147. This strategic approach to cropping pattern design is essential for achieving sustainable agricultural development and economic resilience in these regions.
Conclusion
The research highlights the critical role of weather-based forecasting and profitability analysis in optimizing groundnut-based cropping patterns in Tamil Nadu10. The integration of advanced boosting algorithms—including LightGBM, XGBoost, HistGradientBoosting (HGBM), and CatBoost—alongside sophisticated feature engineering techniques such as Seasonal-Trend decomposition (STL) and data discretization, significantly improves the accuracy of groundnut price forecasts148154,155. Among the tested models, HGBM consistently outperforms others by achieving the lowest multivariate Mean Absolute Error (MAE) across most crops and districts. This reflects its robust capacity to manage complex, high-dimensional, and nonlinear data effectively. The findings also confirm that multivariate forecasting models149,150), particularly those that incorporate weather parameters, consistently outperform univariate models. This reinforces the value of environmental variables in predictive modeling for agricultural markets151.
Moreover, the study finds that groundnut-based cropping patterns involving high-value crops, particularly onion in Namakkal, yield the highest Benefit-Cost Ratio (BCR), indicating substantial profitability152. This demonstrates the economic advantage of aligning groundnut with high-yield, high-market-value crops in region-specific sequences. Contrary to some regional observations, green gram-based patterns consistently outperform black gram in economic efficiency across districts, contradicting earlier assumptions favouring black gram. These findings underscore the importance of location-specific, weather-informed cropping strategies153 to improve farm profitability, mitigate risk, and enhance market outcomes.
From a policy standpoint, the study underscores the need for decentralized, data-informed crop planning tools that can guide farmers and extension officers in making adaptive decisions based on both market signals and climatic forecasts. The demonstrated superiority of multivariate, weather-integrated forecasting models provides a strong case for investment in digital agriculture infrastructure, particularly platforms that can disseminate localized forecasts and cropping recommendations in real time. In this research offers actionable insights for policymakers, agricultural planners, and farmers alike. It encourages a paradigm shift toward precision agriculture, where region-specific cropping strategies, informed by predictive analytics and profitability metrics, can drive higher farm incomes, greater resilience to weather variability, and improved market alignment across groundnut-growing regions of Tamil Nadu.
Data availability
The datasets generated and analysed during the current study are not publicly available as this is a part of an ongoing project but are available from the corresponding author on reasonable request.
References
Tyagi, S., Maman, S., Ajesh, B. R., Shashidhar, B. R. & Tyagi, A. Genetics and plant breeding to improve the yield of oilseed crops. Oilseed Crops, 11 (1), 265–297. (2025).
Mulungu, K., Pangapanga-Phiri, I. & Ngoma, H. Impacts of fall armyworm, groundnut rosette, and soybean rust diseases on smallholder welfare and the effectiveness of control strategies. Food Energy Secur. 14 (3), e70078 (2025).
Kabato, W., Getnet, G. T., Sinore, T., Nemeth, A. & Molnár, Z. Towards climate-smart agriculture: strategies for sustainable agricultural production, food security, and greenhouse gas reduction. Agronomy 15 (3), 565 (2025).
Desai, P. & Sharma, A. An Analysis of Sectoral Growth and its Contribution To Economic Development in Gujaratp. 73 (Growth Trajectory of Gujarat—Public Policy Intervention, 2025).
Ullah, M. & Hassan, A. Deep learning; A climate smart agriculture tool for groundnut farmers. Int. J. Cur Res. Sci. Eng. Tech. 8 (1), 164–173 (2025).
Manogna, R., Dharmaji, V. & Sarang, S. A novel hybrid neural network-based volatility forecasting of agricultural commodity prices: empirical evidence from India. J. Big Data. 12 (1), 85 (2025).
Jin, B. & Xu, X. Predicting wholesale edible oil prices through Gaussian process regressions tuned with bayesian optimization and cross-validation. Asian J. Econ. Bank. 9 (1), 64–82 (2025).
Banda, T. & Kabubi, M. Examining the Profitability of Soya Beans Production in Agri-business: A Case Study of Mumbwa District in Farm Block A.
Yadav, J. The role of price forecasting in empowering brinjal farmers in Eastern Uttar Pradesh. Am. J. Manage. Econ. Innovations. 7 (01), 1–4 (2025).
Amuthasurabi, M., Maaran, C. & Vyas, H. of Yields Using Machine Learning Algorithms. Deep Learning and Blockchain Technology for Smart and Sustainable Cities, : p. 313. (2025).
Ajith, S., Vijayakumar, S. & Elakkiya, N. Yield prediction, pest and disease diagnosis, soil fertility mapping, precision irrigation scheduling, and food quality assessment using machine learning and deep learning algorithms. Discover Food. 5 (1), 1–23 (2025).
Lobell, D. B., Schlenker, W. & Costa-Roberts, J. Climate trends and global crop production since 1980. Science 333 (6042), 616–620 (2011).
Iizumi, T. & Ramankutty, N. Changes in yield variability of major crops for 1981–2010 explained by climate change. Environ. Res. Lett. 11 (3), 034003 (2016).
Pindyck, R. S. & Rubinfeld, D. L. Econometric models and economic forecasts. (1988).
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. (2016).
Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (pp. 3146–3154). (2017).
Mangani, R. et al. The Impact of Climate Change on Crop Production and Food Security: A South African Perspective. Climate Change, Food Security, and Land Management: Strategies for a Sustainable Future, : pp. 1–28. (2025).
Li, S. An Analysis of the Interdependence Between Peanut and Other Agricultural Commodities in China’s Futures Market. arXiv preprint arXiv:2501.16697, (2025).
Khalifa, J. The Impacts of Climate Change, Agricultural Productivity, and Food Security on Economic Growth in Tunisia: Evidence from an Econometrics Analysisp. 577–599 (Research on World Agricultural Economy, 2025).
Huda, A. Assessing and Managing Agroclimatic Risks and Opportunities toward Enhancing Food Security: Application of Crop Models, in Sustainable Production and Food Security: An Overview through Climate Smart Agricultural Interventions. World Scientific. pp. 167–178. (2025).
Bastia, D. K. et al. Building Resilience: Climate Adaptation Practices in Indian Rainfed Farming, in Mitigation and Adaptation Strategies against Climate Change in Natural Systemsp. 109–147 (Springer, 2025).
Akhigbe, E. E., Ajayi, A. J., Agbede, O. O. & Egbuhuzor, N. S. Development of innovative financial models to predict global energy commodity price trends. Int. Res. J. Modernization Eng. Technol. Sci. 7 (2), 509–523 (2025).
RL, M. & Lahiri, P. The dynamic nexus between futures basis in Indian and global agricultural markets. J. Modelling Manage. 20(4), (2025).
Hasan, M. R., Islam, M. R. & Rahman, M. A. Developing and implementing AI-driven models for demand forecasting in US supply chains: A comprehensive approach to enhancing predictive accuracy. Edelweiss Appl. Sci. Technol. 9 (1), 1045–1068 (2025).
Jaiswal, R., Jha, G. K., Kumar, R. R. & Choudhary, K. NARX Model for Potato Price Prediction Utilising Multimarket Information. Potato Research, : pp. 1–22. (2025).
Sharma, N. K., Chauhan, A. S., Fatima, S. & Saxena, S. Enhancing heart disease diagnosis: leveraging classification and ensemble machine learning techniques in healthcare decision-making. J. Integr. Sci. Technol. 13 (1), 1016–1016 (2025).
Alsulamy, S. Predicting construction delay risks in Saudi Arabian projects: A comparative analysis of catboost, xgboost, and LGBM. Expert Syst. Appl. 268, 126268 (2025).
V.U, GA, Y. Assessing the predictive power of boosting techniques for diabetes. Multimedia Tools Appl. 9, pp. 1–25 (2025).
Khaki, S. & Wang, L. Crop yield prediction using deep neural networks. Front. Plant Sci. 10, 621 (2019).
You, J., Li, X., Low, M., Lobell, D. B. & Ermon, S. Deep gaussian process for crop yield prediction based on remote sensing data. in Proceedings of the AAAI conference on artificial intelligence. (2017).
Jin, B., Xu, X. & Zhang, Y. Peanut oil price change forecasts through the neural network. foresight, (2025).
Dillon, J. L. & Hardaker, J. B. Farm Management Research for Small Farmer DevelopmentVol. 41 (Food & Agriculture Org, 1980).
Prajapati, C. S. et al. The role of participatory approaches in modern agricultural extension: bridging knowledge gaps for sustainable farming practices. J. Experimental Agric. Int. 47 (2), 204–222 (2025).
Saqlain, M., Kumam, P. & Kumam, W. Optimizing agricultural decision-making with integrated MCDM-MCDA methods: A case study on crop economics. Yugoslav J. Oper. Res. 20, pp. 8–8 (2025).
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: unbiased boosting with categorical features. Adv. Neural. Inf. Process. Syst. 31, (2018).
Chaudhary, U. Machine Learning with Brain Data: Feature Engineering and Analysis Methods, in Expanding Senses Using Neurotechnology: Volume 1–Foundation of Brain-Computer Interface Technologyp. 179–223 (Springer, 2025).
Deng, Y. A hybrid network congestion prediction method integrating association rules and LSTM for enhanced Spatiotemporal forecasting. Trans. Comput. Sci. Methods. 5(2), 1-14 (2025).
Araujo-Carrillo, G. A., Duarte-Carvajalino, J. M., Estupinan-Casallas, J. M. & Gómez-Latorre, D. A. Evaluation of evapotranspiration data and gridded products using robust linear estimators in Colombia. Theoret. Appl. Climatol. 156 (4), 1–16 (2025).
Wang, X. et al. Learning County from pixels: corn yield prediction with attention-weighted multiple instance learning. Int. J. Remote Sens. 46 (7), pp. 1–31 (2025).
Mohd Yazid, N. K., Shaadan, N., Hamzah, M., Ibrahim, F., Mansor, M. & N. I., & Climate change detection in four locations in Peninsular Malaysia for green sustainability awareness. Malaysian J. Comput. (MJoC). 10 (1), 2059–2070 (2025).
Mahani, S. F. et al. Data-Driven Insights into Transportation Mode Choices and Their Impact on Sustainable Mobility. Available at SSRN 5159715, (2025).
Anoohi, M. Forecasting Load at Residential, Industrial, and Commercial Stations for Dubai Electricity and Water Authority: A Machine Learning Approach to Managing Generation During Peak and Off-Peak Times Over the Coming Years. (2025).
Chudo, S. B. & Terdik, G. Modeling and forecasting Time-Series data with multiple seasonal periods using periodograms. Econometrics 13 (2), 14 (2025).
Kumar, A. et al. The Impact of Meteorological Factors on Crop Price Volatility in India: Case studies of Soybean and Brinjal. arXiv preprint arXiv:2503.11690, 2025.
Petchpol, K. & Boongasame, L. Enhancing Predictive Capabilities for Identifying At-Risk Stocks Using Multivariate Time-Series Classification: A Case Study of the Thai Stock Market2025p. 3874667 (Applied Computational Intelligence and Soft Computing, 2025). 1.
Ahsun, A., Khan Sio, A. J. & Winner, B. Developing an Optimal Data Management Framework for Enhanced Financial Forecasting with Big Data Analytics. (2025).
Zha, D., Zhang, S. & Cao, Y. Can extremely high-temperature weather forecast oil prices?. Front. Eng. Manage. 12 (1), pp. 1–14 (2025).
Ibebuchi, C. C. Day-Ahead energy price forecasting with machine learning: role of endogenous predictors. Forecasting 7 (2), 18 (2025).
Wen, Q. & Liu, Y. Feature engineering and selection for prosumer electricity consumption and production forecasting: A comprehensive framework. Appl. Energy. 381, 125176 (2025).
Amiri, B., Haddadi, A. & Mojdehi, K. F. A Novel Hybrid GCN-LSTM Algorithm for Energy Stock Price Prediction: Leveraging Temporal Dynamics and Inter-Stock Relationships (IEEE Access, 2025).
Bandara, K., Hyndman, R. J. & Bergmeir, C. MSTL: A Seasonal-trend Decomposition Algorithm for time Series with Multiple Seasonal Patterns (International Journal of Operational Research, 2025).
Das, P. & Barman, S. Perspective Chapter: An Overview of Time Series Decomposition and Its Applications. (2025).
Zhang, Q. & Qian, J. Identification and Temporal distribution of typical rainfall types based on K-Means + + Clustering and probability distribution analysis. Hydrology 12 (4), 88 (2025).
Abbasi, M., Váz, P., Silva, J. & Martins, P. Machine learning approaches for predicting maize biomass yield: leveraging feature engineering and comprehensive data integration. Sustainability 17 (1), 256 (2025).
Al Sukhni, H. et al. Data-Driven weather prediction: integrating deep learning and ensemble models for robust weather forecasting. J. Cybersecur. Inform. Manage. 15(2), PP. 260-284 (2025).
Abiramasundari, S. & Ramaswamy, V. Distributed denial-of-service (DDOS) attack detection using supervised machine learning algorithms. Sci. Rep. 15 (1), 13098 (2025).
Gupta, S., Nachappa, S. & Paramanandham, N. Stock market time series forecasting using comparative machine learning algorithms. Procedia Comput. Sci. 252, 893–904 (2025).
Chandran, D. & Chithra, N. Predictive performance of ensemble learning boosting techniques in daily streamflow simulation. Water Resour. Manage. 39 (3), 1235–1259 (2025).
Zhou, Z. H. Ensemble Methods: Foundations and Algorithms (CRC, 2025).
Verma, P. et al. ABIDS-VEM: leveraging an equilibrium optimizer and data ramification in association with ensemble learning for anomaly-based intrusion detection system. J. Supercomputing. 81 (7), 1–35 (2025).
Fan, J., Wang, Z., Niu, Y., Mu, C. & Xue, Y. Energy Consumption Analysis and Prediction of Key Enterprises Based on Lightgbm. Available at SSRN 5249514.
Monteiro, F. P. et al. A hybrid methodology using machine learning techniques and feature engineering applied to time series for Medium-and Long-Term energy market price forecasting. Energies 18 (6), 1387 (2025).
Lin, S., Wang, Y., Wei, H., Wang, X. & Wang, Z. Hybrid method for oil price prediction based on feature selection and XGBOOST-LSTM. Energies 18 (9), 2246 (2025).
Ripatti, V. Forecasting Prescription Medication Utilization: A Comparative Study of SARIMA, Prophet, XGBoost and LSTM Models. (2025).
Cheng, M. et al. A Comprehensive Survey of time Series Forecasting: Concepts, Challenges, and Future Directions (Authorea Preprints, 2025).
Bartram, F., Yuan, B. & Zhang, K. M. Predicting solar photovoltaic generation impacted by severe wildfire smoke. Environ. Res. Lett. 20 (6), 064015 (2025).
Sakib, M., Mustajab, S. & Alam, M. Ensemble deep learning techniques for time series analysis: a comprehensive review, applications, open issues, challenges, and future directions. Cluster Comput. 28 (1), 1–44 (2025).
Elshaarawy, M. K. & Armanuos, A. M. Predicting seawater intrusion wedge length in coastal aquifers using hybrid gradient boosting techniques. Earth Sci. Inf. 18 (2), 243 (2025).
Orellana, C. B. Mathematical and Computational Modeling of Collective Dynamics (Arizona State University, 2025).
Song, W., Cao, W., Yuan, Y., Liu, K. Z. & Wu, M. MISMS: a multistep-ahead and interpretable sequential modelling scheme for the long-term dynamic forecasting in process industries. Int. J. Syst. Sci. 56 (1), pp. 1–18 (2025).
Ye, X., Liu, C., Xiong, X. & Qi, Y. Recurrent attention encoder–decoder network for multi-step interval wind power prediction. Energy. 315, 134317 (2025).
Hemant, M. K. Optimizing Hyperparameters in Deep Learning Models Using Bayesian Optimization. (2025).
Papenmeier, L., Cheng, N., Becker, S. & Nardi, L. Exploring Exploration in Bayesian Optimization. arXiv preprint arXiv:2502.08208, (2025).
Wang, S. et al. Improved hyperparameter Bayesian Optimization-Bidirectional Long Short-Term Memory optimization for high-precision battery state of charge estimation. Energy, : p. 136598. (2025).
Lai, M. et al. Temporal cross-validation in forecasting: A case study of COVID‐19 incidence using wastewater data. Qual. Reliab. Eng. Int. 41 (2), 672–688 (2025).
Bansal, A., Balaji, K. & Lalani, Z. Temporal Encoding Strategies for Energy Time Series Prediction. arXiv preprint arXiv:2503.15456, (2025).
Elgazwy, A., Elgazzar, K. & Khamis, A. Predicting Pedestrian Crossing Intentions in Adverse Weather with Self-Attention Models (IEEE Transactions on Intelligent Transportation Systems, 2025).
Jamil, M. H., Jagirdar, R., Kashem, A., Ali, M. N. & Deb, D. Modeling of Marshall stability of plastic-reinforced asphalt concrete using machine learning algorithms and SHAP. Hybrid. Adv. 10, 100483 (2025).
Taha, K. Empirical and experimental insights into data mining techniques for crime prediction: A comprehensive survey. ACM Trans. Intell. Syst. Technol. 16 (2), 1–75 (2025).
Srichaiyan, P., Tippayawong, K. Y. & Boonprasope, A. Forecasting Soybean Futures Prices with Adaptive AI Models (IEEE Access, 2025).
Kumar, V. et al. A comparative study of machine learning models for daily and weekly rainfall forecasting. Water Resour. Manage. 39 (1), 271–290 (2025).
AlSowail, A., Almashal, M., Saeed, M. B., Nasser, N. & Benoumhani, S. Snow Forecasting with Machine Learning to Unravel Meteorological Patterns for Precision Weather Prediction. in 8th International Conference on Data Science and Machine Learning Applications (CDMA). 2025. IEEE. 2025. IEEE. (2025).
Shmuel, A., Lazebnik, T., Glickman, O., Heifetz, E. & Price, C. Global lightning-ignited wildfires prediction and climate change projections based on explainable machine learning models. Sci. Rep. 15 (1), 7898 (2025).
Patil, Y., Ramachandran, H., Sundararajan, S. & Srideviponmalar, P. Comparative analysis of machine learning models for crop yield prediction across multiple crop types. SN Comput. Sci. 6 (1), 64 (2025).
Jallow, H., Mwangi, R. W., Gibba, A. & Imboga, H. Transfer learning for predicting of gross domestic product growth based on remittance inflows using RNN-LSTM hybrid model: a case study of the Gambia. Front. Artif. Intell. 8, 1510341 (2025).
Anand, A. & Jhajharia, K. Advanced crop yield prediction using machine learning and deep learning: a comprehensive review. TELKOMNIKA (Telecommunication Comput. Electron. Control). 23 (2), 402–415 (2025).
Selvam, A. P. & Al-Humairi, S. N. S. Environmental impact evaluation using smart real-time weather monitoring systems: a systematic review. Innovative Infrastructure Solutions. 10 (1), 1–24 (2025).
GUPTA, R. & TRIPATHI, J. A comparative study of electricity, irrigation and cropping pattern in three villages of Uttar pradesh, India. Asian J. Agric. 9(1), 103-111 (2025).
Ortman, S. G., Bogaard, A., Munson, J., Lawrence, D., Green, A. S., Feinman, G. M.,… Leyk, S., Changes in agglomeration and productivity are poor predictors of inequality across the archaeological record. Proceedings of the National Academy of Sciences, 2025. 122(16): p. e2400693122.
Rumallang, A., Salam, M., Fudjaja, L. & Diansari, P. Gross margin, profitability index, and financial feasibility analyses of potato farming: Empirical facts from Gowa Regency, Indonesia. in IOP Conference Series: Earth and Environmental Science. IOP Publishing. (2025).
Hamad, M. Estimating the optimal volumes of the cost function of Cowpea crop in Al-Hawija district for the 2023 production season. J. Agricultural Sci. Sustainable Dev. 2 (1), 28–37 (2025).
Ahmed, S. et al. Enhancing profitability, sustainability, and resilience of rice-based cropping systems by including premium quality rice and intensifying and diversifying cropping systems. (2025).
Khan, S. F. A., Nisa, M., Naz, S., Muluk, A. A. & Mughul, A. Profitability analysis of Cold-Water fish farming in Northern pakistan: A case study of Aziz trout farm district chitral. J. Social Sci. Archives. 3 (1), 56–65 (2025).
Absanto, G., Mkunda, J. & Nyangarika, A. Economic Viability of Micro-Irrigation Technologies in Smallholder Horticultural Farming: A Comparative Study with Traditional Furrow Irrigation in Northern Tanzania. (2025).
Nakalembe, C., Frimpong, D. B., Kerner, H. & Sarr, M. A. A 40-year remote sensing analysis of Spatiotemporal temperature and rainfall patterns in Senegal. Front. Clim. 7, 1462626 (2025).
Baburaja, S. & Pichaipillai, S. Economic evaluation of maize cultivation practices in Salem district, Tamil nadu, India. Int. J. Econ. Bus. Adm. (IJEBA). 13 (1), 141–151 (2025).
Gao, C., Ma, H., Pei, Q. & Chen, Y. Dynamic graph-based graph attention network for anomaly detection in industrial multivariate time series data. Appl. Intell. 55 (6), 517 (2025).
Zhang, N. et al. Comparative physiological and co-expression network analysis reveals potential hub genes and adaptive mechanisms responsive to NaCl stress in peanut (Arachis Hypogaea L). BMC Plant Biol. 25 (1), 294 (2025).
Prihandi, I., Wijono, S., Sembiring, I. & Maria, E. Implementation of ARIMA with Min-Max Normalization for Predicting the Price and Production Quantity of Red Chili Peppers in North Sumatra Province Considering Rainfall and Sunlight Duration Factors15p. 21876–21887 (Engineering, Technology & Applied Science Research, 2025). 2.
Issac, S. S., Jangirala, S. & Mandal, A. Forecasting Carbon Intensity in Great Britain: A Sarimax Approach. Available at SSRN 5251053, (2025).
Li, D., Tang, H. & Wang, Y. The Onset of the British Imperial Retreat from China: Evidence from the Chinese Sovereign Bond Market in London (Asia-Pacific Economic History Review, 2025).
Gaffoor, M. & Assa, H. Examining the Impact of Weather Factors on Agricultural Market Price Risk: an XAI Approach, in Quantitative Risk Management in Agricultural Businessp. 249–272 (Springer, 2025).
Geng, Y. et al. Time-Lag of seasonal effects of extreme climate events on grassland productivity across an altitudinal gradient in Tajikistan. Plants 14 (8), 1266 (2025).
Suharso, A., Herdiyeni, Y., Tarigan, S. D. & Arkeman, Y. Comparison of sugarcane drought stress based on climatology data using machine learning regression model in East java. Jurnal Rekayasa sistem Dan teknologi Informasi (RESTI), 9(2), pp. 225–238 (2025).
Grage, K., Perry, K. I., Manning, K. & Bahlai, C. Time Matters: Enhancing an Ecological time Series Analysis Tool for Nonlinear Systems and Multi-Trophic Interactions (Kent State University, 2025).
Kumar, M. et al. Field based analysis of vegetation and climate impacts on the hydrological properties of urban vegetated slope. Sci. Rep. 15 (1), 7702 (2025).
Dulin, S. et al. Quantifying the compounding effects of natural hazard events: a case study on wildfires and floods in California. Npj Nat. Hazards. 2 (1), 1–11 (2025).
Kumar, R., Lad, Y. A. & Kumari, P. Forecasting Potato Prices in Agra: Comparison of Linear Time Series Statistical vs. Neural Network Models. Potato Research, : pp. 1–22. (2025).
Prudnikov, V. B., Timiryanova, V. M., Rossinskaya, G. M. & Krasnoselskaya, D. K. Spatial dependence of prices for seasonally demanded products across price segments in Russian regions. R-Economy. Vol. 11. Iss. 1, 2025. 11(1): pp. 5–21. (2025).
Hieronymus, T. A. It’s All Speculation: Collected Writings on Markets and Trading (Ceres Books LLC, 2025).
Faiz, B. Studies on transfer of radioactive materials from soil to plant in cox’s bazar and sylhet areas (© University of Dhaka, 2025).
Sethi, R. R. et al. Impact of groundwater pumping on CO2 emissions and water productivity in Western Gujarat agriculture. Water Conserv. Sci. Eng. 10 (1), 1–16 (2025).
Mostafa, S. Tribal Food Habit and Nutrient Composition of some Specific Fruits and Vegetables in Bangladesh (University of Dhaka, 2025).
Zhao, T., Chen, G., Suraphee, S., Phoophiwfa, T. & Busababodhin, P. A hybrid TCN-XGBoost model for agricultural product market price forecasting. PLoS One. 20 (5), e0322496 (2025).
Shafique, R., Khan, S. H., Ryu, J. & Lee, S. W. Weather-Driven predictive models for Jassid and Thrips infestation in cotton crop. Sustainability 17 (7), 2803 (2025).
Levent, İ., Şahin, G., Işık, G. & van Sark, W. G. Comparative analysis of advanced machine learning regression models with advanced artificial intelligence techniques to predict rooftop PV solar power plant efficiency using indoor solar panel parameters. Appl. Sci. 15 (6), 3320 (2025).
Zhang, Y. & Sonta, A. OccuEMBED: Occupancy Extraction Merged with Building Energy Disaggregation for Occupant-Responsive Operation at Scale. arXiv preprint arXiv:2505.05478, (2025).
Kim, J., Kim, H., Kim, H., Lee, D. & Yoon, S. A comprehensive survey of deep learning for time series forecasting: architectural diversity and open challenges. Artif. Intell. Rev. 58 (7), 1–95 (2025).
Abdelli, K. et al. Forecasting of Weather-Induced State of Polarization Changes in Aerial Fibers (Journal of Lightwave Technology, 2025).
Miah, M. S. U. et al. REDf: a deep learning model for short-term load forecasting to facilitate renewable integration and attaining the SDGs 7, 9, and 13. PeerJ Computer Science, 11: p. e2819. (2025).
Kunac, D. Comparative Analysis of time Series Models for River Flood forecasting (University of Guelph, 2025).
Zhang, M., Ma, H., Yang, Y., Gao, Y. & Zhang, Q. Multiscale session-enhanced long time series modeling for power transformer oil temperature prediction. J. Supercomputing. 81 (7), 813 (2025).
Nguyen, T. S., Nguyen, V. T. & Nguyen, D. M. D. Enhancing Time Series Forecasting via a Parallel Hybridization of ARIMA and Polynomial Classifiers. arXiv preprint arXiv:2505.06874, (2025).
Min, Y. et al. RNN and GNN based prediction of agricultural prices with multivariate time series and its short-term fluctuations smoothing effect. Sci. Rep. 15 (1), 13681 (2025).
Khan, Y., Kumar, V., Gacem, A., Satpathi, A., Setiya, P., Surbhi, K., … Kisi, O.,Comparative evaluation of hybrid and individual models for predicting soybean yellow mosaic virus incidence. Scientific Reports, 2025. 15(1): pp. 1–22.
Htay, H. S., Ghahremani, M. & Shiaeles, S. Enhancing bitcoin price prediction with deep learning: integrating social media sentiment and historical data. Appl. Sci. 15 (3), 1554 (2025).
Jiang, M., Che, J., Li, S., Hu, K. & Xu, Y. Incorporating key features from structured and unstructured data for enhanced carbon trading price forecasting with interpretability analysis. Appl. Energy. 382, 125301 (2025).
Ahmed, S. et al. Enhancing food security: machine learning-based wheat yield prediction using remote sensing and climate data in Pakistan. Theoret. Appl. Climatol. 156 (5), 1–19 (2025).
Makokha, J. W., Barasa, P. W. & Khamala, G. W. Enhancing Climate Resilience: A Data-Driven North Rift Weather Prediction System for Real-Time Forecasting and Agricultural Decision Support. Heliyon, (2025).
Thimmegowda, M. N. et al. Comparative analysis of machine learning and statistical models for cotton yield prediction in major growing districts of karnataka, India. J. Cotton Res. 8 (1), 6 (2025).
Beddows, M., Durrant, A. & Leontidis, G. VisionTreeS: A Hybrid Tree-Based Visual Masked Autoencoder Approach for Strawberry Yield Forecasting from Low-Resolution Data. Available at SSRN 5202498, (2025).
Thangavelu, G., Ponnusamy, G., Ravikumar, K., Bajurulla, M. A. & Ganesan, M. Weather forecasting using machine learning. in AIP Conference Proceedings. AIP Publishing LLC. (2025).
Rodrigues, E. M., Baghoussi, Y. & Mendes-Moreira, J. KDBI special issue: explainability feature selection framework application for LSTM multivariate time‐series forecast self optimization. Expert Syst. 42 (2), e13674 (2025).
Cao, Y., Tian, Z., Guo, W. & Liu, X. MSPatch: A multi-scale Patch Mixing Framework for Multivariate time Series Forecastingp. 126849 (Expert Systems with Applications, 2025).
Ghasemi, Z., Neshat, M., Aldrich, C., Zanin, M. & Chen, L. Optimising Sag Mill Throughput and Circulating Load Using Machine Learning Models: A Multi-Objective Approach for Identifying Optimal Process Parameters. Available at SSRN 5154483.
Sharma, D., Kushwaha, P., Verma, A. & Chaudhary, A. Harnessing IoT and Machine Learning for Efficient Water Management in Urban Infrastructure. in International Conference on Pervasive Computational Technologies (ICPCT). 2025. IEEE. 2025. IEEE. (2025).
Schafer, M., Paudel, K. P. & Upadhyaya, K. Family strategies: labor migration, multigenerational households, and children’s schooling in Nepal. Am. J. Econ. Sociol. 84 (1), 135–152 (2025).
Akchaya, K., Parasuraman, P., Pandian, K., Vijayakumar, S., Thirukumaran, K., Mustaffa,M. R. A. F., … Choudhary, A. K., Boosting resource use efficiency. 2025.
Ramana, M. V., Kumari, C. P., Karthik, R., Alibaba, M., Reddy, G. K., Chiranjeevi,K., … Hossain, A., Integrated Farming Systems Improve the Income of Small Farm Holdings—An Overview of Earlier Findings in the Indian Context. Food and Energy Security, 2025. 14(2): p. e70064.
Yadav, M., Dhingra, B., Batra, S., Saini, M. & Aggarwal, V. ESG scores and stock returns during COVID-19: an empirical analysis of an emerging market. Int. J. Soc. Econ. 52 (3), 390–405 (2025).
Tripathi, S. C., Kumar, N., Venkatesh, K. & Meena, R. P. Enhancing energetics, system productivity, profitability and soil fertility in maize based cropping system by conservation vis a vis conventional agriculture in North-West India. Int. J. Plant. Prod. 19, pp. 117–130 (2025).
Venkatesh, P. & Rupak, S. and E. Shakeel Ahamed. Optimizing Market Interventions a Data-Driven Approach to Price Stabilization. in 2025 International Conference on Multi-Agent Systems for Collaborative Intelligence (ICMSCI). IEEE. (2025).
Sauwamah, A., Tie, V. G. & Pamungkas, I. D. The impact of profitability on stock prices: The moderating role of firm size in publicly listed companies. in Proceeding International Conference on Accounting and Finance. (2025).
Houghton, L. & Glynn, P. The Ontology of a Product-Based Circular Economy. Circular Economy Applications in Energy Policy, : p. 273. (2025).
Kabir, M., Bello, T. & Shittu, E. Weed species composition and profitability of groundnut (Arachis Hypogaea L.) production as affected by variety and Intra-Row spacing in kano, Sudan savannah, Nigeria. Afr. J. Agricultural Sci. Food Res. 18 (1), 47–58 (2025).
Krithika, C., Santhi, R., Maragatham, S., Devi, P. & Vijayalakshmi, D. R., Unleashing the potential of blackgram: exploring the impact of diverse nutrient prescription techniques on Alfisols. Commun. Soil Sci. Plant Anal. 56 (10), pp. 1–21 (2025).
Heiss, N., Meier, J., Gessner, U. & Kuenzer, C. A review: potential of Earth observation (EO) for mapping Small-Scale agriculture and cropping systems in West Africa. Land 14 (1), 1–47 (2025).
Gaglo, E. K., Chaste, E., Luyssaert, S., Roupsard, O., Jourdan, C., Sow, S., … Valade,A., Sensitivity of a Sahelian groundwater-based agroforestry system to tree density and water availability using the land surface model ORCHIDEE (r7949). EGUsphere, 2025. 2025: pp. 1–38.
Mohanty, P., Subhadarshini, K., Nayak, R., Pati, U. C. & Mahapatra, K. Exploring data-driven Multivariate Statistical Models for the Prediction of Solar Energy, in Computer Vision and Machine Intelligence for Renewable Energy Systems p. 85–101 (Elsevier, 2025).
Jacoby, D., Messer, H. & Ostrometzky, J. Spatio-Temporal Model for Predicting Multivariate Weather-Induced Attenuation in Wireless Networks (IEEE Transactions on Instrumentation and Measurement, 2025).
Bai, X., Zhang, L., Feng, Y., Yan, H. & Mi, Q. Multivariate temperature prediction model based on CNN-BiLSTM and randomforest. J. Supercomputing. 81 (1), 162 (2025).
Dalwai, A. & Singh, R. Secondary Agriculture: Farm-Linked Micro-enterprises, in Secondary Agriculture: Upgrading Agriculture for Jobs and Income. Springer. pp. 139–196. (2025).
Dhanya, P., Geethalakshmi, V., Ramanathan, S., Senthilraja, K., Dhasarathan, M.,Sreeraj, P., … Vigneswaran, S., Accuracy of climate and weather early warnings for sustainable crop water and river basin management, in Hydrosystem Restoration Handbook. 2025, Elsevier. pp. 121–133.
Kumari, P., Goswami, V., Harshith, N. I., & Pundir, R. S.. Recurrent neural network architecture for forecasting banana prices in Gujarat, India. PLoS ONE. 18(6), e0275702 (2023)
Harshith, N., & Kumari, P. Memory based neural network for cumin price forecasting in Gujarat, India. Journal of Agriculture and Food Research. 15, (1-10), 101020 (2024)
Kumari, P., Satish, K. M., Vekariya, P., Shubhra, N. K., Jignesh, M., & Mishra, P. Predicting potato prices in Agra, UP, India: An H2O AutoML approach. Potato Research. 68, 127–142 (2024)
Funding
NASF/SSPA/9035.
Author information
Authors and Affiliations
Contributions
M.K, SD, KK, and CG conceived and designed the research. VP and K.M.S managed the economic aspects of the study, while M.K and C.S.S supervised the machine learning components. A.S, N.B, and S.K prepared the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical statement
All methods were carried out in accordance with relevant guidelines and regulations. The study was conducted in line with the ethical standards of Tamil Nadu Agricultural University and was exempt from formal ethical review, as it involved anonymous survey responses and posed minimal risk to participants. Informed consent was obtained from all participants.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Muthuswamy, K., Dolli, S., Khandeparkar, K. et al. Weather-driven groundnut price forecasting and profitability assessment of cropping patterns in Tamil Nadu using boosting algorithms. Sci Rep 15, 33880 (2025). https://doi.org/10.1038/s41598-025-08573-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-08573-3