Predicting land suitability for wheat and barley crops using machine learning techniques

Ganati, Bikila Abebe; Sitote, Tilahun Melak

doi:10.1038/s41598-025-99070-0

Download PDF

Article
Open access
Published: 07 May 2025

Predicting land suitability for wheat and barley crops using machine learning techniques

Bikila Abebe Ganati¹ &
Tilahun Melak Sitote²

Scientific Reports volume 15, Article number: 15879 (2025) Cite this article

6149 Accesses
2 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Ensuring food security to meet the demands of a growing population remains a key challenge, especially for developing countries like Ethiopia. There are various policies and strategies designed by the government and stakeholders to confront the challenge. One of the strategies is using technology solutions to increase crop productivity. Precision agriculture using advanced technology has been utilized to increase crop yield. Identifying suitable land for a crop is one of the important factors that will affect the crop’s yield. The existing approach to land suitability identification for a crop is time-consuming, expensive, and inaccurate. In this study, land suitability has been predicted for the two widely grown cereal crops in Ethiopia—wheat and barley—using machine learning techniques. The dataset was obtained from the Engineering Corporation of Oromia (ECO). To make it suitable for modelling, we have pre-processed it. Features have been selected with univariate feature selection (UFS), recursive feature elimination with cross validation (RFECV), and sequential forward selection (SFS). Then, random forest (RF), gradient boosting (GB), and K-nearest neighbour (KNN) were used to predict the land suitability of the two selected crops. To optimize the performance of the models, hyperparameters were tuned with cross-validated randomized searches. The performance of the models has been evaluated using stratified tenfold cross-validation with performance metrics such as accuracy, precision, recall, and F1-score. GB with the SFS has better performance than the other models, with accuracy of 99.41%, precision of 99.37%, recall of 99.34%, and an F1-score of 99.35%. We believe that predicting land suitability accurately using machine learning techniques for the two commonly cultivated cereal crops in Ethiopia will be helpful in increasing the crops’ productivity. The developed model is very accurate. It can be used to develop a decision support system to identify the land suitable for the two crops.

Integration of satellite data for predicting crop yields in Eastern Ethiopia using machine learning

Article Open access 30 September 2025

Simultaneous multi-crop land suitability prediction from remote sensing data using semi-supervised learning

Article Open access 26 April 2023

Maize yield prediction and condition monitoring at the sub-county scale in Kenya: synthesis of remote sensing information and crop modeling

Article Open access 20 June 2024

Introduction

Agriculture constitutes about 33.3% of the global gross domestic product (GDP), and its growth enhances shared prosperity¹. Agricultural growth alleviates extreme poverty by minimizing food shortages. According to recent estimates, 68.7% of the population is multidimensionally poor in Ethiopia². Despite efforts to diversify the country’s economy, agriculture is still the main source of income, and 80–85% of people’s livelihoods depend on it. Agricultural land is a limited resource that needs necessary protection and has to be used wisely. Growth in population leads to increasing demand and puts pressure on the availability of land³. Land use and land cover increases lead to a shortage of agricultural land availability, which results in food shortages⁴. Hence, there is a need for innovative ways to meet the food demand of the growing population⁵.

Agricultural land in Ethiopia is facing various problems such as erosion, landslides, deforestation, and groundwater depletion. The country’s farm size on average is less than one hectare per household⁶. Even if the livestock are considerable and are above the capacity of grazing land, it is still inadequate. Hence, optimal use of the available land is vital. Land evaluation is an important step in the process of optimal land use planning. Identifying land suitability for a specific crop can support decision-making in land-use planning and improve crop yield^3,6. However, lack of knowledge to identify features that best characterize the land and match crops suitable for a particular land is a challenge that affects productivity. In Ethiopia, agricultural experts perform land suitability evaluations. They use geographical information systems (GIS) and AHP to classify land into suitability classes for a particular crop. According to the Food and Agricultural Organization (FAO) guidelines, effective land suitability classification requires a huge process to map attributes of land with crop nutrients requirements, which consumes a lot of time, requires a lot of effort, and is expensive³. The mismatch between the actual requirements and what is implemented on land affects crop yield⁴. Hence, land suitability assessment to determine the suitable land unit for a particular crop is an important process to enhance crop yield.

Since machine learning approach has been used in agriculture to improve productivity in several areas, such as species recognition, soil management, land suitability prediction, species breeding, water management, crop management, yield prediction, crop quality management, crop disease detection, weed recognition, and livestock management. This approach can be used for this study which is land suitability prediction⁵.

In this study, we predicted the land suitability of the two commonly cultivated cereal crops in Ethiopia, wheat and barley⁶, using machine learning techniques. Features of soil, climate, topography, and crops’ nutrient have been used to predict land suitability. We believe that effective land suitability classification can be helpful to farmers in identifying appropriate crops for their land and improving crop yield⁷.

Related works

Determining the suitable land unit for a particular crop is a crucial process to enhance crop yield. Many studies have been conducted on land suitability identification using machine learning techniques. The major related works have been discussed as follows:

Komolafe et al. proposed a land suitability predictive model for cassava⁸. Support vector machine (SVM) and decision tree have been used to predict land suitability. The data was collected from the Institute of Agriculture, Research, and Training (IART), Ibadan. It is a major agricultural research institute in Nigeria. The datasets contain features such as chemical properties, physical properties, climate, and topography as independent, and land suitability as dependent. There were total of 252 records. The accuracy of the model was 87.5%, and 12.7% of the instances were misclassified. The dataset is small for machine learning-based modelling. In this study, land attributes with crop needs were not also mapped.

Kennedy et al. predicted land suitability for sorghum using a parallel random forest classifier⁹. The dataset consisted of eight features: drainage, annual rainfall, rooting depth, salinity hazard (ECE), sodicity hazard (ESP), moisture-holding capacity (MHC), slope, and mean annual temperature. They trained different machine learning models such as Random Forest (RF), Logistic Regression (LR), Latent Dirichlet Allocation (LDA), K-Nearest Neighbour (KNN), Gaussian Naïve Bayes (GNB), and SVM. The performance of the models was evaluated. RF outperformed the other models with an accuracy of 90% under tenfold cross-validation. In this study, the best determinant features for land suitability, such as pH, N, and P, were not considered.

Fereydoon et al. predicted land suitability for wheat in the Kouhin region of Iran using SVM¹⁰. The dataset consists of 32 representative soil profiles and 10 land features. Features include climatic factors like precipitation and temperature, topography like relief and slope, and soil-related factors like soil texture, CaCO3, OC, coarse fragments, pH, and gypsum parameters. They implemented a two-class SVM model on a non-linear class boundary. They have used MATLAB 8.2 to train and test the model. In performance evaluation metrics, they got a root mean square error (RMSE) of 3.72 and a coefficient of determination (R square) of 0.84. The RMSE value in this study is larger than the optimum RMSE value, which is in the range between 0.2 and 0.5. The model tuning or hyperparameter tuning with different kernel values has not been explicitly stated.

Bhimanpallewar and Narasinagrao determined land suitability for the Jowar using a decision tree¹¹. Features including climatic (temperature and rainfall), topography (slope), soil (pH, EC, N, K, P, and soil moisture), and Jowar crop requirement ranges have been used. The class labels were suitability: highly suitable, moderately suitable, marginally suitable, currently suitable, and permanently not suitable. They have used FAO methods to map land attributes to crop requirements and calculate land suitability. The dataset was real-time data collected using aggregated reports. The accuracy of the model is 98.9%. In this study, the details of real-time data were not explored in detail. The performance evaluation was not comprehensive, and adequate validation has not been shown to show the model has not overfit.

Ogunde and Olanbo proposed a web-based decision support system for evaluating soil suitability for the cassava crop¹². They acquired secondary data from reviewed articles on soil suitability for cassava plantations. The dataset contains attributes of soil such as pH, NPK, and organic matter as independent features and land suitability as a dependent feature. The accuracy of the suitability prediction model was 76.5%. The classification was carried out using the J48 decision tree algorithm. In this study, only five features of soil attributes have been used to determine land suitability. Climate and topographic data, which are crucial in determining land suitability, were not considered. The performance is also not very good, with about 23% misclassification.

Schmidt et al.carried out land suitability assessments for irrigated wheat and barley crops using machine learning techniques in Kurdistan, Iran⁴¹. The main objective of this study was to conduct a land suitability evaluation and map it for both rain-fed crops using parametric and machine learning algorithms such as RF and SVM. The outcome was compared to traditional approaches. They concluded that the machine-learning-based approach outperformed traditional land suitability mapping in terms of accuracy. In this study, comparisons with other machine learning models were not carried out.

We may see shortcomings in related works, including a significant amount of misclassification and small datasets that are prone to overfitting. Consequently, they are less generalizable to real-world applications. Only one or two of the related works are on wheat and barley. The models may not also be applicable due to variations in the context of the crop and area. The summary of related works including the methods used and gaps have been shown in Table 1. We therefore conducted this research in an effort to address those flaws. We thought that by using appropriate machine learning techniques and hyperparameter optimization, performance could be enhanced.

Table 1 Summary of related works.

Subjects

Abstract

Similar content being viewed by others

Integration of satellite data for predicting crop yields in Eastern Ethiopia using machine learning

Simultaneous multi-crop land suitability prediction from remote sensing data using semi-supervised learning

Maize yield prediction and condition monitoring at the sub-county scale in Kenya: synthesis of remote sensing information and crop modeling

Introduction

Related works

Materials and methods

Study area

Data source

Pre-processing

Feature selection

Univariate feature selection (UFS)

Recursive feature elimination with cross-validation (RFECV)

Sequential forward selection (SFS)

Machine learning model building

Random forest (RF)

K-nearest neighbour (KNN)

Gradient boosting (GB)

Hyperparameter tuning

Models’ performance evaluation

Results and discussion

Dataset feature description

Dataset class distributions

Feature selection result

Models’ performance and evaluation results

Performance of models with original features

Performance of models with feature selection

Result of KNN based on feature scaling

Hyperparameter tuning

Discussions

Conclusion

Future work

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Consent for publication

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links