Abstract
Accurate crop yield prediction is crucial for enhancing food security and agricultural sustainability; however, existing models frequently struggle to capture the intricate relationships between environmental drivers and crop performance. Here we leveraged a large, spatially explicit yield monitor dataset of U.S. commercial maize (Zea mays) and soybean (Glycine max) fields (134 unique crop-site-years). Machine learning models were trained to predict yield with high accuracy (R2 > 0.87, RMSE < 1.13 Mg ha−1), and Shapley Additive Explanations were used to quantify how weather, soil, and terrain properties predict yield variability. Our results highlight the potential of machine learning to disentangle environmental constraints on crop production, thereby providing actionable insights for more resilient U.S. food systems. The results presented here represent a novel approach to identifying maize and soybean yield constraints that can inform the next generation of crop breeding and precision management strategies.
Data availability
Yield monitor data has been kept private at the request of farmer participants. All other data used in this study are available on Google Earth Engine (https://developers.google.com/earth-engine/datasets) or through the Google Earth Engine Community Catalogue (https://gee-community-catalog.org/). Please contact Harrison Smith at hws001@uark.edu to request data from this study.
Code availability
Code, documentation, and metadata are available from the corresponding author’s GitHub repository: https://github.com/harrisonwsmith/harvesting_insights, or contact Harrison Smith at hws001@uark.edu to request the code used in this study.
References
USDA ERS. Farming and Farm Income: U.S. Farm Sector Cash Receipts (2023). https://www.ers.usda.gov/data-products/farm-income-and-wealth-statistics.
USDA NASS. Crop Production Annual Summary, 2023 (2024). https://usda.library.cornell.edu/concern/publications/k3569432s.
USDA FAS. Global Agricultural Trade System (GATS) (2024). https://apps.fas.usda.gov/gats.
Godfray, H. C. J. et al. Food security: the challenge of feeding 9 billion people. Science 327, 812–818 (2010).
Tilman, D., Balzer, C., Hill, J. & Befort, B. L. Global food demand and the sustainable intensification of agriculture. PNAS 108, 20260–20264 (2011).
Persson, U. M. The impact of biofuel demand on agricultural commodity prices: A systematic review. in Advances in Bioenergy 465–482 (John Wiley & Sons, Ltd, 2016).
Lauer, J. G. et al. The scientific grand challenges of the 21st century for the crop science society of America. Crop Sci. 52, 1003–1010 (2012).
Reddy, B. V. S., Reddy, S., Bidinger, P., Blümmel, M. & F. & Crop management factors influencing yield and quality of crop residues. Field Crops Res. 84, 57–77 (2003).
Peng, B. et al. Towards a multiscale crop modelling framework for climate change adaptation assessment. Nat. Plants. 6, 338–348 (2020).
Andorf, C. et al. Technological advances in maize breeding: past, present and future. Theor. Appl. Genet. 132, 817–849 (2019).
Boehm, J. D. Jr. et al. Genetic improvement of US soybean in maturity groups V, VI, and VII. Crop Sci. 59, 1838–1852 (2019).
Prasanna, B. M. Diversity in global maize germplasm: characterization and utilization. J. Biosci. 37, 843–855 (2012).
Xavier, A., Thapa, R., Muir, W. M. & Rainey, K. M. Population and quantitative genomic properties of the USDA soybean germplasm collection. Plant. Genet. Resour. 16, 513–523 (2018).
Heinemann, J. A. et al. Sarah and sustainability and innovation in staple crop production in the US Midwest. Int. J. Agric. Sustain. 12, 71–88 (2014).
Egli, D. B. Comparison of corn and soybean yields in the United States: Historical trends and future prospects. Agron. J. 100, 79–88 (2008).
Yost, M. A. et al. A long-term precision agriculture system sustains grain profitability. Precis Agric. 20, 1177–1198 (2019).
Gage, J. L. et al. The effect of artificial selection on phenotypic plasticity in maize. Nat. Commun. 8, 1348 (2017).
Kang, Y. & Özdoğan, M. Field-level crop yield mapping with Landsat using a hierarchical data assimilation approach. Remote Sens. Environ. 228, 144–163 (2019).
Lobell, D. B. & Burke, M. B. On the use of statistical models to predict crop yield responses to climate change. Agric. Meteorol. 150, 1443–1452 (2010).
Khaki, S. & Wang, L. Crop yield prediction using deep neural networks. Front. Plant. Sci. 10, 621 (2019).
Shahhosseini, M., Hu, G. & Archontoulis, S. V. Forecasting corn yield with machine learning ensembles. Front. Plant. Sci. 11, 1120 (2020).
Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2025).
Kaspar, T. C. et al. Relationship between six years of corn yields and terrain attributes. Precis Agric. 4, 87–101 (2003).
Jagadamma, S., Lal, R., Hoeft, R. G., Nafziger, E. D. & Adee, E. A. Nitrogen fertilization and cropping system impacts on soil properties and their relationship to crop yield in the central corn Belt, USA. Soil. Tillage Res. 98, 120–129 (2008).
Erickson, N. et al. AutoGluon-tabular: Robust and accurate automl for structured data. Preprint at https://doi.org/10.48550/arXiv.2003.06505 (2020).
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).
Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. in Advances Neural Inform. Process. System Vol. 30 (2017).
Dhillon, R., Takoo, G., Sharma, V. & Nagle, M. Utilizing machine learning framework to evaluate the effect of climate change on maize and soybean yield. Comput. Electron. Agric. 221, 108982 (2024).
Chang, Y., Latham, J., Licht, M. & Wang, L. A data-driven crop model for maize yield prediction. Commun. Biol. 6, 1–9 (2023).
Chaney, N. W. et al. POLARIS soil properties: 30-m probabilistic maps of soil properties over the contiguous united States. Water Resour. Res. 55, 2916–2938 (2019).
Lv, X. et al. Heat stress and sexual reproduction in maize: unveiling the most pivotal factors and the greatest opportunities. J. Exp. Bot. 75, 4219–4243 (2024).
Hoffman, L., Kemanian, A. R., Forest, E. & A. & The response of maize, sorghum, and soybean yield to growing-phase climate revealed with machine learning. Environ. Res. Lett. 15, 094013 (2020).
Ray, D. K., Gerber, J. S., MacDonald, G. K. & West, P. C. Climate variation explains a third of global crop yield variability. Nat. Commun. 6, 5989 (2015).
Ashworth, A. J., Allen, F. L. & Saxton, A. M. Using partial least squares and regression to interpret temperature and precipitation effects on maize and soybean genetic variance expression. Agronomy 13, 2752 (2023).
Bhattarai, B., Leasor, Z. & Reis, A. F. D. B. Incorporating soil moisture data into a machine learning framework improved the predictive accuracy of corn yields in the U.S. Agric. Water Manage. 319, 109762 (2025).
Kravchenko, A. N. & Bullock, D. G. Correlation of corn and soybean grain yield with topography and soil properties. Agron. J. 92, 75–83 (2000).
Cairns, J. E. et al. Identification of drought, heat, and combined drought and heat tolerant donors in maize. Crop Sci. 53, 1335–1346 (2013).
Valliyodan, B. et al. Genetic diversity and genomic strategies for improving drought and waterlogging tolerance in soybeans. J. Exp. Bot. 68, 1835–1849 (2017).
Safi, A. R., Karimi, P., Mul, M., Chukalla, A. & de Fraiture, C. Translating open-source remote sensing data to crop water productivity improvement actions. Agric. Water Manage. 261, 107373 (2022).
Jin, Z. et al. Smallholder maize area and yield mapping at National scales with Google Earth engine. Remote Sens. Environ. 228, 115–128 (2019).
Celis, J., Xiao, X., Wagle, P., Adler, P. R. & White, P. A review of yield forecasting techniques and their impact on sustainable agriculture. in transformation towards circular food systems: Sustainable, smart and resilient citrus supply chains in Mediterranean areas 139–168 (Springer Nature, 2024).
Bullock, D. S. et al. The data-intensive farm management project: changing agronomic research through on-farm precision experimentation. Agron. J. 111, 2736–2746 (2019).
Vega, A., Córdoba, M., Castro-Franco, M. & Balzarini, M. Protocol for automating error removal from yield maps. Precis Agric. 20, 1030–1044 (2019).
Thornton, M. M., Shrestha, R., Wei, Y., Thornton, P. E. & Kao, S. C. Daymet: Daily surface weather data on a 1-km grid for North America, version 4 R1 (ORNL Distributed Active Archive Center, 2022). https://doi.org/10.3334/ORNLDAAC/2129.
U.S. Geological Survey. 3D elevation program 1-meter resolution digital elevation model (2019). https://www.usgs.gov/the-national-map-data-delivery
Safanelli, J. L. et al. Terrain analysis in Google Earth engine: a method adapted for high-performance global-scale analysis. ISPRS Int. J. Geo-Inf. 9, 400 (2020).
Lehner, B., Verdin, K. & Jarvis, A. New global hydrography derived from spaceborne elevation data. Eos Trans. Am. Geophys. Union. 89, 93–94 (2008).
Gorelick, N. et al. Google Earth engine: planetary-scale Geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017).
Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. in Advances Neural Inform. Process. Systems Vol. 30 (2017).
Mosca, E., Szigeti, F., Tragianni, S., Gallagher, D. & Groh, G. SHAP-based explanation methods: a review for NLP interpretability. in Proceedings of the 29th International Conference on Computational Linguistics 4593–4603 (2022).
Ying, X. An overview of overfitting and its solutions. J. Phys. Conf. Ser. 1168, 022022 (2019).
Acknowledgements
The authors would like to acknowledge the data contributions of farmers from the Data Intensive Farm Management Project, without which this project would not have been possible. The U.S. Department of Agriculture (USDA) prohibits discrimination in all its programs and activities on the basis of race, color, national origin, age, disability, and where applicable, sex, marital status, familial status, parental status, religion, political beliefs, reprisal, or because all or part of an individual’s income is derived from any public assistance. (Not all prohibited bases apply to all programs.) Persons with disabilities who require alternative means for communication of program information (Braille, large print, audiotape, etc.) should contact USDA’s TARGET Center at 202-720-2600 (voice and TDD).
Funding
This research was supported by the intramural research program of the U.S. Department of Agriculture, National Institute of Food and Agriculture, Agriculture and Food Research Initiative Predoctoral Fellowship Program, Award Number 2023-67011-40361. This research was also funded in part by USDA-NRCS On-Farm Trials Conservation Innovation Grant, “Improving the Economic and Ecological Sustainability of US Crop Production through On-Farm Precision Experimentation,” award number NR213A7500013G021, and by USDA-NIFA Hatch Project 470-362.
Author information
Authors and Affiliations
Contributions
HWS wrote the main manuscript text. H.W.S. and C.J.H. wrote software used in the work and conducted analyses. Data acquisition was conducted by D.S.B., A.J.A. and P.R.O., while H.W.S., A.J.A., L.L.N., P.R.O., D.S.B., and J.T. contributed to project conception. All authors contributed to data interpretation. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Smith, H.W., Heffernan, C.J., Ashworth, A.J. et al. Harvesting insights: interpretable machine learning to understand environmental drivers of U.S. maize and soybean yield. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38724-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-38724-z