Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Harvesting insights: interpretable machine learning to understand environmental drivers of U.S. maize and soybean yield
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 13 February 2026

Harvesting insights: interpretable machine learning to understand environmental drivers of U.S. maize and soybean yield

  • Harrison W. Smith1,
  • Christopher J. Heffernan2,
  • Amanda J. Ashworth3,
  • L. Lanier Nalley4,
  • David S. Bullock5,
  • Jason Tullis6 &
  • …
  • Phillip R. Owens7 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Ecology
  • Environmental sciences
  • Plant sciences

Abstract

Accurate crop yield prediction is crucial for enhancing food security and agricultural sustainability; however, existing models frequently struggle to capture the intricate relationships between environmental drivers and crop performance. Here we leveraged a large, spatially explicit yield monitor dataset of U.S. commercial maize (Zea mays) and soybean (Glycine max) fields (134 unique crop-site-years). Machine learning models were trained to predict yield with high accuracy (R2 > 0.87, RMSE < 1.13 Mg ha−1), and Shapley Additive Explanations were used to quantify how weather, soil, and terrain properties predict yield variability. Our results highlight the potential of machine learning to disentangle environmental constraints on crop production, thereby providing actionable insights for more resilient U.S. food systems. The results presented here represent a novel approach to identifying maize and soybean yield constraints that can inform the next generation of crop breeding and precision management strategies.

Data availability

Yield monitor data has been kept private at the request of farmer participants. All other data used in this study are available on Google Earth Engine (https://developers.google.com/earth-engine/datasets) or through the Google Earth Engine Community Catalogue (https://gee-community-catalog.org/). Please contact Harrison Smith at hws001@uark.edu to request data from this study.

Code availability

Code, documentation, and metadata are available from the corresponding author’s GitHub repository: https://github.com/harrisonwsmith/harvesting_insights, or contact Harrison Smith at hws001@uark.edu to request the code used in this study.

References

  1. USDA ERS. Farming and Farm Income: U.S. Farm Sector Cash Receipts (2023). https://www.ers.usda.gov/data-products/farm-income-and-wealth-statistics.

  2. USDA NASS. Crop Production Annual Summary, 2023 (2024). https://usda.library.cornell.edu/concern/publications/k3569432s.

  3. USDA FAS. Global Agricultural Trade System (GATS) (2024). https://apps.fas.usda.gov/gats.

  4. Godfray, H. C. J. et al. Food security: the challenge of feeding 9 billion people. Science 327, 812–818 (2010).

    Google Scholar 

  5. Tilman, D., Balzer, C., Hill, J. & Befort, B. L. Global food demand and the sustainable intensification of agriculture. PNAS 108, 20260–20264 (2011).

    Google Scholar 

  6. Persson, U. M. The impact of biofuel demand on agricultural commodity prices: A systematic review. in Advances in Bioenergy 465–482 (John Wiley & Sons, Ltd, 2016).

  7. Lauer, J. G. et al. The scientific grand challenges of the 21st century for the crop science society of America. Crop Sci. 52, 1003–1010 (2012).

    Google Scholar 

  8. Reddy, B. V. S., Reddy, S., Bidinger, P., Blümmel, M. & F. & Crop management factors influencing yield and quality of crop residues. Field Crops Res. 84, 57–77 (2003).

    Google Scholar 

  9. Peng, B. et al. Towards a multiscale crop modelling framework for climate change adaptation assessment. Nat. Plants. 6, 338–348 (2020).

    Google Scholar 

  10. Andorf, C. et al. Technological advances in maize breeding: past, present and future. Theor. Appl. Genet. 132, 817–849 (2019).

    Google Scholar 

  11. Boehm, J. D. Jr. et al. Genetic improvement of US soybean in maturity groups V, VI, and VII. Crop Sci. 59, 1838–1852 (2019).

    Google Scholar 

  12. Prasanna, B. M. Diversity in global maize germplasm: characterization and utilization. J. Biosci. 37, 843–855 (2012).

    Google Scholar 

  13. Xavier, A., Thapa, R., Muir, W. M. & Rainey, K. M. Population and quantitative genomic properties of the USDA soybean germplasm collection. Plant. Genet. Resour. 16, 513–523 (2018).

    Google Scholar 

  14. Heinemann, J. A. et al. Sarah and sustainability and innovation in staple crop production in the US Midwest. Int. J. Agric. Sustain. 12, 71–88 (2014).

    Google Scholar 

  15. Egli, D. B. Comparison of corn and soybean yields in the United States: Historical trends and future prospects. Agron. J. 100, 79–88 (2008).

    Google Scholar 

  16. Yost, M. A. et al. A long-term precision agriculture system sustains grain profitability. Precis Agric. 20, 1177–1198 (2019).

    Google Scholar 

  17. Gage, J. L. et al. The effect of artificial selection on phenotypic plasticity in maize. Nat. Commun. 8, 1348 (2017).

    Google Scholar 

  18. Kang, Y. & Özdoğan, M. Field-level crop yield mapping with Landsat using a hierarchical data assimilation approach. Remote Sens. Environ. 228, 144–163 (2019).

    Google Scholar 

  19. Lobell, D. B. & Burke, M. B. On the use of statistical models to predict crop yield responses to climate change. Agric. Meteorol. 150, 1443–1452 (2010).

    Google Scholar 

  20. Khaki, S. & Wang, L. Crop yield prediction using deep neural networks. Front. Plant. Sci. 10, 621 (2019).

    Google Scholar 

  21. Shahhosseini, M., Hu, G. & Archontoulis, S. V. Forecasting corn yield with machine learning ensembles. Front. Plant. Sci. 11, 1120 (2020).

    Google Scholar 

  22. Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2025).

  23. Kaspar, T. C. et al. Relationship between six years of corn yields and terrain attributes. Precis Agric. 4, 87–101 (2003).

    Google Scholar 

  24. Jagadamma, S., Lal, R., Hoeft, R. G., Nafziger, E. D. & Adee, E. A. Nitrogen fertilization and cropping system impacts on soil properties and their relationship to crop yield in the central corn Belt, USA. Soil. Tillage Res. 98, 120–129 (2008).

    Google Scholar 

  25. Erickson, N. et al. AutoGluon-tabular: Robust and accurate automl for structured data. Preprint at https://doi.org/10.48550/arXiv.2003.06505 (2020).

    Google Scholar 

  26. Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).

    Google Scholar 

  27. Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. in Advances Neural Inform. Process. System Vol. 30 (2017).

  28. Dhillon, R., Takoo, G., Sharma, V. & Nagle, M. Utilizing machine learning framework to evaluate the effect of climate change on maize and soybean yield. Comput. Electron. Agric. 221, 108982 (2024).

    Google Scholar 

  29. Chang, Y., Latham, J., Licht, M. & Wang, L. A data-driven crop model for maize yield prediction. Commun. Biol. 6, 1–9 (2023).

    Google Scholar 

  30. Chaney, N. W. et al. POLARIS soil properties: 30-m probabilistic maps of soil properties over the contiguous united States. Water Resour. Res. 55, 2916–2938 (2019).

    Google Scholar 

  31. Lv, X. et al. Heat stress and sexual reproduction in maize: unveiling the most pivotal factors and the greatest opportunities. J. Exp. Bot. 75, 4219–4243 (2024).

    Google Scholar 

  32. Hoffman, L., Kemanian, A. R., Forest, E. & A. & The response of maize, sorghum, and soybean yield to growing-phase climate revealed with machine learning. Environ. Res. Lett. 15, 094013 (2020).

    Google Scholar 

  33. Ray, D. K., Gerber, J. S., MacDonald, G. K. & West, P. C. Climate variation explains a third of global crop yield variability. Nat. Commun. 6, 5989 (2015).

    Google Scholar 

  34. Ashworth, A. J., Allen, F. L. & Saxton, A. M. Using partial least squares and regression to interpret temperature and precipitation effects on maize and soybean genetic variance expression. Agronomy 13, 2752 (2023).

    Google Scholar 

  35. Bhattarai, B., Leasor, Z. & Reis, A. F. D. B. Incorporating soil moisture data into a machine learning framework improved the predictive accuracy of corn yields in the U.S. Agric. Water Manage. 319, 109762 (2025).

    Google Scholar 

  36. Kravchenko, A. N. & Bullock, D. G. Correlation of corn and soybean grain yield with topography and soil properties. Agron. J. 92, 75–83 (2000).

    Google Scholar 

  37. Cairns, J. E. et al. Identification of drought, heat, and combined drought and heat tolerant donors in maize. Crop Sci. 53, 1335–1346 (2013).

    Google Scholar 

  38. Valliyodan, B. et al. Genetic diversity and genomic strategies for improving drought and waterlogging tolerance in soybeans. J. Exp. Bot. 68, 1835–1849 (2017).

    Google Scholar 

  39. Safi, A. R., Karimi, P., Mul, M., Chukalla, A. & de Fraiture, C. Translating open-source remote sensing data to crop water productivity improvement actions. Agric. Water Manage. 261, 107373 (2022).

    Google Scholar 

  40. Jin, Z. et al. Smallholder maize area and yield mapping at National scales with Google Earth engine. Remote Sens. Environ. 228, 115–128 (2019).

    Google Scholar 

  41. Celis, J., Xiao, X., Wagle, P., Adler, P. R. & White, P. A review of yield forecasting techniques and their impact on sustainable agriculture. in transformation towards circular food systems: Sustainable, smart and resilient citrus supply chains in Mediterranean areas 139–168 (Springer Nature, 2024).

  42. Bullock, D. S. et al. The data-intensive farm management project: changing agronomic research through on-farm precision experimentation. Agron. J. 111, 2736–2746 (2019).

    Google Scholar 

  43. Vega, A., Córdoba, M., Castro-Franco, M. & Balzarini, M. Protocol for automating error removal from yield maps. Precis Agric. 20, 1030–1044 (2019).

    Google Scholar 

  44. Thornton, M. M., Shrestha, R., Wei, Y., Thornton, P. E. & Kao, S. C. Daymet: Daily surface weather data on a 1-km grid for North America, version 4 R1 (ORNL Distributed Active Archive Center, 2022). https://doi.org/10.3334/ORNLDAAC/2129.

  45. U.S. Geological Survey. 3D elevation program 1-meter resolution digital elevation model (2019). https://www.usgs.gov/the-national-map-data-delivery

  46. Safanelli, J. L. et al. Terrain analysis in Google Earth engine: a method adapted for high-performance global-scale analysis. ISPRS Int. J. Geo-Inf. 9, 400 (2020).

    Google Scholar 

  47. Lehner, B., Verdin, K. & Jarvis, A. New global hydrography derived from spaceborne elevation data. Eos Trans. Am. Geophys. Union. 89, 93–94 (2008).

    Google Scholar 

  48. Gorelick, N. et al. Google Earth engine: planetary-scale Geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017).

    Google Scholar 

  49. Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  50. Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. in Advances Neural Inform. Process. Systems Vol. 30 (2017).

  51. Mosca, E., Szigeti, F., Tragianni, S., Gallagher, D. & Groh, G. SHAP-based explanation methods: a review for NLP interpretability. in Proceedings of the 29th International Conference on Computational Linguistics 4593–4603 (2022).

  52. Ying, X. An overview of overfitting and its solutions. J. Phys. Conf. Ser. 1168, 022022 (2019).

    Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the data contributions of farmers from the Data Intensive Farm Management Project, without which this project would not have been possible. The U.S. Department of Agriculture (USDA) prohibits discrimination in all its programs and activities on the basis of race, color, national origin, age, disability, and where applicable, sex, marital status, familial status, parental status, religion, political beliefs, reprisal, or because all or part of an individual’s income is derived from any public assistance. (Not all prohibited bases apply to all programs.) Persons with disabilities who require alternative means for communication of program information (Braille, large print, audiotape, etc.) should contact USDA’s TARGET Center at 202-720-2600 (voice and TDD).

Funding

This research was supported by the intramural research program of the U.S. Department of Agriculture, National Institute of Food and Agriculture, Agriculture and Food Research Initiative Predoctoral Fellowship Program, Award Number 2023-67011-40361. This research was also funded in part by USDA-NRCS On-Farm Trials Conservation Innovation Grant, “Improving the Economic and Ecological Sustainability of US Crop Production through On-Farm Precision Experimentation,” award number NR213A7500013G021, and by USDA-NIFA Hatch Project 470-362.

Author information

Authors and Affiliations

  1. Environmental Dynamics Program, University of Arkansas, Fayetteville, AR, USA

    Harrison W. Smith

  2. Department of Computer Science and Computer Engineering, University of Arkansas, Fayetteville, AR, USA

    Christopher J. Heffernan

  3. USDA-ARS Poultry Production and Product Safety Research Unit, Fayetteville, AR, USA

    Amanda J. Ashworth

  4. Agricultural Economics and Agribusiness Department, University of Arkansas, Fayetteville, AR, USA

    L. Lanier Nalley

  5. Department of Agricultural and Consumer Economics, University of Illinois Urbana-Champaign, Urbana, IL, USA

    David S. Bullock

  6. Department of Geosciences, University of Arkansas, Fayetteville, AR, USA

    Jason Tullis

  7. USDA-ARS Dale Bumpers Small Farms Research Center, Booneville, AR, USA

    Phillip R. Owens

Authors
  1. Harrison W. Smith
    View author publications

    Search author on:PubMed Google Scholar

  2. Christopher J. Heffernan
    View author publications

    Search author on:PubMed Google Scholar

  3. Amanda J. Ashworth
    View author publications

    Search author on:PubMed Google Scholar

  4. L. Lanier Nalley
    View author publications

    Search author on:PubMed Google Scholar

  5. David S. Bullock
    View author publications

    Search author on:PubMed Google Scholar

  6. Jason Tullis
    View author publications

    Search author on:PubMed Google Scholar

  7. Phillip R. Owens
    View author publications

    Search author on:PubMed Google Scholar

Contributions

HWS wrote the main manuscript text. H.W.S. and C.J.H. wrote software used in the work and conducted analyses. Data acquisition was conducted by D.S.B., A.J.A. and P.R.O., while H.W.S., A.J.A., L.L.N., P.R.O., D.S.B., and J.T. contributed to project conception. All authors contributed to data interpretation. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Harrison W. Smith.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Smith, H.W., Heffernan, C.J., Ashworth, A.J. et al. Harvesting insights: interpretable machine learning to understand environmental drivers of U.S. maize and soybean yield. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38724-z

Download citation

  • Received: 11 August 2025

  • Accepted: 30 January 2026

  • Published: 13 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-38724-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing Anthropocene

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Anthropocene