Figure 1

Importance of management-based variables in a random forest model predicting soybean yield. Feature importance was measured as the ratio of model error, after permuting the values of a feature, to the original model error. A predictor was unimportant if the ratio was 1. Points are the medians of the ratio over all the permutations (repeated 20 times). The bars represent the range between the 5% and 95% quantiles. Sowing date was the number of days from Jan 01. Growing degree days and the aridity index were annualized categorical constructs used within the definition of technology extrapolation domains (TEDs). Foliar fungicide or insecticide use, seed treatment use, starter fertilizer use, lime and manure applications were all binary variables for the use (or not) of the practice. Iron deficiency was likewise binary (symptoms were observed or not). Topsoil texture, plant available water holding capacity in the rooting zone, row spacing, and herbicide program were categorical variables with five, seven, five, and four levels, respectively.