Introduction

Phenotyping of biologically relevant traits is a high priority to accelerate genetic gain in crop breeding programs1 and close notable yield gaps2. Phenotyping in the field, and more specifically in the target population of environments3 is critical to capture the genotype-by-environment (G×E) interactions that define phenotypic outcomes3,4. Currently, high-throughput genotyping is routinely used in maize (Zea mays L.) breeding programs to aid marker assisted selection and whole genome prediction5. However, the scale of genomic data now far exceeds phenotypic data6, from the perspective of both data quantity and cost per data point. Phenotypic data are essential to provide context to genomic data4 and currently present a bottleneck in research and crop improvement activities7,8. Although modeling efforts aid in predicting phenotypes from genotype, they typically rely on training data for phenotypes that are limited to either (1) highest priority traits such as grain yield or (2) can be captured at large scale for low cost9. Phenotyping for component traits often adds key biological insights to understand yield outcomes and GxE10,11, but these are often difficult and time-consuming to measure.

Plant breeding companies focus on the continued improvement of maize genotypes through testing, advancement, and recycling of germplasm through their testing pipeline12. How these new genotypes interact with management factors once in the hands of farmers is of considerable interest as well and must be assessed through small plot and on-farm trials. Management factors such as plant density13 and nitrogen management14 can be applied to more fully understand Genotype by Environment by Management (GxExM) interactions4. These efforts require phenotypic characterization of the crop throughout the season to specifically determine the morphological changes that occur and further explain grain yield outcomes under a variety of environment and management conditions. Substantial resources are required to collect relevant morphological phenotypes such as stem diameter, leaf area index (LAI) and plant biomass15,16,17,18. Depending on the trait of interest, a physical measurement may require 10 or more minutes of human effort to collect a single data point. Time and labor constraints mean that field scientists must limit the number of treatments, genotypes, and locations where morphological traits are collected, further contributing to the phenotyping bottleneck observed in maize research trials. In general, currently available maize phenotyping methods are not capable of producing a complete assessment of plant growth processes, which leads to an incomplete understanding of GxExM.

Unmanned aerial vehicles have been deployed on a broad scale to collect RGB, multi-spectral, hyper-spectral and thermal data. Images from these flights capture biologically relevant traits such as plant height19, leaf area index20, and NDVI21. Data fusion approaches have been implemented to integrate signals captured from different sensor types, which partially overcomes the problem of canopy saturation for some of the most informative spectral bands22. However, in-canopy traits, such as ear height and stem diameter, remain challenging, if not impossible, to predict from above-canopy phenotyping tools. Deployment of compact, in-canopy robotic platforms carrying cameras and sensors is an active space of development to unlock access to below-canopy phenotypic traits at similar scales to above-canopy trait collection.

Ground-based, under-canopy, phenotyping platforms have taken various forms. Initial developments were wheeled but human-pushed23 tractor carried24, or motor propelled but human driven25 platforms have also been demonstrated. More recently, self-propelled platforms capable of running between crop rows and require little human operation26 have also been demonstrated. The last category of ground robots, motorized and autonomous, offers the greatest flexibility and best opportunity to enable parallel, simultaneous operation of multiple robots for high-throughput phenotyping. Ground-based platforms can measure several traits from either maize or sorghum. Examples include plant height and volume24, stem diameter27, stem strength28, and leaf area index29. However, each of these leveraged a lower-throughput system that required substantial human assistance (i.e., pushing or driving through the platform through the field) to collect the required information for trait calculation. For this reason, automating in-row, under-canopy, self-propelled systems for the measurement of phenotypic traits can have tremendous value for increased throughput with reduced human labor.

Autonomous robotic platforms for under-canopy phenotyping have become more available for use by the practitioner with examples including TerraSentia26 (Fig. 1A) and MARS30. While proof-of-concept demonstrations for collecting new phenotypic traits continue to increase, there are few examples of autonomous robots deployed at a scale sufficient to enable phenotyping required for a large-scale breeding, trait evaluation, or agronomic testing program. A single example leveraged an autonomous robot to collect LiDAR scans on 2103 experimental units with maize hybrids31. Novel latent-space (machine-learning-based system to characterize and discriminate differences among a series of image sequences for a set of treatments32) phenotypes were extracted from these scans that showed heritabilities as high as 0.4431.

Fig. 1: Deployment details for the TerraSentia robots between 2019 and 2023 by teams from Corteva Agriscience, EarthSense, and the University of Illinois.
figure 1

A A TerraSentia® robot in a typical corn breeding experimental unit with 0.76 m spaced corn rows. B Aerial image of a typical research trial location showing the corn plots laid out in columns and ranges. Each experimental unit is typically 5.3–9.3 m long with two rows spaced 0.76 m apart. C locations where the robot was operated in the United States and Canada. D The number of locations, number of locations by collection dates, experimental units with raw multi-sensor data, and the number of experimental units with at least one trait delivered.

A significant hurdle to deploying self-propelled under-canopy phenotyping robotic platforms is reliable in-row navigation under real-world, variable, field, and crop-canopy conditions. This problem has been studied with suggested solutions by several groups33,34,35,36. The primary challenge in autonomous navigation under the crop canopy is that the Global Navigation Satellite System (GNSS), also known as the Global Positioning System (GPS), does not provide reliable localization for navigation of these ground-based within-row robots, even with the use of augmentation protocols such as Real-Time Kinematic (RTK). To overcome this challenge, algorithms and systems are required that are capable of row-following using other exteroceptive sensors such as LiDAR and cameras37,38,39,40,41. Autonomous robots leveraging such algorithms can ensure reliable under-canopy in-row navigation, and can handle obstacles in real-world fields, such as weeds, ditches, ruts, etc. However, prior to this study, such autonomous robotic systems were not deployed at scale sufficient to establish their use in modern breeding and agricultural product development pipelines.

Our objectives were (1) determine whether autonomous ground-based robots could be used to collect phenotypic data at a scale required for a large commercial corn breeding and research operation, (2) assess whether multi-sensor data collected from underneath the maize plant canopy by these robots could accurately estimate phenotypic traits that are commonly collected in maize hybrid development and associated with grain yield, and (3) investigate GxE and GxExM interactions for grain yield in different experimental designs to gain new insights into how maize hybrids respond to variation in nitrogen supply, planting density, and precipitation during the growing season.

Results and discussion

Small under-canopy phenotyping robots have high deployment potential

Between 2019 through 2023, 3–5 TerraSentia robots (EarthSense, Inc. Champaign, IL) per year were deployed by multiple individuals located in various parts of the United States and Canada (Fig. 1C). We achieved a collection of multi-sensor data from 198,249 maize plots (experimental units, or EUs) (Fig. 1B) (5.4–9.4 m length by 1.52 m width). The data were collected across 142 unique locations (Fig. 1D), an average of three to four times each, resulting in 418 unique location-by-time data collections. The compact size (Supplementary Fig. 1B) and limited weight of the TerraSentia robots, relative to other ground-based autonomous robots30, was one of the main factors that allowed for broad deployment across many locations and across experimental units within a location. In addition to size, operators reported continued improvement during the study years of greater robot reliability (i.e., maintain wi-fi connection to tablet, maintain cool operating temperature, hot-swapable batteries), and a simpler user interface (Supplementary Fig. 1C) to engage more operators (based on a survey of operators).

Critically important to high-throughput data collection was the significantly improved autonomous navigation. A typical daily collection could accumulate to a total driving distance up to 11,000 m. Beginning from 27 m in 2019, the average distance between human interventions increased to a peak of 3629 m of unassisted autonomous navigation between human interventions by 2023 due to improved algorithms37,38,39,40,41. Overall, the learning-based methods for autonomy exhibited a large level of generalizability once trained on a sufficiently large and diverse dataset at the beginning of the study. The autonomy models require very little retraining or updating across years. Robots did not need to be retrained when they were used at a different location and the same autonomy algorithm was used at all locations within a year. Model improvements and robot retraining occurred during the part of the year when no crop phenotyping occurred. This demonstrated for the first time the feasibility of under-canopy robots for high-throughput phenotyping and other applications across a diverse set of environments using row-following navigation that was not GPS guided.

From the imaging and LiDAR data, plant height, stem diameter, and leaf area index or ear height were calculated for each of the 132,176 experimental units, delivering nearly 400,000 individual phenotypic trait measurements. During the multiple years of data collection, the number of experimental units with video data increased 24× and location number increased 20× (Fig. 1D). A requirement for any high-throughput system to be useful is the collection of high-resolution images that facilitate computer vision and machine learning42. Here we note the marked improvement between 2019, when traits were estimated for 982 (36%) of the 2731 experimental units with multi-sensor data, to 2022 when 98% of the 67,148 experimental units with multi-sensor data successfully delivered at least two traits. The dataset collected in 2022 exceeds by an order of magnitude the number of phenotyped experimental units from field-grown maize using UAVs22, and by two orders of magnitude the largest phenotyping effort with UGVs26,43.

Case studies for high-throughput phenotyping in agriculture

From this large set of robotically collected data, we have selected five case studies to demonstrate the capability of the TerraSentia phenotyping platform to deliver high volumes of phenotypic traits. These traits (1) replace commonly manually collected traits and (2) provide new opportunities to collect difficult phenotypic data to inform GxExM in the context of a commercial maize breeding and agronomic research program.

Case study I. Replacing common, but manual trait collection in a maize breeding program

A commercial program evaluates hundreds of crosses at multiple locations each year12, with varying levels of biotic and abiotic stress, with the goal of finding high performing genotypes with acceptable traits. The height of the maize ear is a trait of particular interest due to correlation with lodging risk44. Previously this trait was collected manually using a measurement stick and data collection on a few plants within each experimental unit45. This method requires at least one individual to enter each experimental unit and measure from the soil surface to the nodal attachment point of the ear, typically for one to four plants, followed by manual data entry. This process requires approximately one to two minutes of human effort per experimental unit. It is important to note, it is not possible to measure maize ear height using above-canopy imagery collected with drones, let alone aircraft or satellites.

With the TerraSentia robots, we were able to deliver ear-height measurements for 4826 experimental units across five research locations (Table 1). The ear-height algorithm detects the ear, determines upward or downward orientation, and then identifies the node attachment point to the stem. The automated measurement of ear height in this experiment was achieved with approximately 20 h of in-field data collection, a significant reduction compared to an estimated 150 h of human labor that would have been required to complete equivalent manual measurements. TerraSentia reported ear height ranged from 59 to 152 cm depending on the location (Table 1). Ear-height values are comparable to those obtained from manual measurements of ear height collected from genetically diverse maize populations grown over multiple years and locations in the US45,46. A within-location heritability estimation was calculated for each of the five locations utilizing the best linear unbiased estimates (BLUEs) of germplasm effects by trial and ranged between 0.43 and 0.67 demonstrating a strong genotypic response to the trait among the various locations. One experiment was common across three locations and produced a heritability estimate of 0.66. By collecting variable ear-height data across a large number of locations and genotypes, we show that the robot can feasibly replace manual data collection for an important agronomic trait like ear height and produce data with high heritability both within and across locations that can be selected upon by plant breeders.

Table 1 Descriptive statistics and heritability for maize ear height collected in a pre-commercial breeding program using the TerraSentia robot at five locations in 2022 in the Midwest USA

Case study II. Multi-factor experiments using component trait phenotyping

Agronomic management choices such as planting density and nitrogen rate are ways farmers can optimize crop productivity and profitability at the field and farm level. Small-plot research trials are leveraged to study multiple management factors in multiple combinations to understand the main effects and potential interactions. In this experiment, six hybrids were evaluated at one location in four replicate plots, in each combination of four levels of N fertilizer and four planting densities, for a total of 384 experimental units. The experiment was established to estimate grain yield from these treatment combinations and use the TerraSentia robot, operated by one individual, to enable a deeper characterization of how crop morphological traits responded to the treatments.

Nitrogen rate and plant density interacted, exhibiting greater leaf area index (LAI) (p ≤ 0.01) and lower plant height (p = 0.01) as plant density increased across treatments (Fig. 2A). While LAI was greater with higher N rates the degree of response from LAI to N rates was lower for the lower plant density treatments (Fig. 2B). Compared to LAI, plant height was greater with increasing plant density and interacted with N rate (Fig. 2C). Both plant height and LAI in maize have been previously reported to respond to nitrogen rate and plant density16,17. Here, we observed that these traits showed distinct responses to plant density in the unfertilized compared to fertilized plots, with plant height being progressively reduced by each increase in plant density, whereas LAI was maximized at a density of 9.3 plants m2. The stem diameter was smaller at greater plant densities regardless of N treatment (Fig. 2D), in line with prior observations16,47.

Fig. 2: Plant canopy traits collected shortly after silk appearance for an experiment with four N rates, 4 plant densities, and six hybrids.
figure 2

Three traits were analyzed and the ANOVA indicates level of significance of main effects and interactions (A). Change in each trait relative to increasing plant density is reported along with the standard error for LAI (B), Plant Height (C), and Stem Diameter (D). Letters used within (B) and (C) indicate significant (p ≤ 0.05) separation of means for N rate within a given planting density and in (D) for separation among planting density. Data collected during 2022 near Windfall, IN.

The TerraSentia platform offered a dramatic increase in efficiency at which LAI and stem diameter were measured from these 384 experimental units. On average, multi-sensor data was collected on 250 EUs h1. Approximately 1.6 h was spent by a single person with a robot to collect multi-sensor data that, after processing, provided measurements of stem diameter, LAI, and plant height. This compares to the approximately 192 h of human capital that would have been required to collect these traits manually (30 min per experimental unit) across three separate operations: (1) remove leaves from four random plants and scan through a leaf area meter in the lab, (2) collect plant height with a measuring stick on four plants, and (3) measure stem diameter on ten plants with calipers. Simultaneous measurement of these three traits has already been demonstrated to predict biomass of sorghum43 and could further be applied to maize phenotyping.

A robotic phenotyping platform thus makes a number of significant contributions to small-plot, multi-factor research trials. First, the number of treatments can be increased, as the desired intensity of phenotyping can be applied to the entire experiment rather than being limited by the human labor available for phenotyping. Second, additional locations can be phenotyped to measure the treatment-by-location interactions that are critical for broad applicability of research results. Labor constraints limit manual data collection to a few individual plants per plot, whereas the robot collects data from most, if not all, of the plants in each experimental unit.

Due to the number of people taking measurements or collecting samples, there is also a substantial opportunity for error or variation between data collected by the individual people, limiting the overall data quality and potential usefulness. As evidence for this claim, we calculated the coefficient of variation (CV) for human and robot-measured traits used for trait validation. For three traits, plant height, ear height, and stem diameter, the CV was similar (less than 4% difference between robot and human measured) while for LAI the CV was reduced by 16% using the robot compared to human measurements (Supplementary Fig. 2 and Supplementary Table 1). Identical robotic phenotyping platforms deployed across locations remove many of these measurement errors and facilitate the potential to combine datasets across locations, seasons, and across research groups in meaningful ways.

Case study III. Phenotyping genetic diversity for N-response traits

Breeding programs often use managed environments for evaluating germplasm responses to key drivers of productivity, such as soil nutrient supply and water availability. Managed environments enable scaling of precision phenotyping to reveal genetic variation under stress conditions that may occur in the target population of environments12. The University of Illinois maintains a field location where the field soils have been managed to limit nitrogen supply in an otherwise high yield potential environment. This field site has previously enabled evaluation of nitrogen-responsive phenotypes in maize at population scale22,48. The TerraSentia robot was deployed to phenotype replicated plots of at least 44 experimental hybrids in both 2022 and 2023, where each hybrid was grown with either no N fertilizer or 200 kg ha1 of N side-dressed at the V3 growth stage. In 2022, data were collected at two times post-flowering (July 22 and August 17) from 312 plots and in 2023 at the R2 growth stage (July 25) from 480 plots.

Case Study II indicated that LAI was the only robot produced trait that showed a significant hybrid by N-rate interaction (Fig. 2A). Figure 3A shows the variation in LAI values for the 44 hybrids grown in 2023. Among the 217 total N-response plots, LAI values were greater with nitrogen application for all but three, making LAI measurements an accurate classifier for N treatment regardless of genotype. Each hybrid displayed a positive average N response for LAI, which ranged from 1.12 to 1.47. Comparisons of LAI values for the six hybrids with the lowest (<1.2) or highest (>1.48) N response indicate the most responsive hybrids have both lower LAI at low N (mean 3.64 versus 4.01) and higher LAI at high N (mean 5.28 versus 4.83).

Fig. 3: Robotic phenotyping from a managed soil-environment enables assessment of maize nitrogen response.
figure 3

A Nitrogen response of leaf area index (LAI) for 44 maize hybrids collected from three replicate plots in Urbana, Illinois on July 25, 2023. B Correlations among the traits collected by the robot at multiple times during the 2022 and 2023 seasons, manual measurement of ear and plant height, and the component traits for nitrogen use efficiency. C Regression between stalk biomass predicted from robotically collected traits and manual measurements from the same plots collected in 2022.

Figure 3B shows the results from correlation analyses among the robot-collected traits at different times over two seasons, ground-truth data for plant height and ear height, and component traits for nitrogen use efficiency (NUE) measured at physiological maturity. As indicated from Fig. 3A, among all factors considered, LAI has the highest positive correlation with N rate. Although significant in both years, the stronger correlation between LAI and N rate in the July 2023 dataset compared to August 2022 dataset likely reflects LAI values being collected near their peak values before the onset of nutrient remobilization and leaf senescence. The moderate correlation between LAI values collected in August 2022 and July 2023 for the same hybrids indicates some consistency across seasons, although the rank correlation among genotypes was not statistically significant. July 2023 LAI values were positively correlated with each of the NUE component traits known to increase with N supply and negatively correlated with nitrogen utilization that decreases with fertilization. Despite being closer in time to final harvest, the August 2022 LAI values showed only a moderate correlation with only grain yield, suggesting that canopy photosynthetic capacity earlier in the season is driving these relationships.

In this experiment, LAI was not correlated with either ear or plant height, which is consistent with LAI capturing spectral rather than architectural features of the crop canopy. Trait values collected with the robots were highly correlated (r2 > 0.93) with ground-truth measures for both ear and plant height from a subset of the 2022 plots, further demonstrating the accuracy of the onboard algorithms to derive these traits when challenged with a wide range of phenotypic variation due to the interaction of both genetic diversity and N supply. Furthermore, as anticipated for height traits with high heritabilities31, robot phenotypes delivered by TerraSentia showed strong correlations across measuring times and seasons. Among NUE component phenotypes, ear and plant height only showed a significant positive correlation with stalk biomass, which increased with data collection later in the season, nearer to final harvest. The stronger correlation of ear height over plant height for stalk biomass is likely due to the higher proportion of stalk biomass accumulated below the ear node compared to above the ear.

The relationships between robot-measured traits and human-measured stalk biomass indicate the potential to measure stalk biomass in an automated and non-destructive manner. In addition to height and LAI, the robots also delivered average stem diameter for each plot. Stem diameter was found to be correlated with LAI, but not to height, and in July 2023 both LAI and stem diameter showed very similar associations with NUE traits. A noted concern was the substantial reduction (25.3%, Supplementary Table 1) in the heritability measurement for stem diameter compared to manual measurements. However, measuring stem diameter in a high-throughput and automated fashion opens the opportunity to (1) collect the measurement multiple times to reduce variation and increase heritability measurement and (2) leverage allometric measurements for estimating total plant biomass15. Figure 3C illustrates how a linear regression model including ear height, plant height, and the interaction term provides a good prediction for genetic variation in stalk biomass for both fertilized and unfertilized plots from the 2022 experiment. The robot traits were collected with a single robot in under four hours with one person, and much earlier in the season, compared to approximately 100 person-hours spent on measuring stalk biomass. The estimated broad-sense heritability for stalk biomass in the 2022 experiment was 0.89 when measured manually and 0.76 when predicted from the TerraSentia robot trait model, indicating that the 25-fold reduction in effort with automated phenotyping for stalk biomass may be achieved without a major loss in potential genetic gain.

Case study IV. From research trials to on-farm system evaluations

Genotype selection coupled with management choices and environmental events define the outcome in a farmer’s field. This interaction is best understood by phenotyping the G x M decisions within the relevant field environment4; however, this is difficult due to field size, accessibility, and heterogeneity of the field to determine where trait collection should occur.

We were able to cover large transects of the field (Fig. 4A) to capture the influence of soil type (E) on the phenotypic traits, which were expected to primarily respond to nitrogen rate (M) and hybrid (G). Grain yield and LAI were different between the two states and among N rates but were not different (p > 0.05) between the two hybrids or for any of the interactions between state, N rate or hybrid (Fig. 4C). The LAI trait measurement from the Nebraska and Indiana fields delivered several useful insights. Canopy size was greater at the irrigated Nebraska location compared to the Indiana location for all N rates (Fig. 4B). Specifically of interest was the 0 kg N ha1 rate as this represents the potential for the soil to provide N to the crop via available NO3-N or through the process of mineralization. As expected, LAI was greater with higher N rate but did not respond to N rates greater than 134 kg N ha1 at the Nebraska site, whereas 202 kg N ha1 was needed to reach a maximum LAI value at the Indiana site. Grain yield was linearly correlated to LAI (Fig. 4D) and did not exhibit any saturation effect.

Fig. 4: Data collection from an on-farm, multi-N, multi-hybrid experiment on 6 July in Indiana and 22 July in Nebraska during 2021.
figure 4

A Layout of multi-N treatments across a farmer’s field in southern Indiana. Colors indicate the N rates of 0, 67, 134, 202, and 269 kg N ha1 applied at planting. The length of each experimental strip is provided for reference. B Leaf area index and grain yield produced at the field trial shown in (A) and a similarly-designed trial in Nebraska. Fitted trait responses were significantly different between the two locations for both grain yield (p-value < 0.001) and LAI (p-value of 0.0085). C ANOVA for yield and LAI response to State, Nitrogen, Hybrid, and interaction effects. D Regression analysis supports a strong linear relationship (R2 0.957, p-value < 0.001) between green leaf area index and grain yield using N treatment and hybrid mean values from the Indiana and Nebraska locations.

Given the size of these experiments (740 m long in Indiana and up to 1330 m long in Nebraska; Fig. 4A) collection of LAI, stem diameter, and plant height would have been economically and physically infeasible with conventional human measurement methods. Greater leaf area index, a trait strongly sensitive to the management and environment specifically through availability of N and water, brings the potential benefit of greater N storage within the plant biomass18 which is expected to be advantageous later in the season as a reserve of N to complete grain filling. However, greater LAI could potentially come at a cost if water is limited and the greater leaf area results in a transpiration rate in excess of water available in the soil. Ability to non-destructively and efficiently phenotype LAI throughout the season provides insight to canopy expansion as a result of water and N availability and enables a calculation of grain yield production efficiency relative to investment in leaf area.

Case study V. Deeper understanding of hybrid evaluation trials with previously elusive traits

Interactions between genotypes and the environment result in the phenotype present at a particular location3. Utilization of multiple locations provides additional information on the GxE interaction. A particularly important environmental factor known to influence GxE is precipitation. In 2022, LAI, plant height, and grain yield were collected for a set of pre-commercial hybrids in Ontario. Rainfall in July and August was below the 20-year historical average at each of these locations but was more limited at Atwood and Hensal compared to Harriston. Traditional methods of LAI data collection are very labor intensive making collection of LAI in large maize breeding programs impractical. The robot offered the novel opportunity to collect LAI for multiple research locations multiple times throughout crop development, allowing us to understand how a change in LAI through the maize lifecycle is affected by the environmental conditions and consequently enable the investigation of genotypic variation in this response between pre-commercial lines.

There was a notable difference in LAI among locations. The pre-commercial population at Harriston had an average LAI of 6.5 while maximum LAI was 5.25 at Atwood and 5.15 at Hensall (Fig. 5A). Plant height was lower at the sites that received lower precipitation with an average height of 226 cm at Harriston versus 212 cm at Atwood and 208 cm at Hensall (Fig. 5B). Yield was lower as well, with Harriston averaging 14250 kg ha1 while Hensall and Atwood averaged 12250 kg ha1 each (Fig. 5C). Principal Components Analysis (PCA) indicated a strong correlation between grain yield and maximum LAI as well as with LAI at R5 (Fig. 5D). Greater LAI at the Harriston location indicated the potential for greater solar radiation interception as well as more N storage in the leaves to support kernel filling through remobilization. Notably, LAI has not been reported as a phenotype under active selection in breeding trials. Only recently has LAI been collected via UAS during the growing season on a breeding population10. In this novel work, they demonstrate how QTLs associated with LAI explain a substantial amount of the yield variation, indicating the potential benefit of using LAI as a selection index in maize breeding. Expanding LAI phenotyping to multiple locations across a more diverse set of hybrids and trials will provide insight into how a specific genotype achieves observed grain yield and enable its future use as an informative selection metric.

Fig. 5: Traits collected from a pre-commercial hybrid experiment conducted at Harriston, Atwood, and Hensal, Ontario, Canada during 2022.
figure 5

A Robot-measured Leaf area Index (LAI), B Robot-measured Plant Height (PHT), and C Combine Grain Yield reported for a pre-commercial breeding trial conducted at locations near Harriston (green, greater precipitation) and Atwood and Hensal (orange and blue, respectively; less precipitation), Ontario during 2022. D Principal Component Analysis shows the distinction of the Harriston plots versus similarity between Hensal and Ashworth plots LAI (measured at two weeks after silk appearance to obtain maximum LAI as well as at the R5 growth stage to capture remaining LAI at the end of the growing season) and grain yield.

Robotic platforms that facilitate multi-time point data collection during the season provide the opportunity to define rates of canopy expansion and biomass accumulation as well as end-of-season canopy loss. Measurement of LAI and plant height during the season provides an indication of when stress is beginning to occur and to what extent. Identifying the times when stress occurs is useful for grouping locations with similar stress patterns into a combined multi-location analysis49. These time course trait measurements are potentially the phenotypic descriptors to inform “enviromics” to enable further understanding of the genotype-by-environment interactions.

Screening a large breeding population is required to leverage the genetic diversity available for crop improvement50. A critical component to the success of high-throughput phenotyping across large populations is the capacity not only of the phenotyping platform to collect data in a high-throughput manner, but also the capacity to transfer these large datasets for processing, extract meaningful trait information from imagery and other sensor data, and compile and publish the trait data emerging from the phenotyping pipeline to enable downstream analyses8. The objective for high-throughput phenotyping is to create a virtuous cycle that automates crop phenotyping, trait computation, data analysis, and selection followed by the next season of phenotyping42.

Future applications: automating “old” traits, developing new traits, and increasing scale

Deployment of the TerraSentia robots at multiple locations with different types of GxExM experimental designs between 2019 and 2023 demonstrates the potential to deploy ground-based autonomous robots at sufficient scale to provide data at geographic scales that match data generated by other sources, such as UAV, combine harvesters, as well as from genotyping. Both UAV and ground-based robots can be equipped with imaging cameras and LIDAR, collecting similar data types but from different perspectives, either above or below the canopy. Both methods perform very well for plant height and thus appear to offer complementary information. However, for traits like ear height that are best collected from observation within rather than above the maize canopy, the ground-based robot offers advantages, particularly when ear-height estimates are collected in conjunction with other traits. Figure 3C illustrates one example of how we combined the estimate for ear height with another data type (stover biomass) in a small training set to develop a model that could be employed to predict stover biomass among additional hybrids grown in the same trial.

Crop scientists are entering an era of phenotyping where multiple methods will be available to phenotype the same trait of interest. For example, a digital caliper will still be available for stem diameter, whereas now the option of an image-based method will also be available. This situation is similar for LAI where destructive, handheld sensors, drones, and now in-canopy robots can all provide a trait measurement. When the new measurement method is developed and validated to be equivalent to the previous methods, the ability to “swap out” methods based on logistic considerations or availability of technology or labor should be seamless to the user and allow datasets to be built from multiple sources of trait measurements. Robotic, in-canopy trait collection opens an entirely new area of research into the linking of these in-canopy traits to marker association, crop selection, and predictive breeding to continue to investigate and elucidate GxExM effects. Additionally, we demonstrate that the trait measurements derived from the in-canopy multi-sensor data are agronomically relevant and useful to understand crop growth, crop stress, and genetic diversity.

Comparison of the data collected via human measurement on multi-rate N trials or plant density trials with the robotic data shows that there are strong similarities in the response of plant height, stem diameter, leaf area index, and ear height estimated by both methods. Each of these traits has historically been collected at lower coverage through less efficient manual methods. Moving forward, the potential exists for phenotyping other rarely studied traits captured via in-canopy sensing, such as the influence of brace root characteristics (size, width, origin node, diameter of ground coverage) on lodging risk51. Automated counting of tillers that form around the base of the main stem of maize will also become a phenotypic trait potentially scored for multiple hybrids at multiple locations to determine their relationship to grain yield52. The list of novel traits that will become routine will continue to increase with additional sensors, robotic platforms, and data analytics.

Over the four years of operation, TerraSentia robots navigated along each corn row autonomously with a user simply following the robot to facilitate any crash recoveries and turning at the end of the field to begin the next pass of data collection. Future capabilities, currently under development and testing, will bring reliable full-field autonomy to data collection. The robots will navigate through the field autonomously, staying between the crop rows and recovering independently when an obstruction occurs. Additionally, the robots will turn at the end of each pass autonomously by navigating between GPS waypoints prior to starting the next pass of data collection. Human involvement at this point will be minimal outside of delivering the robot to the field and exchanging batteries. When the potential to “swarm” robots becomes a reality, data collection rates can greatly increase as each robot is assigned a particular section of the field as their mission.

The next generation of phenotyping traits will be delivered by autonomous, in-canopy robots. Through their capacity to digitize traits critical to crop performance, at high densities of measurement in both time and space and on many genotypes, phenotyping robots will contribute significantly to increasing grain yield, reducing yield gaps, and optimizing agronomic crop management. This technological advance will enhance both the productivity and sustainability of agriculture and the future of food production.

Materials and methods

A ground-based robot, capable of driving between 0.76 m spaced maize rows (Fig. 1A) was developed by EarthSense, Inc. (Champaign, IL) beginning in 2017. A collaborative effort between Corteva Agriscience (Johnston, IA) and EarthSense began in 2019, and a collaboration with the University of Illinois (Champaign-Urbana, IL) and EarthSense started in 2021 to determine whether phenotypic traits important to maize growth could be measured from in-canopy multi-sensor data collected on this robot, named TerraSentia.

TerraSentia is a configurable, terrestrial robotic platform that autonomously performs under-canopy data collection, running between rows of maize or other row-crops. On board the TerraSentia, a number of sensors are mounted including three cameras for RGB video capture, lidar for 3D point clouds, fisheye lens cameras for hemispherical photography and depth cameras that provide RGB with depth information. The robot can complete an experimental unit (5.3 m long) in about 6 s capturing data for every plant in the unit and autonomously runs for typically 180 m without manual intervention (i.e., user takes control of the robot using a tablet to then drive the robot) once the user has placed the robot in-between the tows and started the navigation. Additional information, specific to the TerraSentia robot and its method of navigation within a plant canopy, are further described by the following publications37,40,41,53,54.

The TerraSentia robot is capable of autonomous row-following, meaning it can effectively navigate long distances down rows of crops with minimal input from the user. Row-following is enabled by a combination of LiDAR sensors, cameras, and GPS mounted on the robot (Supplementary Fig. 1A). The onboard computer processes the data from these sensors to map the locations of the crop rows on each side and the orientation of the robot relative to the rows. This information about the robot and its surroundings is then used for path planning and decision making.

The TerraSentia robot is equipped with three adjustable cameras, one on the front and one on each side. The unit is also equipped with a rear LiDAR sensor for scanning the regions above and to the side of the robot. Depending on robot configuration, the top of the robot can be equipped with an upward-facing camera with a fisheye lens for LAI data collection, or a front-facing depth camera angled upward at 25 degrees. These sensors are used for data collection and trait estimation. Additional information regarding the development of methods for measurement of stand count and stem width in a highly cluttered field environments can be found in the following publications specific to the TerraSentia robot55,56,57.

The robot is configurable and maneuverable with a user-friendly tablet interface (Supplementary Fig. 1C). Details about a data collection mission, such as the dimensions of the field and the planting date, can be specified by the user. Furthermore, the camera angles, resolutions, fields-of-view, and frame rates of the cameras can be programmatically customized for specific traits. These features allow the robot to simultaneously record data for at least five distinct traits (e.g., stem diameter using the left-side camera, plant count on the right-side camera, LAI using the hemispherical lens on top of the robot, plant height and canopy architecture information using the rear lidar) at a given time, with accurate metadata for future reference.

Data processing

Videos and sensor data are collected by the robot along with metadata (columns, ranges, crop, desired trait, etc.) input from the user are uploaded to the cloud after the robot returns to a base location with stable electrical power (to recharge the robot) and internet connectivity (not in the field). The videos are pre-processed with automatic checks for quality and integrity and run through an automatic plot splitting algorithm (Supplementary Fig. 1D). The automatic plot splitter uses a deep learning model to look for periodic gaps visible in the lidar scans of the canopy, gaps which occur between the experimental units (Fig. 1B) being scanned. These outputs of the automatic data verification and plot splitting processes are manually reviewed to ensure data quality (e.g., an error in the plot split can lead to catastrophic “off-by-one” errors in the alignment of plot results). Alignment of the split plots with their associated metadata (column/range) is compared to GPS coordinates to further verify plot assignment. The multi-sensor data are then fed into the appropriate algorithm, the raw results per scan/frame are aggregated and post-processed into the final estimates of the desired traits for each experimental unit, which are then reported back to the user.

The plant-height algorithm uses the point cloud of the vertically scanning lidar to extract the highest measured points as the height of the maize plant58. This is calibrated to correspond to the tassel height in maize. Tassel height and uppermost leaf height measured manually generally correlate extremely well, so validating versus uppermost leaf height is also effective.

Stem width uses RGB imagery combined with distance-to-plant-row estimates to measure the diameter of the maize stems. The stems are detected using Mask R-CNN image segmentation algorithm. Stem widths in pixels are extracted at several sampling points on each vertical stem. These pixel widths are converted to real widths using the distance to stem, estimated from the robot’s lidar and odometry data59. The correlation to manual data can be complicated by the millimeter scale measurement error and limited measurement window (young stems tend to have many leaves blocking the stem, and later stage maize tends to have drooping leaves which block the stems).

The ear-height algorithm processes data gathered by an RGB + Depth sensor. A machine-learning-based object recognition algorithm detects maize ears and then uses the depth information to estimate geometric properties of the maize ear in image space. The geometry of the maize ear is then adjusted by accounting for the location and orientation of the camera, which are informed by IMU sensors in the robot and in the camera itself. From this adjusted maize ear geometry, the height of the maize ear can be computed. The reported ear height averaged over a number of ears in the plot correlates well with manual measurements of ear height (Supplementary Fig. 2A).

The LAI algorithm processes data gathered by an upward facing camera with a fisheye lens capturing hemispherical imagery. The algorithm identifies every pixel corresponding to blue light penetrating through the canopy from the sky (blue light is the opaquest wavelength for plants) and then estimates LAI from the distribution of sky pixels using Miller’s integral formula60. This method is similar to sensor readings provided by commercial LAI instruments, such as the Li-Cor LAI-2200C (Lincoln, NE). This method has also been verified by20 and specifically for maize20,61.

Validation dataset collection

For each of the traits currently measured from multi-sensor data collected by the autonomous robot, a specific set of ground-truth data was collected manually. For each trait, a correlation analysis was conducted using R (R core team). This analysis provided the linear regression equation describing the relationship between measurement methods, the amount of variation explained by the analysis (R2), and the significance of the correlation between rover and human-based measurements. In addition, the CV for each trait by measurement method was calculated as standard deviation divided by the mean to determine the variation of the trait measurement method.

Stem diameter was measured manually using a digital caliper at the R1 (silk appearance growth stage)62. The caliper was placed around the stem in the same direction as the maize row and the diameter was recorded to the nearest tenth mm. For stem diameter, 44 experimental units having between 10 and 40 plants were measured with every plant contributing a measurement. Measurements using the TerraSentia robot were well correlated (p < 0.001) with human-acquired caliper traits with an R2 value of 0.89 and a slope of 1.02 (Supplementary Fig. 2B).

Plant height was measured manually using a tall measurement pole at the R2 (10–14 days after R162) growth stage. The height was recorded to the top of the tassel and the top leaf attachment point (leaf node) from the soil level for 27 experimental units having about 40 plants with every third plant measured. The correlation between the manually measured tassel height and leaf node height was extremely high (R2 = 1.00) with the leaf heights being slightly lower than the tassel heights (the robot height measurements are closer in height to the tassel measurements than the leaf node measurements). Measurements delivered by the TerraSentia robots, using LiDAR data, were well correlated (p < 0.001) with human-acquired plant-height measurement with an R2 value of 0.97 and a slope of 0.97 (Supplementary Fig. 2D).

Ear-height measurements were validated on 79 experimental units having approximately 40 plants each. Manual measurements were carried out at R5 (kernels are dented on the ear62) with every third plant measured manually. The distance from the soil surface to the ear attachment point on the stem (ear node) was recorded using the same pole used for plant height. Measurements delivered by the TerraSentia robots were well correlated (p < 0.001) with human-acquired plant-height measurement with an R2 value of 0.87 and a slope of 0.83 (Supplementary Fig. 2A).

Leaf area index was validated by recording data in each of the 50 experimental units approximately two weeks after R1, to collect imagery with a 720p camera with a hemispherical lens. Following robotic data collection, the following process was conducted for manual LAI measurement. First, the length of rows on either side of the robot were measured to the nearest cm. The row spacing was 76 cm and thus the exact area could be calculated. Following row measurement, the number of plants in each row was counted to establish the plant density. Following row measurement and plant counting, all of the green leaves, that were orientated upward ≥90°, were removed from four randomly selected plants (two from each row). These leaves were placed in a large plastic bag and transported to the lab. At the lab leaves were run through a LI-3100C leaf scanner (Li-Cor, Lincoln, Nebraska, USA). This machine determined the total leaf area in square cm for the four plants. This value was divided by four to determine an average leaf area per plant and further divided by 10,000 to produce leaf area per square meter. Leaf area per plant was multiplied by plant density to calculate leaf area index (leaf area m2 land area m2). A total of 50 experimental units were used to validate the TerraSentia LAI measurement. The correlation between robot-measured LAI and human-measured LAI was significant, but the R2 value was lower (0.5) than the height and diameter traits (Supplementary Fig. 2C).

Case study 1

As part of the commercial breeding program of Corteva Agriscience trials are grown at several locations to expose pre-commercial lines to environments and management to determine crop response prior to further advancement. Five locations near Griswold, IA, Prairie City, IA, Cameron, IL, Table Grove, IL, and Windfall, IN were selected. The TerraSentia robot, carrying the Real Sense camera, was used approximately three weeks prior to combine harvest to collect video images that would allow for ear detection and ear-height calculation. A human operator followed approximately two meters behind the robot to reorientate following any crashes and to save the videos at the end of each pass and turn the robot on the ends to start the next pass.

Data were analyzed by looking at genotypic performance for the trait of interest, in this case ear height, both within single locations and across locations. For by-location analyses, we generate the best linear unbiased estimates (BLUEs) of germplasm effects by trial using ASREML63. To generate BLUEs, we first fit the spatial model (AR1 by AR1) with genotype effects as random. Next, we set the genotype to a fixed factor along with the other variance parameters and rerun the model. For across location analyses we generate the best linear unbiased predictors (BLUPs) using ASREML. In this model, the location effect is considered a fixed effect, the genotype and the genotype-by-location interaction effects are considered random, and the residual variance is fixed at 1 and relies on the weighting for each genotype from the BLUE analysis. One experiment was carried out at three locations (Cameron, IL, Table Grove, IL, and Windfall, IN locations) and was used for the BLUP analysis (Table 1). Heritability is calculated as the ratio of the genotypic variance to total phenotypic variance using equation (1) where \({\sigma }^{2}\) refers to the variance of the genotype and \({\scriptstyle{1}\atop}v_{\Delta }^{{-BLUP}}\) is the average standard error of the genotypic BLUPS64.

$${H}_{{Cullis}}^{2}=1-\tfrac{{\scriptstyle{1}\atop}v_{\Delta }^{{-BLUP}}}{2 \, * \, {\sigma }_{g}^{2}}$$
(1)

Case study 2

A multi-treatment, replicated, small-plot (5.4 m long by 3 m wide) trial was established at a location near Windfall, Indiana in a randomized complete block design. Treatments included a whole plot treatment of N levels 0, 112, 224 and 336 kg N ha1, a split-plot treatment of 4.4, 6.9, 9.4 and 11.9 plants m2 and a split-split plot treatment of six Pioneer (Corteva Agriscience, Johnston, IA, USA) hybrids. This trial was planted on 20 May 2022 using a custom designed vacuum planter (Almaco, Nevada, IA, USA) and the nitrogen treatments were applied on 24 May 2022 using urea-ammonium-nitrate (UAN). The UAN was injected approximately 20 cm from each row of a four-row experimental unit.

On 1 July, 29 July and 18 August the TerraSentia robot was used to collect RGB imagery and LiDAR data on all 384 experimental units. The left camera was set at −15° to image the plant stems, the right camera was set a −30° to image the soil to plant interface, the front camera was set a 0° to collect basic plot survey data and the upward facing hemispherical lens was at 90° to collect video for leaf area index. The robot was operated in row-following mode with an operator following approximately two meters behind the robot to reorientate following any crashes and to save the data at the end of each pass and turn the robot on the ends to start the next pass. Data from the 29 July collection were selected for analysis due to their close proximity (~2 weeks after R1 (silk appearance)) to when the canopy reaches maximum height and LAI. The analysis was conducted using Proc Mixed in SAS65. Fixed effects were N rate, Plant density, and Hybrid. Random effects were Replication × N rate and Replication × Plant density.

Case study 3

Maize hybrids were evaluated as described in ref. 48 within a nitrogen-responsive field site at the Crop Sciences Research and Education Center in Urbana, Illinois. The soil type at this field is a Drummer silty clay loam, pH 6.5, which provided 60–75 kg N ha1 in each year of the experiment based on both soil testing and measures of plant N recovery from unfertilized plots. Hybrids were grown in a split-plot design, where the same hybrid was grown in paired plots (2 rows 5.3 m long, 76 cm row spacing) receiving either no fertilizer or N fertilizer applied at a rate of 200 kg N ha1, as granular ammonium sulfate banded at the soil surface when the plants reached the V3 (3 leaf collars) growth stage. All hybrids were planted at a density of 86,000 plants ha1, with three replicates of each N treatment. Trials were planted on 7 May (2021), 12 May (2022) and 4 May (2023) using a GPS-guided vacuum planter (SeedPro, Almaco, Nevada, IA, USA).

Phenotyping with the robot was performed as described above for Case Study 2, with data collections on 1 September 2021, 17 August and 9 September 2022, and 25 July 2023. Ground truth measures of plant and ear height were made with a measuring stick on 5 plants per plot. Biomass and nitrogen concentrations of stalk plus leaves, cob and grain were determined from five plants of each plot harvested at physiological maturity (R6 growth stage, month of September) as described in ref. 48. Data analyses were performed in R Statistical Software66 (R Core Team 2021) using the corrplot67 and ggplot268 packages.

Case study 4

During 2021, a multiple N rate by multiple hybrid experiment was established at two grower fields. One field was located in northeast Nebraska near the city of Homer, and one field was located in Southwest Indiana near the city of Owensville. In each field there were two separate strips going through the field (Fig. 4A). In each strip the whole plot was five N rates of 0, 67, 134, 201, and 269 kg N ha1. N was applied using an applicator that was designed to apply UAN (28-0-0-0, N-P-K-S) between rows 1 and 2 and then 3 and 4 of a four-row experimental unit. The UAN was injected into the soil using liquid fertilizer coulters (Yetter, Cholchester, IL) to a depth of 7.5 cm and 20 cm from each row. The split plot was the hybrid (P1197AM or P1359AM (Corteva Agriscience, Johnston, IA)). This pattern continued to be replicated as many times as needed until the strip across the field was filled (Fig. 4A). The experimental unit size was 5.4 m by 3 m. The number of experimental units ranges from as few as 108 (584 m) to as long as 250 (1350 m) ranges.

Approximately two weeks after flowering (silk appearance) and pollen shed, the robots were used to collect leaf area index, stem diameter, and plant-height data. The robot was run through the experimental units at a constant speed of 0.3 m s1. The left camera was set at −15° to video the plant stems, the right camera was set a −30° to video the soil to plant interface, the front camera was set a 0° to collect basic plot survey data and the upward facing hemispherical lens was at 90° to collect video for leaf area index. Grain yield was captured using a specialized plot combine that measured the weight and moisture of the grain for each experimental unit and standardized the grain weight at 155 g kg1 moisture. Data were analyzed using Proc Mixed in SAS65 with State (Nebraska or Indiana), N rate and Hybrid treated as fixed effects. Random effects were N rate × Replication nested within the specific field strip, field strip nested within State, and Range nested within a specific field strip.

Case study 5

A set of pre-commercial maize hybrids were evaluated at three locations near the towns of Hensall, Attwood, and Harriston, all located in Ontario, Canada. The robot was run through the experiment between four and six times depending on the location during the season between the V10 (10 collared leaves) and the R6 (maturity) growth stages. At each time-point video images were collected using the hemispherical lens camera mounted on top of the robot. These videos were converted into a measurement of LAI. Plant height was also collected at each time point using the lidar mounted on the back of the robot.

From these time course data, the maximum LAI (achieved at R1, flowering, and maintained for approximately two weeks after flowering) could be determined for each location. An additional measurement, occurring on 23 September 2022, was common among locations and selected to represent potential loss of green leaf area as the crop senesced prior to harvest, which occurred 15–30 days later depending on the location. Weather data (rainfall and temperature) from 2022 were collected using weather station data within 10 km of the individual location. Grain yield was captured using a specialized plot combine that measured the weight and moisture of the grain for each experimental unit and standardized the grain weight at 155 g kg1 moisture. Raw LAI data were reviewed and values greater than three standard deviations away from the mean were considered outliers and removed from the visualization and analysis. Grain yield, plant height, and LAI at the R5 growth stage were subjected to a principal component analysis using R66. To conduct the PCA analysis, each trait was standardized by subtracting the trait mean from each individual trait value and dividing by the standard deviation to center each trait around zero.