Introduction

According to the statistics from the Food and Agriculture Organization, wheat is the third largest grown crop globally and is a staple food source across the world, alongside maize and rice1. Biotic stress from diseases and pests are one of the major contributors of crop losses across the globe2. Fusarium head blight (FHB) is one of the most serious diseases attacking the wheat crop and causing large yield and quality losses, predominantly in Europe and North America. This disease also contaminates the seeds with harmful mycotoxins like deoxynivalenol (DON), and leads to poor seed quality3. FHB resistance is quantitative in nature, having a complex genetic architecture that is greatly influenced by genotype by environment (G × E) interactions. Therefore, replicated field experiments with reliable phenotypic selection are necessary for selecting new cultivars with FHB resistance.

Climate change is another principal factor for the rise of disease epidemics, including more frequent and severe FHB epidemics. Both climate change and increasing food demand give rise to challenges in wheat breeding – new cultivars must be developed quickly to accommodate new conditions – both ecologically and economically. The global population is predicted to reach 9 billion by 2050, meaning that efforts must be quickly employed to meet the increased food demand. Traditional breeding processes are time and resource-consuming; it typically takes 10 to 12 years to develop a new cultivar, using labor intensive practices and costly methods. To accelerate breeding times and reduce costs, novel approaches have been and are being developed. The availability of high-density low-cost marker genotyping platforms makes genomic prediction and selection feasible. Therefore, Genomic selection (GS) can be implemented to predict breeding values of progeny lines without costly phenotyping, saving time and money; increasing the intensity of selection; and the ability of trait prediction4. Genomic prediction aims to utilize genetic information, such as single nucleotide polymorphism (SNP) markers to enable in silico prediction of the breeding values of new breeding lines based on their genotypic data. Coupled with state-of-the-art data analysis, including machine learning and multi-trait (MT) prediction models, GS can prove to be a potent tool for breeding, decreasing the necessity of costly manual phenotyping.

GS technology has already revolutionized animal breeding, and research on developing GS methodology for plant breeding began over two decades ago5,6,7,8,9,10,11,12,13,14,15. Simulations and implementations of GS in livestock breeding show that genetic gains can be doubled and sometimes tripled16. Estimated predictive abilities are still low for complex disease resistance traits using GS. For this reason, in the present study, FHB will be used as a case study to improve the prediction abilities of disease resistance in wheat by incorporation of G × E interactions and trait correlations in prediction models.

To practically apply the GS in a breeding program, a training population that has been phenotyped for the traits of interest and genotyped with a genome-wide dense marker panel must be created. The training population with the phenotypic and genotypic information is used to model a prediction equation (GS model) that predicts the marker effects of the traits. This GS model is then used to predict the Genomic Estimated Breeding Values (GEBVs) on the test data set, which has been only genotyped with the genome-wide markers but not phenotyped. These GEBVs are further used to select the best genotypes for breeding. Several different statistical GS models have been developed over the past 20 years and machine learning methods have also gained popularity. These methods are both parametric and nonparametric. Parametric methods include ridge regression best linear unbiased prediction (RR-BLUP)4 and Genomic BLUP (which assume a normal distribution of SNP effects), BayesA and weighted Bayesian shrinkage regression (wBSR, which assume a prior distribution of effects with a higher probability of moderate to large effects), and BayesB and BayesCπ (which assume that some SNP effects are zero). Nonparametric methods include random forest, reproducing kernel hilbert space (RKHS) or neural network approaches, which were compared in a study by Heslot and co-workers17.

GS models normally take the genetic markers’ information or pedigree relationships into account for predictions in both animal and plant breeding programs. However, in plant breeding, G × E interactions are an important factor to be considered in the GS prediction model. Several studies have already demonstrated that incorporation of G × E interactions into the prediction model can substantially improve predictive abilities18,19,20,21,22,23,24,25,26,27,28. Other information such as environmental covariates was investigated in one of the foremost studies19 and pedigree x environment interactions29 made better predictions in specific environments. Overall, the inclusion of G × E interactions in the prediction model along with other information of environmental covariates helped to obtain a substantial increase in the predictive abilities. Most of the aforementioned studies have evaluated uni-variate or single trait (ST) models, focusing on prediction of individual traits, whereas MT models make use of shared information from the genetic correlations between the traits30,31,32,33,34. A study on malting quality traits in barley31 reported an increase of 76 percent in predictive ability for grain protein content using MT-GS prediction models. Limited studies have assessed the potential of multi-trait multi-environment (MTME) models using real-world experimental data. Recently, MTME genomic prediction models were evaluated for their effectiveness in improving seed protein and yield in field pea, suggesting that MTME models could potentially improve prediction accuracy34.

Previously, a study on comparing marker assisted selection and GS for predicting traits associated with FHB resistance showed that GS captured more genetic variation, which increased prediction accuracy35. Another study highlighted the potential and practical utility of integrating GS into wheat breeding program for developing disease resistant wheat varieties36. Recently, a study utilized stochastic simulations to compare different breeding programs integrating GS and speed breeding and demonstrated the potential of GS in accelerating the breeding for FHB resistance37.

G × E interactions in plant breeding can in general be grouped in two forms: those based on predictable environmental effects, such as those due to soil type, fertilization regime, etc. and the interactions caused by the more unpredictable environmental effects, such as erratic weather conditions, that vary from year to year38. While the former can be directly exploited in a breeding program by developing targeted germplasm to the specific conditions, the unpredictable G × E is harder to deal with. FHB resistance in wheat belongs to the latter case, where the infection levels are largely affected by the weather conditions at anthesis. In general, the disease is favored by warm and humid conditions at this critical developmental stage for establishing infections3,39. Since weather conditions often change from day to day, the disease levels of genotypes in a particular field trial might be largely affected by the timing of their anthesis. In one trial it could be the ones with early anthesis that get more diseased, in another it could be the late ones. Such G × E effects contribute noise to the data and testing over multiple environments is needed for obtaining reliable results40. Although targeted breeding to exploit such G × E interactions is not feasible, we hypothesize that modeling of G × E effects can increase the predictive abilities for FHB disease resistance in genomic prediction models.

The objective of this study was to assess the possibility of enhancing genomic prediction for FHB resistance in wheat by incorporation of G × E interaction effects in ST and MT mixed models

Results

Trait correlations

The across environment means of FHB severity and DON content had moderate phenotypic correlations with high significance in both panels, with r = 0.64 (p < 0.0001) in the NMBU panel and r = 0.34 (p < 0.0001) in the GRAMINOR panel (Table 1). Anther extrusion (AE) was negatively correlated with DON (r = -0.53 and -0.53, p < 0.0001) and FHB (r = −0.58 and −0.45, p < 0.0001) with high statistical significance for both the panels. Plant height (PH) was negatively correlated with DON (r = −0.34, p < 0.0001), while Days to heading (DH) was positively correlated with DON (r = 0.38, p < 0.0001). The same trend was observed in the GRAMINOR panel: PH was negatively correlated with DON (r = −0.28, p < 0.0001) and DH was positively correlated with DON (r = 0.07, p < 0.0001). In the NMBU panel, FHB and DH are not significantly correlated and in the GRAMINOR panel, the correlations between FHB and PH, and DON and DH are not significant. But, FHB and DH correlation is significant (Table 1).

Table 1 Pearson correlations between across-environment means of different traits for a) NMBU panel and b) GRAMINOR panel

Correlations between the environments

Most environments of the NMBU panel were highly correlated with each other, except for FHB disease severity (−0.07) and DON content (0.22) between the 2019 field trials in Morden, Canada (2019_Morden) and Vollebekk, Norway (Fig. 1a, c).

Fig. 1: Correlation heatmaps between the environments.
Fig. 1: Correlation heatmaps between the environments.The alternative text for this image may have been generated using AI.
Full size image

FHB severity correlations for a NMBU panel and b GRAMINOR panel; and DON content correlations for c NMBU panel and d GRAMINOR panel.

For the GRAMINOR panel, environmental correlations were generally lower than for the NMBU panel, reflecting the lower genetic diversity in this panel consisting purely of elite Norwegian breeding lines41. However, DON content was moderately positively correlated between the field trials in 2021 in Vollebekk, Norway and Morden, Canada (0.62). Low correlations were observed for FHB and DON between the Vollebekk trials in 2020 and 2021 (Fig. 1b, d).

Prediction performance of STME and MTME models using CV1

Predictive abilities varied across models depending on the trait, as illustrated in Figs. 2 and 3 for NMBU and GRAMINOR panels, respectively. The EL (environment and line) model consistently underperformed across traits in all environments. Predictive abilities across traits indicated that the EG (environment and genomic information) model outperformed EL, while the inclusion of G×E interactions (EG&G×E) further improved predictive abilities in a few cases.

Fig. 2: Bar graphs showing the predictive abilities for traits AE, DH, DON, FHB and PH using CV1 for STME and MTME prediction models in NMBU panel.
Fig. 2: Bar graphs showing the predictive abilities for traits AE, DH, DON, FHB and PH using CV1 for STME and MTME prediction models in NMBU panel.The alternative text for this image may have been generated using AI.
Full size image

In the x-axis labels meaning, AE Anther Extrusion, DH Days to Heading. DON Deoxynivalenol, FHB Fusarium Head Blight, PH plant height. Legend labels show types of GS prediction models – EL environment and line, EG environment and genomic information, EG&GxE environment, genomic information, and genotype-environment interaction, GS prediction models in solid bars are STME - Single trait multi-environment models and in striped bars are MTME – Multi-trait multi-environment models.

Fig. 3: Bar graphs showing the prediction abilities for traits DH, DON, FHB and PH using CV1 for STME and MTME prediction models in GRAMINOR panel.
Fig. 3: Bar graphs showing the prediction abilities for traits DH, DON, FHB and PH using CV1 for STME and MTME prediction models in GRAMINOR panel.The alternative text for this image may have been generated using AI.
Full size image

In the x-axis labels meaning, DH Days to Heading. DON Deoxynivalenol, FHB Fusarium Head Blight, PH plant height. Legend labels show types of GS prediction models – EL environment and line, EG environment and genomic information, EG&GxE environment, genomic information, and genotype-environment interaction, GS prediction models in solid bars are STME - Single trait multi-environment models and in striped bars are MTME – Multi-trait multi-environment models.

Leveraging correlated trait information enabled multi-trait multi-environment (MTME) models to achieve higher predictive abilities compared to single trait multi-environment (STME) models. In the NMBU panel, the STME models exhibited lower predictive ability compared to the MTME models across all model types (EL, EG, and EG&GxE). For instance, in the NMBU panel the MTME model EG for AE achieved predictive ability of 0.48, compared to 0.33 for the STME model EG. A similar trend was observed for other traits, such as FHB (STME: 0.29, MTME: 0.32) and DON (STME: 0.41, MTME: 0.39) using the EG model. These results highlight the advantage and importance of incorporating correlated traits. The inclusion of GxE interactions further enhanced prediction performance in the EG&G×E model, particularly in the MTME scenario, where predictive ability increased from 0.41 (STME) to 0.43 (MTME) for DON (Fig. 2).

However, this trend of MTME models outperforming STME deviated in the GRAMINOR panel. MTME models displayed weaker performance compared to STME models in certain cases. An exception was observed in the environments 2020_Vollebekk and 2021_Vollebekk, where the STME EG model achieved a predictive ability of 0.19, while the MTME model improved it to 0.22 for DH in 2021_Vollebekk. Nevertheless, in many cases, the EG&G×E model achieved higher predictive ability than EL and EG, regardless of whether it was in the STME or MTME scenario (Fig. 3). Across all models (EL, EG, and EG&G×E), predictive performance for PH in the environments 2020_Vollebekk and 2021_Morden was relatively low, possibly due to weaker correlations between these environments (Supplementary Fig. S1).

Tukey test results under CV1 revealed modest significant differences between models in both STME and MTME scenarios. In the NMBU panel, MTME model EG&GxE had the highest predictive ability, significantly outperforming MTME model EL for DON, FHB, and AE, while MTME model EG showed moderate but occasionally non-significant improvements. In the GRAMINOR panel, MTME model EG&GxE was superior to other models for DON and FHB, whereas no significant differences were observed for DH and PH. Similar trends were observed in STME models, although with lower predictive abilities. In the NMBU panel, STME model EG&GxE significantly outperformed STME model EL for DON and FHB, while AE showed mixed results. In the GRAMINOR panel, STME model EG&GxE slightly outperformed STME model EL for DON, but differences in DH and PH were not statistically significant (Figs. 2 and 3)

Prediction performance of STME and MTME models using CV2

Prediction abilities under CV2 were improved due to the additional availability of trait information from other environments, particularly for genetically correlated traits. Figures 4 and 5 shows the predictive abilities of the NMBU and GRAMINOR panels under STME and MTME scenarios. In general, predictive abilities achieved under CV2 were higher than CV1 across the models, traits and panels (Figs. 2, 3, 4 and 5). Similar to CV1, the MTME scenario in CV2 outperformed STME, highlighting the advantage of leveraging correlated traits. The standard model EL performed better under CV2 due to the shared information on the lines from other environments. The EG model in both STME and MTME scenarios outperformed EL only in limited cases. This trend varied across traits and environments, and the addition of GxE interactions further enhanced predictive ability in most cases (Figs. 4 and 5).

Fig. 4: Bar graphs showing the prediction abilities for traits AE, DH, DON, FHB and PH using CV2 for STME and MTME prediction models in NMBU panel.
Fig. 4: Bar graphs showing the prediction abilities for traits AE, DH, DON, FHB and PH using CV2 for STME and MTME prediction models in NMBU panel.The alternative text for this image may have been generated using AI.
Full size image

In the x-axis labels meaning, AE Anther Extrusion, DH Days to Heading. DON Deoxynivalenol, FHB Fusarium Head Blight, PH plant height. Legend labels show types of GS prediction models – EL environment and line, EG environment and genomic information, EG&GxE environment, genomic information, and genotype-environment interaction, GS prediction models in solid bars are STME - Single trait multi-environment models and in striped bars are MTME – Multi-trait multi-environment models.

Fig. 5: Bar graphs showing the prediction abilities for traits DH, DON, FHB and PH using CV2 for STME and MTME prediction models in GRAMINOR panel.
Fig. 5: Bar graphs showing the prediction abilities for traits DH, DON, FHB and PH using CV2 for STME and MTME prediction models in GRAMINOR panel.The alternative text for this image may have been generated using AI.
Full size image

In the x-axis labels meaning, DH Days to Heading. DON Deoxynivalenol, FHB Fusarium Head Blight, PH plant height. Legend labels show types of GS prediction models – EL environment and line EG environment and genomic information, EG&GxE environment, genomic information, and genotype-environment interaction, GS prediction models in solid bars are STME - Single trait multi-environment models and in striped bars are MTME – Multi-trait multi-environment models.

For example, in the NMBU panel for FHB, the STME model EG achieved a predictive ability of 0.32, whereas the MTME model EG improved it to 0.40. With the inclusion of GxE interaction term, further enhancements were observed (STME EG&GxE: 0.66, MTME EG&GxE: 0.76) in the 2019_Vollebekk environment (Fig. 4). A similar trend was evident in the GRAMINOR panel. For instance, in the environment 2021_Morden, the STME model EG achieved predictive abilities of 0.23 and MTME model EG with 0.27 for DON, whereas the inclusion of GxE interactions further improved predictive ability to 0.46 (STME EG&G×E) and 0.59 (MTME EG&G×E) (Fig. 5).

Tukey results under CV2 revealed greater significant differences between the models compared to CV1. MTME model EG&GxE significantly outperforming EL, particularly for DON, AE, and FHB in the NMBU panel. While MTME model EG also exhibited higher predictive ability, its improvements over EG&GxE were not always statistically significant. In the GRAMINOR panel, MTME model EG&GxE had significantly higher predictive ability for DON and FHB, while DH and PH had shown no significant differences among models. A similar pattern was observed in STME scenario but with smaller differences among models. In the NMBU panel, STME model EG&GxE significantly outperformed STME model EL for DON and AE, while STME model EG showed moderate but not always statistically significant improvements. In GRAMINOR, EG&GxE had the highest predictive ability for DON and FHB in STME scenario, whereas DH and PH exhibited minimal differences among models (Figs. 4 and 5).

Across both cross validation schemes CV1 and CV2, predictive abilities varied across the environments with visible differences between the panels. The prediction performance of models was generally higher in the NMBU panel compared to the GRAMINOR panel. This variation could be attributed to differences in the genetic diversity, and environmental factors. In most cases, MTME models outperformed STME models due to their ability to integrate corelations between multiple traits (Figs. 2, 3, 4, and 5).

Discussion

G × E interaction is an important factor considered by the plant breeders when evaluating genotypes in multi-environment field experiments. Using the G × E interactions in the GS prediction models is common practice and many research studies included G × E18,19,23,24,25,26,27,28,42. In the present study, we investigated the importance of G × E interactions in enhancing the predictive ability of GS prediction models based on the models tested and implemented by Jarquín et al.19. We evaluated three different models with and without G × E interactions based on ST and MT approaches using two scenarios. ST GS predictive models considering the G × E interactions were commonly studied but very few studies were based on MTME GS models due to the complexity and computational power needed43,44. In our study including G × E interactions to the GS predictive models (EG&GxE) increased predictive ability in many cases within individual environments and across the environments (overall). Genomic predictive models were applied to NMBU and GRAMINOR panels using CV1 and CV2 cross validation schemes. In general, the EG&GxE model, which included the G × E interactions term, produced the highest predictive abilities among the GS models used in our study. ST predictive abilities were quite similar to the MT predictive abilities.

MT prediction is highly dependent on the correlations between traits, especially for complex traits31,45,46,47. It is important to apply genetic characterizations to the associated multiple traits before making use of them in the MT prediction models. Associated traits with FHB resistance have been studied extensively and are well characterized. Previous studies explain the correlations between FHB disease related traits41,48,49,50,51,52,53. Traits evaluated in the current study were well correlated and correspond with other studies. DH is either negatively or positively correlated with FHB disease severity and DON due to phenological differences in the germplasm and the huge impact of weather conditions during the most susceptible developmental stage at flowering for successful Fusarium infections54,55. The negative correlation of FHB with PH is also well known49,50,51 and confirmed in our study. AE is an important associated trait evaluated for FHB disease resistance and is a highly heritable trait48,56. Many studies48,50,51,52,53,57 have reported the negative correlation for FHB and DON with AE. AE functions as an avoidance mechanism, and the genotypes or individual plants with higher AE usually get a lower FHB infection and disease severity. All the above genetic information from the correlated traits is valuable in MT prediction. MT prediction showed enhancement in the predictive abilities of GS prediction models in many previous studies31,45,46,47,58,59. We also tested MT models using the correlations from the associated traits in the present study but observed no markable differences in predictive abilities compared to ST GS models.

It is important to study correlations between the environments to understand how each trait is affected in a particular environment—how they normally react to an environment and how environment influences the expression of each trait. It is very common for plant breeders to evaluate the genotypes in multi-environmental experiments, where G × E interaction plays a crucial role in making breeding decisions. A proper understanding of the G × E interactions is essential in finding strategies to deal with it38,60. Lado et al. 28 assessed the correlations between environments and used these estimates to model the variance-covariance matrix among the environments in the GS prediction models. In our study, the correlations between environments revealed substantial variability in predictability across panels, reflecting their genetic diversity and environmental conditions. Strong inter-environmental correlations were observed for most traits, especially for PH and DH in the NMBU panel. This evidence reflects the panel’s diverse genetic makeup and consistent trait expression across the tested environments. In the GRAMINOR panel, overall, there were weaker correlations between the environments, which is in line with its lower genetic variation. We also found that when there are better correlations between the environments, the predictive ability improves. This was evident from the observed predictive abilities for traits DON and FHB in the environments Morden 2019 and Vollebekk 2019, which presented weaker correlations (Figs. 1, 2, and 3).

The implementation of two cross-validation designs, CV1 and CV218 provided interrelated insights into the prediction performance of different models tested in our study. These schemes reflect real-world breeding challenges, where CV1 evaluates the prediction of entirely unseen genotypes or new varieties, while CV2 assesses the prediction performance of lines using data from other environments. CV2 has an advantage of leveraging the information from other environments, resulting in higher predictive ability compared to CV1. This difference is due to the absence of training data for the unseen lines in CV1. Our study results have shown superior prediction performance with CV2 over CV1, which are consistent with the above expectations. Moreover, our results revealed some inconsistencies across the panels. For instance, in the GRAMINOR panel, STME models outperformed MTME models, whereas, in the NMBU panels, MTME models generally exhibited better performance than STME models. These discrepancies suggest that genetic diversity and environmental variation play a critical role in model prediction performance.

To ensure an unbiased evaluation of the models, we employed a 5-fold cross-validation design with 10 repetitions. This approach maximized the randomization of observed values, thereby reducing the risk of biased model performance assessments. By ensuring that all data points contributed to both training and testing sets, we enhanced the reliability of our findings. Despite these strengths, we observed weaker correlations between certain environments, which may have contributed to suboptimal prediction performance in some models. Addressing these constraints could involve integrating environmental covariates or employing advanced machine learning methods capable of capturing environmental variations, thereby further improving predictive ability.

The predictive ability of genomic prediction models is affected by many factors; namely genetic relatedness of individuals, density of markers used and linkage disequilibrium between the markers and QTL. In terms of sample size, when the training population is larger, the chance of improving the predictive ability is greater. This adds more information to the mixed models that are used to predict the performance of genotypes61. Riedelsheimer et al.9 conducted a study to evaluate the role of training populations size on predictive abilities in maize, where they observed that the larger the size of training population, the higher is the predictive ability of the prediction model, provided more comprehensive genetic and phenotypic information comes from the larger training population. This reported observation can be related to our study, as the predictive abilities obtained using NMBU panel were higher than GRAMINOR panel. The GRAMINOR panel was assessed in replicated trials for only two years, whereas the NMBU panel was evaluated for more than five years in diverse environments. However, the training population size for both the panels is more than 200 genotypes. Marker density is another important factor affecting the predictive ability as reported in animal, human and plant breeding studies7,62,63,64. An important difference between the two panels used in this study that can explain the higher predictive ability using the NMBU panel is that it is more genetically diverse, both in general and when it comes to phenotypic and genotypic diversity for FHB resistance (Nannuru et al. 2024).

In plant breeding, breeders and researchers regularly evaluate and select the best performing genotypes in multi-environment replicated field experiments. Traits of interest used for selection are influenced by G × E interactions and considering G × E interactions while selecting the genotypes is crucial for developing best yielding and disease resistant cultivars. Similarly, G × E interactions also affect the predictive ability and the addition of G × E interactions enhanced the predictive ability in a few studies19,59,65,66. The integration of G × E interactions in the GS prediction model using different methodology was advocated and tested18,19,22,23,24,25,26,27,42. We constructed GS prediction models based on models described by Jarquín et al.19. Especially, STME models described by Jarquín et al.19 were employed for their ability to model G × E interactions across multiple years and locations without explicit modeling the environmental covariates, which was applicable in FHB field tests with varying inoculation techniques, conditions, and test regimes as previously described41. The MTME model extended this by jointly modeling FHB severity and DON, taking advantage of relevant trait correlations. Low magnitude field trial correlations indicate that unidentified environmental factors contributed to variation in phenotypes in this study. However, the observed predictive ability with integrating G × E interactions in prediction models was considerably higher compared to other models tested, especially for the traits with low heritability such as FHB disease severity and DON content compared to other traits. Climate and weather data used as environmental covariates in the genomic prediction models was already evaluated19 and in our study we did not test this approach. Including environmental covariates such as temperature, humidity, rainfall, soil moisture, sunlight, and management practices in future studies could potentially refine G × E interactions, clarify significant environmental drivers, and improve prediction and selection for FHB resistance and reduced DON content. For example, we could make use of the aforementioned weather variables during the important stage of infection and early disease development such as the time around 10 days following the heading date. Even though MTME prediction models are complex and require higher statistical performance models, deep learning methods offer a promising solution and have already been tested in some studies26,67,68,69. Similarly, use of environmental covariates in prediction models is another effective approach to improve predictive ability70,71.

MT models are expected to perform well in terms of predictive ability compared to the ST models as they aim to capture relationship and correlations between the multiple traits. But the quantity and quality of the data used in the MT prediction models can have an impact on the overall performance of the models. MT prediction models commonly require large and comprehensive data obtained for multiple traits across several environments. When this requirement is not reached, the performance of the MT models can be compromised72. This was an observation in our study with the GRAMINOR panel, where the MTME models underperformed, the most in case of environment 2020_Vollebekk (Fig. 3). The predictive ability could have been better with the GRAMINOR panel tested across a wider range of environments. In contrast to this, MT prediction models performed better in the NMBU panel due to the higher genetic diversity and more abundant phenotypic data from several years of testing in different locations (Figs. 2 and 4). MT prediction models need to account for correlations and interactions between traits, which can be challenging to model accurately. The importance of trait correlations for complex trait predictions and MT prediction dependency on these correlations were reported in some previous studies31,45,46,47. Failure to capture these complex interrelationships can result in suboptimal predictions and reduced model performance73. Another important characteristic of MT models is that they often rely on the assumption of homogeneity of (co)variances across traits and environments. Especially, variability of G × E interactions across different traits and environments is very important. Violations of these assumptions can lead to biased predictions and inaccurate assessments of genotype performance74. Our study showed the importance of considering G × E interactions to increase the predictive ability of GS prediction models. G × E interaction was useful in predicting the disease traits DON content and FHB disease severity both in STME and MTME prediction models of our study (Figs. 2, 3, 4, and 5).

Overall, the addition of G × E interactions to the GS prediction models often improved the predictive abilities in our study, especially in case of the complex traits like FHB severity and DON content. These results show that investigation of the correlations between the environments is very important and integrating genetic and environmental insights into the genomic prediction modeling serves the purpose of attaining higher predictive abilities. This is particularly important for traits with complex inheritance patterns and strong dependency on environments. We have evaluated the different model performances in both ST and MT scenarios, but did not notice any remarkable differences, however prediction performance of MT models was a little higher.

In general, this study showed that incorporating G × E interactions into genomic prediction models can substantially improve the selection of spring wheat lines for FHB resistance. For research, it provides a clear conceptual model of how genetics and environment together shape complex disease traits like FHB and DON. In practical breeding purposes, the findings suggest a potential where breeders can identify lines with stable resistance across diverse conditions, make more informed decisions in untested environments, and use correlated traits like DON to enhance selection efficiency that accelerate the development of FHB-resistant cultivars in a resource-efficient setting

Future studies should focus on the potential of including environmental covariates in addition to G × E interaction in genomic prediction models, which could further improve the performance.

Methods

Plant material

The phenotypic and genotypic data for this study was previously described in detail in our genome-wide association study (GWAS) utilizing the same panels41. Two wheat panels were used: the NMBU spring wheat panel (hereafter referred to as the NMBU panel) consisting of 296 hexaploid spring wheat cultivars and breeding lines mainly from Norway, other European countries, USA, CIMMYT (Mexico), China, Australia, etc., and the Graminor spring wheat panel (hereafter referred to as the GRAMINOR panel) consisting of 358 new breeding lines from the commercial spring wheat breeding program of Graminor in Norway.

The NMBU and GRAMINOR panels were selected to access corresponding sources of genetic variation for spring wheat. The NMBU panel has more diverse germplasm including both elite Nordic cultivars and breeding lines and less adapted material, representing a broad genetic base for testing FHB resistance. The GRAMINOR panel, on the other hand, consists only of Nordic-selected material, encompassing breeding lines and cultivars selected for adaptation to local environments. Combined, these panels facilitate evaluation of genomic prediction models across multiple germplasm and adaptation backgrounds, allowing for studying G × E interactions that are relevant to Nordic Spring wheat breeding programs.

Field trials, experimental design, and trait assessments

The NMBU panel was tested in five years in four different locations (Vollebekk and Staur in Norway, Tulln in Austria, and Morden in Canada) and the GRAMINOR panel was tested for two years and in three locations (Vollebekk, Tulln and Morden). Both panels were evaluated for FHB disease resistance related traits such as FHB disease severity in percentage and DON content in parts per million (ppm). Other secondary traits were also recorded, such as plant height (PH), days to heading (DH) and anther extrusion (AE). For details on inoculation procedure, disease pressure and assessments, please refer to Nannuru et al.41.

In Norway, Vollebekk research farm at the Norwegian University of Life Sciences, Ås (59°N, 90 m above sea level) and Staur research farm close to Hamar (60°N, 153 m above sea level) were used as the testing locations. The NMBU panel was planted in α-lattice designs with two replicates at Vollebekk research farm in 2013, 2014, 2019, and at Staur research farm in 2015. The GRAMINOR panel was evaluated in two replicates at Vollebekk research station in 2020 and 2021, following the same experimental design and methodology as for the NMBU panel.

Both panels were tested at the experimental station of the Department of Agrobiotechnology, Tulln in 2020 (9°N, 177 m above sea level). In 2020, the NMBU panel was tested, while in 2021, the GRAMINOR panel was evaluated. Both trials in Tulln were conducted using randomized complete block designs with two replicates.

Similarly, the NMBU and GRAMINOR panels were evaluated in a location in Canada at Morden, Manitoba. The NMBU panel was planted in an α-lattice design with two replicates in 2020 and in 2021 the GRAMINOR panel was tested following the same experimental design and methodology. Not all the traits were scored in each field experiment. For complete details, please refer to Nannuru et al.41.

Genotype data

As described by Nannuru et al.41, the lines were genotyped using Trait Genetics Illumina 25 K SNP Chip. In addition, some kompetitive allele specific – polymerase chain reaction (KASP) and simple sequence repeats (SSR) markers for key agronomic and disease resistance traits75 were also included. The SSR markers were converted to biallelic state. Markers were filtered based on <10% missing data and minor allele frequency of ≥5% in the lines. Heterozygous genotypes were regarded as missing data. Positional information was assigned using the Trait Genetics Illumina 25 K SNP Chip. After filtering and removing redundant markers 21652 markers remained in the genotypic dataset. Imputation of the markers was done using the software Beagle 5.476,77 as per the guideline provided in the user manual. Following imputation, both genotypic datasets were merged to obtain the common markers between the panels using Plink 2.078. After the merging, 15987 markers were common between the panels and kept for further use in the genomic prediction models.

Phenotypic data analysis

Least square means (LSmeans) were calculated using the “lme4” package79 and “lmerTEST”80 in ref. 81 for all the recorded phenotypic traits in this study. Models used for the calculation were based on the lmer function in the package “lme4” of R using restricted maximum likelihood (REML). For alpha lattice design, Eqs. 1 and 2 were used to calculate LSmeans of individual environments and across the environments for each trait.

$${\,P}_{{iknl}}=\mu +{g}_{i}+{R}_{n}+{R}_{{B}_{k(n)}}\,+{e}_{{iknl}}$$
(1)

and

$${\,P}_{{ijknl}}=\mu +{g}_{i}+{E}_{j}+{g\times E}_{{ij}}\,+{R}_{n}+{R}_{{B}_{k(n)}}+{e}_{{iJknl}}$$
(2)

where \({P}_{{iknl}}\) is the phenotypic value of trait of interest of the ith line in the nth replicate within the kth block, \(\mu\) is the general mean, \({g}_{i}\) is the fixed effect of the ith line, \({R}_{n}\) is the random effect of nth replicate, \({R}_{{B}_{k(n)}}\) is the random effect of the kth block within the nth replicate, and \({e}_{{iknl}}\) is the error term. In the Eq. 2, \({P}_{{ijknl}}\) is the phenotype (trait value) of the ith line in the nth replicate within the kth block in jth environment. \(\mu\) is the general mean, \({g}_{i}\) is the fixed effect of the ith line, \({E}_{j}\) is the random effect of the jth environment, \({g\times E}_{{ij}}\) is the random interaction effect of the ith line and the jth environment, \({R}_{n}\) is the random effect of nth replicate, \({R}_{{B}_{k(n)}}\) is the random effect of the kth block within the nth replicate, and \({e}_{{ijknl}}\) is the error term.

For a randomized complete block design, the Eqs. 3 and 4 were used to calculate LSmeans of individual environments and across the environments for each trait.

$${\,P}_{{inl}}=\mu +{g}_{i}+{R}_{n}+{e}_{{inl}}$$
(3)

and

$${\,P}_{{ijnl}}=\mu +{g}_{i}+{E}_{j}+{R}_{n}+\,{(g\times E)}_{{ij}}\,+{e}_{{ijnl}}$$
(4)

where \({P}_{{inl}}\) is the phenotypic value of trait of interest of the ith line in the nth replicate, \(\mu\) is the general mean, \({g}_{i}\) is the fixed effect of the ith line, \({R}_{n}\) is the random effect of nth replicate and \({e}_{{inl}}\) is the error term. In the Eq. 4, \({P}_{{ijnl}}\) is the phenotype (trait value) of the ith variety in the nth replicate in jth environment. \(\mu\) is the general mean, \({g}_{i}\) is the fixed effect of the ith line, \({E}_{j}\) is the random effect of the jth environment, \({g\times E}_{{ij}}\) is the random effect of the ith line grown under jth environment (interaction), \({R}_{n}\) is the random effect of nth replicate, and \({e}_{{ijnl}}\) is the error term.

Correlations between the traits were calculated using Pearson method82 for both the panels based on the across-environment means of all traits. Additionally, correlation within the individual environments were determined using the LSmeans of traits, specifically FHB disease severity and DON content for both the panels, with correlation coefficients also calculated using the Pearson method. Correlation heatmaps were subsequently generated in the R-package “Factoextra”83.

We used three different models for ST and MT genomic prediction analyses. For both ST and MT analyses, genomic predictions were done using the data from multiple environments that considered G × E interactions in one of the models. In each case, three models were used for genomic predictions. The following models were based on the study carried out by Jarquín et al.19. Single trait multi-environment (STME) prediction analysis was performed using the R-package BGLR84 in R version 4.4.285.

Single trait multi-environment prediction models

Environment + Line (EL)

This is a standard (mixed) model incorporating random environment (location x year) and line effects and the model is defined as:

$${Y}_{{ij}}=\,\mu +{L}_{i}{\,+\,E}_{j}+{e}_{{ij}}$$
(5)

where, \({Y}_{{ij}}\) is the response of the ith line in the jth environment, \(\mu\) is overall mean\(,\,{L}_{i}\) is random effect of the i-th line (i = 1,…,I), distributed as \({L}_{i}\) N(0, σL2)\(,\,{E}_{j}\) is random effect of j-th (j = 1,…..,J), distributed as \({E}_{j}\) N(0, σE2), and \({e}_{{ij}}\) is residual error term, distributed as \({e}_{{ij}}\) N(0, σe2). This standard model assumes independence between the different levels of effects, with no borrowing of information across lines or environments.

Environment + Genomic information (EG)

In this model, the random effect of lines (\({L}_{j})\) from the standard model is replaced with genomic values (gj), where genetic relationships are modeled between lines using a covariance matrix based on genomic markers and the model is defined as:

$${Y}_{{ij}}=\,\mu +\,{g}_{i}+\,{E}_{j}+{e}_{{ij}}$$
(6)

Where, \({g}_{i}\) represents an estimated genetic value of i-th line (i = 1,…,I) based on the number of markers. \({g}_{i}\) is random effect and assumed to follow multivariate normal distribution with zero mean and a covariance matrix G, distributed as \({g}_{i}\) N (0, ZgGZg σg2), where Zg as an incidence matrix for vector of genetic effects, G is the genomic relationship matrix and σg2 is additive genetic variance. The genomic relationship matrix (G) is computed using VanRanden’s method86. The EG model captures the covariance among the lines based on the genomic marker data.

Environment, Genomic information, and Genomic information x Environment interactions (EG & GxE)

This model extends the Eq. (6) by adding the interaction term (gEij) known as genotype by environment interaction term, and is described as:

$$\,{Y}_{{ij}}=\,\mu +\,{g}_{i}+\,{E}_{j}+{{gE}}_{{ij}}+{e}_{{ij}}$$
(7)

where, interaction term \({{gE}}_{{ij}}\) is a random effect and assumed to follow multivariate normal distribution N (0, ZgGZg ◦ ZEZE σgE2), and the covariance structure for gEij is proportional to Hadamard product; Cov(gE) = (ZgGZg) ◦ (ZEZE). ZgGZg is described in the previous and ZEZE is environmental covariance structure with ZE is an incidence matrix for the environmental effects, that links the phenotypes to environments and ◦ represents Hadamard product (please refer Jarquín et al.19 for details).

Multi-trait multi-environment prediction models

The MT models extend ST models, to a MT framework, where multiple traits (t = 1,…,T) are modeled simultaneously. Unstructured covariance structures account for genetic and environments correlations among traits in these models. Therefore, ST model Eqs. (5, 6 and 7) are extended to MT model Eqs. (8, 9 and 10).

$${Y\,}_{{ij}}^{(t)}=\,\mu +\,{L}_{i}^{(t)}+\,{E}_{j}^{(t)}+{e}_{{ij}}^{(t)}$$
(8)

and

$${Y}_{{ij}}^{(t)}=\,\mu +\,{g}_{i}^{(t)}+\,{E}_{j}^{(t)}+{e}_{{ij}}^{(t)}$$
(9)

and

$$\,{Y}_{{ij}}^{(t)}=\,\mu +\,{g}_{i}^{(t)}+\,{E}_{j}^{(t)}+{{{gE}}_{{ij}}^{(t)}+\,e}_{{ij}}^{(t)}$$
(10)

Where, \({Y}_{{ij}}^{\left(t\right)}\) is the response of the ith line in the jth environment for the tth trait. In the Eq. (8), the random effects of lines \({L}_{i}^{(t)}\) and environments \({E}_{j}^{(t)}\), assumed to follow a multivariate normal distribution capturing trait-specific and environment-specific variations. The random line effect is replaced with \({g}_{i}^{(t)}\) which incorporates genetic relationships among the lines as in ST models, while maintaining an unstructured variance-covariance structure across the traits, represented by a t x t matrix. In the Eq. 10, random effect of genotype by environment interaction (\({{gE}}_{{ij}}^{(t)})\) is incorporated and assumed to follow a multivariate normal distribution. This interaction effect maintains an unstructured variance-covariance structure across the environments represented by a j x j matrix.

Cross-validation schemes for assessment of the predictive ability

To evaluate model performance within individual environments, we implemented two cross-validation (CV) schemes - CV1 and CV2, designed to resemble real-world challenges in plant breeding18. CV1 evaluates the prediction performance of new lines or varieties which have not been tested before, and CV2 evaluates the prediction performance of lines in a specific environment using the data from other environments. Both schemes followed 5-fold design, where each genotype with an associated data record was assigned to a specific fold. In CV1, all the data record of a given line is assigned to same fold, ensuring test set contained only unseen, new lines, and in CV2, each data record of a given line is randomly assigned to a fold, ensuring that data from the given line was available in different environments. Figure 6a shows single STME CV1, where a given line was masked across all the environments, mimicking the scenario of predicted untested genotypes in new environments. In contrast, STME CV2 randomly masked specific genotype-environment combinations, this approach ensures that specific genotype-environment combination is available for model training from other environments (Fig. 6b). For the MTME, we extended ST CV designs with multiple traits (Fig. 6c and d). Figure 6c indicates MTME CV1, a given line was completely masked across the environments and traits. In MTME CV2, individual genotype-environments combinations were randomly masked across the traits, allowing the models to leverage the information from correlated traits in correlated environments (Fig. 6d).

Fig. 6: Cross validation schemes used for assessing predictive ability.
Fig. 6: Cross validation schemes used for assessing predictive ability.The alternative text for this image may have been generated using AI.
Full size image

CV1 and CV2 schemes used for single trait multi-environment (STME) (a, b) and multi-trait multi-environment (MTME) (c, d) prediction models. Black rectangular cells indicate presence of phenotypic data and white rectangular cells indicates absence/masking of phenotypic data (testing dataset). Each rectangular cell represents line and environment combination in STME prediction models and line, environment and trait combination in MTME prediction models.

The above mentioned cross-validation schemes were applied to both NMBU and GRAMINOR panels using 5-fold CV design with 10 repetitions. Each model was run for 12,000 iterations, with the first 5000 iterations discarded as burn-in. GEBVs were predicted for the masked data and for each fold, predictive ability was assessed using the Pearson correlation between the GEBVs and observed phenotypic values. A distribution of correlation estimates were obtained across 5-folds and 10 repetitions and standard error (SE) was calculated from the mean and variances of these estimates. At the same time, Tukey’s HSD test for significance was performed to determine the significant differences among the models. This test was applied to the correlation estimates obtained from 5-folds with 10 repetitions, where differences in model performance were deemed significant when the p-value was below 0.05. The 5-fold CV and 10 repetitions was necessary for ensuring maximum randomization of observed values and for removing bias in model performance.