Introduction

Global genomic prediction has been proposed as a means to integrate datasets from diverse environments and years in horticultural crops, thereby improving prediction accuracy and facilitating cultivar deployment across locations1. Horticultural crops, including high-value fruit and nut species such as strawberry (Fragaria × ananassa), depend on the adoption of improved germplasm that meets grower, consumer, and industry demands2,3. Genotype-by-environment (G×E) interactions are common in plants and must be understood to optimize breeding strategies and cultivar deployment. However, many breeding programs rely on relatively narrow genetic bases derived from a limited set of founding ancestors, which can constrain the ability to capture the full range of G×E interactions4,5,57. Leveraging historical datasets collected across global environments enables breeders to better characterize G×E patterns, understand the genetic basis of complex traits, and identify parents with broader adaptation.

Genomic best linear unbiased prediction (GBLUP) using a genomic relationship matrix (GRM) derived from entry-by-marker genotype data6,7 is widely applied in global genomic prediction because it offers a flexible mixed-model framework8. However, population structure caused by inbreeding, genetic drift, migration, or isolation can influence prediction accuracy by producing differences in allele frequencies and possibly in QTL effects among genetic groups9,10,11,12. Population structure can be quantified from geographic or breeding origin13, pedigree records14, or molecular markers12. If unaccounted for, these differences can inflate variance estimates and bias genomic estimated breeding values (GEBVs), heritability, and predictive ability7. Explicitly incorporating population structure into genomic prediction models may therefore improve accuracy and reduce bias.

PCA-based approach

One approach to account for population structure is to incorporate principal components (PCs) or principal coordinates (PCos) derived from genomic data into prediction models15. Fitting these components as fixed effects can correct for major sources of structure, but because PCs are derived from the same GRM used in the model, this method may result in “double counting” genetic information16,17. Janss et al.16 addressed this by reparameterizing the GBLUP model, partitioning genetic variance across and within subpopulations using eigenvalues from PCA. This PCA-derived relationship matrix in a Gaussian GBLUP framework has been shown to yield higher prediction accuracies than ridge regression or Bayesian methods (BayesA, BayesB)18, with dairy cattle studies reporting slightly higher accuracies compared to the standard GRM19,20,21.

Population-specific GRM approach

Another strategy is to construct a GRM using population-specific allele frequencies rather than overall means, thereby accounting for differences in allele distribution among subpopulations10,22. This method can capture situations where causal variants segregate in only one population. Simulated data suggest that this approach improves prediction accuracy by ~ 2% compared to the standard GRM10. These findings underscore the potential benefits of accounting for population structure, while also indicating that performance gains may vary by species and dataset.

Cultivated strawberry (F. × ananassa) originated in 18th-century France from a hybridization between F. virginiana(North America) and F. chiloensis (South America)23. Today, strawberry is a $15.9 billion global industry24, supported by numerous regionally focused breeding programs. Diversity analyses show that F. × ananassa has a broadly shared genetic base, with structure often aligned to geography or major breeding programs25,26. For example, germplasm from the University of Florida, University of California–Davis, and globally distributed “Cosmopolitan” material form distinct groups26. Additional fine-scale structuring within the USDA-ARS collection further highlights the need to account for population structure when modeling G×E interactions in strawberry25.

Strawberry flavor is a balance of sugars, acids, and aroma compounds27,28,29, with sweetness a key driver of consumer preference30,31,32,33,34. Soluble solids content (SSC), measured by refractometry, is widely used as a proxy for sweetness because sugars comprise 80–90% of SSC35. SSC is a quantitative trait controlled by many minor-effect loci, with few stable across environments27,28,29,36. It can also be negatively correlated with other desirable traits such as firmness and size27,29, making simultaneous improvement challenging. Genomic prediction offers a means to account for environmental and design-related variation, improving selection for SSC while managing trade-offs with other fruit quality traits37.

Only a few studies have applied genomic selection in strawberry38,39,39, but results indicate it can shorten the breeding cycle from three to two years by enabling earlier selection of parents based on GEBVs. Osorio et al.40 reported that predictive ability averaged 0.35 for five polygenic traits when training and validation sets shared individuals, but dropped to 0.24 when they did not, underscoring the role of relatedness.

In this study, we investigate the effect of population structure on genomic prediction for SSC in a large, diverse strawberry panel combining germplasm from breeding programs in the USA, Europe, and Australia. To our knowledge, this is the first genomic prediction study for SSC in strawberry using such a broad and genetically diverse dataset. The results provide insights for the practical implementation of genomic selection for complex traits in strawberry and strategies to effectively control for population structure in global GS datasets.

Materials and methods

Phenotypic data

Soluble solids content was assessed via refractometry (McRoberts 1932) on 2,064 accessions planted in nine trials at seven locations across the U.S.A., Europe, and Australia (Tables 1 and 2). These locations were within regions considered both temperate and subtropical. Below details of experimental design:

Further details regarding the experimental trials are provided in Supplementary Note 1.

RosBREED trials (Corvallis, OR & Benton Harbor, MI)

As part of RosBREED41, 425 clonal strawberry entries were evaluated at USDA-ARS (Oregon) and Michigan State University (Michigan), with 399 and 369 genotypes assessed, respectively. Plantings included cultivars and bi-parental populations in randomized designs (2010–2011), with two adjacent clones per genotype forming one experimental unit. Ripe fruits were collected once per plant during peak season and stored at − 20 °C. Soluble solids content (SSC) was measured from thawed, homogenized fruit using a handheld refractometer.

UF trials (F4 & F5, Balm, Florida)

Conducted in 2014–2015 using randomized block designs with five blocks. Ripe berries were sampled from each plant in December–January, macerated, and SSC measured with a refractometer. Values were averaged over five sampling periods.

NIAB-EMR trial (East Malling, UK)

Clonal genotypes were planted in five blocks (two screenhouses) using a randomized design in 2018. SSC was measured on up to three ripe berries per plant and averaged per plant.

IFAPA trials (Málaga, Spain)

Two trials evaluated 66 genotypes in randomized plots. Two ripe fruits per plant were measured for SSC and averaged.

QLD-DAF trials (Australia)

Two trials were held in 2018 N8 (subtropical, Queensland) with 121 genotypes in two-replicate incomplete blocks, and W8 (temperate, Victoria) with 70 genotypes in randomized blocks. SSC was measured at three harvests; for N8, fruit was frozen, thawed, and homogenized, while for W8, juice was measured immediately.

Genotypic data, curation, and imputation

Genotyping for the Oregon USDA (ORUS) and Michigan State University (MSU) breeding programs (trials C1/2 and B1/2) was performed using the 90 K Strawberry Axiom array (Thermo Fisher, Santa Clara, CA, USA)42, while all other programs employed the IStraw35 384HT Axiom array, developed from a subset of probes on the 90 K array28. Allele calling was conducted using the Axiom Analysis Suite software (Thermo Fisher), and a total of 12,591 SNPs shared between the two arrays were retained for analysis.

Data curation involved removing markers not present on the IStraw35 array from the ORUS and MSU datasets, followed by filtration based on Axiom Analysis Suite quality classifications. Only markers classified as “poly high-resolution,” “no minor homozygous,” or “monomorphic high-resolution” across all datasets were retained18. Accessions appearing in multiple studies were compared for identity; those differing at > 5% of markers were considered distinct and assigned unique accession names (e.g., the ‘Mara des Bois’ genotype in Spain differed from the same cultivar in Michigan and Oregon). For accessions with < 5% differences, consensus genotypes were created by converting discordant calls to missing data. Markers with > 25% missing data and accessions with > 20% missing data were excluded, resulting in a final dataset of 2,064 samples and 12,591 SNPs for downstream analyses.

Missing genotypes were imputed using FImpute v343, applied both across the entire population and within sub-populations. Imputation accuracy was assessed by masking 2,000 genotypes, imputing them, and calculating Pearson correlations and concordance rates across 10 repetitions. SNP distribution was evaluated in 1 Mb windows (~ 830 Mb genome) using CMplot in R, providing a genome-wide view of marker density and ensuring adequate representation of genomic variation.

Population structure

Population structure was characterized using two complementary approaches: ADMIXTURE and principal coordinate analysis (PCoA) based on the genomic relationship matrix. ADMIXTURE analysis was performed with K = 2 ancestral populations, and individuals with ≥ 90% ancestry assigned to a single cluster were classified as “non-admixed,” while those with < 90% ancestry were considered “admixed.” PCoA was conducted using classical multidimensional scaling of the genomic relationship matrix, followed by k-means clustering (K = 2) on the first two principal coordinates. Cluster assignments from both methods were compared to assess concordance. For downstream genomic prediction, ADMIXTURE-based clusters were retained due to their clearer biological interpretability and direct estimation of ancestry proportions, with admixed individuals treated as a separate category.

The optimal number of clusters (K) was evaluated using two criteria: (i) the silhouette method (R package factoextra4454, where the k maximizing average silhouette width was selected, and (ii) ADMIXTURE v1.3.045 with 20-fold cross-validation, where the K with the lowest cross-validation error was chosen.

Statistical methods

General mixed model

A general linear mixed model was analyzed using ASReml-R55, incorporating data from all trials and environments.:

$$\:\mathbf{y}=\mathbf{X}\mathbf{b}+\:{\mathbf{Z}}_{\text{g}}\mathbf{a}+{\mathbf{Z}}_{\text{u}}\mathbf{u}+\mathbf{e}$$

where \(\:\mathbf{y}\) is the vector of phenotypic observations, \(\:\mathbf{X}\) is the design matrix for fixed effects (trial × season, block within environment), and \(\:\mathbf{b}\) is the vector of fixed effects. The matrix \(\:{\mathbf{Z}}_{\text{g}}\) links observations to additive genetic effects \(\:\text{a}\), while \(\:{\mathbf{Z}}_{\text{u}}\) links to non-additive effects \(\:\mathbf{u}\). The residual term is \(\:\mathbf{e}\). Certain fixed or random effects were omitted depending on trial design; details are provided in Table 3.

Additive genetic effects and G×E covariance

Additive genetic effects were modelled as a genotype-by-environment (G×E) term:

$$\:\mathbf{a}\sim\:\text{N}(0,\text{\hspace{0.17em}}{\varvec{\Sigma\:}}_{\mathbf{A}}\otimes\:\mathbf{G}),$$

where \(\:\varvec{G}\) is the genomic relationship matrix among individuals and \(\:{{\Sigma\:}}_{\varvec{A}}\) is the additive covariance matrix across environments. To parsimoniously capture cross-environment correlations, a factor-analytic (FA) decomposition was applied to \(\:{{\Sigma\:}}_{\varvec{A}}\):

$$\:{\varvec{\Sigma\:}}_{\varvec{A}}=\varvec{\Lambda\:}{\varvec{\Lambda\:}}^{\mathbf{\top\:}}+\varvec{\Psi\:}$$

where \(\:\varvec{\Lambda\:}\) is the environment-by-factor loading matrix and \(\:\varvec{\Psi\:}\) is diagonal, containing environment-specific variances. Competing FA models (FA1–FA3) were compared using AIC, and the most parsimonious was selected. Importantly, the FA was applied to \(\:{\varvec{\Sigma\:}}_{\varvec{A}}\) (the additive covariance).

Genetic correlations between trials

Additive genetic correlations between environments \(\:i\) and \(\:j\) were estimated from \(\:{{\Sigma\:}}_{\varvec{A}}\):

$$\:gCor{r}_{ij}=\frac{{\varSigma\:}_{A}(i,j)}{\sqrt{{\varSigma\:}_{A}(i,i)\text{\hspace{0.17em}}{\varSigma\:}_{A}(j,j)}}$$

These additive correlations reflect the consistency of heritable effects across trials and are directly relevant for genomic prediction.

Genomic relationship matrices

To assess the impact of population structure, three approaches were used to construct \(\:\mathbf{G}\):

  1. 1.

    Standard GBLUP: \(\:\mathbf{G}\) was computed from centered genotypes using allele frequencies across the entire population:

    $$\:\mathbf{G}=\frac{\mathbf{M}{\mathbf{M}}^{\mathbf{\top\:}}}{2\sum\:_{i}pi(1-{p}_{i})}$$

    where \(\:\mathbf{M}\) is the centered marker matrix (columns = loci, rows = individuals) and \(\:{p}_{i}\) is the allele frequency at locus \(\:i\). When required, \(\:\mathbf{G}\) was bent to be positive-definite46.

  2. 2.

    P-GBLUP: Principal components (PCs) derived from \(\:\mathbf{G}\) were included as fixed covariates to control for population structure. Enough PCs were retained to explain ~ 99% of the genetic variance. This model is equivalent to standard GBLUP with PCA covariates.

  3. 3.

    Population-specific GRM: Separate GRMs were built for each subpopulation using population-specific allele frequencies10:

    $${\mathbf{G}}_\text{pop}=\frac{{\mathbf{S}}_\text{pop}{\mathbf{S}}_\text{pop}^\text{T}}{\sum_{j}2{p}_{j,{pop}}(1-{p}_{j,{pop}})}$$

    where \(\:{\varvec{S}}_{\text{pop}}\) is the centered genotype matrix for the subpopulation and \(\:{p}_{j,\text{pop}}\) is the allele frequency at locus \(\:j\) in that group.

Within-trial genomic environments were defined as combinations of seasons within a location that exhibited homogeneous additive variance and near-unity pairwise additive correlations. Single-trial models were first fit to define environments and then combined across trials using the FA parameterization of \(\:{{\Sigma\:}}_{\mathbf{A}}\).

Generalized genomic heritability

Generalized genomic heritability was estimated to quantify the proportion of trait variability attributable to genetic differences. Heritability was calculated for each trial following the method described by Hardner et al4. The heritability for trial \(\:t\) was computed as:

$$\:{\widehat{h}}_{t}^{2*}=1-\frac{{\overline{\sigma}}_{\varDelta{A,t}}^{2}}{2\times\:\:{\widehat{\sigma}}_{\varDelta{A,t}}^{2}}$$

where \({\overline{\sigma}}_{\varDelta{A,t}}^{2}\) is the mean variance of the difference of additive predictions at the \(\:{t}^{th}\) trial, estimated from the prediction error variance matrix of additive effects and \({\widehat{\sigma}}_{\varDelta{A,t}}^{2}\) is the estimated additive genetic variance at the \(\:{t}^{th}\) trial.

Prediction accuracy

Expected accuracy

Expected prediction accuracy for an individual was computed as:

$$\:E\left[r\right(\widehat{A},A\left)\right]=\sqrt{\text{\hspace{0.17em}}1-\frac{\text{PEV}\left(\widehat{\mathbf{A}}\right)}{{\sigma\:}_{\varvec{A},t}^{2}}}\text{\hspace{0.17em}},$$

where \(\:\widehat{\mathbf{A}}\) is the predicted additive effect, \(\:\mathbf{A}\) is the true additive effect, and \(\:{\sigma\:}_{\varvec{A},t}^{2}\) is the additive genetic variance at trial t.

Realized accuracy

Realized accuracy was assessed by k-fold cross-validation within and across environments. For each fold, individuals in the validation set were excluded from model fitting, marker effects were estimated from the training set, and genomic breeding values were predicted for the validation set. Accuracy was calculated as the Pearson correlation between predicted breeding values and reference genotypic values57 . To account for the imperfect reliability of the phenotypes, this correlation was further divided by the square root of the generalized heritability at the corresponding trial.

Results

SNP distribution, allele frequency and imputation

SNP markers were evenly distributed across the genome in 1 Mbp windows (Figure S1), with an average density of 84 SNPs per 1 cM. The largest physical gaps between adjacent markers were observed on chromosomes 6 and 3 (up to 35 cM), while the smallest gaps occurred on chromosomes 1, 5, and 7. Allele frequencies differed markedly between the two sub-populations (SP1 and SP2), with 98% pairwise dissimilarity (Figure S3) and a fixation index (Fst) of 0.35. Across populations, 12.5% of loci contained missing genotypes (10% in SP1 and 15% in SP2). Genotype imputation achieved 90% concordance when performed population-wide but nearly 99% when performed within populations, and the latter results were used for downstream analyses.

Population structure

Clustering results from ADMIXTURE and PCoA were largely concordant, with the majority of individuals assigned to the same clusters across methods (Fig. 1and Figure S2). ADMIXTURE analysis (K = 2) with 20-fold cross-validation revealed two primary genetic clusters and a subset of individuals showing substantial mixed ancestry, defined here as having less than 90% ancestry assigned to either cluster (i.e., more than 10% from both clusters). Using this threshold, 1,111 individuals (54%) were classified as Cluster 1 (SP1), 387 individuals (19%) as Cluster 2 (SP2), and 566 individuals (27%) as admixed (Fig. 1C&D). In parallel, principal coordinate analysis (PCoA) of the genomic relationship matrix, followed by k-means clustering (K = 2) of the first two coordinates, produced similar groupings, with most discrepancies occurring near cluster boundaries and involving the admixed group identified by ADMIXTURE (Fig. 1B & S1). Across the 2,064 accessions planted in seven locations, these genetic clusters were also geographically structured (Figure S1C): SP1 consisted primarily of accessions tested in Florida (P), forming a distinct subclade with only a few accessions from Australian trials (Nambour, QLD [N], and Wandin, VIC [W]), whereas SP2 was composed almost entirely of accessions tested in Benton Harbor, MI (B); Corvallis, OR (C); East Malling, U.K. (E); and Málaga, Spain (M). The admixed group included accessions from multiple locations, consistent with their intermediate genetic composition. Silhouette analysis supported K = 2 as the optimal number of clusters (Figure S1A) and cross-validation error from ADMIXTURE (Figure S1C), with average silhouette widths of 0.09 for SP1 and 0.22 for SP2, indicating moderate within-cluster cohesion. Given the clearer biological interpretability and direct representation of ancestry proportions, we used ADMIXTURE-defined clusters including the admixed category for downstream genomic prediction to more accurately capture population structure and admixture in the dataset.

Table 1 Summary for the 9 trials (Trial ID) included in this study.
Table 2 Number of accessions within, and in common, across trials (Trial ID are defined in Table 1).
Table 3 Log likelihood for the single-location individual trial (see Table 1 for key to TrialID) and single-location multi-trial models.
Fig. 1
Fig. 1
Full size image

Comparison of population structure inferred by ADMIXTURE and Principal Coordinates Analysis (PCoA) in strawberry samples. Panel (A) shows the ADMIXTURE bar plot where individuals are represented by their ancestry proportions from two clusters. Individuals with less than 90% ancestry from any single cluster are classified as “Admixed,” shown as mixed proportions rather than a solid color. Panel (B) presents the PCoA scatter plot, where samples are grouped into three categories: Cluster 1 (blue), Cluster 2 (green), and Admixed (purple). Panel (C) compares the number of samples assigned to each category by ADMIXTURE and PCoA using side-by-side barplots. Panel (D) displays a confusion heatmap illustrating the correspondence between ADMIXTURE and PCoA group assignments, including the admixed group, highlighting both concordance and discrepancies between these complementary approaches. Information on the optimal K value is provided in Figure S2.

Standard GBLUP

Single location GBLUP

We have reduced the complexity of the models by removing factors, interactions and combining trial within locations (Table S3, Table 3). There was no interaction between genetic effects and year for the most parsimonious individual trial models (Table 3 and Table S1 & S3). Variance component estimates for the single-trial model were presented in Fig. 2 and Table S1. In some trials, the estimated additive genomic variance (vA) was relatively higher than the residual variance (vR), indicating that additive genetic effects contributed more to the observed variation in those specific cases (Fig. 2 and Table S1). In addition, genetic correlations between individual trials (Fig. 3) provide key insights into the stability of genetic values across environments. High positive correlations (such as those observed between the Nambour and Wandin trials with other individual trials) indicate strong consistency in genetic effects, suggesting shared genetic control and the potential for joint or across-environment model selection. In contrast, correlations close to zero (e.g., between the Corvallis and Kent trials) reflect minimal genetic overlap, implying that these environments differ substantially in their genetic architecture and may need to be analyzed separately in downstream applications. The proportion of total genomic variance explained by additive genomic effects was more variable for the single-trial models for Málaga and Balm, FL trials (Fig. 2 and Table S1). Generalized heritability was highest for the Florida trial (h2 = 0.45) followed by Málaga trial (h2 = 0.41) and the lowest was recorded at East Malling, U.K. and Benton Harbor, MI (h2 = 0.16 and 0.18, respectively) (Fig. 4). Realized prediction accuracy (square root of reliability) ranged from 0.44 at the Benton Harbor, MI trial to 0.72 for the Balm, FL trials (Fig. 4).

Fig. 2
Fig. 2
Full size image

Variance component estimates for the single-location single-trial model (details in Table S1 and Figure S2).

Fig. 3
Fig. 3
Full size image

Genetic correlations between individual trials. These estimates are central to the study, as they indicate the stability of genetic values across environments. Positive correlations suggest shared genetic information and potential for model selection, while correlations near zero indicate limited genetic overlap, implying that environments should be treated independently for downstream analysis.

Fig. 4
Fig. 4
Full size image

Genomic heritability (h2) and reliability (A) for individual environment (IE) and multi-trial models. (B) * Trials from the same locations were combined (IE-M4&5 = M; IE-F4&5 = F) when genomic heritability and prediction accuracy are estimated. IE = individual environment (italic); Gfa = factor analytic model (FA) based on standard GBLUP model; Pfa = factor analytic model (FA) based on standard GBLUP + PCA model; Wfa = factor analytic model (FA) based on multi-population model (detail description of the model is provided in M&M section).

Multi-location GBLUP

Compared to the single location models, narrow sense heritability (h2) and prediction accuracy values were higher for the multiple locations standard GBLUP approach for all trials (Fig. 2 and Figure S4). Under the multi-location models, heritability was highest for the Florida trial (h2 = 0.61) and lowest for the East Malling, U.K. and Benton Harbor, MI U.S.A. trials (h2 = 0.27 and 0.28, respectively). This reflects what was observed when modeling environments individually. On average, the multi-location approach increased h2 estimates by 0.16. Prediction accuracies ranged from 0.53 at the Wandin, AUS trial to 0.75 at the Nambour, Australia. On average, prediction accuracies increased by 0.06 when incorporating multiple environments into the model.

P-GBLUP (Janss PCA method)

The relative size of realized prediction accuracy among trials for the GBLUP model that used a reparametrized GRM based on eigen decomposition (P-GBLUP) was similar to that observed for the standard multi-environment approach, where the Nambour and Wandin trials had the highest (r = 0.79) and lowest (r = 0.56) prediction accuracies, respectively (Fig. 4). The P-GBLUP model explained approximately 76% of the phenotypic variance. For all trials, realized prediction accuracies obtained from the GBLUP + PCA model were higher than the standard GBLUP (Fig. 4).

Population specific GRM approach

Realized genomic prediction accuracy for the multi-population model that accounted for population structure through the kinship matrix displayed the same relative prediction accuracy as the standard GBLUP + PCA approach. The population specific (Wfa) model explained approximately 39% of the phenotypic variance.

Fig. 5
Fig. 5
Full size image

Distribution of best linear unbiased prediction (BLUP) of the Multi-location model (i.e., standard GBLUP model (Gfa1), standard GBLUP + PCA (Pfa1), and population specific model (Wfa1) model) and additional information is provided in Table S2. (A) Distribution between Gfa1 BLUP values vs Pfa1 BLUP values (B) Gfa1 BLUP values vs Wfa1 BLUP values (C) Gfa1 BLUP values vs Pfa1 BLUP values.

Comparing the multi-location approaches

In a multi-location model, the lowest prediction accuracies were achieved in the standard model (Gfa) whereas the two approaches that account for population structure in the prediction model (Pfa and Wfa) achieved higher and more stable accuracies across trials than the standard GBLUP approach (Figs. 4 and 5 and Figure S4). The total genomic correlation matrix across genomic environments estimated from the most parsimonious multivariate for SSC assessed across breeding trials. Genomic environments are defined as groupings of trial-by-seasons such that genomic variance is homogeneous, and genomic correlations are 1 within environments. The factor analytic (FA) model was selected after comparing FA1 to FA3, with the most parsimonious model chosen for subsequent genomic prediction scenarios (Table S2). In addition, the BLUP distribution and correlation between the multi-populations further confirmed the presence of variation in BLUP predictors (Figs. 4 and 5). A strong positive correlation of BLUPs was observed between the Gfa, Pfa, and Wfa approaches for the Florida (F; r = 0.83–0.96), Málaga (M; r = 0.75–0.9), and Corvallis, OR (r = 0.7–0.8) trials; whereas unstructured distribution and low correlation between BLUPs were observed across the multi-population approaches for the Nambour and Wandin trials (Fig. 5 & Figure S4). Genomic heritability followed the same trend as the prediction accuracy estimates with the exception of the Nambour, AUS trial. For this trial, heritability was noticeably lower for the Pfa and Wfa approaches compared to the standard multi-location GBLUP approach.

In most cases, the models that accounted for population structure (standard GBLUP + PCA [Pfa] and multi-population [Wfa]) approaches) generated the highest prediction accuracies (r = ~ 0.8) and showed the lowest variation across trials. Similarly, genomic heritability followed the same pattern as prediction accuracy, where the multi-population approach exhibited high heritability estimates.

Discussion

This study evaluated strategies to account for population structure in genomic prediction models, using a large and diverse panel of global strawberry clones for soluble solids content (SSC). We found that models explicitly accounting for population-specific genomic relationships (multi-population GBLUP) achieved higher prediction accuracies compared to standard GBLUP models that ignore structure. Prediction accuracy varied considerably across environments in the single-trial univariate models, with the highest accuracy observed in the Florida trials (F4 and F5) and the lowest in Benton Harbor, MI. Combining trials from the same location in a multi-trial model increased the size of the reference population and improved prediction accuracy. Further improvements were obtained when population structure was incorporated into the multi-trial analysis, highlighting the benefit of using population-specific genomic relationship matrices rather than a single matrix for the entire population.

Analysis of global genetic relatedness revealed two major sub-populations (SP1 and SP2), broadly associated with subtropical and temperate growing environments. This structure was consistent with previous findings25,26 and is likely the result of historical germplasm exchange, particularly between the Florida and Australian breeding programs. Genetic diversity between the two groups was further supported by differences in allele frequency distributions, which have implications for the unbiased estimation of genetic correlations47. Accounting for this structure proved critical for improving genomic prediction. In our data, correcting for population structure increased prediction accuracy by up to 20% in single-location models and by about 10% in multi-location models. Similar results have been reported in maize, wheat, and cattle, where ignoring structure reduced accuracy, particularly for across-population predictions7,9,10,13,48,49,50,51,52.

Our results confirm that population structure between temperate and subtropical germplasm directly influences prediction accuracy, and models such as Pfa and Wfa generally improved reliability. However, performance gains were not uniform across all locations, indicating that environmental and genetic factors may interact in complex ways. The observed variability in model performance underscores the importance of tailored model development. Environments with low BLUP correlations likely reflect situations where additional covariates or interaction terms are needed. Future research should aim to identify these location-specific factors to further refine model robustness and generalizability.

The implication in breeding

Strong population structure can lead to biased predictions if not addressed, potentially causing false positives and false negatives in marker–trait associations. Best practice involves evaluating population and family structure prior to genomic prediction, using population-specific allele frequencies to construct GRMs, and adopting reduced-dimensionality approaches to handle complex genotype-by-environment covariance structures. Equally important is the use of diverse and representative training populations to ensure shared genetic backgrounds between training and prediction sets4.