Incorporating spatial and genetic competition into breeding pipelines with the R package gencomp

Chaves, Saulo F. S.; Ferreira, Filipe M.; Ferreira, Getulio C.; Gezan, Salvador A.; Dias, Kaio Olimpio G.

doi:10.1038/s41437-024-00743-9

Article
Published: 16 January 2025

Incorporating spatial and genetic competition into breeding pipelines with the R package gencomp

Heredity volume 134, pages 129–141 (2025)Cite this article

1443 Accesses
2 Citations
2 Altmetric
Metrics details

Subjects

Plant breeding

Abstract

Genetic competition can obscure the true merit of selection candidates, potentially leading to altered genotype rankings and a divergence between expected and actual genetic gains. Despite a wealth of literature on genetic competition in plant and animal breeding, the separation of genetic values into direct genetic effects (DGE, related to a genotype’s merit) and indirect genetic effects (IGE, related to the effects of a genotype’s alleles on its neighbor's phenotype) in linear mixed models is often overlooked, likely due to the complexity involved. To address this, we introduce gencomp, a new R package designed to simplify the use of (spatial-) genetic competition models in crop and tree breeding routines. gencomp includes functions for constructing the genetic competition matrix, fitting (spatial-) genetic competition models via the variance-component approach, and extracting key results such as variance components, heritabilities, competition classes, and total genetic values. For tree breeding, gencomp also calculates the merit of different clonal mixtures using the estimated DGE and IGE of the selection candidates. In this paper, we first present the theoretical foundation of the methods implemented in the package. We then demonstrate the use of gencomp with two datasets: one simulated from a Eucalyptus spp. trial and a real potato dataset. We used both datasets to demonstrate the influence of genetic competition in variance component estimates, heritabilities and selection. Despite the dependency on ASReml-R, a paid resource, gencomp is a user-friendly tool that can popularize genetic competition models, contributing to more informed decision-making in plant breeding.

You have full access to this article via your institution.

Download PDF

Metadata analysis indicates biased estimation of genetic parameters and gains using conventional pedigree information instead of genomic-based approaches in tree breeding

Article Open access 10 March 2022

Genomic prediction for agronomic traits in a diverse Flax (Linum usitatissimum L.) germplasm collection

Article Open access 08 February 2024

Genomic studies of the additive and dominant genetic control on production traits of Euterpe edulis fruits

Article Open access 16 June 2023

Introduction

Plants compete for above- and below-ground resources. The competition intensity will depend on several factors such as the species, density and resource availability. In plant breeding, competition may distort the selection process. This is because competitive plants may impair the capacity of sensitive neighbors to express their full genetic potential. Furthermore, selecting only aggressive individuals may decrease the correspondence between what was observed in trials and the actual performance in commercial farms (Besag and Kempton 1986; Kempton 1982). These are consequences of overlooking competition and selecting based only on the portion due to direct genetic effects (DGE) of the total heritable variation (Bijma 2011; Walsh and Lynch 2018, chapter 22). A possible solution is to consider the underlying competition capacity of selection candidates. This is done by taking the indirect genotypic effects (IGE) into account during the selection process (Bijma et al. 2007; Muir 2005).

IGE can be seen as the influence that the genetic makeup of one individual has on the phenotypic performance of its surrounding neighbors. Evidence suggests that competition ability is a quantitative genetic trait (Bailey and Desjonquères 2022; Griffing 1967; Sakai 1955). Under this scenario, the phenotypic value of an individual v, surrounded by n neighbors, is given by ${P}_{v}={g}_{v}+\mathop{\sum }\nolimits_{k = 1}^{n}{c}_{k}+{e}_{v}$, with g_v being the DGE, c_k being the IGE of the k^th neighbor, and e_v being the non-heritable terms (Bijma 2014), instead of just P_v = g_v + e_v, in traditional modeling approaches. Given that each candidate has its own competition capacity and that genetic competition is relevant to phenotypic expression (i.e., IGE are significant), the most appropriate selection criterion is the summation ${g}_{v}+\hat{\phi }{c}_{v}$, known as total genotypic (or genetic, if kinship is considered) value (TGV), with $\hat{\phi }$ being the mean competition intensity factor (see “Methods” for more details), and c_v the IGE of candidate v (Costa e Silva and Kerr 2013; Ferreira et al. 2023). Note that if c_v ≠ 0, the merit of a selection candidate will be different when comparing models with and without competition. In other words, competition causes change in candidates’ ranking, and, ultimately, in the selection.

Even with the relevant insights related to IGE in animal and plant breeding so far (Bijma 2010; Costa e Silva et al. 2017; Kempton and Lockwood 1984; Muir 2005), its statistical modeling is not straightforward. The main challenge is the construction of a competition matrix (IGE’s incidence matrix), as there are many ways to obtain it and each methodology yields different results. Another difficulty is the incorporation of IGE into a linear mixed model, where it is important to ensure both DGE and IGE are estimated properly. The estimation of competition effects in the genetic part of the model can be done in combination with spatial components so that the residual and genetic competitions are properly calculated (Stringer et al. 2017, 2011).

Here, we propose a new resource called gencomp, an R (R Core Team 2024) package that leverages the average information algorithm implemented in ASReml-R (Gilmour et al. 1995; The VSNi Team 2023) to facilitate the use and fitting of competition models in plant breeding programs. Currently, methods for considering competition in tree and crop breeding are implemented. In addition, the package has functions to: (i) fit (spatial-) genetic competition models, (ii) define competition classes, (iii) compute the total genotypic value and (iv) calculate the total heritable variation. Specifically for tree breeding, gencomp can also predict the performance of commercial forest stands composed of clonal mixtures, based on clones’ competition capacity (Ferreira et al. 2023).

gencomp is designed to make it easily accessible to model (spatial-) genetic competition, so plant breeders can take full leverage of this information for decision-making. In this paper, we aimed to show how to properly use gencomp for both tree and crop breeding. Using gencomp, we also aimed to illustrate how considering genetic competition effects (along with spatial trends) can change genetic and non-genetic parameters and the final—and most important—decision for plant breeders: the selection. This paper is divided as follows: first, we describe the Methods implemented within gencomp, highlighting the Theory behind it. Then, using two Motivating examples, one for tree breeding and another for crop breeding, we use the Results and Discussion sections to (i) describe the packages’ pipeline—from building the competition matrices to extracting the main results of the model; and (ii) compare the outputs of linear mixed models with and without modeling spatial trends and genetic competition.

Methods

Theory

Quantitative genetics base of Griffing (1967)

Using two individuals (1 and 2) in a group, a gene model that considers the competition effects (or, as he called it, associative effects) can be described as follows:

$$\begin{array}{lll}{u}_{{A_1a}_{1}}\;=\;{\alpha }_{{A}_{1}}+{\alpha }_{{a}_{1}}+{\delta }_{{A}_{1}{a}_{1}}+{\alpha }_{{A}_{2}}+{\alpha }_{{a}_{2}}+{\delta }_{{A}_{2}{a}_{2}}+{(\alpha \alpha )}_{{A}_{1}{A}_{2}}+{(\alpha \alpha )}_{{A}_{1}{a}_{2}}+{(\alpha \alpha )}_{{a}_{1}{A}_{2}}\\\qquad\quad\;\, +\,{(\alpha \alpha )}_{{A}_{1}{a}_{2}}\,+{(\alpha \delta )}_{{A}_{1}{A}_{2}{a}_{2}}+{(\alpha \delta )}_{{a}_{1}{A}_{2}{a}_{2}}+{(\delta \alpha )}_{{A}_{1}{a}_{1}{A}_{2}}+{(\delta \alpha )}_{{A}_{1}{a}_{1}{a}_{2}}+{(\delta \delta )}_{{A}_{1}{a}_{1}{A}_{2}{a}_{2}}\end{array}$$

where ${u}_{{A_1a}_{1}}$ is the genetic value of locus A in individual 1, ${\alpha }_{{A}_{1}}$ is the direct additive effect of allele A in individual 1 (the same goes to allele a), ${\delta }_{{A}_{1}{a}_{1}}$ is the direct dominance effect of alleles A and a in the individual 1. The effects described so far measure the DGE of individual 1. Proceeding, ${\alpha }_{{A}_{2}}$ is the competition additive effect of allele A in individual 2 as measured on alleles of individual 1 (the same goes to ${\alpha }_{{a}_{2}}$), and ${\delta }_{{A}_{2}{a}_{2}}$ is the competition dominance effect of Aa in individual 2 as measured on A₁a₁. These effects relate to the indirect genetic effects (IGE) of individual 2 which individual 1 is susceptible to. The other “epistatic” effects are additive × additive, additive × dominant, and dominant × dominant interaction between direct and competition effects. This model is the basis of the variance-component approach (also known as genotypic or treatment interference approach) to model IGE, described in the next topics.

A linear mixed model

Let the starting point be a field trial with N plots wherein V candidates (v = 1, 2, …, V) are assessed. It is assumed that it has R rows and C columns, with each candidate replicated J times (j = 1, 2, …, J). For clarity, each plot is composed of a single plant. A typical linear mixed model fitted to this trial is described as follows:

$${\bf{y}}={\bf{X}}{\boldsymbol{\beta }}+{{\bf{Z}}}_{1}{\bf{u}}+{{\bf{Z}}}_{2}{\bf{p}}+{\mathbf{\varepsilon }}$$

(1)

where y is a N × 1 vector of phenotypic records, organized sequentially according to row and then column nested within row; β is the vector of fixed effects, u is the V × 1 vector of random genotypic effects, p is the vector of non-genetic random effects, and ε is the N × 1 vector of random residuals. The capital letters X, Z₁ and Z₂ are the incidence matrices that connect their respective vectors to y. Depending on the context, p can be absent from Equation (1), and the replication (block) effects can be random or fixed. The random effects are assumed to be the outcomes of a multivariate Gaussian distribution with zero means and variance matrices G_u, G_p, and R. G_u and R equal to ${\sigma }_{u}^{2}{{\bf{I}}}_{V}$ and ${\sigma }_{\varepsilon }^{2}{{\bf{I}}}_{N}$, respectively. I_V can be replaced by a kinship matrix (based on pedigree or genomic data). The form of G_p depends on the context. Still, it usually follows the same structure as G_u and R, i.e., a variance component multiplied by an identity matrix whose order depends on the dimension of G_p.

The spatial component

The most popular way of addressing spatial trends in plant breeding is by modeling the residual covariance matrix with a separable first-order autoregressive structure (Gilmour et al. 1997). Under this scenario, ${\bf{R}}={\sigma }_{\varepsilon }^{2}\left({{\boldsymbol{AR1}}}_{R}\otimes {{\boldsymbol{AR1}}}_{C}\right)$, where AR1_R and AR1_C are a R × R and C × C first-order autocorrelation matrices for rows and columns, respectively; and ⊗ is the direct product. Note that R × C = N, so ${\sigma }_{\varepsilon }^{2}\left({{\boldsymbol{AR1}}}_{R}\otimes {{\boldsymbol{AR1}}}_{C}\right)$ is a surrogate of ${\sigma }_{e}^{2}{{\bf{I}}}_{N}$ in R. One may also want to fully distinguish between spatially dependent and independent errors (η). In this case, $\eta \sim N({\bf{0}},{\sigma }_{\eta }^{2}{{\bf{I}}}_{N})$.

The genetic competition component

Once spatial trends are defined, we can focus on modeling the genetic competition. For this purpose, we make u a 2V × 1 vector, with ${{\bf{u}}}^{{\prime} }=({{\bf{g}}}^{{\prime} },{{\bf{c}}}^{{\prime} })$, where g and c are the V × 1 vectors of DGE and IGE, respectively. Consequently, Z₁ becomes a N × 2V matrix containing an incidence matrix for DGE effects $\left[{{\bf{Z}}}_{g}^{(N\times V)}\right]$ and another for IGE effects—known as competition matrix $\left[{{\bf{Z}}}_{c}^{(N\times V)}\right]$, i.e., Z₁ = (Z_g, Z_c) (Costa e Silva et al. 2013; Stringer et al. 2011). Under this notation:

$${V\,ar}({\bf{u}})=\left[\begin{array}{cc}{\sigma }_{g}^{2}&{\sigma }_{gc}\\ {\sigma }_{gc}&{\sigma }_{c}^{2}\end{array}\right]\otimes {{\bf{I}}}_{V}$$

(2)

where σ_gc is the covariance between g and c. An alternative parameterization disregards σ_gc and takes g and c as two orthogonal effects in the model (Ferreira et al. 2023).

The model differentiates g and c based on their incidence matrices. Z_g is a regular design matrix, i.e., with ones in the corresponding position of a given plot and zero elsewhere. On the other hand, Z_c is filled in the corresponding positions of the genotypes neighboring a given plot. For instance, a plot has the genotype “G7” neighbored by “G6” and “G9” in the horizontal direction, “G5” and “G9” in the vertical direction, and “G4”, “G8”, “G5” and “G4” in the diagonal directions (Fig. 1a). For crop breeding, Z_c will be an incidence matrix with ones in the positions corresponding to the genotypes neighboring “G7” in the horizontal (“G6” and “G9”) or vertical directions (“G5” and “G9”), and zero elsewhere (Stringer et al. 2011). For tree breeding, due to the large area occupied by a single tree and large spacing between trees, the standard procedure is to compute the directional competition intensity factors for each direction by filling the positions corresponding to the genotypes neighboring “G7” in the respective row of Z_c. gencomp has three options to calculate these factors:

MU: In this method proposed by Muir (2005), the competition intensity factors are the inverse of the distance between the focal individual and its neighbors in the diagonal, row, and column directions:
$$\left\{\begin{array}{l}{f}_{d}=\frac{1}{\sqrt{{{\mathcal{D}}}_{r}^{2}+{{\mathcal{D}}}_{c}^{2}}} \\ {f}_{r}=\frac{1}{{{\mathcal{D}}}_{r}} \\ {f}_{c}=\frac{1}{{{\mathcal{D}}}_{c}}\quad \end{array}\right.$$
(3)
CC: In this method proposed by Cappa and Cantet (2008), the distance and the number of neighbors in each direction are considered. This method assumes that the distance between the focal individual and its neighbors in the row is the same as the distance between the focal individual and its neighbors in the column:
$$\left\{\begin{array}{l}{f}_{d}=\frac{1}{\sqrt{2\left({n}_{c}+{n}_{r}\right)+{n}_{d}}}\quad \\ {f}_{r}=\sqrt{\frac{2}{2\left({n}_{c}+{n}_{r}\right)+{n}_{d}}}\quad \\ {f}_{c}=\sqrt{\frac{2}{2\left({n}_{c}+{n}_{r}\right)+{n}_{d}}}\quad \end{array}\right.$$
(4)
SK: This method, proposed by Costa e Silva and Kerr (2013), considers the number of neighbors, the distance between the focal individual and its neighbors, and the difference between distances in the row and column directions:
$$\left\{\begin{array}{l}{f}_{d}=\frac{p}{\sqrt{\left({n}_{r}{p}^{4}\right)+\left({n}_{r}{p}^{2}\right)+\left({n}_{c}{p}^{2}\right)+\left({n}_{d}{p}^{2}\right)+{n}_{c}}} \\ {f}_{r}={f}_{d}\sqrt{1+{p}^{2}} \\ {f}_{c}=\frac{{f}_{d}\sqrt{1+{p}^{2}}}{p}\quad \end{array}\right.$$
(5)

**Fig. 1: Details about genetic competition models implemented in gencomp.**

For all of the above methods, f_r, f_c and f_d are the directional competition intensity factors for a given plot (i.e., a given row of Z_c) in the row, column and diagonal directions, respectively; n_r, n_c and n_d are the number of neighbors in the row, column and diagonal directions, respectively; ${{\mathcal{D}}}_{r}$ and ${{\mathcal{D}}}_{c}$ are the distance between the focal individual and its neighbors in the row and column directions, respectively; and $p={{\mathcal{D}}}_{c}/{{\mathcal{D}}}_{r}$. Note that when the distances between the focal individual and its neighbors in the row and column directions are the same, SK = CC. Table 1 contains the key factors that differentiate the methods implemented in gencomp. The Appendix A details the construction of Z_c based on the example of Fig. 1a.

Table 1 Information on the methods implemented in gencomp to build the competition matrix: the type of plant breeding, reference, considered direction, whether inter-row and inter-column distances are treated as the same or different, and the estimator of the mean competition intensity factor ($\hat{\phi }$).

Full size table

Once Z_c is built and the model is solved, the DGE $(\hat{{\bf{g}}})$ and IGE $(\hat{{\bf{c}}})$ are estimated. With these values, we can compute the total genotypic value (TGV) as:

$${TGV}_{v}={\hat{g}}_{v}+\hat{\phi }\times {\hat{c}}_{v}$$

(6)

where $\hat{\phi }=1$ for crop breeding, or $\hat{\phi }=\bar{{n}_{r}}{\bar{f}}_{r}+{\bar{n}}_{c}{\bar{f}}_{c}+{\bar{n}}_{d}{\bar{f}}_{d}$ for tree breeding. In the last case, $\hat{\phi }$ is the mean competition intensity factor. In crop breeding, $\hat{\phi }=1$ because there is no distance-based weighting, i.e., the genetic competition effects are not weighted by the distance between plots. In this case, the full contribution of IGE to the total heritable variance is considered (Bijma 2014). When the competition effects are significant, the TGV is the most appropriate breeding selection unit or value, as selection based solely on the DGE - which is equivalent to selecting based on the BLUPs in a typical linear model—may yield biased results, which can be reflected in the ranking (see the subsection “Competition affects the selection” in "Results" for more details). The significance of competition effects can be assessed via the likelihood ratio test.

The variance component estimates provided by the model can be used to calculate the broad-sense heritability (when no kinship matrix is incorporated). In genetic-competition models, breeders can base their decision-making on the DGE heritability (${H}_{g}^{2}$) or the total heritability (${H}_{t}^{2}$) (Bijma et al. 2007). The first is the portion of the total variance that refers to the DGE. The latter is a ratio between the sum of the total heritable components against the phenotypic variance, and it is an adjusted estimate of the heritability that considers the competition effects and the covariance between DGE and IGE. The expressions for these heritabilities are given below:

$${H}_{g}^{2}=\frac{{\hat{\sigma }}_{g}^{2}}{{\hat{\sigma }}_{y}^{2}}$$

(7)

$${H}_{t}^{2}=\frac{{\hat{\sigma }}_{g}^{2}+2\times \hat{\phi }\times {\hat{\sigma }}_{gc}+{\hat{\phi }}^{2}\times {\hat{\sigma }}_{c}^{2}}{{\hat{\sigma }}_{y}^{2}}$$

(8)

with ${\sigma }_{y}^{2}$ being the total phenotypic variance.

Depending on the data, species and trait’s architecture, the reliability of DGE and IGE can vary, with DGE’s reliability being frequently higher than IGE’s. For this reason, one may want to weigh the sum of these estimated effects by their respective reliabilities when computing the TGV (Ferreira et al. 2024). In this case, the weighted TGV (wTGV) is given by:

$${wTGV}_{v}={\hat{g}}_{v}\times {r}_{{g}_{v}}^{2}+\hat{\phi }\times {\hat{c}}_{v}\times {r}_{{c}_{V}}^{2}$$

(9)

with ${r}_{{g}_{v}}^{2}$ and ${r}_{{c}_{v}}^{2}$ being the reliabilities of DGE and IGE, respectively; which are calculated as:

$${r}_{{g}_{v}}^{2}=1-\frac{{PEV}_{{g}_{v}}}{{\sigma }_{g}^{2}}\quad \,\text{and}\,\quad {r}_{{c}_{v}}^{2}=1-\frac{{PEV}_{{c}_{v}}}{{\sigma }_{c}^{2}}$$

(10)

with ${PEV}_{{g}_{v}}$ and ${PEV}_{{c}_{v}}$ being the prediction error variance of DGE and IGE of the vth genotype.

The estimated competition effects can also be used to categorize the selection candidates into three competition classes: aggressive, homeostatic, and sensitive. For this purpose, we used the classification proposed by Ferreira et al. (2023), and detailed below:

$$\,{\text{Classes}}\,=\left\{\begin{array}{l}\bar{{\rm{c}}}-(\tau ){\rm{sd}}({\rm{c}}) \,<\,{\hat{{\rm{c}}}}_{{\rm{v}}} \,<\, \bar{{\rm{c}}}+(\tau ){\rm{sd}}({\rm{c}})\to \,{\text{Homeostatic}}\,\quad \\ {\hat{{\rm{c}}}}_{{\rm{v}}} \,<\,\bar{{\rm{c}}}-(\tau ){\rm{sd}}({\rm{c}})\to \,{\text{Agressive}}\,\quad \\ {\hat{{\rm{c}}}}_{{\rm{v}}} \,>\, \bar{{\rm{c}}}+(\tau ){\rm{sd}}({\rm{c}})\to \,{\text{Sensitive}}\,\quad \end{array}\right.$$

(11)

with $\bar{c}$ being the mean IGE in the population, and sd(c) is the IGE’s standard deviation. The parameter τ (not included in the original definition), was added to gencomp for additional flexibility when defining competition classes. τ is a weight defining the thresholds to declare if a genotype is aggressive, homeostatic or sensitive. For instance, let $\bar{c}=-2,sd(c)=3,{c}_{G1}=0,{c}_{G2}=-6$ and c_G3 = 6. If τ = 1, the lower and upper limits to homeostaticity are −2 − 1 × 3 = −5 and −2 + 1 × 3 = 1, so G1 is homeostatic, G2 is aggressive and G3 is sensitive. Now, if τ = 2, the lower and upper limits are −2 − 2 × 3 = −8 and −2 + 2 × 3 = 4, thence G1 and G2 are homeostatic, and G3 is sensitive.

Multi-age model (repeated measures) for tree breeding data

The procedures previously described are required to fit a spatial-genetic competition model that applies to a unique measurement (single age). Recently, Ferreira et al. (2024) proposed an extension to fit multi-age (or repeated measures) models that include competition. Let M represent the number of ages or measurements (m = 1, 2, …, M), and $T=\mathop{\sum }\nolimits_{m}^{M}{N}_{m}$. The multi-age spatial-genetic competition model is written as:

$${\bf{y}}={\bf{X}}{\boldsymbol{\beta }}+{{\bf{Z}}}_{g}{\bf{g}}+{{\bf{Z}}}_{c}{\bf{c}}+{\mathbf{\varepsilon }}$$

(12)

all terms of this model were previously described, but note that their dimensions are modified to fit this multi-age definition, e.g., g and c are of dimension VM × 1, and Z_g and Z_c are T × VM. In addition, the age effect is added to the vector of fixed effects. In this model, ${{\bf{Z}}}_{c}=({{\bf{Z}}}_{{c}_{1}},{{\bf{Z}}}_{{c}_{2}},\ldots ,{{\bf{Z}}}_{{c}_{m}})$, i.e., a unique competition matrix must be built for each age level. The variance-covariance structure of both g and c is the compound symmetry, meaning that Equation (12) estimates their main effects and their interaction with the different ages. The within-age DGE and IGE can be accessed by adding the main effect to its corresponding interaction effect from a specific age.

The residual modeling is also expanded to consider heterogeneous variances and spatial autocorrelations between ages:

$${\bf{R}}={\oplus }_{m = 1}^{M}{\sigma }_{{\varepsilon }_{m}}^{2}\left({{\boldsymbol{AR1}}}_{{C}_{m}}\otimes {{\boldsymbol{AR1}}}_{{R}_{m}}\right)$$

(13)

in which ⊕ represents the direct sum. R is a T × T block diagonal matrix.

Clonal composites for tree breeding data

In some tree breeding pipelines, the final goal is to deploy high-performance clones (as occurs in eucalyptus, cacao, rubber tree, poplar, etc.). Some clones can be more competitive and/or endure the competition more efficiently. In that case, there might be a clonal mixture that can coexist harmoniously when planted together. This is the core idea of Ferreira et al. (2023) to propose a method for defining the best composition of a clonal mixture considering competition effects. We implemented a function that simulates a physical planting grid considering the DGE and IGE of a given set of clones. These are positioned differently in each simulation, which enables the modification of focal individual-neighbor dynamics. In each simulation, the expected mean of each clone is predicted using the following equation:

$${\hat{y}}_{ik}=\hat{\mu }+{\hat{g}}_{i}+\mathop{\sum }\limits_{i\ne k}^{n}{\hat{c}}_{k}$$

(14)

where ${\hat{g}}_{i}$ is the DGE of the focal individual i, and ${\hat{c}}_{k}$ is the IGE of the neighbor k (i can have up to n neighbors). Both ${\hat{g}}_{i}$ and ${\hat{c}}_{k}$ are obtained from (spatial-) genetic competition models previously defined. In addition, one may want to weigh the IGE by the distance between the focal individual i and neighbor k (${{\mathcal{D}}}_{ik}$). This makes more biological sense, as the influence of a neighbor k diminishes as its distance to the focal individual increases. In this case, the equation is:

$${\hat{y}}_{ik}=\hat{\mu }+{\hat{g}}_{i}+\mathop{\sum }\limits_{i\ne k}^{n}\frac{1}{{{\mathcal{D}}}_{ik}}\times {\hat{c}}_{k}$$

(15)

As an extension of Ferreira et al. (2023) proposed approach, gencomp estimate the 95% confidence interval of the predicted means using a bootstrap approach based on 10,000 random samples.

Here, it is important to clearly distinguish TGV and ${\hat{y}}_{ik}$. As previously described, the focal genotype’s TGV comprises its genetic merit (represented by the DGE) summed to the weighted IGE that it exerts over its neighbors. On the other hand, ${\hat{y}}_{ik}$ consists of the expected mean of a given individual considering its genetic merit and the influence of its neighbors’ IGE on its phenotype. In the first case, we are concerned about the genetic aspects of the competition for selection (Fig. 1C of Ferreira et al. 2023), while in the second one, we are interested in how it affects the phenotypic expression when the selected candidates are planted together in commercial orchards (Fig. 1B of Ferreira et al. 2023).

Motivating examples

Using the example datasets (euca and potato, described below), we demonstrate the usage of gencomp and the effects of modeling the genetic competition. For the second objective, we compare four models: a “traditional” mixed model (TMM, without spatial adjustment and genetic competition), a spatial mixed model (SMM, without genetic competition), a genetic competition mixed model (CMM, without spatial adjustment) and a spatial-genetic competition mixed model (SCMM, the most complete scenario). From each model, we obtained the REML-estimates of variance components, from which we computed the heritabilities. We also investigated the relationships between BLUP-estimates of genotypic values (of TMM and SMM), and DGE and IGE (of CMM and SCMM), and calculated how different the selection would be considering an intensity of 20% and 25%, for the euca and potato datasets, respectively.

Tree breeding

The package gencomp contains a tree breeding dataset, named euca, which users can employ to test the function and adapt it to their own datasets. In brief, euca is a dataset simulated using parameters from a real data of an intermediate-stage clonal eucalyptus trial. It has the mean annual increment values (m³ ha⁻¹ year⁻¹, column MAI) of 100 clones (“C001” to “C100” in clone column) laid out in a randomized complete block design with 13 replicates (“B01” to “B13” in block column). The experimental unit is the same as the observation unit, i.e., there is a single plant per plot. The plants are spaced by 2 and 3 meters in the row and column directions, respectively; and the position of each tree in the field is found in columns row and col. Phenotypes of two ages are available (“3y” and “6y” in age column). This trial was not organized into contiguous blocks: the first six blocks were situated in one area, while the other seven were in another. The dataset includes a column labeled area, distinguishing between these areas. This scenario is common in large tree breeding trials, particularly when trials are divided by geographical features like streams, roads, or steep hills. In such cases, the lack of contiguity between areas must be considered in spatial-genetic competition models because clones on the border of one area do not neighbor clones on the border of another (Fig. 1b). Essentially, the layout information directs the construction of the Z_c matrix. Furthermore, the autoregressive structure in the residual, described in “The spatial component” subsection, needs to be modified to ${\bf{R}}={\oplus }_{h = 1}^{H}{\sigma }_{{\varepsilon }_{h}}^{2}\left({{\bf{AR1}}}_{{C}_{h}}\otimes {{\bf{AR1}}}_{{R}_{h}}\right)$ or ${\oplus }_{m = 1}^{M}\left[{\oplus }_{h = 1}^{H}{\sigma }_{{\varepsilon }_{mh}}^{2}\left({{{\boldsymbol{AR}}}{\mathbf{1}}}_{{C}_{mh}}\otimes {{{\boldsymbol{AR}}}{\mathbf{1}}}_{{R}_{mh}}\right)\right]$ in the multi-age case, with H being the number of areas (h = 1, 2, …, H). In summary, this modification assumes that no residual covariance exists between plots from different contiguous blocks.

Crop breeding

To exemplify the usage of genetic competition models in crop breeding, gencomp has a data set named potato, obtained from the package agridat [connolly.potato, Wright 2024]. Originally, this dataset was generated by Connolly et al. (1993) to study the inter-plot competition in single-drill plot trials of potatoes in Scotland. They measured the tuber yield of 20 varieties (column gen), which were replicated four times (column rep). Each replication was an independent row of 20 drills. The maturity class of each variety was also registered (column matur), with representatives of the first early (M1), second early (M2), and maincrop (M3) classes. Each drill had five tubers spaced 45 cm apart, with 75 cm between drills. The rows (replications) are not contiguous, which impedes the usage of an autocorrelation structure in the residual part of the model. Thus, in this case, we fitted a genetic-competition model considering the within-row competition. Check Connolly et al. (1993) for more details.

Results and discussion

gencomp is an optimized tool designed to facilitate fitting spatial-genetic competition models. The example datasets comprise 2288 (euca) and 80 (potato) observations. Taking the larger dataset as example, and utilizing a computer equipped with 8 GB of RAM and a 12th Gen Intel^® Core^™ i7-1255U processor, featuring a base frequency of 1.70 GHz, and using the 11th Gen Windows^® operational system software, the entire process—from constructing the competition matrix to extracting the results of the fitted model—consumed only about 2 min. Below, we will detail how to utilize gencomp with the example datasets.

First step: competition matrix

Following the logic presented in the “Methods” section, the first step is to build the competition matrix. For this, gencomp has two functions: prepfor and prepcrop. The former is designed to deal with tree breeding trials, and the latter, with crop breeding trials. These functions construct the matrix Z_c and provide an overview of the dataset. Their basic structure is shown in Box 1.

The argument data receives the working data frame, and gen, row, col, and trait receive the name of the column that contains the corresponding information in the data frame. The plt argument is optional (defaulting to NULL) and allows users to specify the name of the column containing plot information. This helps ensure that the functions follow the same order as the data collection in the field. If plt is not provided, the functions will automatically generate a column to differentiate the plots, ordering the dataset by row and column. The effs argument accepts a string vector with the names of columns representing other effects to be considered in the model fitting step. For instance, the effect of block (block) for the tree breeding data, and maturity stage (matur) and block (rep) for the crop breeding data. This is crucial as it transforms these columns into factors in the data, which will then be used in the model fitting functions. The default value is NULL if no additional effects are included. verbose defines whether a progress bar should be shown (TRUE) in the console during the matrix construction process or not (FALSE).

The arguments mentioned in the last paragraph are common between functions. prepfor has other exclusive arguments. area and age refer to the column that contains the area information (when the trial has non-contiguous blocks, for instance) and the age information (repeated measures), respectively. They both default to NULL. dist.row and dist.col correspond to the distances between rows and columns in the field trial, respectively. The method argument can receive three options: “MU”, “CC” and “SK”. They define which method will be used to estimate the directional competition intensity factors. Finally, n.dec allows users to specify the number of decimal digits to be displayed in the matrix. The sole exclusive argument for comprepcrop is direction, which defines which direction will be considered (currently, row or column) for constructing the competition matrix.

The functions prepfor and prepcrop generate objects of classes comprepfor and comprepcrop, respectively, which contain the competition matrix, a data frame with the inputted data merged with Z_c, another data frame containing the phenotypic records of each focal plot and its neighbors (see an example in Table 2), and the mean competition intensity factor (exclusive for prepfor). Two graphs are available from the comprepfor and comprepcrop objects using the S3 method plot: (i) a heatmap illustrating the field trial (Fig. 2A), and (ii) boxplots with each candidate’s performance (Fig. 2B).

Table 2 Neighborhood check for the euca dataset: phenotypic records of mean annual increment of wood volume (MAI) of each focal individual and mean phenotype of its neighbors in the row, column, and diagonal directions, phenotypic mean of all neighbors, and the number of neighbors in the row, column and diagonal directions.

Full size table

**Fig. 2: Plots generated from objects of class comprepfor and comprepcrop using the plot S3 method.**

Box 1 Usage of functions prepfor and prepcrop in the example datasets

comp_for = prepfor(data = euca,

gen = "clone",

row = "row",

col = "col",

plt = "tree",

trait = "MAI",

effs = c("block"),

dist.row = 2,

dist.col = 3,

method = "SK",

area = "area",

age = "age",

n.dec = 3,

verbose = TRUE)

comp_crop = prepcrop(data = potato,

gen = "gen",

row = "row",

col = "col",

trait = "yield",

plt = NULL,

effs = c("rep", 'matur'),

direction = "row",

verbose = TRUE)

Second step: model fitting

We can proceed with model fitting once Z_c is constructed. The model fitting step is the most computationally intensive. To optimize time and computational resources, gencomp internally utilizes the Average Information (Gilmour et al. 1995) algorithm of the ASReml-R package (The VSNi Team 2023) to solve linear mixed models rapidly. Thus, ASReml-R is currently a strong dependency of gencomp. Future versions will implement the same functionalities using open-source resources. It is worth mentioning that other freely available packages provide alternative solutions to deal with genetic competition, like breedR (Muñoz and Sanchez 2020) and sommer (Covarrubias-Pazaran 2016).

Currently, two functions are responsible for fitting the spatial-genetic competition model: asr and asr_ma. They have the same structure, as outlined in Box 2, but asr_ma can only be used in a multi-ages (repeated measures) context.

The argument prep.out receives the object generated by the functions prepfor or prepcrop. The arguments fixed and random receive formulas describing the fixed and random parts of the linear mixed model, using the usual ASReml-R syntax (more details in the ASRreml-R manual). In the random argument, users must specify any random effects other than the genotypic effect, if applicable. These effects must also be previously declared in the effs argument of prepfor and prepcrop. The basic structure implemented in asr and asr_ma internally accounts for the partition of the genotypic effect into DGE and IGE. If there are no other effects, the function uses the default random = ~1. The cor, lrtest, and spatial arguments are logical values that dictate if the fitted model should consider the covariance between DGE and IGE, if likelihood ratio tests should be performed, and if the model should use a first-order autoregressive structure to adjust spatial trends. For instance, if users wanted to address both spatial and independent errors, then spatial = TRUE and random = ~units should be used. Besides, bear in mind that setting cor = FALSE adds a bias in the model, as covariance between DGE and IGE is usually different from zero (Bijma et al. 2007; Costa e Silva et al. 2013; Trebissou et al. 2021). Additional arguments passed to the asreml function can also be employed in asr. In asr_ma, for example, maxit was used to increase the maximum number of iterations, which may be necessary if the model does not converge with the default number of iterations (13).

Box 2 Fitting the spatial-genetic competition model using asr and asr_ma

mod_crop = asr(prep.out = comp_crop,

fixed = yield matur + rep,

random = 1,

spatial = FALSE,

cor = TRUE,

lrtest = TRUE)

mod_for = asr_ma(prep.out = comp_for,

fixed = MAI age + block:age,

random = 1,

lrtest = TRUE,

spatial = TRUE,

cor = TRUE,

maxit = 20)

Third step: main results

The resp function provides a list of the most relevant outputs: (i) results of the likelihood ratio tests (if lrtest = TRUE in asr or asr_ma), (ii) variance components, (iii) heritabilities of the DGE and the total genotypic effects (if cor = TRUE in asr or asr_ma), and (iv) BLUP-estimates of DGE, IGE, and TGV. In the case of multi-age models, resp provides the main DGE and IGE, and the within-ages DGE and IGE. These values are later used to compute the TGVs across and within ages. The structure of the resp function is shown in Box 3. Table 3 has the variance components and likelihood ratio test results, and Table 4 has the heritabilities (of DGE and total genotypic effects) estimated from the models fitted with the euca and potato datasets.

Table 3 Variance component and correlation estimates in euca and potato datasets, their respective standard errors, and likelihood ratio statistics for the direct (DGE) and indirect genotypic effects (IGE), and their interaction with ages (only for euca).

Full size table

Table 4 Broad-sense heritabilities of the direct genotypic effects (${H}_{g}^{2}$) and of the total genotypic effects (${H}_{t}^{2}$) in the euca and potato datasets.

Full size table

The arguments prep.out and model receive the objects of class comprepfor or comprepcrop and asreml generated by the functions prepfor or prepcrop, and asr or asr_ma, respectively. The weight.tgv argument is a logical value that determines if DGE and IGE should be weighted by their respective reliabilities when computing the TGV (see Equation 9). Finally, sd.class defines the value of τ, i.e., the weight given to sd(c) when establishing competition classes.

The function resp provides an object of class comresp. The S3 methods summary, print, and plot generate specific results for comresp objects. Figures 3 and 4 depict the nine alternatives available via plot to illustrate the comresp object results. The lollipop plots illustrate the DGE, IGE (Fig. 3A), and TGV (Fig. 3B) of each selection candidate. Breeders can base selection on the TGV, which acts as a selection index incorporating DGE and IGE, or manually assign weights to DGE and IGE. For example, we might want to prioritize high-performing and homeostatic candidates over sensitive or aggressive ones. It is important not to overlook the reliability of the information: we recommend users consider it for decision-making, even when weight.tgv = FALSE (Ferreira et al. 2024).

**Fig. 3: Plots generated from an object of class comresp using the plot S3 method.**

Figures 3C, D, E and 4D are colored according to the three competition classes. The evaluation of the number of different genotypes as neighbors of the selection candidate can be done using Fig. 3C. Here, we exemplified using the euca dataset, which has neighbors in all directions. Overall, almost all clones neighbored each other, and most of them had homeostatic neighbors. Figure 3D presents the density of IGE values in the euca data and depicts the threshold adopted to determine a candidate’s class. Figure 3E illustrates the relationship between DGE and IGE, which is negative in the present example, a trend usually observed for growth- and yield-related traits (Costa e Silva et al. 2013; Ferreira et al. 2024). The distribution of genotypes according to their competition classes in the field is illustrated in Fig. 4D for the potato data. Note that these plots can change depending on the value set on the argument sd.class.

**Fig. 4: Plots generated from an object of class comresp using the plot S3 method.**

Similar to Figs. 2A and 4D, Fig. 4A–C represent the field trial. The difference is the interpretation of each cell. In Fig. 4A, the cells are filled according to the magnitude of the residual effect. This Figure is useful for investigating extraneous trends in the field. The heatmaps of Fig. 4B and C are filled according to the DGE and IGE of the corresponding genotype of each cell. Even cells with missing values will be filled in this case. This is because the classification into aggressive, sensitive, and homeostatic is performed at the genotype level.

Simulation of clonal composites for tree breeding

The results presented in Figs. 3 and 4 are examples of information users can leverage. Nonetheless, there is one further resource available for tree breeders. We provide a function to simulate clonal composites as per in Ferreira et al. (2023). The function composite has the structure shown in the box below.

The function composite (Box 4) uses all information obtained up to this point (arguments prep.out, model, and resp.out). Users may also provide the desired spacing between rows and columns in a numeric vector of size two in d.row.col. Furthermore, it must be indicated whether the IGEs should be weighted by the neighbor-focal individual distance when computing the expected mean in d.weight. The argument n.sim dictates the number of simulations, i.e., how many field grids will be generated. This is useful to guarantee that every neighbor-focal individual combination is sampled. Note that, depending on the number of selected clones, this might not happen. The selected argument receives the names of the clones that should be tested in clonal composites. In the example, we tested the top 10 clones based on their TGV. These results are presented in Table 5. Users can use an iterative process to test several clonal combinations and define which composite yields the best result.

Table 5 Predicted means of the 10 selected clones when planted together in clonal composites.

Full size table

Future users can install gencomp from GitHub using the codes presented in Box 5. Given the dependency, ASReml-R should be already installed.

Box 4 Simulating clonal composites using the composite function

sel.clone = results$blups$main[order(results$blups$main$TGV, decreasing = T),1][1:10]

cc = composite(prep.out = comp_for,

model = mod_for,

resp.out = results,

d.row.col = c(3, 3),

d.weight = TRUE,

nsim = 10,

verbose = TRUE,

selected = sel.clone)

Competition affects the selection

The availability of a tool that facilitates the management of competition in plant breeding trials is important since genetic competition represents a systematic bias that can distort the candidates’ genotypic values and hamper the selection process (Besag and Kempton 1986; Sakai 1955). Several studies showed that competition can change the candidates’ ranking, meaning that overlooking it may lead to suboptimal results (Ferreira et al. 2024; Hunt et al. 2013; Stringer et al. 2011). Furthermore, there is a hidden heritable variation related to competition effects which can only be accessed using genetic competition models, meaning that these models provide a better overview of a population’s genetic parameters (Bijma 2011; Costa e Silva et al. 2013).

Using gencomp and the available datasets, we also observed the aforementioned patterns. In the potato dataset, since we cannot fit spatial models, we compared only the traditional mixed model (TMM) with the genetic competition mixed model (CMM). The residual variance decreased from the TMM to the CMM (Fig. 5A), which can be attributed to the clear distinction between direct and IGE. This decrease resulted in an increased heritability of the DGE (Fig. 5B). As expected, the total heritability was lower than the DGE’s, since there is a negative covariance between DGE and IGE. The selection of the top 5 candidates from TMM’s BLUPs was 40% and 20% different from selecting considering the DGE and TGV of CMM, respectively (Fig. 6).

**Fig. 5: Changes in estimates of population parameters by fitting indirect genotypic effects.**

Fig. 6: Relationship between BLUPs (obtained from the traditional mixed model, TMM), direct (DGE) and indirect genotypic effects (IGE), and total genotypic values (TGV) (obtained from the genetic competition mixed model, CMM) in the potato dataset.

In the euca dataset, we fitted a multi-age model and compared the inclusion of spatial adjustments on linear mixed models, considering genetic competition (SCMM) or not (SMM). As illustrated in the potato dataset, there is a clear trend of decreasing the residual variance as complexity is added to the model, with SCMM reaching the lowest values (Fig. 7A). The value removed from the residual variance is redistributed to the other variance components (Fig. 7B), and more precise estimates of heritability (both DGE and total) are obtained (Fig. 7C). It is worth mentioning the differences in variance components and heritabilities between ages. There is a trend to lose experimental precision from 3y to 6y, as indicated by the increase of residual variance and decrease of heritability. This is expected since trees are longer in the field and susceptible to the cumulative effects of the environment. Furthermore, the competition tends to increase throughout the ages, as individuals demand more space and resources (water, nutrients, and light) (Ferreira et al. 2024). Changes in ranking are more perceptible in the euca dataset (Fig. 8). The selection would be 35% and 40% different when considering TMM and the TGV of SCMM, and TMM and the TGV of CMM, respectively. This shows the importance of considering an index such as the TGV, since the selection would be only 15% different when comparing TMM and the DGE of SCMM and CMM, showing that BLUPs and DGEs are somewhat equivalent. The addition of the spatial component also changes the ranking and the selection but on a smaller scale.

**Fig. 7: Changes in estimates of population parameters by fitting indirect genotypic effects.**

Fig. 8: Relationship between BLUPs (obtained from the traditional mixed model, TMM; and from the spatial mixed model, SMM), direct (DGE) and indirect genotypic effects (IGE), and total genotypic values (TGV) (obtained from the genetic competition mixed model, CMM and the spatial-genetic competition mixed model, SCMM) in the euca dataset.

Concluding remarks

The R package gencomp stands as a user-friendly tool with the advantage of facilitating the fitting and utilization of (spatial-) genetic competition models, irrespective of users’ programming proficiency. Moreover, the package offers flexibility by including other effects in model fitting, enabling users to adapt models to their specific requirements. It is worth mentioning that gencomp is a work in progress and will continue to evolve as we introduce additional functionalities. Future versions may extend to fitting multi-environment and multi-trait models and integrating kinship matrices. User feedback is vital to this process.

Data availability

The source code and the example datasets can be found at https://github.com/Kaio-Olimpio/gencomp.

References

Bailey NW, Desjonquères C (2022) The indirect genetic effect interaction coefficient ψ: theoretically essential and empirically neglected. J Hered 113:79–90
Article PubMed Google Scholar
Besag J, Kempton R (1986) Statistical analysis of field experiments using neighbouring plots. Biometrics 42:231
Article Google Scholar
Bijma P (2010) Multilevel selection 4: modeling the relationship of indirect genetic effects and group size. Genetics 186:1029–1031
Article PubMed PubMed Central Google Scholar
Bijma P (2011) A general definition of the heritable variation that determines the potential of a population to respond to selection. Genetics 189:1347–1359
Article PubMed PubMed Central Google Scholar
Bijma P (2014) The quantitative genetics of indirect genetic effects: a selective review of modelling issues. Heredity 112:61–69
Article CAS PubMed Google Scholar
Bijma P, Muir WM, Van Arendonk JAM (2007) Multilevel selection 1: quantitative genetics of inheritance and response to selection. Genetics 175:277–288
Article PubMed PubMed Central Google Scholar
Cappa EP, Cantet RJC (2008) Direct and competition additive effects in tree breeding: bayesian estimation from an individual tree mixed model. Silvae Genet 57:45–56
Article Google Scholar
Connolly T, Currie ID, Bradshaw JE, McNicol JW (1993) Inter-plot competition in yield trials of potatoes (Solanum tuberosum L.) with single-drill plots. Ann Appl Biol 123:367–377
Article Google Scholar
Costa e Silva J, Kerr RJ (2013) Accounting for competition in genetic analysis, with particular emphasis on forest genetic trials. Tree Genet Genomes 9:1–17
Article Google Scholar
Costa e Silva J, Potts BM, Bijma P, Kerr RJ, Pilbeam DJ (2013) Genetic control of interactions among individuals: contrasting outcomes of indirect genetic effects arising from neighbour disease infection and competition in a forest tree. N. Phytol 197:631–641
Article Google Scholar
Costa e Silva J, Potts BM, Gilmour AR, Kerr RJ (2017) Genetic-based interactions among tree neighbors: identification of the most influential neighbors, and estimation of correlations among direct and indirect genetic effects for leaf disease and growth in Eucalyptus globulus. Heredity 119:125–135
Article PubMed PubMed Central Google Scholar
Ferreira FM, Chaves SFS, Bhering LL et al. (2023) A novel strategy to predict clonal composites by jointly modeling spatial variation and genetic competition. For Ecol Manag 548:121393
Article Google Scholar
Ferreira FM, Chaves SFS, Santos OP et al. (2024) Competition effects can mislead selection in eucalypt breeding trials. For Ecol Manag 561:121892
Article Google Scholar
Gilmour AR, Cullis BR, Verbyla AP (1997) Accounting for natural and extraneous variation in the analysis of field experiments. J Agric Biol Environ Stat 2:269–293
Article Google Scholar
Gilmour AR, Thompson R, Cullis BR (1995) Average Information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51:1440–1450
Article Google Scholar
Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the r package sommer. PLoS ONE 11:1–15
Article Google Scholar
Griffing B (1967) Selection in reference to biological groups I. Individual and group selection applied to populations of unordered groups. Aust J Biol Sci 20:127–140
Article CAS PubMed Google Scholar
Hunt CH, Smith AB, Jordan DR, Cullis BR (2013) Predicting additive and non-additive genetic effects from trials where traits are affected by interplot competition. J Agric, Biol, Environ Stat 18:53–63
Article Google Scholar
Kempton RA (1982) Adjustment for competition between varieties in plant breeding trials. J Agric Sci 98:599–611
Article Google Scholar
Kempton RA, Lockwood G (1984) Inter-plot competition in variety trials of field beans (Vicia faba L.). J Agric Sci 103:293–302
Article Google Scholar
Muir WM (2005) Incorporation of competitive effects in forest tree or animal breeding programs. Genetics 170:1247–1259
Article PubMed PubMed Central Google Scholar
Muñoz F, Sanchez L (2020) breedR: statistical methods for forest genetic resources analysts. https://github.com/famuvie/breedR. R package version 0.12-5.
R Core Team (2024) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Sakai K-I (1955) Competition in plants and its relation to selection. Cold Spring Harb Symp Quant Biol 20:137–157
Article CAS PubMed Google Scholar
Stringer JK, Atkin FC, Gezan SA (2017) Statistical approaches in plant breeding: maximising the use of the genetic information. In: Campos H, Caligari PD (eds) Genetic improvement of tropical crops. Springer International Publishing, Cham, pp 3–17
Stringer JK, Cullis BR, Thompson R (2011) Joint modeling of spatial variability and within-row interplot competition to increase the efficiency of plant improvement. J Agric Biol Environ Stat 16:269–281
Article Google Scholar
The VSNi Team (2023) ASReml: Fits Linear Mixed Models using REML. www.vsni.co.uk. R package version 4.2.0.332.
Trebissou CI, Tahi MG, Munoz F et al. (2021) Cocoa breeding must take into account the competitive value of cocoa trees. Eur J Agron 128:126288
Article Google Scholar
Walsh B, Lynch M (2018) Evolution and selection of quantitative traits, vol. 1. Oxford University Press
Wright K (2024) agridat: Agricultural Datasets. https://CRAN.R-project.org/package=agridat. R package version 1.23.

Download references

Acknowledgements

This work was financially supported by the Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)—Finance Code 001. This study was financed, in part, by the São Paulo Research Foundation (FAPESP), Brazil. Process Number #2023/04881-3.

Author information

Authors and Affiliations

Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
Saulo F. S. Chaves & Kaio Olimpio G. Dias
Institute of Artificial and Computational Intelligence (IDATA), Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
Saulo F. S. Chaves & Kaio Olimpio G. Dias
Plant Production Department, College of Agriculture, São Paulo State University, Botucatú, São Paulo, Brazil
Filipe M. Ferreira
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
Getulio C. Ferreira
VSN International, Hemel Hempstead, England, UK
Salvador A. Gezan

Authors

Saulo F. S. Chaves
View author publications
Search author on:PubMed Google Scholar
Filipe M. Ferreira
View author publications
Search author on:PubMed Google Scholar
Getulio C. Ferreira
View author publications
Search author on:PubMed Google Scholar
Salvador A. Gezan
View author publications
Search author on:PubMed Google Scholar
Kaio Olimpio G. Dias
View author publications
Search author on:PubMed Google Scholar

Contributions

S.F.S.C., F.M.F., and K.O.G.D. designed the research. S.F.S.C. wrote the first draft; All authors contributed to the package’s construction, revised drafts of the paper, and approved the final manuscript.

Corresponding author

Correspondence to Kaio Olimpio G. Dias.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Associate editor Yuan-Ming Zhang

Appendix A: Building the competition matrix

Let the toy example of Fig. 1a represent a tree breeding trial, with trees spaced 3 m between rows and 2.5 m between columns. Its Z_c would have the following structure:

$${{\bf{Z}}}_{c}=\left[\begin{array}{cccccc}0&{f}_{c}&{f}_{r}&{f}_{d}&0&0\\ {f}_{c}&{f}_{c}&{f}_{d}&{f}_{r}&0&{f}_{d}\\ 0&{f}_{c}&0&{f}_{c}&{f}_{d}&{f}_{r}\\ {f}_{c}&0&0&0&{f}_{r}&{f}_{d}\\ {f}_{d}&{f}_{d}&{f}_{r}&{f}_{r}&0&{f}_{c}\\ {f}_{r}&{f}_{r}&{f}_{d}&{f}_{c}&{f}_{c}&{f}_{d}\\ {f}_{d}&{f}_{r}&{f}_{c}&0&{f}_{d}&{f}_{r}\\ {f}_{r}&{f}_{d}&0&{f}_{c}&{f}_{8}&{f}_{d}\\ 0&0&{f}_{r}&{f}_{d}&0&{f}_{c}\\ {f}_{c}&0&{f}_{d}&{f}_{r}&{f}_{c}&{f}_{d}\\ 0&0&{f}_{c}&{f}_{d}&{f}_{d}&{f}_{c}\\ {f}_{c}&0&0&0&{f}_{r}&{f}_{d}\end{array}\right]$$

(A1)

Note that when the same genotype is a neighbor in different directions, we use the direction with the lowest distance. For instance, the plot “P3” contains the genotype “G5”, which is a neighbor of “G7” in both the column and diagonal directions (“P4” and “P7”, respectively). In this case f_c will be added to the matrix, since the distance between trees in the column (2.5 m) is lower than the distance in the diagonal ($\sqrt{{3}^{2}+2.{5}^{2}}=3.9$ m).

Now, using the same example to illustrate Z_c for crop breeding, considering competition in the row (${{\bf{Z}}}_{{c}_{1}}$) and in the column (${{\bf{Z}}}_{{c}_{2}}$) directions, the competition matrices would have the following structure:

$${{\bf{Z}}}_{{c}_{1}}=\left[\begin{array}{cccccc}0&1&0&0&0&0\\ 1&1&0&0&0&0\\ 0&1&0&1&0&0\\ 0&1&0&0&0&0\\ 0&0&0&0&0&1\\ 0&0&1&0&1&0\\ 0&0&1&0&0&1\\ 0&0&0&1&0&0\\ 0&0&0&0&0&1\\ 1&0&0&0&1&0\\ 0&0&1&0&0&1\\ 1&0&0&0&0&0\end{array}\right]\quad {{\bf{Z}}}_{{c}_{2}}=\left[\begin{array}{cccccc}0&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&0&1\\ 0&0&0&0&1&0\\ 0&0&1&1&0&0\\ 1&1&0&0&0&0\\ 0&1&0&0&0&1\\ 1&0&0&0&1&0\\ 0&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&0&1\\ 0&0&0&0&1&0\end{array}\right]$$

(A2)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chaves, S.F.S., Ferreira, F.M., Ferreira, G.C. et al. Incorporating spatial and genetic competition into breeding pipelines with the R package gencomp. Heredity 134, 129–141 (2025). https://doi.org/10.1038/s41437-024-00743-9

Download citation

Received: 23 August 2024
Revised: 23 December 2024
Accepted: 23 December 2024
Published: 16 January 2025
Version of record: 16 January 2025
Issue date: February 2025
DOI: https://doi.org/10.1038/s41437-024-00743-9

This article is cited by

Age trends in Bertholletia excelsa Bonpl. progeny test reveals high early selection efficiency and support genetically-oriented thinning management
- Andrei Caíque Pires Nunes
- Samir Rolim
- Daniel Piotto
New Forests (2026)

Subjects

Abstract

Similar content being viewed by others

Metadata analysis indicates biased estimation of genetic parameters and gains using conventional pedigree information instead of genomic-based approaches in tree breeding

Genomic prediction for agronomic traits in a diverse Flax (Linum usitatissimum L.) germplasm collection

Genomic studies of the additive and dominant genetic control on production traits of Euterpe edulis fruits

Introduction

Methods

Theory

Quantitative genetics base of Griffing (1967)

A linear mixed model

The spatial component

The genetic competition component

Multi-age model (repeated measures) for tree breeding data

Clonal composites for tree breeding data

Motivating examples

Tree breeding

Crop breeding

Results and discussion

First step: competition matrix

Second step: model fitting

Third step: main results

Simulation of clonal composites for tree breeding

Competition affects the selection

Concluding remarks

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Appendix A: Building the competition matrix

Appendix A: Building the competition matrix

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Age trends in Bertholletia excelsa Bonpl. progeny test reveals high early selection efficiency and support genetically-oriented thinning management

Search

Quick links