Quantifying genetic and genotypic gain gaps in Eucalyptus: the hidden cost of ignoring inbreeding and dominance

Araujo, Marcio J.; Bush, David; Tambarussi, Evandro V.

doi:10.1038/s41437-025-00792-8

Article
Published: 21 August 2025

Quantifying genetic and genotypic gain gaps in Eucalyptus: the hidden cost of ignoring inbreeding and dominance

Heredity volume 134, pages 542–557 (2025)Cite this article

378 Accesses
15 Altmetric
Metrics details

Subjects

Abstract

Understanding the mating system of Eucalyptus species is necessary to accurately estimate genetic parameters and improve breeding programs. Eucalyptus species often exhibit mixed mating systems, leading to complex relationships among progenies. Traditional tree breeding programs that assume a half-sibling relationship for open-pollinated (OP) trials may overestimate genetic gains by neglecting the effects of inbreeding and dominance. This study focuses on Eucalyptus pellita, a species with a mixed mating system, to quantify the impact of selfing and dominance on breeding strategies. We simulated OP trial growth data for 100 randomly selected families in a randomized complete block design, using published estimated parameters for diameter at breast height (DBH). Our analysis indicated that marker-based models, particularly those incorporating dominance effects, provide more accurate genetic parameter estimates and larger predicted genetic gains than pedigree-based models. These results reveal pronounced genetic and genotypic gain gaps when traditional models are employed, underscoring the imperative for integrating dominance-informed genomic selection strategies. Thus, our study provides essential guidance for optimizing breeding programs to sustainably enhance productivity and genetic quality in Eucalyptus plantations.

You have full access to this article via your institution.

Download PDF

Efficient genomics-based ‘end-to-end’ selective tree breeding framework

Article Open access 03 January 2024

Forest tree breeding using genomic Markov causal models: a new approach to genomic tree breeding improvement

Article 17 March 2025

A joint learning approach for genomic prediction in polyploid grasses

Article Open access 21 July 2022

Introduction

Eucalyptus is among the most widely planted genera in tropical and subtropical forestry, with multiple species cultivated for timber, pulp, and other wood products (Grattapaglia and Kirst 2008; Varghese et al. 2024). Eucalyptus pellita F. Muell. stands out as an important species in tropical silviculture due its high productivity, economic value, and adaptability. E. pellita is native to northern Australia and parts of Papua New Guinea, and it has been extensively planted in Southeast Asia, particularly in Indonesia and Malaysia (Lukmandaru et al. 2016). The species is prized for its fast growth, broad environmental adaptability, and natural resistance to pests and diseases (Fadwati et al. 2023).

These traits, coupled with wood properties suitable for pulp, paper, veneers, and sawn timber (Thavamanikumar et al. 2020), have led to E. pellita being widely adopted in industrial plantations. Notably, this species has been used to replace Acacia mangium in over half a million hectares of plantations in Indonesia and neighboring countries (Thavamanikumar et al. 2020) that were devastated by disease outbreaks (Eyles et al. 2008; Mendham et al. 2015; Nambiar et al. 2018; Tarigan et al. 2011), underscoring its commercial importance.

Besides its silvicultural importance, E. pellita was also chosen as the model species for this study because it exemplifies the mixed mating system and genetic characteristics typical of the genus Eucalyptus. Like most eucalypts, E. pellita is predominantly allogamous (outcrossing) but maintains a considerable rate of self-fertilization in natural populations (Potts and Savva 1988). Open-pollinated seed from eucalypt trees often includes a considerable proportion of selfed progeny, typically ranging from 5% to over 50%, depending on the species and population conditions (Eldridge et al. 1993; Griffin et al. 2019; House and Bell 1996; Quezada et al. 2022), resulting in inbreeding and complex pedigree relationships among seedlings within family. Furthermore, E. pellita exhibts quantitative genetic parameters in line with other commercial eucalypt species, such as moderate heritability for growth and wood quality traits (Brawner et al. 2010; Hung et al. 2015; Thavamanikumar et al. 2020) and pronounced inbreeding depression in fitness-related traits. These commonalities support the use of E. pellita as a representative model for the broader Eucalyptus genus in simulations of breeding strategies.

Historically, tree breeders have not accounted for all of the relevant sources of genetic variance and/or have ignored the assumptions that underpin the models used in forest tree improvement programs (Lebedev et al. 2020; Tambarussi et al. 2018; Tambarussi et al. 2022). As a result, inaccurate estimates may have limited the realized genetic gains in key tree species, especially when non-additive genetic effects and mating system complexity are not properly accounted for (Araújo et al. 2012; Bouvet et al. 2015; Costa e Silva et al. 2004). A clear understanding of a species’ mating system is essential to accurately estimate genetic parameters and their components and to quantify the expected gains from selection in a breeding program (Bush et al. 2015; Tambarussi et al. 2018; Tambarussi et al. 2022). The magnitude of genetic gain is partially dependent on the species mating system. Therefore, it should be considered when estimating genetic parameters, offering a clear understanding of the transmission of genetic information between generations (Bush et al. 2015; Namkoong 1966; Wright 1921).

Information on mating systems should be directly applied to breeding strategies, genetic conservation practices, and the production of improved seeds (Gusson et al. 2006). Mating systems can be allogamous, autogamous, or mixed (Goodwillie et al. 2005). Species of the genus Eucalyptus L’Hér. have a mixed mating system (Griffin et al. 2019; Hardner and Potts 1997; Kennington and James 1997; Sampson and Byrne 2008; Yeh et al. 1983). Under favorable conditions, i.e., flowering synchronicity and sufficient pollinators, most offspring are the result of panmictic biparental mating, where crosses between unrelated individuals predominate and there is a small proportion of offspring resulting from self-fertilization and mating between related parents.

This complex mating system influences the evolutionary patterns of species (Griffin et al. 2019; Hardner and Tibbits 1998; Sampson et al. 1995) and breeding strategies (Gonzaga et al. 2016; Griffin and Cotterill 1988). It also leads to different levels of relatedness between progenies within families, creating varying patterns of growth and survival (Burgess et al. 1996). Thus, the amount of additive variance is directly affected by natural patterns of inbreeding and biparental mating. As a result, phenotypic variation in Eucalyptus is not only influenced by additive effects, but also dominance and epistatic effects (Araújo et al. 2012; Costa e Silva et al., 2004; Tan et al. 2018). Failure to account for non-additive variance can inflate heritability, resulting in incorrect estimates of genetic gain (Wright Stephen et al., 2008).

A decline in the phenotypic performance of fitness-related quantitative traits is known as inbreeding depression (ID) (Byers and Waller 1999; Costa e Silva et al., 2010; Costa e Silva et al., 2011; Falconer and Mackay 1996; Lohr and Haag 2015). Inbreeding results in decreased genetic variability within progeny, whereas biparental inbreeding increases kinship between parents and progeny (Griffin and Eckert 2003; Ronfort and Couvet 1995). The spatial structure of natural populations also influences the likelihood of inbreeding, as individuals are more likely to mate with nearby genotypes than individuals in more distant populations (Epperson 1992; Tambarussi et al. 2017). This is typical for eucalypts, as seeds are dispersed over short distances and neighborhood relatedness and inbreeding are common (Jones et al. 2006; McDonald et al. 2003; Pupin et al. 2019). In nature, the intrapopulation spatial genetic structure resulting from restricted gene flow may explain the existence and maintenance of the mixed mating system (Tambarussi et al. 2017). Consequently, there is a decrease in cross-fertilization, and the genes for self-fertilization are transmitted without losing the adaptive value. Thus, Wright’s equilibrium may explain the maintenance of genetic variability in these populations (Coelho and Vencovsky 2003). In mixed systems, individuals with vigorous growth are present in some populations, even if they are inbred, and are similar to those found in panmictic populations in Hardy-Weinberg equilibrium (Coelho and Vencovsky 2003; Vencovsky and Crossa 1999).

Inbreeding patterns create different degrees of kinship between individuals, which may be higher than expected for half-siblings, leading to biased genetic estimates due to the species’ mating system (Ismail and Kokko 2019; Tambarussi et al. 2017; Vencovsky et al. 2001). Most tree breeding programs still use the pedigree-based matrix which assumes half-sib relationships between trees from open-pollinated trials. This assumption has an impact on the true value of genetic estimates (Tambarussi et al. 2022; Tambarussi et al. 2018). As the quantification of inbreeding and dominance effects continues to be neglected by breeders, it is necessary to take a closer look at this problem to improve the accuracy of genetic parameters (Tambarussi et al. 2022). The main objective of the present study is to provide a better understanding of the effects of selfing and dominance for Eucalyptus pellita F.Muell., a species with a mixed mating system (Eldridge et al. 1993). We performed simulations considering different levels of dominance and inbreeding to explore their impacts on breeding programs under different selection intensities. Subsequently, we sought to quantify the effect of dominance under different inbreeding scenarios.

Material and methods

Simulations

We simulated growth data for open-pollinated (OP) breeding populations for a randomized trial design with 20 complete-block replications of 100 families in single-tree plots (STP). Simulated parameters included trial mean (μ), and additive, dominance, plot, and phenotypic variances (${\sigma }_{a}^{2}$, ${\sigma }_{d}^{2}$, ${\sigma }_{c}^{2}$, and ${\sigma }_{p}^{2}$, respectively). Simulations were based on several published diameter at breast height (DBH) datasets for Eucalyptus pellita evaluated at approximately four years after planting (Brawner et al. 2010; Hardiyanto 2003; Harwood et al. 1997; Leksono et al. 2006; Leksono et al. 2008; Leksono et al. 2009; Luo et al. 2006; Nieto et al. 2016; Thavamanikumar et al. 2020). For all simulations, we set all parameters (variances and overall mean) to the median of the analyzed studies for E. pellita four years after planting.

Simulations were carried out for the breeding population and next generation breeding population (progeny test) using the package AlphaSimR (Gaynor et al. 2021), in the R software environment, as outlined below.

Genetic structure of the breeding population

The first step in the simulation was to generate unrelated founder haplotypes for the breeding population (Bp). The method used was Markovian Coalescent Simulation (MaCS), which effectively simulates haplotypes under any population history model (Chen et al. 2008). Simulations of the OP population started with 1000 trees as founders in the breeding population and the quantitative trait under selection was DBH. We assumed 1000 segregating sites (loci) per chromosome, of which 600 were related to the simulated DBH trait. These loci were also assumed to be evenly distributed across 11 chromosomes (as found for all Eucalyptus species) (Myburg et al. 2014). Therefore, the whole-genome sequences for 11 chromosome pairs were constructed by randomly extracting 1000 biallelic single-nucleotide polymorphisms (SNP) as markers per chromosome and randomly assigning 600 SNP as QTL per chromosome. The genotypes were coded as 0 for reference (ancestral) homozygote, 1 for heterozygote, and 2 for alternative (derived) homozygote.

The breeding population for the OP population was assumed to be a breeding seed orchard consisting of 1000 superior trees selected for random mating, with the poorest performing trees removed. The strategy was chosen because it is widely used in eucalypt breeding programs.

Global model parameters for the simulations were set assuming that genetic control of DBH (g), can be decomposed into additive (a) and dominance (d) effects. To simplify the simulation and analysis, epistasis (e) was not simulated and interactions with the environment (s) (i.e., a x s, d x s, e x s) were assumed as zero. In summary, the parameter estimates used in the simulations (based on the literature, as described above) were as follows: μ = 10.5 0 ± 2 cm; ${\sigma }_{p}^{2}$ = 12 ± 2; ${\sigma }_{a}^{2}$ = 3 ± 0.01; ${\sigma }_{d}^{2}$ = from 1.2 to 2.4 ± 0.01. From these parameters, the narrow-sense heritability and the coefficient of determination for dominance (${h}_{d}^{2}$) were: ${h}_{a}^{2}$ = 0.25 ± 0.02 and ${h}_{d}^{2}$ = from 0.1 to 0.2 ± 0.01.

Breeding values (a), dominance effects (d), genotypic values (g), and phenotypic values (p)

The simulation of the additive (a) and dominance (d) effects follows the parametrization of classical quantitative trait models (Falconer and Mackay 1996). The a and d effects are defined as genotype dosage scaling with an individual’s raw genotype dosage as the number of copies of the ‘1’ allele at a locus (Gaynor et al. 2021).

The additive effect is sampled in two stages: (1) sampling initial values from a standard normal distribution (stochastic simulation); (2) scaling the magnitude of the initial values to achieve a desired genetic variance. This process began by first calculating the variance in the founder population (breeding population in this study) that accounted for both a and d effects, using the initial sampled effects. Then, a scaling constant was calculated and applied to all effects to achieve the target variance in the breeding population.

The individual breeding values (${BV}(a)$) were calculated as follows:

$${BV}\left(a\right)=\sum {a}_{q}{x}_{A}$$

Where, ${a}_{q}$ is the additive effect of the q-th QTL and ${x}_{A}$ is the scaled additive genotype dosage (which scales the relative dosage to set the values for opposite homozygotes to −1 and 1), calculated as:

$${x}_{A}=\left(x-\frac{{ploidy}}{2}\right)\left(\frac{2}{{ploidy}}\right)$$

The individual dominance effects ($D\left(d\right)$) were calculated as follows:

$$D\left(d\right)=\sum {d}_{q}{x}_{D}$$

Where, ${d}_{q}$ is the dominance effect of the q-th QTL and ${x}_{D}$ is the scaled dominance dosage (which scales the relative dosage to set the values for opposite homozygotes to 0 and the middlemost heterozygote to 1), calculated as:

$${x}_{D}=x({ploidy}-x){\left(\frac{2}{{ploidy}}\right)}^{2}$$

The dominance effects were calculated as partial dominance. The dominance degree (δ) was assumed to be 0.2 (i.e., partial dominance) and d was calculated as:

$$d=\delta \left|a\right|$$

In summary, the simulations of d were performed by scaling the magnitude of the sampled initial values for the supplied parameters. These supplied parameters are the mean and variance for a normal distribution used to sample the degrees of dominance and are then scaled as described for the additive effects to calculate the scaling constant. The scaling constant is then applied to d. Although this scaling changes the value of the dominance effect, the dominance degree remains unchanged. This makes the specification of the dominance degree distribution independent of the desired genetic variance.

Finally, the individual genotypic value (${GV}(x)$) for the simulated DBH trait was:

$${GV}\left(g\right)={BV}\left(a\right)+D(d)$$

For more details of the a and d simulation, see Traits in AlphaSimR (r-project.org).

To simulate the phenotypic values (P), we simulated the environmental effects (E), calculated based on the broad-sense heritability (${h}_{g}^{2}$). The ${h}_{g}^{2}$ was calculated as the ratio between genotypic variance (${\sigma }_{g}^{2}={\sigma }_{a}^{2}+{\sigma }_{d}^{2}\,$) and phenotypic variance (${\sigma }_{p}^{2}={\sigma }_{g}^{2}+{{\rm{environmental\; variance}}\,(\sigma }_{e}^{2})$; under the assumption that there is no interaction between genotype and environment, and that there are no other sources of variation in the phenotype), as follows:

$${h}_{g}^{2}=\frac{{\sigma }_{g}^{2}}{{\sigma }_{a}^{2}+{\sigma }_{e}^{2}}$$

The individual E was calculated as $\sim N[\mu \left({given\; above}\right);{\sigma }_{e}^{2}]$. Therefore, P was calculated as:

$$P={BV}+D+E$$

Crossing simulations for the progeny test

The progeny test was simulated using the “gene drop” method, which is used to create new haplotypes from simulated haplotypes of the original founder population (breeding population in this study). Based on the whole-genome (distributions of QTL effects, and with sequence and SNP phased alleles and genotypes) and the pedigree of the breeding population generated by the MaCS method (described above), the new haplotypes were created by modeling genetic recombination during meiosis (Hickey and Gorjanc 2012).

A sample of haplotypes with sequence information for each chromosome according to the specified breeding population was then dropped through a pedigree with genetic recombination according to the gamma model (McPeek and Speed 1995). In AlphaSimR, a genetic map is constructed and used to model the distribution of genetic recombination. The crossover interference parameter for a gamma model of recombination is used with a value of v = 2.6 (Broman and Weber 2000), which approximates the degree of crossover interference implied by the Kosambi mapping function (Gaynor et al. 2021). Therefore, the random shuffling of genes created each individual’s genome in the simulated progeny test. In this process, Mendelian sampling variance is generated by the process of randomly sampling parental chromosomes during meiotic division at gametogenesis and estimated from the difference between the individual’s predicted transmission ability and its parent mean (Cole and VanRaden 2011).

The process of creating a new population (progeny test) from the founders (breeding population) described above started with a given pedigree, i.e., a pair of crosses from the breeding population that will produce the offspring for the progeny test. The simulation for the OP population considered common issues that occur in improved seed orchards. Due to the timing of flowering (the probability of all trees in the improved seed orchard flowering at the same time is very low) and collection strategies, only a sample of trees become seed sources. In this case, we randomly selected a hypothetical set of 100/1000 trees to become seed sources and generated 20 progenies from each. From the genomic-based simulations, the relationship between progenies within families were half-sib (HS), full-sib (FS), self-half-sibs (SHS), and self-full-sibs (SFS) (Squillace 1974); however, in the pedigree-based model, all individuals within families are HS, as it assumes panmictic mating.

Therefore, the progeny test for OP populations has 2000 trees distributed across 100 families, four replications/blocks, and single-tree plots. During the crosses, we simulated self-crosses at the following selfing rates: 0, 5, 10, 25, 50, 75, and 95% and replicated each scenario 100 times, for a total of 700 simulations.

Inbreeding depression

We simulated self-crosses (selfing) for the OP population. In eucalypts, selfing is expected to cause a reduction in field growth compared to outcrossed trees (Hardner and Potts 1995), a phenomenon known as inbreeding depression (I). Inbreeding depression can be understood as the expected progressive decrease of the genotypic value as the inbreeding coefficient increases from zero to one (Wellmann and Bennewitz 2011). We simulated the expectation of inbreeding depression (E(I)) as follows (Wellmann and Bennewitz 2011):

$$E\left(I\right)=\frac{2{\mu }_{\delta }{\sigma }_{a}\mathop{\sum }\nolimits_{i=1}^{{n}_{{QTN}}}{p}_{i}{q}_{i}}{\sqrt{\pi \mathop{\sum }\nolimits_{i=1}^{{n}_{{QTN}}}{p}_{i}{q}_{i}\left[1+({\mu }_{\delta }^{2}+{\sigma }_{\delta }^{2}){({q}_{i}-{p}_{i})}^{2}\right]}}$$

Where, ${\mu }_{\delta }$ is the mean of the dominance effect; ${\sigma }_{a}$ is the additive genetic standard deviation; ${n}_{{QTN}}$ is the number of quantitative trait nucleotides (QTN; all QTN are biallelic and are assumed to have additive (a) and dominance (d) effects); ${p}_{i}$ is the frequency of allele 1 (the mutant allele) at the i-th locus; ${q}_{i}=1-{p}_{i}$ is the frequency of allele 0 (the ancestral allele) at the i-th locus; and ${\sigma }_{\delta }^{2}$ is the variance of the dominance effect. Therefore, the dominance effect was simulated as $\sim N({\mu }_{\delta },\,{\sigma }_{\delta }^{2})$.

Relative phenotypic inbreeding depression due to self-pollination (ID) was calculated as (Hardner and Potts 1995):

$${ID}=\left(\frac{{Phenotypic\; mean\; of\; the\; outcrossed\; trees}-{Phenotypic\; mean\; of\; the\; self\; pollinated\; trees}}{{Phenotypic\; mean\; of\; the\; outcrossed\; trees}}\right)* 100$$

Statistical models for genetic evaluation

To evaluate genetic parameters and predict breeding values, we applied a series of linear mixed models incorporating different sources of genetic information. Initially, we used pedigree-based models (ABLUP), which rely on the numerator relationship matrix (A) constructed from known pedigree records. These models estimate breeding values based on expected additive genetic relationships and are traditionally used in tree breeding programs.

We also employed genomic BLUP models, (GBLUP) models that use realized genomic relationship matrices (G) derived from SNP data. Two forms were tested:

An additive-only GBLUP, using the G matrix to model additive genetic effects
A genomic additive + dominance GBLUP (GDBLUP), which includes both additive (G) and dominance (D) genomic relationship matrices, thus capturing non-additive variance relevant for traits influenced by dominance.

To integrate all individuals, genotyped and non-genotyped, we used single-step GBLUP with dominance effect (ssGDBLUP). This model combines pedigree and genomic information in the H matrix and adds a dominance genomic matrix (D) to account for dominance deviations. In this model, the inverse of the H matrix was computed following (Legarra et al. 2009), and all models were fitted using ASReml-R. This modeling framework is especially relevant in open-pollinated populations where selfing and dominance effects can bias additive-only evaluations. Our approach aligns with findings in Eucalyptus pellita that emphasize the importance of dominance modeling for growth traits (Thavamanikumar et al. 2020).

Parameter estimates

We estimated the genetic parameters and genetic values (Breeding Values (BV) = additive effect; and Genotypic Values (GV) = additive + dominance effect) for the different simulated scenarios. First, analysis of variance using pedigree and marker-based models were performed to obtain the genetic and non-genetic estimates and to calculate the narrow-sense (${h}_{a}^{2}$) and broad-sense heritabilities (${h}_{g}^{2}$) for the different simulated scenarios (Table 1). The coefficient of dominance effect (${h}_{d}^{2}$) was also calculated.

Table 1 Population description and kinship structures of the simulated data used in the analysis.

Full size table

The following general model was used for analysis based on REML/ABLUP and REML/GBLUP procedures:

$$Y={Xb}+{Z}_{\zeta }a+{Z}_{\tau }d+e$$

Where, Y is the vector of individual observations; X is the known design matrix for fixed effects; b is the vector of fixed effects (overall mean and blocks). Z_ζ is the known design matrix for the additive genetic random effects; a is the vector of the additive genetic random effect with $a \sim {N}(0,\,{\boldsymbol{\zeta }}{\sigma }_{a}^{2})$, where ${\sigma }_{a}^{2}$ is the additive genetic variance and $\zeta$ is equal to:

i.
the additive genetic relationship matrix for pedigree-based models (HS and FAM models); the ${\sigma }_{a}^{2}$ in the FAM model was estimated as $4{\sigma }_{p}^{2}$ (${\sigma }_{p}^{2}$ is the variance between families), following the classical approach described by Falconer and Mackay (1996).
ii.
the marker-based additive genomic relationship matrix (VanRaden 2008) for marker-based models (GBLUP and GDBLUP models);

${Z}_{{\boldsymbol{\tau }}}$ is the known design matrix of the dominance random effects; d is the dominance random effect vector with ${d} \sim {N}(0,\,{\boldsymbol{\tau }}{\sigma }_{d}^{2})$, where ${\sigma }_{d}^{2}$ is the dominance variance and τ is:

i.
the marker-based dominance genomic relationship matrix (Vitezica et al. 2013) for the marker-based model (GDBLUP model);
ii.
the combined pedigree and genomic-based dominance relationship matrix (HD matrix; Aliloo et al. 2017; Vitezica et al. 2013; Zhang et al. 2019) for the ssGDBLUP model.

e is the random residue vector with ${e} \sim {N}(0,{I}{\sigma }_{e}^{2})$, and ${\sigma }_{e}^{2}$ as the residual variance. The models that omitted dominance effects used the described model without the ${Z}_{\tau }d$ term. These models were applied to the data sets generated for each selfing rate.

The distribution of ${h}_{a}^{2}$ from the HS, GBLUP, GDBLUP, and ssGDBLUP models were compared with the true simulated ${h}_{a}^{2}$ for each selfing rate. We also performed a regression analysis for the simulated true breeding value as a function of the breeding value estimated from each model described and selfing rate.

The calculation of ${h}_{a}^{2}$, ${h}_{d}^{2}$, and ${h}_{g}^{2}$ were as follows:

$${h}_{a}^{2}=\frac{{\sigma }_{a}^{2}}{{\sigma }_{a}^{2}+{\sigma }_{d}^{2}+{\sigma }_{e}^{2}}$$

$${h}_{d}^{2}=\frac{{\sigma }_{d}^{2}}{{\sigma }_{a}^{2}+{\sigma }_{d}^{2}+{\sigma }_{e}^{2}}$$

$${h}_{g}^{2}=\frac{{\sigma }_{a}^{2}+{\sigma }_{d}^{2}}{{\sigma }_{a}^{2}+{\sigma }_{d}^{2}+{\sigma }_{e}^{2}}$$

Selective accuracy (r_aa)

The r_aa for breeding values were obtained as the correlation (“Pearson”) between the predicted BV from each model described in Table 1 and the true simulated BV.

Selection gain

We investigated the effect on selection gain for each progeny test structure and model class using the true simulated BV and GV as a baseline. Two different deployment strategies were applied: (1) SGBV – selection of trees for BV to become parents of the next generation of the breeding population; (2) SGGV – selection of trees for GV for cloning. Different levels of selfing rates and different selection intensities (SI) were considered.

For each deployment strategy, we calculated the “genetic gain gap” as the difference between selection based on each model described in Table 1 and the true BV and GV. To do so, the predicted breeding values of the best individual trees from each model (described in Table 1) were assumed to be indirectly selected and compared with the genetic gain from direct selection using the true BV and GV for each selfing rate. The following selection intensities (SI) were assumed: 1, 5, 10, 25, 50%. Table 2 describes the formulas used to calculate genetic gain for each deployment strategy.

Table 2 Deployment strategies for genetic and genotypic gain calculations.

Full size table

Genetic and genotypic gain gap

The genetic gain gap quantifies the difference between the maximum achievable genetic gain (when selection is based on the true breeding values) and the gain achieved when individuals are ranked using breeding values (BV) or genotypic values (GV) estimated by different models (Table 1). For each selection intensity (SI%), the genetic gain gap was calculated as:

$${Genetic\; Gain\; Gap}=\left(\frac{\mu +\bar{{{BV}}_{{Tue},{SI} \% }}}{\mu }-1\right)-\left(\frac{\mu +\bar{B{V}_{{Model},{SI} \% }}}{\mu }-1\right)$$

Where:

μ is the population mean.
$\bar{B{V}_{{True},{SI} \% }}$ is the mean true breeding value of the top individuals selected based on their true BV at the given selection intensity.
$\bar{B{V}_{{Model},{SI} \% }}$ is the mean true breeding value of individuals ranked using BV estimated by the models described in Table 1 (e.g., Half-Sib, GBLUP, GDBLUP, or ssGDBLUP).

The genotypic gain gap, which measures the difference in genotypic gains (additive + dominance effects), was similarly calculated as:

$${Genotypic\; Gain\; Gap}=\left(\frac{\mu +\bar{G{V}_{{True},{SI} \% }}}{\mu }-1\right)-\left(\frac{\mu +\bar{G{V}_{{Model},{SI} \% }}}{\mu }-1\right)$$

Where:

$\bar{G{V}_{{True},{SI} \% }}$ is the mean true genotypic value of the top individuals selected based on their true GV.
$\bar{G{V}_{{Model},{SI} \% }}$ is the mean true genotypic value of individuals ranked using BV (for HS model) or GV (GBLUP, GDBLUP and ssGDBLUP models) estimated by the models in Table 1.

The models used to estimate BV and GV (Half-Sib, GBLUP GDBLUP and ssGDBLUP) are detailed in Table 1, which provides a description on the kinship structures and deployment strategies for genetic and genotypic gains. By using the true BV or GV to calculate gains for both the True BV SI% and the Model and SI%, we ensure that the comparison reflects only the differences caused by ranking inaccuracies of the models.

Results

Genetic parameter estimates

In general, the pedigree-based models resulted in greater deviation from the true values of narrow-sense heritabilities (${h}_{a}^{2}$) (Fig. 1). It is important to note that by estimating the additive variance as $4{\sigma }_{p}^{2}$, the Half-Sib (HS) and the Family (FAM) models have equal estimates of ${h}_{a}^{2}$. The median bias (HS/FAM model) was low up to a selfing rate of 10% and increased with greater inbreeding in the population (95% selfing rate), reaching more than 50% of median bias. Even in cases where the median bias was low, the pedigree-based models had more spread distribution around the true ${h}_{a}^{2}$ compared to the marker-based models.

**Fig. 1: Distribution of narrow-sense heritability (${h}_{a}^{2}$) for simulated progeny test of open-pollinated *Eucalyptus pellita* populations across different selfing rates.**

The ssGBLUP model, which considers that 50% of individuals are genotyped and accounts for the dominance effect (ssGDBLUP model), showed better performance with less bias than the pedigree-based model (Fig. 1). At a lower selfing rate (up to 50%), the ssGDBLUP model was intermediate between the pedigree and marker-based models to estimate ${h}_{a}^{2}$; however, at a higher selfing rate (greater than 50%) it was similar to the marker-based models, especially the model that does not consider the dominance effect (GBLUP model).

The marker-based model had a low standard deviation (2–5%) for all levels of selfing, and showed limited bias in estimates of dominance effect (${h}_{d}^{2}$) at low (up to 10%) and high (95%) selfing rates (Fig. 2). However, the bias was high at an intermediate selfing rate, with the highest bias at a 50% selfing rate. Despite high variability along the simulations, the ssGDBLUP model showed less bias than the marker-based model.

**Fig. 2: Distribution of the dominance effect coefficient (${h}_{d}^{2}$) for a simulated progeny test of open-pollinated *Eucalyptus pellita* populations at different selfing rates.**

The marker-based models were more efficient than pedigree-based models in predicting the breeding values, showing higher accuracy regardless of selfing rate (Fig. 3). In the OP population, the pedigree-based model (Half-Sib) had stable accuracy across the selfing rate. The accuracy of the ssGDBLUP model was intermediate between pedigree and marker-based models (Fig. 3). The accuracy of ssGDBLUP increased with the selfing rate.