Abstract
Endosperm, which is derived from two polar nuclei fusing with one sperm, is a triploid tissue in cereals. Endosperm tissue determines the grain quality of cereals. Improving grain quality is one of the important breeding objectives in cereals. However, current statistical methods for mapping quantitative trait loci (QTL) under diploid genetic control have not been effective for dealing with endosperm traits because of the complexity of their triploid inheritance. In this paper, we derive for the first time the conditional probabilities of F3 endosperm QTL genotypes given different flanking marker genotypes in F2 plants. Using these probabilities, we develop a multiple linear regression method implemented via the iteratively reweighted least-squares (IRWLS) algorithm and a maximum likelihood method (ML) implemented via the expectation-maximization (EM) algorithm to map QTL underlying endosperm traits. We use the mean value of endosperm traits of F3 seeds as the dependent variable and the expectations of genotypic indicators for additive and dominance effect of a putative QTL flanked by a pair of markers as independent variables for IRWLS mapping. However, if an endosperm trait is measured quantitatively using a single endosperm sample, the ML mapping method can be used to separate the two dominance effects. Efficiency of the methods is verified through extensive Monte Carlo simulation studies. Results of simulation show that the proposed methods provide accurate estimates of both the QTL effects and locations with very high statistical power. With these methods, we are now ready to map endosperm traits, as we can for regular quantitative trait under diploid control.
Similar content being viewed by others
Introduction
Improving grain quality is one of the important objectives in cereal breeding (Sadimantara et al, 1997; Mazur et al, 1999; Tan et al, 1999; Wang et al, 2001). Many grain quality traits, such as amylose content and gel consistency in rice, protein and amino-acid content in wheat, starch and gum content in barley, and sugar content in sweet corn, are endosperm traits. The phenotypes of most endosperm traits are actually distributed in a continuous fashion and their expression is modified by environmental variables. It is not efficient to study their genetic architectures based on a single gene model. Therefore, we need quantitative genetic models to describe the expression of most endosperm traits. Quantitative genetic models have been developed for endosperm traits (Gale, 1976; Mo, 1987; Bogyo et al, 1988; Foolad and Jones, 1992; Pooni et al, 1992; Zhu and Weir, 1994; Wu et al, 1998). However, they are not designed for QTL mapping using molecular markers.
Recent advances in molecular biology provide tools that can generate saturated molecular markers along the genome. These markers segregate and are inherited based on simple Mendelian laws. They can be used to infer the segregation of quantitative trait loci (QTL) located in the neighborhood of the markers, a technique called QTL mapping. Numerous statistical methods have been developed for mapping QTL using simple crosses derived from two inbred lines (Lander and Botstein, 1989; Haley and Knott, 1992; Martinez and Curnow, 1992; Jansen, 1993,1994; Zeng, 1994; Xu, 1998a,1998b; Kao and Zeng, 1997). However, these QTL mapping statistics are almost exclusively designed for traits under diploid control. They may not be appropriate for mapping QTL for triploid endosperm traits. Genetically, endosperm traits have several unique properties different from those of diploid traits. First, the endosperm is triploid and has a more complicated genetic constitution than the diploid plant. For a locus with two alleles, Q and q, four genotypes, QQQ, QQq, Qqq and qqq, are possible (Mo, 1987; Bogyo et al, 1988), whereas a diploid plant has only three possible genotypes. Second, the occurrence of the fertilized egg is the beginning of a new generation, so that the embryo and endosperm of a plant represent the next generation. Third, the endosperm genotype of a hybrid coming from one mating will differ from that of the reciprocal hybrid. Finally, each single endosperm has an independent genotype, which may be different from each other, and thus endosperm traits are separated based on seeds, whereas diploid traits in plants are separated based on plants. These unique differences associated with endosperm traits should be given sufficient consideration in mapping QTL.
In this study, we develop a statistical method for mapping QTL with attention particularly paid to these unique properties of endosperm traits. We assume that the genetic variance of an endosperm trait is controlled by the segregation of the triploid genome of the endosperm rather than the diploid genome of the maternal plant. The genotypes of the maternal plant, however, are used only for inferring the genotypes of the endosperm.
Theory and methods
Genetic models for endosperm traits
Based on the genetic characteristics of endosperm traits, various genetic models have been proposed to partition the genetic effects of endosperm traits (Gale, 1976; Mo, 1987; Bogyo et al, 1988; Foolad and Jones, 1992; Pooni et al, 1992; Zhu and Weir, 1994; Wu et al, 1998). In order for the paper to be self-contained, these models are summarized and described here. Consider the simplest case of one quantitative trait locus and two alternative alleles, Q and q, with increasing and decreasing effects, respectively. There are four possible genotypes and three genetic effects, a, d1 and d2, where a is the mean substitution effect of Q to q (called the additive effect), d1 is the interaction effect of QQ and q (called the first dominant effect), d2 is the interaction effect of Q and qq (called the second dominant effect). The four possible endosperm genotypes and their genotypic values are defined as follows: GQQQ = μ + 3/2a, GQQq = μ + ½a + d1, GQqq = μ − ½a + d2 and Gqqq = μ − 3/2;a, where μ is the mid-point or mean of the two homozygotes QQQ and qqq.
Linear model
Let yij be the phenotypic value of the jth endosperm on the ith F2 plant, which can be described by the following linear model:

where ɛij is the residual error distributed as N(0, σ2ɛ). Note that if the trait is indeed controlled by a single QTL and the genotype of the QTL is observed for every individual, the residual error purely reflects the random environmental noise. If the trait is controlled by multiple QTL, the residual error will contain the effects of other QTL not included in the model in addition to the environmental error. The independent variables, x0ij, x1ij, x2ij and x3ij, are defined as follows. For any genotype, x0ij = 1. For genotype QQQ, x1ij = 3/2; and x2ij = x3ij = 0; for genotype QQq, x1ij = ½,x2ij = 1 and x3ij = 0, for genotype Qqq, x1ij = −½,x2ij = 0 and x3ij = 1 and for genotype qqq, x1ij = −3/2;,x2ij = 0; and x3ij = 0. The above model cannot be taken as a working model because (i) x1ij, x2ij and x3ij are missing because of the inability to observe the QTL genotype and (ii) most endosperm traits cannot be measured quantitatively using a single endosperm sample, rather a mixed endosperm sample of many seeds is collected and measured together from a single plant. As a result, we need a working model to perform the estimation and test. The working model is

where

are the mean value of the yij, x0ij, x1ij, x2ij and x3ij, respectively, for ni seeds sampled from the ith maternal plant (a single F2 individual), IM stands for the F2 plant marker information. E(x0i|IM),E(x1i|IM), E(x2i|IM) and E(x3i|IM) are the expectations of x0i,x1i,x2i;x3i conditional on marker information, respectively, and ei is the residual error, different from ɛij. Note that E(x0i|IM)=1 for all i.
Let Xi=(x0i x1i x2i x3i) and b = (μ a d1 d2)T. The model can be expressed in matrix notation as

The expectation and variance of model (3) are

where

To derive the conditional expectations and variances, we need the conditional probabilities of the four possible genotypes of an endosperm given marker information of the maternal plant, denoted by Pi=(pi(111) pi(110) pi(100) pi(000))T for the four endosperm genotypes in the order of QQQ, QQq, Qqq and qqq for plant i. Let M1M1QQM2M2 and m1m1qqm2m2 be the joint genotypes of two inbred lines, P1 and P2, for three loci (two flanking markers and one QTL). Let r be the recombination fraction between the two markers, and r1 and r2 be the recombination fractions of the QTL with the two markers. The F1 hybrid plant has a genotype of M1m1QqM2m2, which produces eight possible gametes if no interference is assumed. From these gametes, we can easily derive the QTL genotypes and their frequencies of the F2 plant conditional on flanking markers (Haley and Knott, 1992). Note that we are now to map QTL underlying endosperm trait instead of common diploid plant trait. Therefore, we need to derive the corresponding conditional probabilities of QTL genotypes of the F3 endosperms from an F2 maternal plant conditional on marker genotypes of the F2 plant. These conditional probabilities are given in Table 1. If the QTL genotype of the F2 plant is QQ, all the F3 endosperms are QQQ; if the F2 QTL genotype is qq, all the F3 endosperm are qqq; if the QTL genotype of the F2 plant is Qq, there are four possible endosperm QTL genotypes, QQQ, QQq, Qqq and qqq, with an equal probability (1/4). These probabilities are combined with the conditional probabilities of conventional diploid QTL mapping to form the corresponding conditional probabilities of F3 endosperm QTL (Table 1). From this table we can find Pi for individual i and are now ready to derive the variances and covariances of the x variables.
Define a 4 × 4 matrix H as

and let Hl be the lth row of matrix H. We can now connect Xi with H by Xi=Hl for l=1,…4, if an endosperm from plant i takes the lth ordered triploid genotype. With the above definitions, we have

where diag(Pi) denotes a diagonal matrix with the diagonal elements filled with vector Pi.
Weighted least-squares estimation
Define X = X1T,X2T,…,XkT)T as a k × 4 design matrix for all the k plants, U=E(X|IM), R=diag(R11, R22, …, Rkk) as a diagonal matrix and y=(y1, y2,…,yk)T. The parameters can be estimated using the iteratively reweighted least-squares (IRWLS) method (Xu, 1998a,1998b). Given an initial guess of b and σɛ2, matrix R is treated as known. Conditional on R, the solutions for the parameters are

and

Since R depends on unknown parameters, it must be updated by the estimates of the parameters and the estimation is then repeated until a certain criterion of convergence has been reached.
The variance–covariance matrix of the estimate b is

The variance–covariance matrix of b̂ is used to construct the test statistic for QTL detection. For example, to test the hypothesis H0: a = d1 = d2 = 0, we define

so that the null hypothesis can be redefined as H0:LTb̂ = 0. The test statistic for this hypothesis is F = L b̂T[LTV(b̂)L]−-1b̂ LT. Under the null hypothesis this test statistic will follow approximately a χ2 distribution with three degrees of freedom. Other hypotheses can be tested by redefining matrix L. For example, to test H0 : a=0, one simply defines a new L matrix as L = (0 1 0 0)T. Xu 1998a,1998b showed that the above test statistic is very close to a likelihood ratio test statistic and thus can be converted into an LOD score as usually done in QTL mapping. The relation between the F-like statistic and the LOD score is LOD = F/4.61, which is used in the following simulation studies.
Note that because E(x2i|IM)=E(x3i|IM) in the F3 endosperm generation, the two dominance effects, d1 and d2, cannot be estimated separately by IRWLS method; rather, they are combined as a single dominance effect. If we want to estimate these two dominance effects separately, we must measure the endosperm trait for each seed and then use a maximum likelihood (ML) method.
ML estimation
The ML method can be implemented via the EM algorithm. If Xj were observed for every endosperm individual, the MLE of the parameters could be found explicitly in a single step using the following equations:

where N is the number of endosperms measured for the entire mapping population. For example, if there are k plants each with n seeds, then N=kn.
In the case where Xj is missing but the distribution of Xj is given, the EM algorithm can be adopted to take advantage of the above equations. The EM equations simply replace all the terms related to Xj by their expectations, that is,

The expectations are obtained conditional on both marker information and the phenotypic value yj. The connection between the phenotype and the QTL genotype is through the three genetic parameters, but the parameters are what we are trying to find. Therefore, we need iterations on equation (5) by providing some initial values of the parameters to start the iteration. This is the EM algorithm. The E-step is to find the expectations and the M-step is to invoke equation (5) for iterations.
Denote the probability of Xj conditional on marker information by Pr(Xj = Hl|IM). This probability is simply Pj=(pj(111) pj(110) pj(100) pj(000))T and may be called the prior probability. After incorporating the phenotypic value, we obtain the posterior probability, denoted by

where

The expectations are actually obtained using the posterior probabilities rather than the prior probabilities. Therefore,

and

Likelihood ratio test
Define the log-likelihood value evaluated at the MLE of parameters as

This is also called the likelihood value under the full model. We need the likelihood value under H0 :LTb=0 (the restricted model) to test null hypothesis of no genetic effects. Let the likelihood value under the restricted model be

The likelihood ratio test statistic is

Simulation studies
Designs of simulation experiments
We designed two simulation experiments, one for a single chromosome (design I) and the other for an entire genome with 12 chromosomes (design II). In design I, the single chromosome is covered by a given number of evenly spaced codominant markers covering 150 cM. A single QTL is located at position 75 cM on the chromosome with the following effects: a=4, d1=d2=2 and μ = 20. We used various sizes of residual variance to control the desired levels of the heritabilities. Factors considered include (1) marker density, (2) QTL heritability, (3) population size and (4) number of endosperms per plant. Marker density was simulated at two levels: 16 and six markers, which correspond to distances of 10 and 30 cM, respectively, between consecutive markers. QTL heritability was simulated at two levels: 10 and 30%. Sample size of the F2 population was simulated at three levels: 50, 100 and 200. Number of endosperms collected per plant was simulated at three levels: 5, 10 and 20. The total number of treatment combinations is 2 × 2 × 3 × 3=36.
In design II, a genome consisting of 12 chromosomes was simulated. The number of markers per chromosome and the marker locations were generated randomly. The simulated linkage map information is listed in Table 2. This setup actually mimics the linkage map of the rice genome. In this particular experiment, we simulated six QTL distributed along five of the 12 chromosomes. Their sizes and locations are given in Table 3. The total phenotypic variance of the endosperm trait explained by the six QTL is 50%. The population size is 200 and the number of seeds per plant is 20.
Each treatment combination of the simulation experiments was repeated 100 times. The standard deviation of an estimated parameter among the 100 replicates provides a measure of the standard error of parameter estimation. The statistical power is determined by counting the number of replicates that have a test statistic (LOD score) greater than the empirical critical values obtained from analyses of 1000 additional samples simulated under the null model (zero heritability).
Results
Table 4 shows the means and standard deviations of the estimated QTL effects and locations as well as the empirical powers calculated from 100 repeated simulations under different marker density, heritabilities and sampling strategies in design I. The results show the general trends of expectation: denser marker maps, higher QTL heritabilities and larger sample sizes tend to produce more accurate and precise estimates, and lower heritabilities, especially with smaller sample sizes, produces less accurate estimates with large estimation errors. In addition, the estimated positions of QTL tend to be biased toward the middle of the chromosome. Estimates of the additive and dominant effects are reasonably close to the true value. It is surprising to see how powerful the methods are in detecting the QTL. Even though the QTL only explains 10% of the trait variation and only 50 F2 plants, the power is almost 100% when the number of endosperms is 20. When the sample size of F2 population is 100 or more, the powers are all close to 100% in different treatments. Only two out of the 36 treatment combinations show a statistical power less than 80%.
The mean estimates and standard deviations of locations and effects of the six QTL and corresponding statistical powers in design II are listed in Table 5. From this table, we see that only two out of the six QTL, qtl1 and qtl3, have powers less than 100% (72 and 71%, respectively, for the two QTL) and all other QTL have 100% power. This is not surprising because the heritabilities of the two QTL are less than 5%. The effect and position estimates of the four larger QTL are fairly accurate and precise. The estimated location of qtl4 shows some deviation from the true value. This may be explained by the fact that there are two QTLs on chromosome 10 and qtl4 is smaller than qtl5 on the same chromosome. The LOD score profiles of a random sample out of the 100 random samples are shown in Figure 1. This figure shows an expected output in real data analysis for endosperm mapping.
For the data simulated from design I, we also performed the ML analysis for some treatments, hoping to separate the two dominance effects. The main results are in accordance with those reported above by the IRWLS method. However, the ML method was indeed able to separate the two dominance effects. For example, when the marker density is 10 cM, QTL heritability is 10%, population size is 100 and number of F3 endosperm per plant is 10, the empirical power is 100%, for the ML method. The MLE of QTL position, additive effect, and first and second dominance effect are 74.37±3.61, 3.99±0.33, 2.26±2.12 and 1.75±2.31, respectively.
Discussion
Endosperm traits belong to a group of characters that determine the grain quality in cereals and they are tremendously important to human nutrition. Genetic improvement of such endosperm traits has received considerable attention in plant breeding (Benner et al, 1989; Sadimantara et al, 1997; Mazur et al, 1999). Quantitative genetics models for analyzing the triploid inheritance of endosperm traits have been developed and applied to practical breeding populations in cereals (Gale, 1976; Mo, 1987; Bogyo et al, 1988; Foolad and Jones, 1991; Pooni et al, 1992; Zhu and Weir, 1994; Wu et al, 1998). However, these traditional methods were not designed for QTL mapping; rather they were developed for analyzing the overall contribution of the genetic variance to the phenotypic variance. Traditional QTL mapping methods for diploid traits have been applied to mapping endosperm traits (Tan et al, 1999; Wang and Larkins, 2001; Wang et al, 2001). The assumption was that the genetic variance of an endosperm trait is controlled by the segregation of QTL in the diploid maternal plants. Two precautions should be clarified when a diploid mapping procedure is used for mapping a triploid trait. Firstly, the DNA markers detected (eg, using F2 or BC plants) and traits measured (using F3 or BC selfing seeds on F2 or BC plants) are not measured in the same generation. Therefore, the application of a diploid mapping model to endosperm traits is identical to mapping the maternal effects of the trait in question. Secondly, there is no reason to believe that an endosperm trait is only controlled by the genotype of the maternal plant and that there is no contribution from the genotype of the endosperm tissue itself. For the first time, we here consider the triploid control mechanisms of the endosperm traits for QTL mapping and develop the appropriate probability model to infer the triploid genotype from the diploid marker genotypes of the maternal plant.
The proposed IRWLS mapping method is a second-order approximation to ML, was first proposed by Xu (1998a, 1998b), and was demonstrated and compared with the simple linear regression (REG) method proposed by Haley and Knott (1992) and ML method proposed by Lander and Botstein (1989) via Monte Carlo simulation. We chose the IRWLS for two reasons: (1) it is faster than ML and better than the ordinary least-squares method, as shown by Xu (1998a,1998b); (2) the endosperm trait is measured as the average of several seeds and it is hard to model the average value using the mixed distribution model. However, if an endosperm trait is measured using a single endosperm sample, the ML mapping method can be used to estimate all genetic effects of endosperm QTL, including the two different dominance effects. Since the purpose of this study is not to compare the efficiencies of different statistical methods, but to apply existing methods to map QTL for endosperm traits, we paid more attention to the genetic model and the implementation of the IRWLS considering that most endosperm traits cannot be measured using a single endosperm sample.
The next step of endosperm mapping is to consider both the maternal diploid genotype and the triploid endosperm genotypes jointly. The model will become more complicated because there will be five genetic effects involved, which are maternal additive effect, maternal dominant effect, endosperm additive, endosperm first dominance and endosperm second dominance effects. This project is currently under investigation and will be reported in a separate paper.
References
Benner MS, Phillips RL, Kirhara JA, Messing JW (1989). Genetic analysis of methionine-rich storage protein accumulation in maize. Theor Appl Genet 78: 761–767.
Bogyo TP, M Lance RC, Chevalier P, Nilan RA (1988). Genetic models for quantitatively inherited endosperm characters. Heredity 60: 61–67.
Foolad MR, Jones RA (1992). Models to estimate maternally controlled genetic variation in quantitative seed characters. Theor Appl Genet 83: 360–366.
Gale MD (1976). High α-amylase breeding and genetical aspects of the problem. Cereal Res Commun 4: 231–243.
Haley CS, Knott SA (1992). A simple regression method for maooing quantitative trait loci in line crosses using flanking markers. Heredity 69: 315–324.
Jansen RC (1993). Interval mapping of multiple quantitative trait loci. Genetics 135: 205–211.
Kao CH, Zeng ZB (1997). General formulas for obtaining the MLEs and the asymptotic variance–covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53: 653–665.
Lander ES, Botstein SD (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199.
Martinez O, Curnow RN (1992). Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers. Theor Appl Genet 85: 480–488.
Mazur B, Krebbers E, Tingey S (1999). Gene discovery and product development for grain quality traits. Science 285: 372–375.
Mo HD (1987). Genetic expression for endosperm traits. Proceedings of the Second International Conference on Quantitative Genetics Sinaur Associates, MA, pp 478–487.
Pooni HS, Kumar I, Khush GS (1992). A comprehensive model for disomically inherited metrical traits expressed in triploid tissues. Heredity 69: 166–174.
Sadimantara GR, Abe T, Sasahara T (1997). Genetic analysis of high molecular weight proteins in rice (Oryza sativa L.) endosperm. Crop Sci 37: 1177–1180.
Tan YF, Li JX, Yu SB, Xing YZ, Xu CG, Zhang Q (1999). The three important traits for cooking and eating quality of rice grains are controlled by a single locus in an elite rice hybrid, Shanyou 63. Theor Appl Genet 99: 642–648.
Wang XL, Larkins BA (2001). Genetic analysis of amino acid accumulation in opaque-2 maize endosperm. Plant Physiol 125: 1766–1777.
Wang XL, Woo YM, Kim CS, Larkins BA (2001). Quantitative trait locus mapping of loci influencing elongation factor 1 alpha content in maize endosperm. Plant Physiol 125: 1271–1282.
Wu HP, Chen YS, Chao YT (1998). Studies on the genetic model of cytoplasmic and endospermic effect on quantitative characters of plant. Chin Agron J 8: 7–16.
Xu S (1998a). Iteratively reweighted least squares mapping of quantitative trait loci. Behav Genet 28: 341–355.
Xu S (1998b). Further investigation on regression method of mapping quantitative trait loci. Heredity 80: 364–373.
Zeng ZB (1994). Precision mapping of quantitative trait loci. Genetics 136: 1457–1468.
Zhu J, Weir BS (1994). Analysis of cytoplasmic and maternal effects. 2. Genetic models for triploid endosperms. Theor Appl Genet 89: 160–166.
Acknowledgements
This research was supported by the National Natural Science Foundation of China (Grant 39900080) to CX and the National Institutes of Health Grant GM55321, and the USDA National Research Initiative Competitive Grants Program 00-35300-9245 to SX.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xu, C., He, X. & Xu, S. Mapping quantitative trait loci underlying triploid endosperm traits. Heredity 90, 228–235 (2003). https://doi.org/10.1038/sj.hdy.6800217
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/sj.hdy.6800217
Keywords
This article is cited by
-
Mixed linear model approach for mapping quantitative trait loci underlying crop seed traits
Heredity (2014)
-
Mapping epistatic quantitative trait loci underlying endosperm traits using all markers on the entire genome in a random hybridization design
Heredity (2008)
-
The QTL analysis on maternal and endosperm genome and their environmental interactions for characters of cooking quality in rice (Oryza sativa L.)
Theoretical and Applied Genetics (2008)
-
Mapping quantitative trait loci for binary trait in the F2:3 design
Journal of Genetics (2008)
-
A mixture model approach to the mapping of QTL controlling endosperm traits with bulked samples
Genetica (2007)



