“Least-squares” Estimation of Distribution Mixtures

BARTLETT, M. S.; MACDONALD, P. D. M.

doi:10.1038/217195b0

Letter
Published: 01 January 1968

“Least-squares” Estimation of Distribution Mixtures

M. S. BARTLETT¹ &
P. D. M. MACDONALD¹

Nature volume 217, pages 195–196 (1968)Cite this article

63 Accesses
8 Citations
Metrics details

Abstract

THE statistical estimation of the unknown fractions Π₁ (i = 1, … k), and, when unknown, further parameters θ_j (j = 1, … m), in a mixed population which is denoted by its cumulative distribution (taken to be univariate) is, in general, cumbersome. It therefore seems worth noting the convenience and comparative efficiency, at least in relation to the estimation of Π_i, of estimators of “least-squares” type, defined by the minimization of the integral where F_s(x) denotes the cumulative distribution of the empirical sample (from, for example, n independent observations), and G(x) is a suitable increasing function of x. Taking for simplicity the case m = 0, the estimators of the Π_i are automatically unbiased with exact variances readily calculable, and with asymptotic normality. For example, if k= 2, we have an estimate p₁ for Π₁, for example, where H=(dF₁−dF₂)/dG, with the expected value E(p₁)=Π₁, and variance A considerable choice of H is possible. Thus, considering for definiteness the density case, we define the unweighted least-squares estimator of Π₁ by the choice dG = dx, H=f₁−f₂. We should, however, expect weighted estimators to be better. In fact, because the variance of dF_s is fdx/n, we should try to choose dG=fdx; the term ∫HdF in the expression for σ²(p₁), arising from the covariance of dF_s(x) and dF_s(y), then vanishes, and σ²(p₁) becomes identical with the reciprocal of the information function I(Π₁). If we take dG =Π₀dF₁+(1−Π₀)dF₂, then this weighting will be most efficient when Π₀ is close to Π₁. Thus if we suspect Π₁ to be near 1, ½ or 0, suitable choices for Π₀ would be 1, ½ or 0, respectively. An alternative to ½(dF₁+dF₂) is the geometric mean of dF₁ and dF₂, and another is max(dF₁, dF₂). The geometric mean has some convenience over the arithmetic mean in theoretical investigations of efficiency (for example, for f₁ and f₂ normal), but the latter (or alternatively max(dF₁, dF₂)) has the advantage of approximating to the maximum likelihood estimator, for any value of Π₁, in both the extreme cases of f₁ and f₂ well separated and of f₁→f₂. It has been shown by Hill¹ that the information function I(Π₁) for Π₁ may be written n[1−S(Π₁)]/(Π₁Π₂), where If f₁ and f₂ differ by some parameter μ (for example, the mean), and Δμ=μ₂−μ₁, then also as f₁→f₂ where I(μ) is the information function for μ. Thus for the maximum likelihood estimator Π̂₁ the variance, which is Π₁Π₂/n for f₁ and f₂ well separated, is 1/[n(Δμ)²I(μ)] as f₁→f₂.