Abstract
We propose two novel logarithmic ratio–type estimators for the finite-population mean under simple random sampling without replacement (SRSWOR). The estimators integrate a logarithmic transformation of the auxiliary variable to stabilize variance, reduce the influence of outliers, and better capture nonlinear relationships between study and auxiliary variables. We derive closed-form expressions for first-order bias and mean squared error (MSE) and obtain analytic expressions for the optimal tuning constants by direct minimization of the approximate MSE. A comprehensive numerical study, comprising five real engineering datasets and extensive Monte-Carlo simulations from multivariate normal, log-normal and gamma populations, evaluates finite-sample behavior across a range of sample sizes and correlation structures. The proposed estimators consistently reduce MSE and deliver large percent-relative-efficiency (PRE) gains relative to the classical sample mean and common competitors (empirical PREs ≈ 283; simulation PREs up to ≈ 670), with especially large and stable improvements under skewed or heavy-tailed populations. Theoretical formulas and simulation evidence align closely, showing robustness to nonlinearity and skewness while retaining simple implementation for practitioners. Results are derived under SRSWOR using first-order approximations; extensions to higher-order corrections, stratified and two-phase designs, and uncertainty in auxiliary means are recommended for future work.
Similar content being viewed by others
Introduction
Accurate estimation of population means from sampled data lies at the heart of survey statistics and many applied fields, from official statistics and environmental monitoring to engineering quality control and experimental sciences1. When using simple random sampling (SRS) to collect data on system performance or material properties, utilizing supplementary information, such as known historical measurements or operating conditions, can enhance estimator efficacy. Estimation methods such as the ratio estimator and logarithmic-ratio type estimator utilize this auxiliary data to adjust the primary estimates, reducing the mean squared error (MSE) compared to baseline classical sample means2. For instance, in structural health monitoring, using temperature as an auxiliary variable can improve stress estimation in materials under load3. For a variety of reasons, statisticians urge the incorporation of supplemental information during the estimation step. For instance, the pioneer work in this regard is due to Cochran4. Since auxiliary information are available to the researcher, they can be used effectively at either, the design stage or estimation stage5,6. Since the supplementary information explain variation in the main study variable due to their correlation, it utilizes extra information and enhances the efficiency of estimates7. The method of estimation of parameter rely on the nature of relation between the survey and supplementary variable. When the auxiliary variable has a positive correlation with the primary study variable the ratio-method of estimate performs efficiently8. Product form estimators, on the other hand, typically perform better when the regression streak crosses the origin and the connection between the study and auxiliary variables is linear and strongly negative. These considerations underscore the importance of selecting an appropriate estimator based on the nature of the statistical association, including both the direction and the structure of relationship9,10,11,12,13,14.
The concept of using auxiliary information to enhance estimation accuracy was first introduced by Cochran4, who showed that incorporating related information can substantially improve the precision of survey estimates.. Later, Bahl and Tuteja15, expanded on this concept by proposing exponential ratio and product-type estimators, marking a major step forward in the effective use of auxiliary variables. These estimators exploit the functional relationship between the study and auxiliary variables to obtain more efficient and accurate estimates of population parameters, especially when the variables are highly correlated. Since then, numerous researchers have expanded and refined these concepts in the field of survey sampling. Notable contributions include those by Izunobi and Onyeka16, Kadilar and Cingi14, Singh et al.17, Khoshnevisan et al.18, Onyeka et al.19, Singh et al.20, Bhushan et al., Gupta and Shabbir21, Azeem et al.22, Sher et al.23, Ahmad et al.20, and Subramani25 long et al.15. Building on this extensive body of work, the present study extends the use of auxiliary information by developing logarithmic ratio and product-form estimators, aimed at further improving the estimation of population means.
Although several ratio and product-type estimators have been developed to improve the efficiency of population mean estimation using auxiliary information, there is still a gap in understanding how to effectively combine auxiliary variables with the study variable to achieve the greatest efficiency gains. For example, most of these estimators rely on linear or conventional transformations that may not fully exploit the structure of non-linear between study and auxiliary variables. In particular, limited attention has been given to the use of logarithmic transformations within the ratio estimation framework under simple random sampling. Furthermore, existing logarithmic estimators often lack general applicability and are rarely validated on real-world engineering datasets, where non-linearity and high correlations are common. Thus, there is a need to develop a more flexible and efficient estimator that integrates logarithmic transformation with ratio-type estimation and to evaluate its performance through theoretical comparison and empirical validation16.
Novelty and significance
This study proposes a logarithmic ratio-type estimator for the population mean under simple random sampling without replacement (SRSWOR), specifically designed for engineering applications where relationships between study and auxiliary variables are complex and intricate. By applying a logarithmic transformation to the auxiliary variable, the estimator linearizes variation e.g. \(Y\approx \alpha {X}^{\beta }\), stabilises variance, reduces the influence of outliers which causes nonlinearity. We derive closed-form expressions for the bias and mean square error (MSE) up to the first-order approximation under SRSWOR, and conduct a numerical assessment, based on the correlation, coefficients of variation, and skewness of the auxiliary variable, to evaluate the efficiency gains of the proposed estimator obtained through the logarithmic transformation of the auxiliary variable. Finite-sample performance is assessed through Monte Carlo experiments that emulate engineering sampling conditions (varying sample size and correlation levels) and also validated empirically on real engineering datasets. The estimator’s simplicity, theoretical grounding, and robustness make it a readily applicable improvement for mean estimation in engineering quality control, monitoring, and experimental studies, with straightforward extensions to stratified and two-phase sampling designs.
The structure of the paper is as follows:
The remainder of the paper proceeds as follows. Section 2 formalizes notation, recalls relevant classical estimators, and sets up the linearization framework under SRSWOR. Section 3 introduces the two proposed logarithmic ratio–type estimators and develops their first-order bias and MSE; closed-form optimal constants are derived there. Section 4 discusses analytic efficiency conditions and compares the new forms with established estimators. Section 5 presents empirical evaluations on five engineering datasets and a comprehensive Monte Carlo study across Normal, Lognormal, and Gamma populations. Finally, Sect. 6 summarizes the implications, limitations, and practical recommendations for applying the proposed estimators in engineering and survey practice.
Methodology
Examine a random sample of n units selected from a population of \(\left(N={N}_{1}, {N}_{2},\dots {N}_{i}\right)\) units using the simple random sampling without replacement (SRSWOR) framework. Assuming that y is the research variable and x is the auxiliary variable, a sample of size n is chosen from a population of size N using simple random sampling without replacement (SRSWOR). Assuming that the population mean μₓ of the auxiliary variable is known, denote the study variable by y and the auxiliary variable by x. To enhance the estimation of the population, mean of the research variable y, the estimators consider data from the auxiliary variable. It aims to decrease sampling error and increase precision by fusing the sample statistics with the known population mean of the auxiliary variable x assuming that the auxiliary variable’s population mean, \({\mu }_{x}\), is known.
Let us define the sampling error as follows:
\({\mu }_{x}=\frac{1}{N}\sum_{i=1}^{N}\left({X}_{i}-\mu \right)\) and \({\mu }_{y}=\frac{1}{N}\sum_{i=1}^{N}\left({Y}_{i}-\mu \right),\)
\({e}_{0}=\frac{\overline{y }-{\mu }_{y}}{{\mu }_{y}}\), and \({e}_{1}=\frac{\overline{x }-{\mu }_{x}}{{\mu }_{x}}\), such that \(E({e}_{0})\) = \(E\left({e}_{1}\right)=0\), \(E\left({e}_{0}^{2}\right)=\frac{V\left(\overline{y }\right)}{{\mu }_{y}}\) = \(\left( {\frac{{1 - n}}{n}} \right)\frac{{S_{y}^{2} }}{{\mu _{y}^{2} }}\) , and \(E\left({e}_{1}^{2}\right)=\frac{V\left(\overline{x }\right)}{{\mu }_{x}}\) = \(\left( {\frac{{1 - n}}{n}} \right)\frac{{S_{x}^{2} }}{{\mu _{x}^{2} }}\), and \(E\left( {e_{0} e_{1} } \right) = \left( {\frac{{1 - n}}{n}} \right)\frac{{S_{{yx}} }}{{\mu _{y} \mu _{x} }}.\)
Existing estimators
The sample statistics and the known population mean of the auxiliary variable are combined to create the classical estimator. thereby aiming to reduce the sampling error and enhance estimation precision, the classical is given in the following Eq. (1) as:
where \(\lambda\) is the \(fpc\) (finite population correction factor) given by:
The classic ratio-type estimator was first presented by Cochran (Cochran, 1940) and is represented as follows in Eq. (2):
The estimator t is biased for the population mean μx, and its bias and MSE are determined up to the first-order approximation may be written as follows, where y and x stand for the sample means of the study and auxiliary variables, respectively:
where \({S}_{y}^{2}= \frac{1}{N-1}\sum_{i=1}^{N}{\left({Y}_{i}-{\mu }_{y}\right)}^{2}\) is population variance of y, \({S}_{x}^{2}= \frac{1}{N-1}\sum_{i=1}^{N}{\left({X}_{i}-{\mu }_{x}\right)}^{2}\) is the population variance of x, \({S}_{yx}= \frac{1}{N-1}\sum_{i}^{N}\left({X}_{i}-{\mu }_{x}\right)\left({Y}_{i}-{\mu }_{y}\right)\) is the population covariances of x and y, and \({\text{n}} = \frac{n}{N}\) and \(\text{D}= \frac{{\mu }_{x}}{{\mu }_{y}}\) , represent the sampling fraction and population ratios, respectively.
The product type estimator in the context of utilizing single auxiliary variables, suggested by Subramani and Kumarapandiyan14, and is given as follows:
With MSE given by:
Likewise, a single exponential ratio-type estimator for the population mean \({\mu }_{y}\) was proposed by Bahl and Tuteja17 and is defined as follows:
The bias and mean square error of the exponential ratio-type estimator \({\overline{\text{y}} }_{\text{p}}\), which is biased for \({\mu }_{y}\), with MSE given by:
Similarly, Gupta and Shabbir18 developed the difference type regression estimator given by:
where \({\varphi }_{1= }\frac{{s}_{yx}}{{s}_{x}^{2}}\) is the coefficient of simple regression linked to \({\beta }_{1= }\frac{{S}_{yx}}{{S}_{x}^{2}}\) and the population mean. The MSE of regression estimator are presented in Eq. (5).
The pioneer work on the use of log-auxiliary variable in mean estimation was due to Izunobi and Onyeka19, which is given by:
\({\text{T}}_{5}=\frac{{\mu }_{y}Ln(\overline{y })}{Ln\left(\overline{x }\right)}\), \(Ln\left({\mu }_{x}\right)\ne 0\) and \(Ln\left(\overline{x }\right)\ne 0\)
With MSE given by:
where \(k=\frac{1}{Ln\left({\mu }_{x}\right)}\)
The innovative use of log-transformed auxiliary variable was found in work by Mishra et al20. by developing the following mean estimators:
With MSEs respectively given by:
where \(\alpha =-\left(\frac{\overline{Y}\rho {C }_{y}}{{C}_{x}}\right)\), \(A={\overline{Y} }^{2}\theta {C}_{y}^{2},\) \(B=\theta {C}_{x}^{2}\), \(C={\overline{Y} }^{2}\theta {C}_{y}^{2}\), \(D=\overline{Y}\theta \rho {C }_{y}{C}_{x}\), \(E=\overline{Y }\theta \left(\rho {C}_{y}{C}_{x}-\frac{{C}_{x}^{2}}{2}\right).\)
On the same line most recently Singh and Tiwari21, introduce the following adapted mean estimator in SRSWOR scheme:
With MSE given by:
where
Although the effectiveness of parameter estimates is much increased by the logarithmic-ratio type estimators, log-ratio estimates still require improvement, which is the main goal of our work. The development of enhanced log-ratio type estimators of finite population mean under the SRSWOR scheme is the focus of the following section.
Proposed class of estimators
Taking motivation from Izunobi and Onyeka19, Sher et al.22, and Eisa et al.1, Under SRSWOR, two estimators of the logarithmic ratio type are created for the finite population mean.
The estimator increases the estimate process’s efficiency by employing the auxiliary variable’s logarithmic transformation, particularly in cases when there is a nonlinear relationship between the study and auxiliary variables.
Asymptotic properties of the proposed estimators
Rewriting Eq. (10) and (11) in term of error due to sampling, given in "Methodology", as following:
Expanding the series and after simplification of Eqs. (12) and (13), we obtain the following:
we subtract the population mean from each estimator, express the resulting deviations in terms of sampling errors, perform a Taylor expansion up to first order, and then take expectations to obtain the bias.
Taking expectation of (14) and (15), to obtain Biases of estimator TP1 and TP2 respectively, as given by:
Squaring both side of Eqs. (14) and (15), and after simplification, we get:
Which, after simplifications, we obtain:
where,
\({A}_{1}=1+\lambda {C}_{y}^{2}+\frac{11}{12}\lambda {C}_{x}^{2}-2\lambda {C}_{x}{C}_{y}\),\({B}_{1}=1+\frac{11}{12}\lambda {C}_{x}^{2}\), \({C}_{1}=1+\frac{1}{3}\lambda {C}_{x}^{2}-\lambda {C}_{x}{C}_{y}\), \({D}_{1}=1+\frac{1}{3}\lambda {C}_{x}^{2}\), and \({E}_{1}=1+\frac{11}{12}\lambda {C}_{x}^{2}-\lambda {C}_{x}{C}_{y}\), \({A}_{2}=\frac{\alpha \hspace{0.17em}\left(1+\beta \right)\hspace{0.25em} +\hspace{0.25em} {\alpha }^{2}}{64},{B}_{2}=\frac{\alpha \hspace{0.25em} +\hspace{0.25em} {\alpha }^{2}}{64},{E}_{2}=\frac{\alpha \hspace{0.25em} +\hspace{0.25em} {\alpha }^{2}}{32},{C}_{2}=\frac{\alpha }{4},\boldsymbol{}\alpha =\uplambda {C}_{x}^{2},and\upbeta =\lambda {C}_{y}^{2}\).
To minimize MSE of T_{P1}, we use calculus rule by differentiating it with respect to \({k}_{1}\), \({k}_{2}\) and equating to zero, as following:
We get as system of equations given below:
Using Cramer’s rule to solve the system of equations, we get:
On the same line, we proceed for MSE of TP2 as following:
We obtain:
where, \({\Delta }_{1}={A}_{1}\hspace{0.17em}{B}_{1}\hspace{0.25em} -\hspace{0.25em} {{E}_{1}}^{2}, {\Delta }_{2}={A}_{2}\hspace{0.17em}{B}_{2}\hspace{0.25em} -\hspace{0.25em} {{E}_{2}}^{2}\)
Substituting the values of \({k}_{1}\), \({k}_{2},\) \({k}_{3}\) and \({k}_{4}\) into Eq. (20) and Eq. (21) respectively, yields the following expressions of minimum MSEs:
The proposed estimators are highly significant than their counterparts, which will be assessed numerically in the coming sections with different real as well as synthetic data sets.
Efficiency comparison
The following criteria must be met for our suggested estimate to be more effective than the competing estimators: its mean square error (MSE) must be lower than that of the competing estimators.
Conditions
From Eq. (1) and Eq. (24) and (25), The classical mean estimator \({\mu }_{y}\) will be outperformed by the suggested estimators if:
or
where \({\theta }_{i}=\frac{{A}_{i}{{D}_{i}}^{2}+{B}_{i}{{C}_{i}}^{2}-2{C}_{i}{D}_{i}{E}_{i}}{{A}_{i}{B}_{i}-{{E}_{i}}^{2}}\) , i = 1,2 corresponding to the first and second proposed estimator.
By comparing Eq. (2) and Eq. (24) and (25), \(MSE\left({T}_{P\text{1,2}}\right)< MSE\left({T}_{1}\right)\) if:
By comparing Eq. (3) and Eq. (24), and (25) \(MSE\left({T}_{P\text{1,2}}\right)< MSE\left({T}_{2}\right)\) if:
By comparing Eq. (4) and Eq. (24), and (25), \(MSE\left({T}_{P\text{1,2}}\right)< MSE\left({T}_{3}\right)\) if:
By comparing Eq. (5), (7) and Eq. (24), and (25), \(MSE\left({T}_{P\text{1,2}}\right)< MSE\left({t}_{\text{4,6}}\right)\) if:
By comparing Eq. (6) and Eq. (24), and (25), \(MSE\left({T}_{P\text{1,2}}\right)< MSE\left({T}_{5}\right)\) if:
By comparing Eq. (8) and Eq. (24), and (25), \(MSE\left({T}_{P\text{1,2}}\right)< MSE\left({T}_{7}\right)\) if:
By comparing Eq. (9) and Eq. (24), and (25), \(MSE\left({T}_{P\text{1,2}}\right)< MSE\left({T}_{8}\right)\) if:
The subsequent section confirm that these conditions hold true for all types of real data, when the main study variable is positively correlated with the auxiliary variable.
Numerical illustration
Using PREs as the performance metric, empirical and simulated experiments are conducted to assess the effectiveness of the suggested and competing estimators. To assess the performance of the proposed and competing estimators, we have used PRE (Percent Relative Efficiency) as a performance index, given by
where T0 is the classical/usual estimator and \({T}_{..}\) are the competing and proposed estimators.
Empirical study
To assess the practical performance of the proposed logarithmic ratio–type estimators we conducted an empirical evaluation on five real-world engineering datasets. These datasets were chosen to represent a range of sample sizes, correlation structures, coefficients of variation, and degrees of skewness that commonly occur in engineering and environmental applications (see Table 1 for summary statistics). The datasets comprise: (i) Ozone vs. Solar Radiation [Source: Chambers23], (ii) Ozone vs. Temperature[Source: Chambers23], (iii) Air Time vs. Distance[Source: Wickham24], (iv) Total Deaths vs. Total Vaccinations[Source: Subramanian and Kumar25], and (v) Sales: Capital vs. Pindex[Source: Kadilar and Cingi26]. Each data source and its salient descriptive statistics (population/sample sizes, sample means, coefficients of variation, and sample correlation \({\rho }_{yx}\)) are summarized in Table 1. Figure 1a–e shows scatterplots of the study variable.
\(Y\) against the auxiliary variable \(X\) for the five empirical datasets (Fig. 1a: Ozone vs Solar Radiation; Fig. 1b: Ozone vs Temperature; Fig. 1c: Air Time vs Distance; Fig. 1d: Total Deaths vs Total Vaccinations; Fig. 1e: Sales: Capital vs Pindex), illustrating the varying strengths of association and distributional features that motivate the proposed log-ratio estimators. All empirical comparisons use simple random sampling without replacement (SRSWOR) as the design framework, and we evaluate estimator accuracy with Percent Relative Efficiency (PRE) relative to the classical sample mean:
Table 2 and its corresponding plot Fig. 2 show that the proposed estimators \({\text{T}}_{\text{P}1}\) and \({\text{T}}_{\text{P}2}\) consistently achieve the highest PRE-values across all five real datasets, with TP2 reaching a peak of 282.7 on Data-5. The results confirm that the proposed estimators deliver superior efficiency in real-data applications compared to both the usual estimator and other traditional methods.
Simulation study
The simulation study was designed to (i) evaluate the finite-sample behaviour and stability of the proposed logarithmic ratio–type estimators (TP1, TP2 under controlled distributional regimes, (ii) compare their performance with a broad set of existing estimators (classical mean, ratio/product, exponential and other transformed forms), and (iii) verify whether the first-order analytical bias/MSE formulas and the closed-form optimal constants derived in Sect. 3 provide realistic guidance for finite samples. Synthetic finite populations size N = 1000 were generated from three distributional families that together span the canonical sampling regimes practitioners face: Multivariate Normal (symmetric, light tails), Gamma (positively skewed), and Lognormal (heavy-tailed). These choices allow assessment of estimator robustness to skewness and heavy tails. For each family we constructed three bivariate populations differing only in the study–auxiliary correlation \({\rho }_{yx}\in (0.92,,0.71,,0.38)\) (labelled Data-1, Data-2 and Data-3 respectively). All populations share a common mean vector and a covariance structure given by:
The core experiments draw repeated SRSWOR samples of sizes n ε50, 150, 200 from each synthetic finite population; additional sensitivity runs that include smaller \(n\) (e.g. \(n=20\)) and intermediate values were also performed to check small-sample behavior. Each simulation was repeated 10,000 times in RStudio, and estimator performance was evaluated using Percent Relative Efficiency (PRE), defined as:
Where T0 denotes the reference estimator and \({T}_{i}\) represents the \({i}^{th}\) competing estimator. The variance and mean squared error (MSE) were computed over the 10,000 replications as:
where R = 10,000 is the number of simulation replications.
The mean PRE-values across all replications and sample sizes were recorded and summarized in Table 3 to evaluate estimator efficiency under different distributional and correlation scenarios.
Table To further evaluate the robustness of the proposed estimators under non-normal and positively skewed settings, we conducted a simulation study using a Gamma-distributed finite population with a strong positive auxiliary correlation ρ ≈ 0.90. Figure 3 presents bar-plots of the replicate-wise percent relative efficiency (PRE) for all considered estimators relative to the classical estimator T0 across three sample sizes n ε 50, 150, 200. For each scenario, finite-population characteristics were employed to compute the tuning constants, and optimal parameters for T7and T8 were obtained via a data-driven numerical minimization of the empirical mean-squared error. The results demonstrate a clear and consistent dominance of the proposed estimators TP1 and TP2, both in terms of median PRE and distributional stability, especially as the sample size increases. The remaining logarithmic and exponential-type estimators also exhibit improved performance over the classical mean estimator; however, they are comparatively less stable and efficient than the proposed class. These findings reinforce the efficacy of the proposed estimators under skewed populations and high-correlation structures.
In Tables 6,7,8, we report uncertainty for PREs via both Monte-Carlo standard errors and nonparametric percentile confidence intervals (CI). For simulations we compute per-replicate PREs across RRR Monte-Carlo replicates and report the mean PRE with its Monte-Carlo standard error \((SE = sd(PRE)/\sqrt R )\)together with the empirical 95% percentile CI (2.5%–97.5% quantiles). For empirical/finite-population analyses, repeated SRSWOR resampling from the known population (or a design-consistent bootstrap when the population is not fully available) was used to estimate MSEs and PREs; the resulting replicate PREs are summarized with mean (± MC SE) and percentile CI.
Extremely high PRE-values and wide CIs for classical ratio/product-type estimators T1–T8 reflect instability under skewed Gamma populations, especially with small samples, as revealed by Table 3, 4 and 5. In contrast, the proposed log-ratio estimators TP1, TP2 demonstrate stable PRE, narrow confidence intervals, and robust behavior, confirming their suitability for non-normal data structures (Table 8). The proposed estimators therefore offer a computationally efficient and versatile framework for modern sampling applications, particularly where skewness, auxiliary information, and finite-population structures play a central role.
Results and discussion
In this article, we developed two log-ratio type estimators of finite population mean under SRSWOR scheme. We derived first order Bias and MSE for both estimators to theoretically validate our methodology. The empirical and simulation evidence together provide a consistent and interpretable picture of the proposed estimators’ performance. Table 1 contextualizes the real datasets by summarizing sample sizes, variability and correlation structures, and Table 2 shows that the proposed log-ratio estimators achieve uniformly higher percent-relative-efficiency (PRE) than the classical mean across all five empirical datasets. The simulation tables (Tables 3–8) systematically confirm these gains: as sample size increases from n = 50 to n = 200 the PRE of the proposed estimators rises markedly and they maintain top ranking across Data-I/II/III, with the corrected values for Table 4 removing parsing artefacts and reinforcing the same conclusion. Monte-Carlo summaries (Tables 6, 7, 8) quantify the uncertainty around mean PREs and show that estimator spread and Monte-Carlo standard errors shrink with larger n, indicating both higher efficiency and greater stability. Figures 1a–e visualizes the study–auxiliary relationships that motivate auxiliary-based adjustments, Fig. 2 highlights the empirical PRE advantages, and the box-plots in Fig. 4 demonstrate that the proposed estimators not only improve median PRE but also yield substantially narrower replicate-wise distributions under skewed Gamma populations. Finally, the three-panel grouped figure (n = 50, 150, 200) provides a direct visual comparison across sample sizes and confirms that the proposed estimators deliver robust, monotonic improvements in efficiency as sample size and correlation increase, making them practically attractive for survey and engineering applications where auxiliary information is available.The proposed estimators reduce to or improve upon several established estimators under special parameter choices, confirming theoretical coherence. Relative ranking is stable across scenarios: the two proposed forms dominate classical, product, and exponential competitors in both empirical and simulated environments.
Conclusion
We propose two logarithmic ratio–type estimators, TP1 and TP2, for finite-population mean estimation under simple random sampling without replacement (SRSWOR). Closed-form first-order expressions for bias and mean squared error (MSE) are derived and used to obtain analytic tuning constants. Empirical evaluation on five engineering datasets and extensive Monte-Carlo experiments (multiple distributions, correlations, and sample sizes) shows that the proposed estimators yield consistent and often large percent-relative-efficiency (PRE) gains over the classical sample mean and common competitors; gains increase with sample size and with positive study–auxiliary correlation. The estimators are computationally simple to implement and, when the auxiliary mean is known and the auxiliary variable is positively correlated with the study variable, offer a practical route to substantially reduced MSE and smaller required sample sizes for a given precision.
We also identify important limitations: the current theory assumes SRSWOR, known auxiliary means, and positive correlation between study and auxiliary variables, conditions that facilitate closed-form analysis but may be restrictive in some applied settings. To widen applicability, future research should (i) relax the known-mean assumption via calibration, model-assisted, or empirical-Bayes corrections; (ii) adapt the estimators to stratified and two-phase sampling by incorporating stratum- or phase-specific adjustment factors; (iii) derive higher-order bias/MSE approximations and robust variance estimators for small-sample inference; and (iv) address practical data issues such as nonresponse and measurement error. Finally, while our focus is parameter estimation rather than prediction, the efficiency principles developed here could be embedded within parameter-estimation components of machine-learning pipelines (e.g., regression weighting, EM steps) to improve stability and predictive performance. Overall, TP1 and TP2 provide a theoretically grounded and practically useful improvement for mean estimation when reliable auxiliary information is available.
Data availability
The data used and analyzed in this article are available in the published article as cited against each data sets in the numerical section of this article.
Abbreviations
- N:
-
Size of the population
- n:
-
Size of the sample
- \(S_{y}^{2}\) :
-
Variance of y
- \(S_{x}^{2}\) :
-
Variance of x
- \(s_yx\) :
-
Covariance
- \(C_y\) :
-
Coefficient of variation (y)
- \(C_x\) :
-
Coefficient of variation (x)
- \(C_yx\) :
-
Coefficient of covariance x, y
- \(\alpha \,and\,\beta\) :
-
Generalizing constants
- \(e_0,e_1\) :
-
Relative error in \(\bar{y} = \frac{{\bar{y} - \mu _{y} }}{{\mu _{y} }}\) and \(\bar{x} = \frac{{\bar{x} - \mu _{x} }}{{\mu _{x} }}\)respectively
- \(S_{y}^{2}\) :
-
Population variance of (y)
- \(S_{x}^{2}\) :
-
Population variance of (x)
- \(\rho _{{yx}}\) :
-
Population correlation coefficient
- \(\bar{Y}\) :
-
Mean of the population of y
- \(\bar{X}\) :
-
Population Mean of x
- R:
-
Population ratios
- f:
-
Sampling fraction
- \(T_{{ln_{1} }}\) :
-
First proposed estimator
- \(T_{{ln_{2} }}\) :
-
Second proposed estimator
- \(k_{1} ,~k_{2} ,~k_{3} ,~k_{4}\) :
-
Optimizing constants
References
Eisa, M., Iqbal, M., Ali, H., Mahmood, Z. & Znaidia, S. Optimizing population mean estimation under ranked set sampling with applications to Engineering. Results Eng. 24, 103064 (2024).
Bhushan, S., Kumar, A., Al-Omari, A. I. & Alomani, G. A. Mean Estimation for Time-Based Surveys Using Memory-Type Logarithmic Estimators. Mathematics 11, 2125 (2023).
Han, Q., Ma, Q., Xu, J. & Liu, M. Structural health monitoring research under varying temperature condition: a review. J. Civil Struct. Health Monit. 11, 149–173 (2021).
Cochran, W. G. The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce. J. Agric. Sci. 30, 262–275 (1940).
Donner, A. The Use of Auxiliary Information in the Design of a Clinical Trial. Biometrics 305–314 (JSTOR, 1977).
Estevao, V. M. & Säarndal, C.-E. Survey Estimates by Calibration on Complex Auxiliary Information. Int. Stat. Rev. 74, 127–147 (2006).
Ali, H., Asim, S. M., Ijaz, M., Zaman, T. & Iftikhar, S. Improvement in variance estimation using transformed auxiliary variable under simple random sampling. Sci. Rep. 14, 8117 (2024).
Lu, Y. Industry 4.0: A survey on technologies, applications and open research issues. J. Indus. Inform. Integr. 6, 1–10 (2017).
Elhakim, A. F. Estimation of soil permeability. Alex. Eng. J. 55, 2631–2638 (2016).
Karaman, Ö. A., Ağır, T. T. & Arsel, İ. Estimation of Solar Radiation Using Modern Methods. Alexandria Engineering Journal, 60 2447–2455 (Elsevier, 2021).
Shalabh & Tsai, J.-R. Ratio and product methods of estimation of population mean in the presence of correlated measurement errors. Communications in Statistics - Simulation and Computation 46, 5566–5593 (2017).
Singh, M. P. Ratio cum product method of estimation. Metrika 12, 34–42 (1967).
Srivenkataramana, T. & Tracy, D. S. On ratio and product methods of estimation in sampling. Stat. Neerl. 33, 37–49 (1979).
Subramani, J. & Kumarapandiyan, G. Estimation of Population Mean Using Known Correlation Coefficient And Median. J. Stat. Theor. Appl. 13, 333–343 (2014).
Long, C., Chen, W., Yang, R. & Yao, D. Ratio Estimation of the Population Mean Using Auxiliary Information under the Optimal Sampling Design. In Probability in the Engineering and Informational Sciences (ed. Yang, R.) (Cambridge University Press, 2022).
Singh, H. P., Rathour, A. & Solanki, R. S. Improved class of estimators of finite population mean using sampling fraction and information on two auxiliary variables in sample surveys. Statistica (Bologna) 73, 353–361 (2013).
Bahl, S. & Tuteja, R. K. Ratio and Product Type Exponential Estimators. J. Inf. Optim. Sci. 12, 159–164 (1991).
Gupta, S. & Shabbir, J. On improvement in estimating the population mean in simple random sampling. J. Appl. Stat. 35, 559–566 (2008).
Izunobi, C. H. & Onyeka, A. C. Logarithmic ratio and product-type estimators of population mean in simple random sampling. Inter. J. Adv. Stat. Prob. 7, 47–55 (2019).
Mishra, P., Adichwal, N. K. & Singh, R. A new log-product-type estimator using auxiliary information. J. Sci. Res. 61, 179 (2017).
Singh, R. & Tiwari, S. N. Improved Estimator for Population Mean Utilizing Known Medians of Two Auxiliary Variables under Neutrosophic Framework. Neutrosophic. Syst. appl. 25, 38–52 (2025).
Sher, K. et al. Novel efficient estimators of finite population mean in simple random sampling. Sci. Afr. 27, e02598 (2025).
Chambers, J. M. Graphical Methods for Data Analysis. (Chapman and Hall/CRC, 2018).
Wickham, H. nycflights13: Flights that Departed NYC in 2013, (2018).
Subramanian, S. V. & Kumar, A. Increases in COVID-19 are unrelated to levels of vaccination across 68 countries and 2947 counties in the United States. Eur. J. Epidemiol. 36, 1237–1240 (2021).
Kadilar, C. & Cingi, H. New Ratio Estimators Using Correlation Coefficient. 4, (2006).
Acknowledgements
The authors extend their appreciation to Northern Border University, Saudi Arabia, for supporting this work through project number (NBU-CRP-2025-1706).
Author information
Authors and Affiliations
Contributions
Writing—original draft, Solution methodology: **Fazal Shakoor** Supervision **: Muhammad Atif** Project administration, Funding acquisition. Conceptualization: **Hameed Ali** Project administration, Funding acquisition: **Abdulrahman Obaid Alshammari** Writing–review & editing, Investigation, Validation: **Bilal Himmat** Writing–review & editing, Investigation, Validation, Supervision: **Khaled Kefi**.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shakoor, F., Atif, M., Ali, H. et al. Log-ratio type estimation for the finite population mean under simple random sampling without replacement with theory, simulation and application. Sci Rep 16, 29 (2026). https://doi.org/10.1038/s41598-025-29127-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-29127-7





