Table 2 Standard formulae for simple sample size calculations.
From: The rise to power of the microbiome: power and sample size calculation for microbiome studies
Type of sample size calculations | Standard formulae |
|---|---|
Comparison of two means in normally distributed continuous data | If the two groups contain the same number of samples, the required sample size per group can be calculated as14: |
\(n \,=\, \frac{{2( {Z_{1 \,-\, \frac{\alpha }{2}} \,+\, Z_{1 \,-\, \beta }} )^2}}{{{\Delta}^2}},\) Equation A | |
where: | |
• n = the required sample size in each group | |
• Δ = \(\frac{{\mu _1 \,-\, \mu _2}}{\sigma },\) is the effect size, where \(\mu _1\) and \(\mu _2\) are two populations means and σ is the common variance, σ1 = σ2 = σ | |
• \(Z_{1 \,-\, \alpha },Z_{1 \,-\, \beta }\) are the upper tail normal quantiles associated with the desired type I and type II errors, α and β, respectively. | |
If the two groups are not equally sized, then let parameter r denote the ratio of the number of individuals in the larger group divided by the number of individuals in the smaller group. The sample size of the smaller group for a two-sided Z test is given as follows: | |
\(n_2 \,=\, \frac{{r \,+\, 1}}{r}\frac{{( {Z_{1 \,-\, \frac{\alpha }{2}} \,+\, Z_{1 \,-\, \beta }} )^2}}{{{\Delta}^2}},\) | |
and the sample size for the larger group, | |
\(n_1 \,=\, r\,n_2,\) Equation B | |
Comparison of the difference in proportions between two groups | The following formula is used to estimate per group sample size for a difference in proportions, assuming equal sample sizes in both groups44 |
\(n \,=\, \frac{{( {Z_{1 \,-\, \frac{\alpha }{2}} \,+\, Z_{1 \,-\, \beta }} )^2\ {( {P_1( {1 \,-\, P_1} ) \,+\, P_2( {1 \,-\, P_2} } )} )}}{{( {P_1 \,-\, P_2} )^2}},\)Equation C | |
where: | |
• \(P_1\) = the proportion in the first group | |
• \(P_2\) = the proportion in the second group. | |
• \(Z_{1 \,-\, \frac{\alpha }{2}} \,=\, 1.96\) (\(\alpha = 0.05\)), \(Z_{1 - \beta } = 0.84\) (\(\beta = 0.20\)) | |
• \(P_1\) −\(P_2\) = Effect Size (difference in proportions). | |
If \(n_1 \,\ne\, n_2,\) the ratio between the sample sizes of the two groups is \(r \,=\, \frac{{n_1}}{{n_2}}\). Then the formulas that are used to compute sample size and power43 are given below, respectively: | |
\(n_1\) = \(r\,n_2\), and | |
\(n_2 \,=\, ( {\frac{{p_1( {1 \,-\, p_1} )}}{r} \,+\, p_2( {1 \,-\, p_2} )} )( {\frac{{z_{1 \,-\, \frac{\alpha }{2}} \,+\, z_{1 \,-\, \beta }}}{{p_1 \,-\, p_2}}} )^2,\)Equation D | |
For a test statistic \(t\), and for the normal density function \(\phi \left( . \right)\), power can be estimated by: | |
\(1 \,-\, \beta \,=\, \phi ( {t \,-\, Z_{1 \,-\, \frac{\alpha }{2}}} ) \,+\, \phi ( { - t \,-\, Z_{1 \,-\, \frac{\alpha }{2}}} )\), Equation E | |
Comparison of the odds between two groups | The following formula is used to estimate per group sample size for an odds ratio, assuming equal sample sizes in both groups45: |
• Define \(\kappa \,=\, \frac{{n_1}}{{n_2}}\) as the ratio of the numbers of individuals in the groups, 1 and 2, where 1 and 2 are defined based on the exposure variable X. | |
• Define the odds ratio (OR) as: | |
\(OR \,=\, \frac{{p_1(1 \,-\, p_2)}}{{p_2(1 \,-\, p_1)}},\) | |
where \(p_1\) and \(p_2\) are proportions of the samples where the taxon abundance is above the chosen threshold (e.g., median) in the two exposure groups. | |
Then: | |
\(n_1 \,=\, \kappa n_2\), and | |
\(n_2 \,=\, ( {\frac{1}{{\kappa p_1(1 \,-\, p_1)}} \,+\, \frac{1}{{p_2(1 \,-\, p_2)}}} )( {\frac{{z_{( {\alpha /2} )} \,+\, z_{1 \,-\, \beta }}}{{\ln ( {OR} )}}} )^2,\) Equation F | |
Sample size based on correlations | The sample size required to test the hypothesis that the population correlation (\(\rho _{yx}\)) is equal to a specified value (h; usually we set \(h \,=\, 0\) and test \(\rho _{yx} \,=\, 0\)) for a given confidence level (1 − α) and power (1 − β) is approximately: |
\(n \,=\, 3 \,+\, ( {( {z_{( {1 \,-\, \frac{\alpha }{2}} )} \,+\, z_{1 \,-\, \beta }} )^2/( {\widetilde {\rho ^\ast }_{yx} \,-\, h ^\ast } )^2} ),\) Equation G | |
where \(\widetilde {\rho ^\ast }_{yx} \,=\, \ln ( {\frac{{1 \,+\, \tilde \rho _{yx}}}{{1 \,-\, \tilde \rho _{yx}}}} )/2\) is called the Fisher transformation of \(\tilde \rho _{yx}\), the planning value for\(\rho _{yx}\). The desired null hypothesis value h must also be transformed with the Fisher transformation to h* in Equation G, and the numerator captures the adjustment necessary to obtain the desired type I and type II errors46. |