Table 2 Standard formulae for simple sample size calculations.

From: The rise to power of the microbiome: power and sample size calculation for microbiome studies

Type of sample size calculations

Standard formulae

Comparison of two means in normally distributed continuous data

If the two groups contain the same number of samples, the required sample size per group can be calculated as14:

\(n \,=\, \frac{{2( {Z_{1 \,-\, \frac{\alpha }{2}} \,+\, Z_{1 \,-\, \beta }} )^2}}{{{\Delta}^2}},\) Equation A

where:

• n = the required sample size in each group

• Δ = \(\frac{{\mu _1 \,-\, \mu _2}}{\sigma },\) is the effect size, where \(\mu _1\) and \(\mu _2\) are two populations means and σ is the common variance, σ1 = σ2 = σ

• \(Z_{1 \,-\, \alpha },Z_{1 \,-\, \beta }\) are the upper tail normal quantiles associated with the desired type I and type II errors, α and β, respectively.

If the two groups are not equally sized, then let parameter r denote the ratio of the number of individuals in the larger group divided by the number of individuals in the smaller group. The sample size of the smaller group for a two-sided Z test is given as follows:

\(n_2 \,=\, \frac{{r \,+\, 1}}{r}\frac{{( {Z_{1 \,-\, \frac{\alpha }{2}} \,+\, Z_{1 \,-\, \beta }} )^2}}{{{\Delta}^2}},\)

and the sample size for the larger group,

\(n_1 \,=\, r\,n_2,\) Equation B

Comparison of the difference in proportions between two groups

The following formula is used to estimate per group sample size for a difference in proportions, assuming equal sample sizes in both groups44

\(n \,=\, \frac{{( {Z_{1 \,-\, \frac{\alpha }{2}} \,+\, Z_{1 \,-\, \beta }} )^2\ {( {P_1( {1 \,-\, P_1} ) \,+\, P_2( {1 \,-\, P_2} } )} )}}{{( {P_1 \,-\, P_2} )^2}},\)Equation C

where:

• \(P_1\) = the proportion in the first group

• \(P_2\) = the proportion in the second group.

• \(Z_{1 \,-\, \frac{\alpha }{2}} \,=\, 1.96\) (\(\alpha = 0.05\)), \(Z_{1 - \beta } = 0.84\) (\(\beta = 0.20\))

• \(P_1\) −\(P_2\) = Effect Size (difference in proportions).

If \(n_1 \,\ne\, n_2,\) the ratio between the sample sizes of the two groups is \(r \,=\, \frac{{n_1}}{{n_2}}\). Then the formulas that are used to compute sample size and power43 are given below, respectively:

\(n_1\) = \(r\,n_2\), and

\(n_2 \,=\, ( {\frac{{p_1( {1 \,-\, p_1} )}}{r} \,+\, p_2( {1 \,-\, p_2} )} )( {\frac{{z_{1 \,-\, \frac{\alpha }{2}} \,+\, z_{1 \,-\, \beta }}}{{p_1 \,-\, p_2}}} )^2,\)Equation D

For a test statistic \(t\), and for the normal density function \(\phi \left( . \right)\), power can be estimated by:

\(1 \,-\, \beta \,=\, \phi ( {t \,-\, Z_{1 \,-\, \frac{\alpha }{2}}} ) \,+\, \phi ( { - t \,-\, Z_{1 \,-\, \frac{\alpha }{2}}} )\), Equation E

Comparison of the odds between two groups

The following formula is used to estimate per group sample size for an odds ratio, assuming equal sample sizes in both groups45:

• Define \(\kappa \,=\, \frac{{n_1}}{{n_2}}\) as the ratio of the numbers of individuals in the groups, 1 and 2, where 1 and 2 are defined based on the exposure variable X.

• Define the odds ratio (OR) as:

\(OR \,=\, \frac{{p_1(1 \,-\, p_2)}}{{p_2(1 \,-\, p_1)}},\)

where \(p_1\) and \(p_2\) are proportions of the samples where the taxon abundance is above the chosen threshold (e.g., median) in the two exposure groups.

Then:

\(n_1 \,=\, \kappa n_2\), and

\(n_2 \,=\, ( {\frac{1}{{\kappa p_1(1 \,-\, p_1)}} \,+\, \frac{1}{{p_2(1 \,-\, p_2)}}} )( {\frac{{z_{( {\alpha /2} )} \,+\, z_{1 \,-\, \beta }}}{{\ln ( {OR} )}}} )^2,\) Equation F

Sample size based on correlations

The sample size required to test the hypothesis that the population correlation (\(\rho _{yx}\)) is equal to a specified value (h; usually we set \(h \,=\, 0\) and test \(\rho _{yx} \,=\, 0\)) for a given confidence level (1 − α) and power (1 − β) is approximately:

\(n \,=\, 3 \,+\, ( {( {z_{( {1 \,-\, \frac{\alpha }{2}} )} \,+\, z_{1 \,-\, \beta }} )^2/( {\widetilde {\rho ^\ast }_{yx} \,-\, h ^\ast } )^2} ),\) Equation G

where \(\widetilde {\rho ^\ast }_{yx} \,=\, \ln ( {\frac{{1 \,+\, \tilde \rho _{yx}}}{{1 \,-\, \tilde \rho _{yx}}}} )/2\) is called the Fisher transformation of \(\tilde \rho _{yx}\), the planning value for\(\rho _{yx}\). The desired null hypothesis value h must also be transformed with the Fisher transformation to h* in Equation G, and the numerator captures the adjustment necessary to obtain the desired type I and type II errors46.