Table 1 Common errors in the statistical analysis of sex inclusive preclinical studies

From: Navigating the paradigm shift of sex inclusive preclinical research and lessons learnt

Error

Why?

No statistical tests used to support conclusions reached11.

Humans are hard-wired to see patterns and therefore statistical tests are needed to challenge cognitive biases and assess whether the observed relationship is caused by something other than chance.

Pooling the data for a treatment across the sexes studied11,16

Fails to account for variation introduced by sex which will reduce sensitivity and does not allow an assessment of whether the treatment effect depends on sex.

Disaggregating the data by sex during the analysis11,16 e.g., running independent statistical analysis for each sex studied

Loses statistical power as data is not shared in a common statistical model. Does not allow an assessment of whether the treatment effect depends on sex and encourages the comparison of p values error (Differences in sex-specific significance error).

Differences in sex-specific significance error (DISS error)11,46 e.g., finding a statistically significant effect in one sex but not in the other sex and concluding the effect depended on sex.

This is flawed reasoning and is not utilising a statistical test to support the conclusion. This strategy is at high risk of false positives. The differences between significant and not significant could be a function of sampling variability or statistical power. Fundamentally, “because the difference between significant and not significant need not itself be statistically significant”61.

Comparing the females and males within the treatment group to assess whether there was sex-related variation in the treatment effect11.

This is a flawed strategy. It assumes the differences between the groups is due solely to the treatment effect and could be confounded by a common baseline sex difference.