After thirty years of research highlighting the risk of the sex bias in preclinical research, we now have tangible change happening in the research landscape with a rapid increase in the proportions of studies including female and male subjects. In practical terms, this will require a paradigm shift affecting choices around experimental design, analysis and presentation and considerations when drawing conclusions and planning next steps in the pipeline. As the preclinical research community embraces this new way of working, numerous insights are being published and shared on how to proceed. This article will signpost the learnings obtained and potential future steps that should be focused on.
There is a culturally embedded practice of studying only one sex in biology and health research and then generalising to the other, unstudied population1. This strategy introduces a knowledge bias into the research pipeline and contributes to translation and reproducibility issues2,3,4. Despite numerous funding bodies establishing initiatives to encourage researchers to routinely integrate males and females into basic, preclinical, and clinical research, little progress has been observed5. Consequently, numerous funding bodies have recently issued inclusion mandates requiring justification for exclusion in preclinical research5,6,7. These mandates have increased the proportion of published papers including females and males8,9,10 however inclusion still represents a minority of published studies9. Additionally, studies including male and female samples are frequently beset by further critical methodological problems that compromise the integrity of their conclusions8,11. There is some debate in preclinical research around the choice of prioritising sex as a source of variation over other potential sources of biological variation (e.g., age or genetic background)12. In terms of improving reproducibility, embracing biological variation in the designs has been highlighted as the critical step to improve the generalisability of research conclusions13. However, research has not found one critical variable that needs to be embraced to achieve this14. Some have argued that researchers should be allowed to choose the most relevant source of variation for their research question12,15. I would argue that multiple sources of variation can be encompassed in a research study and including female and male samples in preclinical research is an easy first-step to improving the generalisability. Furthermore, it is an ethical choice as it does not need an increase in sample size by default16 and uses all animals generated in the breeding process1.
The strategy of studying only one sex was not a decision taken due to researchers perceiving sex was not relevant or that a particular sex was more valuable than another; rather, it was an ethical and pragmatic decision based on the understanding at that time17. Research has shown that scientists believe sex matters but are not running inclusive designs because it was not perceived as achievable16,17,18 due to escalating costs and ethical concerns. Many of these perceived barriers are now understood to be misconceptions16,17 and to change the status quo, there is a need to focus on cultural change17,19. For inclusive designs to be the norm, the research community needs a paradigm shift in how research is conducted. This isn’t a minor change in how research is conducted, as it requires the community to embrace changes in the language used, the design of experiments, the conduct of experiments, the analysis and presentation of data and the integration of data into in silico models. This article will take the opportunity to showcase some of the steps various researchers have taken to support the transition, and the lessons learnt. It will highlight some of the gaps that are apparent, though, of course, this is based on what is currently understood. It is an exciting time as we traverse this paradigm shift.
Lessons learnt and opportunities
Language use within sex inclusive preclinical research
Culture can be defined as the ideas, customs, and social behaviours of a particular group or society. As we conduct science and use language to communicate, the language used expresses and embodies cultural reality. This matters as the language chosen influences how people conceive of concepts and objects20.
Frequently, in the context of in vivo research, the word gender has been used as an alternative to sex due to people being uncomfortable with the word sex due to its cultural connotations when used as a shortened form for sexual intercourse. However, the words are not synonyms. In science, sex refers to a classification system that utilises a set of physiological attributes (e.g., chromosomes, hormones, reproductive organs) to define an organism as male, female, intersexed or hermaphrodite. Whilst gender refers to a human-specific phenomenon, arising from the socially constructed roles, expressions, behaviours and identities of female, male and gender-diverse people21. In fact, with in vivo, ex vivo or in vitro research it is only possible to explore sex-related variation as the experimental construct cannot be used as a model to explore the human experience of gender. As these terms have very different meanings, when presenting research, it is essential to accurately represent the research and use the correct terminology to avoid confusion22.
Focusing further on the term sex, it is often presented as a binary truth with phrases such as ‘the two sexes’ or ‘both sexes’. This aligns with a societal culture which is structured around the concept that sex is a binary, biological truth which are mutually exclusive23. In reality, for both sex and gender, these are complex, variable and non-binary classification systems23. For example, it is estimated that roughly 5–6% of animals’ species are hermaphroditic24. Velocci23 asks whether sex is a useful category and highlights that sex is an incoherent category with no universally agreed guidelines for the definition of sex. Historically, reporting guidelines have focused on reporting the sex and number of animals used in published research25. Attention is now focusing on the need to operationalize sex by defining and reporting the variables that were used to distinguish males and females21,26. For example, an assessment of chromosomes or hormone concentrations or a visual assessment of primary and secondary sexual organs. Doing so will increase reproducibility21 and enable the opportunity to consider sex as a constructed category which could operate at multiple biological levels which would then have the potential to “open up new avenues for enquiry in our study of biological variation”26.
As research practices become more inclusive and identify situations where sex mathematically explains variation in the treatment effect, it is important to be mindful of the trap of perceiving sex as an underlying causal mechanism21. When conducting statistical analysis, the term effect is used for the various elements of the model, which normally implies causality but in the context of statistically modelling means that the variation in the data associated with this term is explored. Sex is a category that is represented by multiple mechanisms (hormonal, immune, body composition differences etc.) and therefore it is not sex itself that drives sex-related variation but rather one or more mechanisms associated with sex. Frequently, the language used is reinforcing cultural beliefs that females and males are ‘profoundly and systematically different’27. For example, I myself used the term ‘sexual dimorphism’2 in a study when describing the prevalence of a treatment effect interacting with sex. In the paper, whilst sex explained variation in the treatment effect, the distributions of the continuous variables still overlapped and yet a term that officially means distinct phenotypic forms was used2. Pape et al., also highlighted the frequent inappropriate use of the phrase ‘sex-specific’ which implies an effect occurs exclusively in one sex category and not the other when more commonly a statistical difference in the degree of the effect is observed21.
We also need to think about language from a position of inclusion and representation. Research is conducted to aid all and therefore using language such as ‘two sexes’ or ‘both sexes’ unintentionally excluding a significant proportion of society. One could argue this is not just an issue that could be accused of wokeism; it is important for challenging cultural biases that can impact the perception of the science being conducted and the conclusions drawn. For example, in 1910, during an expedition to explore Antarctica, George Murray Levick, a zoologist recorded many same-sex mating events whilst studying a penguin colony however, none of these notes would appear in Levick’s published work28. To address this, it is necessary to be more explicit by stating the sexes used and to avoid an implicit hierarchy by altering the order each time the sexes are declared.
Inclusive research is new territory for many. It is likely that the imprecise use of language to date has arisen from the ongoing paradigm shift. As the community raises these issues, it is important to embrace the learnings and work out how to address them.
Experimental design
When do we need a sex inclusive design?
The Sex and Gender Equity in Research (SAGER) guidelines21 were published in 2016 to guide the reporting of sex and gender information in study design, data analyses, results and interpretation of findings. They shared a general principle that provides clarity on when a sex inclusive design should be conducted: “Where the subjects of research comprise organisms capable of differentiation by sex, the research should be designed and conducted in a way that can reveal sex-related differences in the results, even if these were not initially expected.”. It is a default position of inclusion where the sex can be determined. This means ex vivo samples, primary cell lines and patient-derived cell lines are in scope. Whilst research studies collecting data at the level of a cluster of animals (e.g., a herd or litter group) are exempt. Consensus has not yet been reached on whether immortalised cell lines are in scope, probably as it will depend on the situation. Many researchers have raised challenges with identifying the sex of immortalised cell lines due to contamination with micro-organisms and the loss of Y chromosome due to genetic instability29. Holland and Bradbury argued that whilst it is difficult, it is essential to understand and report your findings even if that means being explicit that you could not determine the sex30. With increasing update/adoption of guidelines such as those proposed by Geraghty et al.31, the sex of the original source will have been captured and the use of optimal procedures and a frozen source stock will minimise these challenges.
The SAGER guidelines can also help in more complex experiments to determine whether we need a sex inclusive design. For example, those that have both cell lines and animals within the same study. Consider, indirect in vivo clustered regularly interspaced short palindromic repeat (CRISPR) screens32 as an example where both primary cells and animals are used within a single study. In these, a two-step process is used where first, in vitro, the target cell line is modified using CRISPR technology and then the modified cell lines are transplanted into an animal. From the perspective of inclusion, this raises the question: should the attention focus on the cell lines or the animals or both? Utilising the principles of the SAGER guidelines, the subject of the research in this study can be identified as the cell line. However, it is known that implanting male cells into female animals has an impact on viability33. To be a representative model of human disease, the modified cell lines should be implanted into a wild-type homogenous mouse of the same sex as the cell line. It is important to note that a sex inclusive design is not achieved by just sourcing a second cell line of a similar profile of the other sex. Why? If differences were found between the two cell lines, you would be unable to assign causality as any observed differences could be due to sex-related variation or variation unique to those individual cell lines. To embrace sex inclusion in the design, three or more independently sourced cell lines for each sex would be needed to provide the replication necessary to study the variation arising from sex. Sourcing such cell lines might be prohibitive29,30.
It is important to clarify, that in addition to replication, inclusive studies require the sexes to be studied at the same time within the same experiment. Sometimes, researchers state they will study another sex later. This is unfortunately, a flawed strategy. When studies are conducted independently, researchers will be unable to assess whether variation in treatment effect is due to sampling variation or associated with the difference in sex.
Many funding bodies6,34 have released inclusion mandates setting a position that inclusion is the default, and justification is needed for single sex studies. This is providing an impetus for change, but now requires funding reviewers and institute ethical review boards to consider the justifications. It is important to acknowledge that whilst scientists believe sex matters18, for over 30 years, researchers have been highlighting the embedded bias towards single sex studies in early research9,35,36. Many of the justifications given for single sex studies are now recognised as culturally embedded misconceptions. For example, female mice are more variable and therefore their inclusion would increase the number of animals needed37,38, sex differences in the baseline or sex-related variation in the treatment effect will introduce variability and this would decrease sensitivity to detect the treatment effect of interest38,39 or inclusion would require a doubling of the sample size18,40,41. Many of these misconceptions arise from trying to use the original statistical toolkit and failing to appreciate that the design has changed, which leads to a concurrent change in the analysis strategy. By embracing the right statistical toolkit, the perceived challenges can be addressed (see Statistical analysis section). Frequently, the resistance to inclusion is that historic research has only been collected on one sex and researchers argue that they wish to compare the results from current to historic. Inclusive research does not hinder a cross comparison of new data to historic in terms of the behaviour; for example, the direction of the effect. The resistance probably represents fear of change and the unknown, but to ensure the science is generalisable, it is necessary to step into the unknown. Inherently, with the single sex workflow is an assumption that the results will generalise. Now, inclusive research will assess the truth of this assumption.
A recent publication has shared the Sex Inclusive Research Framework (SIRF)42; a toolkit for both evaluators and scientists to evaluate research proposals. Through a decision tree of twelve questions, proposals are assessed, and a traffic light outcome indicates whether a proposal is appropriate, risky, or insufficient with regard to sex inclusion. For each question, there is supporting information for reviewers conducting the evaluation and educational information to guide the researcher. The idea behind the toolkit is to encourage researchers to reflect on their thinking and to move the discourse to a situation where researchers truly reflect on whether an inclusive design is possible.
Being clear about the objective of inclusion
For robust ethical research, it is necessary to align the design, analysis and objectives before conducting the experiments. Studies have found that researchers believe that sex matters as a biological variable9,41,43 and would like to include males and females as this would allow an exploration of whether there was sex-related variation in the treatment effect however, they do not feel inclusion is doable18,40. It is likely that lack of clarity over the objective of inclusion will also acts as a barrier as the objective will impact the design that needs to be implemented, and the resulting number of test samples needed.
Much in vivo research will be exploring the effect of a treatment (the intervention applied) and the goal of including females and males is to assess for the generalisability of the treatment effect. In these situations, the experiment would only be powered to see very large differences in the sex-related variation in the treatment effect16. From the perspective of sex inclusion, this should be described as an ‘exploratory approach’ as sex is not a primary variable of interest. For these studies, the number of animals needed for the effect size of interest can be shared across the males and females provided the right statistical methods are utilised to correctly account for variation in the data16.
Alternatively, sex could be included in addition to the treatment of interest because sex differences in the outcome is the primary focus. These situations will arise when there is an evidence-based rationale for hypothesized sex difference and therefore should be considered as a ‘confirmatory approach’. In these situations, a bespoke power calculation will be needed, considering the differences in the size of the effect. This is an area where more guidance and resources are needed to support the community in appropriately implementing these designs.
Once the sex differences are confirmed and assessed as biologically significant, opportunities arise to explore the underlying biological origins of the observed sex-related differences. Becker et al.44 have developed a logical series of experimental questions that can be used to guide the research activities to explore the origins of sex-related variation in the treatment effect. For example, step two focuses on assessing if the difference can be attributed to sex hormones. In addition, the manuscript provides advice on potential methods that can be used initially in laboratory studies and then in human studies.
Should the N be balanced between the sexes?
A common question when thinking about sex inclusive designs is whether to use a balanced design or a design that represents the disease distribution characteristics. For most situations, a balanced design is appropriate because it ensures a conclusion space which has equal confidence for the sexes tested. An exception could arise when studying a disease which has a very large difference in the sex-specific disease prevalence such that it is almost absent in one sex. For example, if the disease only occurs in 0.1% of the population of men, in these situations, then it becomes a cost-benefit reflection of whether the benefit of inclusion of males is worth the costs (logistic, financial etc.). Drobniak et al. have also identified that an unbalanced design might be necessary if the variability in the outcome measure is very dependent on the sex45.
Statistical analysis
As researchers embrace inclusive designs, whether confirmatory or exploratory, the analysis strategy needs updating to reflect the more complex designs. Unfortunately, errors in the analysis and subsequent interpretation of the results are common (Table 1)11,46. Research has shown that incorrect analysis leads towards a positive bias of finding sex differences11. Maney and Rich-Edwards, have highlighted how guidance provided by many funding agencies and publishers is encouraging inappropriate analysis and does not align with sound statistical practice47. Similarly, an evaluation of three publicly available online training courses for sex inclusive research found they were endorsing invalid analytical approaches that produce false calls48. These findings demonstrates the scale of the misunderstandings within the community. The subsequent published research is therefore at high risk of having low reproducibility, which will delay scientific advancement and could undermine the progress being made towards inclusive research that adds value for people of all sexes. There is, therefore, a critical need to engage with applied statisticians in the development of further resources and making community-wide training available to enable researchers to appropriately analyse their data49.
The exact analysis needs will depend on the research goals, the design and the data characteristics. What is important is that there is a suitable analysis plan that implements a statistically appropriate strategy, which also includes a statistical test that assesses whether there is significant sex-related variation in the treatment effect and accounts for variation arising from sex. The latter increases sensitivity by accounting for a baseline sex difference. The former accounts for variation which increases sensitivity but also statistically assesses whether the treatment effect depends on sex which is an important element of the sex inclusive research vision. This is the recommended strategy, even when you are not powered for detecting an interaction effect, because the statistical power will pass from the main effect to the interaction term when there is a large difference in the effect16. For many situations, you will need to embrace a factorial analysis where sex is a factor that could potentially interact with other factors in the dataset (e.g., treatment, age, strain). Figure 1 provides a layman’s description of factorial analysis and explains why embracing these more sophisticated analysis tools will maintain sensitivity and why the sample size can be shared between the males and females. Whilst these methods are more statistically sophisticated, most statistical packages will conduct the analysis for the researcher. All that is required is to set up the analysis appropriately. What is more advanced, is interpreting the output with a two-step focus. First, the researcher should inspect the model output for a significant interaction, and then, if significant, focus attention on how sex explained variation in the treatment effect.
In factorial analysis, a single statistical model is fitted to the data to understand how the different factors influence an outcome metric of interest. The most well-known example, suitable for a continuous normally distributed variable, is a 2-way ANOVA. In such an analysis, the data will be queried with three hypothesis questions testing two main effects (the effect of sex and treatment) and an interaction between sex and treatment. This strategy enhances sensitivity (as the data is shared between the sexes), allows the sex-related variation to be accounted for, and allows a statistical test of whether sex explains variation in the treatment effect. A and B illustrate how the same data are assessed for main (A) and interaction (B) effects. A The treatment effect is assessed by pooling data across males and females to estimate the average effect of treatment. This is demonstrated by the bluish green arrow which indicates the difference between the control group average (including females and males) and the treatment group average (including males and females). While the effect of sex is mathematically assessed by pooling data across the treatment conditions to estimate the average difference between female and male samples. This is demonstrated by the sky blue arrow, which indicated the difference between the male average (including both control and treated) and the female average (including both control and treated). B In the assessment of the interaction, the size of the treatment effect is compared between the females and males. Here, the vermillion arrow indicates the effect of treatment in males, and the blue arrow indicates the effect size of treatment in females. C-E Illustrate some of the possible different outcomes from a factorial analysis. C An example of a generalizable treatment effect between the females and males where there is no baseline difference between the sexes. D An example of a generalizable treatment effect between the males and females in the context of a baseline difference. E An example where sex-explains variation in the treatment effect. For both (C) and (D), the interaction is not significant, and the treatment effect can be assessed solely by looking at the significance of the treatment term in the model and then assessing the estimated treatment effect. For (E), the interaction is significant; in these situations, you would run additional analysis to estimate the treatment effect within each sex to understand where the significant differences lie.
Rich-Edwards and Maney7 elaborated upon a 4Cs framework34 based on four key steps: Consider, Collect, Characterise and Communicate to guide best practice in sex inclusive research. In the expanded framework, the advice is stratified on whether the interest is in the exploratory or confirmatory inclusion of females and males. Where this framework is particularly strong, is in giving advice and recommendations on the analysis and subsequent presentation of the results.
Interpretation of the results
In addition to determining the statistical significance of the main effects and interaction, it is important to consider their biological significance. The assessment will be context-dependent and will need to consider the biology and research goals. The movement towards personalised medicine includes the call for sex-specific treatment for medical conditions50 and provides a motivating scenario for considering sex differences. Researchers will need to be mindful that findings from animal models may not always translate directly to human patients and biological relevance does not guarantee clinical relevance. A decision to explore the personalisation of clinical treatment by sex will require the effect to be quite different for the female and male samples.
In discussions on the reproducibility crisis, much attention has focused on the poor statistical practice of just using p values as the decider of whether something is interesting or not51.
Including sex makes the interpretation more complex, as both statistical and biological significance must be considered, along with reflecting on what it means for the biology of interest. Consider a research study where there is a statistically significant sex-related variation in the treatment effect such that the size of the effect is more pronounced in the females compared to the males (Fig. 2). Whether this matters is very dependent on the biology. Based on the researcher’s expertise and knowledge of the biological system, they could choose to move forward in the research pipeline by working with an average-sized effect across the males and females or could estimate the effect for each sex individually.
Just because the p value is significant for the interaction doesn’t mean it is biologically relevant. From the statistical modelling, an average treatment effect across males and females can be estimated (orange arrow) or the treatment effect for each sex can be estimated individually (blue arrows). These effects along with their confidence intervals should be considered. A The treatment effect was approximately double the size in the females compared to the males. For this variable this might be considered equivalent and the average effect used. B In contrast the treatment effect here is approximately four-fold smaller in the females and in this case the researchers, depending on the biology, might make the call that the individual estimates should be reported, and attention should focus on why the treatment effect was different.
Reflecting on the observation that 60% of continuous variables have a baseline sex difference in a trait2, it is unsurprising that the effect size can be a different size dependent on the sex studied. This scenario is frequently seen when working with mouse models of disease where following induction of the disease state, the outcome variable which marks the disease, is often at a different point on the scale for the males and females. Depending on the biology, this might represent a more extreme disease or just a baseline difference for the expression of the disease. When reflecting on the treatment effect, it is important to be more nuanced and reflect both on the biological meaning of the estimated effect sizes and the resulting status of the animals. As highlighted in Fig. 3, focusing on p values alone could lead us awry when considering the impact of a treatment. Intuitively, a significant interaction effect between treatment and sex might suggest that there is a biologically important differences, but as illustrated in Fig. 3, panel B, it could equally be the case that differences in baseline means the treatment outcome is the same in the females and the males. Conversely, an intuitive interpretation of a non-significant interaction might lead one to conclude that the treatment is equally effective for males and females, while Fig. 3 panel C illustrates that the similar effect size results in a very different biological outcome.
In the following scenario, the need for nuanced interpretation that focuses on the biology of the effect is demonstrated. Graphs plot the marginal means (the average value of a particular group when all other variables are held constant) and the 95% confidence intervals. A Consider a model induction scenario, where the inducing agent leads to a change in a critical parameter such that both males and females are in the diseased state (above the blue dotted line) however, the effect size of that effect differs between the two sexes studied. B The resulting animals are then treated with treatment 1, which had a statistically significant interaction, such that sex explained variation in the treatment effect. However, in this scenario whilst the effect size is different, the biological impact is the same – both sexes studied are no longer in the disease state. C Whilst when the model induced animals are treated with treatment 2, the statistical analysis finds that no significant interaction (the lines are parallel) and estimates a main effect of the treatment. However, in this scenario, whilst the effect size is equivalent the resulting biological state is that the females are still in the disease state whilst the males are not. These hypothetical scenarios demonstrate how important it is to consider the biology in addition to the statistical significance.
Visualisation
As researchers embrace this new way of working, the community will need to learn how to interpret more complex visualisation. Frequently, scientists utilise bar plots to summarise a dataset52, which presents the mean signal and typically omits any representation of variability. The popularity of such a graph is driven by the desire for a visualisation that rapidly explains the conclusion narrative of the investigation. Such a desire is the probable driver behind data being pooled. Interactions are statistically assessed by comparing the slopes of change and therefore, 2-way ANOVA graphs typically have a mean line between treatment groups stratified by the second factor. If the lines are not parallel, this suggest (subject to the variance) that the treatment effect depends on the second factor. Fundamentally, it is necessary to be able to visualise the change but also the variation in the data to allow a benchmarking of this change. Garofalo et al., in an article exploring best practice for analysis of interactions, recommend reporting the model estimated marginal means with 95% confidence intervals53.
To make science feasible, a complex biological space is simplified into a testing space and then decisions about the underlying biology are incrementally made through an experimental process of applying an intervention to standardised groups and observing what happens. This approach has integrated with statistical hypothesis testing resulting in the p value being treated as a gate keeper of truth, returning significant or not significant with infrequent consideration of the size of the effect. The data is then often presented as bar plots54, implying an absolute truth of effect. In the context of the reproducibility crisis, numerous publications have been highlighting that there isn’t a pure treatment effect13,55. Sex inclusive designs are one element of improving the generalisability of research, but this will require researchers to become more comfortable with biological variation and that there is variation in the treatment response.
It is important to note, that as designs become more complex, researchers will become more reliant on more complex statistical models and then interpretation relies on the estimated effects and their confidence intervals. In these situations, it is important to run the diagnostics to ensure the models represent the data well and to ensure the model is appropriate for the design. Then, the statistical model output and graphs showing the estimated effects with confidence intervals should be shared53. As we continue in this journey of inclusive designs, there is a need for further discussion and resources developed to provide advice on best practice for visualising inclusive designs, reporting of effect sizes, and finally, in the data sharing.
Computational modelling
Downstream of experiments, data and computer power are used to simulate and study complex systems. The changes around inclusion will therefore have far-reaching consequences. It is, however, early days, but the following examples highlight some opportunities that arise as researchers embrace this new paradigm. Burrowes et al., in a review article considering whole-body physiological in silico models for human health applications, concluded that the consideration of sex differences had been minimal and only started to appear in recent years56. They highlight that incorporation of sex requires an understanding of the sex-related differences in biology, physiology and pathophysiology. They argue that inclusion is a critical next step and provide a checklist for appropriate consideration and inclusion in model development. Furthermore, they highlight that such models will not only incorporate these differences but have the potential to assess the impact on emergent function to further the understanding of differences.
Ontologies allow researchers to categorize observations and identify the key relationships among those concepts. They are an important tool to allow research data to be ingested in a computer-readable format, which allows more complex modelling and exploration of the data across studies to occur57. For example, this has been used in complex modelling to conduct comparison across species using data from the International Mouse Phenotyping Consortium by aligning specific disease and or patient phenotypes using the Human Phenotype Ontology and the individual mouse phenotype ontology (Mammalian Phenotype Ontology) which has been used to discover novel disease models for orphan disease58. Scientific ontologies are not static as the terms and relationships must evolve to align with the current understanding of the discipline as they adapt to ways in which the researchers are thinking about the research57. Currently, the Mammalian Phenotype Ontology is used to classify that the effect is significant in males and females, males alone or females alone. These ontologies will need to evolve to establish methods that can capture thecd relationship between sex and the effect of interest.
Artificial Intelligence (AI) is a key tool with rapid advancement currently occurring that is being used to accelerate research across many disciplines. For example, it is becoming apparent that large language models, a type of generative AI, are introducing game-changing possibilities. As they mine the literature and databases, the quality of the analysis and communication of the results is going to be critical. A common proverb applied to such systems is “you get out what you put in”. There is an urgent need to improve the quality of inclusive research output to ensure the resources using this as input produce output that are valuable.
Moving beyond the binary construct
In the future, researchers will need to go beyond the binary construct when thinking about sex inclusive research. McLaughlin et al. highlight that in the context of ecology and evolution research with a number of case studies, that an expanded framework to represent sex, which uses multivariate and non-binary variables will better allow an exploration of the biology26. Likewise, Sanchis-Segura and Wilcox59, challenged the use of sex with a binary classification, highlighting the risk in that strategy as a mean comparison. They argue that this may impede advances toward precision medicine and advocate for the selection of continuous variables that represent sex-related variation. Whilst Pape et al. argue that to truly understand sex differences, it is necessary to reflect and select concrete, measurable, sex-related variables which for the system being studied, would provide plausible mechanisms to understand what is driving the sex-related differences21.
Embracing these strategies will start to unpack the biology and improve understanding and hopefully provide more clinically relevant insights. However, it is important to place this in the context of the research pipeline. As sex differences start to be understood, this paves the way for more nuanced approaches, but at this point in time, for most researchers, the goal of inclusion is to improve generalisability. Arnold et al. raised a concern that critiques of research on sex differences from different viewpoints and a concern over the potential misinterpretation or misuse of findings could undermine the progress on inclusion60. This can be avoided by improving communities understanding of the goal of inclusion, the language used, the quality of the analysis, and presentation of the results.
Conclusions
Over the last decade, the landscape around sex inclusive research has completely changed from a few scientists reporting on an embedded bias to a mainstream topic. There is now tangible change occurring on the ground: the proportion of published work that includes more than one sex has significantly increased8,9. This has been driven by active research, discussion, and exploration of the issues by many individuals across the research landscape. Including females and males in preclinical research, impacts the whole research pipeline from planning, analysis, visualisation, communication, modelling etc. and there is much to be done for the benefit to be fully realised. As we navigate these paradigm shifts, new lessons have been learnt and will continue to arise. Consequently, there will be a need to revisit and integrate these learnings into earlier frameworks, guidelines etc. to keep them relevant and up to date.
I would argue, then that sex inclusive research is a success story. We, the research community, are making significant strides forward as we embrace the paradigm shift. As we embrace this new way of working, the rewards will be more representative, replicable conclusions and the identification of opportunities where exploration of variation associated with sex will also illuminate new paths to investigate the underlying biological cause of sex-related variation.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Abbreviations
- AI:
-
Artificial Intelligence
- CRISPR:
-
clustered regularly interspaced short palindromic repeat
- DISS:
-
Differences in sex-specific significance error
- SAGER:
-
Sex and Gender Equity in Research
- SIRF:
-
Sex Inclusive Research Framework
References
Nunamaker, E. A. & Turner, P. V. Unmasking the Adverse Impacts of Sex Bias on Science and Research Animal Welfare. Animals 13, 2792 (2023).
Karp, N. A. et al. Prevalence of sexual dimorphism in mammalian phenotypic traits. Nat. Commun. 8, 15475 (2017).
Mogil, J. S. Qualitative sex differences in pain processing: emerging evidence of a biased literature. Nat. Rev. Neurosci. 21, 353–365 (2020).
Clayton, J. A. Studying both sexes: a guiding principle for biomedicine. FASEB J. 30, 519 (2016).
Lee, S. K. Sex as an important biological variable in biomedical research. BMB Rep. 51, 167 (2018).
MRC. Sex in experimental design - Guidance on new requirements https://www.ukri.org/councils/mrc/guidance-for-applicants/policies-and-guidance-for-researchers/sex-in-experimental-design/: UK Research and Innovation; 2022.
Rich-Edwards, J. W. & Maney, D. L. Best practices to promote rigor and reproducibility in the era of sex-inclusive research. Elife 12, e90623 (2023).
Rechlin, R. K., Splinter, T. F., Hodges, T. E., Albert, A. Y. & Galea, L. A. An analysis of neuroscience and psychiatry papers published from 2009 and 2019 outlines opportunities for increasing discovery of sex differences. Nat. Commun. 13, 2137 (2022).
Woitowich, N. C., Beery, A. & Woodruff, T. A 10-year follow-up study of sex inclusion in the biological sciences. Elife 9, e56344 (2020).
Kim, H., Park, J., Ahn, S. & Lee, H. The impact of sex/gender-specific funding and editorial policies on biomedical research outcomes: a cross-national analysis (2000–2021). Sci. Rep. 14, 26599 (2024).
Garcia-Sifuentes, Y. & Maney, D. L. Reporting and misreporting of sex differences in the biological sciences. Elife 10, e70817 (2021).
McCullough, L. D., McCarthy, M. M. & de Vries, G. J. NIH policy: Status quo is also costly. Nature 510, 340 (2014).
Voelkl, B. et al. Reproducibility of animal research in light of biological variation. Nat. Rev. Neurosci. 21, 384–393 (2020).
Karp, N. A. Reproducible preclinical research—Is embracing variability the answer?. PLoS Biol. 16, e2005413 (2018).
Sandberg, K., Verbalis, J. G., Yosten, G. L. & Samson, W. K. Sex and basic science. A Title IX position. Am. J. Physiol.-Regul. Integr. Comp. Physiol. 307, R361–R365 (2014).
Phillips, B., Haschler, T. N. & Karp, N. A. Statistical simulations show that scientists need not increase overall sample size by default when including both sexes in in vivo studies. Plos Biol. 21, e3002129 (2023).
Karp, N. A. & Reavey, N. Sex bias in preclinical research and an exploration of how to change the status quo. Br. J. Pharm. 176, 4107–4118 (2019).
Gompers, A. Three years in:“sex as a biological variable” policy in practice-and an invitation to collaborate. Genderscilab; 2018.
Shansky, R. M. & Murphy, A. Z. Considering sex as a biological variable will require a global shift in science culture. Nat. Neurosci. 24, 457–464 (2021).
Barkan S. E. Sociology: Understanding and changing the social world: Flat World Knowledge, Incorporated Boston, MA; 2011.
Pape, M. et al. Sex contextualism in laboratory research: enhancing rigor and precision in the study of sex-related variables. Cell 187, 1316–1326 (2024).
Heidari, S., Babor, T. F., De Castro, P., Tort, S. & Curno, M. Sex and Gender Equity in Research: rationale for the SAGER guidelines and recommended use. Res Integr. Peer Rev. 1, 2 (2016).
Velocci, B. The history of sex research: Is “sex” a useful category?. Cell 187, 1343–1346 (2024).
Jarne, P. & Auld, J. R. Animals mix it up too: the distribution of self-fertilization among hermaphroditic animals. Evolution 60, 1816–1824 (2006).
Kilkenny, C., Browne, W. J., Cuthill, I. C., Emerson, M. & Altman, D. G. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. J. Pharmacol. Pharmacother. 1, 94–99 (2010).
McLaughlin, J. et al. Multivariate models of animal sex: breaking binaries leads to a better understanding of ecology and evolution. Integr. Comp. Biol. 63, 891–906 (2023).
Richardson S. S. Sex contextualism. Philosophy, Theory, and Practice in Biology. 14. (2022)
Wikipedia. George Murray Levick. Accessed: 2024 Available from: https://en.wikipedia.org/wiki/George_Murray_Levick.
Özdemir, B. C., Richters, A., da Silva, C. E. & Berner, A. M. Cancer researchers’ perceptions of the importance of the sex of cell lines, animals, and human samples for cancer biology research. Iscience 26, 106212 (2023).
Holland, A. & Bradbury, N. A. Did you forget your cell sex? An update on the inclusion of sex as a variable in AJP-Cell Physiology. Am. J. Physiol.-Cell Physiol. 324, C910–C926 (2023).
Geraghty, R. et al. Guidelines for the use of cell lines in biomedical research. Br. J. Cancer 111, 1021–1046 (2014).
Kuhn, M., Santinha, A. J. & Platt, R. J. Moving from in vitro to in vivo CRISPR screens. Gene Genome Editing 2, 100008 (2021).
Deasy, B. M. et al. A role for cell sex in stem cell–mediated skeletal muscle regeneration: female cells have higher muscle regeneration efficiency. J. cell Biol. 177, 73–86 (2007).
Clayton, J. A. Applying the new SABV (sex as a biological variable) policy to research and clinical care. Physiol. Behav. 187, 2–5 (2018).
Berkley, K. J. Vive la difference. Trends Neurosci. 15, 331–332 (1992).
Beery, A. K. & Zucker, I. Sex bias in neuroscience and biomedical research. Neurosci. Biobehav Rev. 35, 565–572 (2011).
Becker, J. B., Prendergast, B. J. & Liang, J. W. Female rats are not more variable than male rats: a meta-analysis of neuroscience studies. Biol. Sex. Differ. 7, 1–7 (2016).
Prendergast, B. J., Onishi, K. G. & Zucker, I. Female mice liberated for inclusion in neuroscience and biomedical research. Neurosci. Biobehav. Rev. 40, 1–5 (2014).
Beery, A. K. Inclusion of females does not increase variability in rodent research studies. Curr. Opin. Behav. Sci. 23, 143–149 (2018).
Waltz, M., Fisher, J. A., Lyerly, A. D. & Walker, R. L. Evaluating the National Institutes of Health’s sex as a biological variable policy: conflicting accounts from the front lines of animal research. J. Women’s Health 30, 348–354 (2021).
MRC. Report from the Working Group on Sex in Experimental Design of Animal Research 6th September 2021 [Accessed: 10th March 2023].
Karp, N. A. et al. The Sex Inclusive Research Framework to address sex bias in preclinical research proposals. Nat. Commun. 16, 3763 (2025).
Woitowich, N. C. & Woodruff, T. K. Implementation of the NIH sex-inclusion policy: attitudes and opinions of study section members. J. Women’s Health 28, 9–16 (2019).
Becker, J. B. et al. Strategies and methods for research on sex differences in brain and behavior. Endocrinology 146, 1650–1673 (2005).
Drobniak, S. M., Lagisz, M., Yang, Y. & Nakagawa, S. Realism and robustness require increased sample size when studying both sexes. Plos Biol. 22, e3002456 (2024).
Duffy, K. A. & Epperson, C. N. Evaluating the evidence for sex differences: a scoping review of human neuroimaging in psychopharmacology research. Neuropsychopharmacology 47, 430–443 (2022).
Maney, D. L. & Rich-Edwards, J. W. Sex-inclusive biomedicine: are new policies increasing rigor and reproducibility?. Women’s Health Issues 33, 461–464 (2023).
Gompers, A., Olivier, M. T. & Maney, D. L. Training in the implementation of sex and gender research policies: an evaluation of publicly available online courses. Biol. Sex. Differ. 15, 32 (2024).
Hunt, L., Nielsen, M. W. & Schiebinger, L. A framework for sex, gender, and diversity analysis in research. Science 377, 1492–1495 (2022).
Miller, V. M., Rocca, W. A. & Faubion, S. S. Sex differences research, precision medicine, and the future of women’s health. J. Women’s Health 24, 969–971 (2015).
Nakagawa, S. & Cuthill, I. C. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol. Rev. 82, 591–605 (2007).
Weissgerber, T. L. et al. Reveal, don’t conceal: transforming data visualization to improve transparency. Circulation 140, 1506–1518 (2019).
Garofalo, S., Giovagnoli, S., Orsoni, M., Starita, F. & Benassi, M. Interaction effect: Are you doing the right thing?. PLoS One 17, e0271668 (2022).
Kick the bar chart habit. Nat. Methods. 11, 113 (2014).
Kafkafi, N. et al. Reproducibility and replicability of rodent phenotyping in preclinical studies. Neurosci. Biobehav. Rev. 87, 218–232 (2018).
Burrowes, K., Ruppage, M., Lowry, A. & Zhao, D. Sex matters: the frequently overlooked importance of considering sex in computational models. Front. Physiol. 14, 1186646 (2023).
National Academies of Sciences E, Medicine, Division of B, Social S, Education, Board on Behavioral C, et al. The National Academies Collection: Reports funded by the National Institutes of Health. In: Beatty A. S., Kaplan R. M., editors. Ontologies in the Behavioral Sciences: Accelerating Research and the Spread of Knowledge. Washington (DC): National Academies Press (US) Copyright 2022 by the National Academy of Sciences. All rights reserved.; 2022.
Graham, D. M. Scaling up disease model discovery. Lab Anim. 46, 334 (2017).
Sanchis-Segura, C. & Wilcox, R. R. From means to meaning in the study of sex/gender differences and similarities. Front. Neuroendocrinol. 73, 101133 (2024).
Arnold, A. P., Klein, S. L., McCarthy, M. M. & Mogil, J. S. Male–female comparisons are powerful in biomedical research—don’t abandon them. Nature 629, 37–40 (2024).
Nieuwenhuis, S., Forstmann, B. U. & Wagenmakers, E.-J. Erroneous analyses of interactions in neuroscience: a problem of significance. Nat. Neurosci. 14, 1105–1107 (2011).
Acknowledgements
NAK would like to acknowledge the use of a Large Language Model to provide grammar suggestions during the writing process. Funding source: This project received no external funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The author declares the following competing interests: NAK is an employee of AstraZeneca and has shareholdings in AstraZeneca.
Peer review
Peer review information
Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Rosie Bunton-Stasyshyn & Christina Karlsson Rosenthal. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Karp, N.A. Navigating the paradigm shift of sex inclusive preclinical research and lessons learnt. Commun Biol 8, 681 (2025). https://doi.org/10.1038/s42003-025-08118-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-025-08118-4