Introduction

Within preclinical research, there is an endemic and persistent sex bias whereby research is predominately conducted using a single sex, typically male animals or male cell lines1,2,3. This can result in our fundamental biological knowledge being biased4. To redress this imbalance and improve the translation of scientific findings between humans and other animals, numerous funding bodies have released inclusion mandates5,6 that require the automatic inclusion of females and males unless a strong and valid justification is provided.

To date, these policies do not require scientists to study differences between males and females as a primary research objective, but instead aim to improve the generalizability of studies by taking sex into account in the experimental design and statistical analysis. This can be achieved by estimating from males and females an average effect of the experimental intervention and by visualizing and analyzing data in such a way that, if there is a large sex difference in the intervention effect, this will be detected. In most situations, this will involve a formal test of whether sex explains variation in the intervention effect. For funders, regulators or ethical review bodies to apply these policies in a systematic and consistent manner, there is a need for resources to help assess whether a research proposal is compliant with sex inclusive mandates. Not only do many scientists struggle to include females and males in their experiments1,2,3 but, when data from males and females are collected, there is often unequal representation7 and inappropriate visualization and analysis of the data8,9. It is therefore important to encourage both balanced inclusion and appropriate analysis.

Research has shown that scientists are generally supportive of sex as an important experimental variable but there are barriers to implementing sex inclusive designs10,11. Frequently, the cited barriers are culturally embedded misconceptions. These include the mistaken perspectives that outcome measures are inherently more variable in females12,13, that sex differences will introduce variability in the data decreasing statistical sensitivity14, or that studying females and males will increase the number of animals needed10,11 escalating the cost and undermining compliance with the 3Rs (Reduction)15. Some researchers have identified welfare concerns that introduce logistical challenges (for example the need for single housing to reduce male mouse aggression)15. Fear of change is also a factor that has been identified as preventing a change to the status quo10. These barriers contribute to the observation that there is significant scope for improvement in compliance with funder’s inclusive research policies7,16. Many researchers are not aware of the criteria that must be met to justify the use of only one sex or how to accommodate females and males in their experimental design and data analysis.

Here we present a framework to rapidly assess an in vivo or ex vivo research proposal to determine whether the proposal is sex inclusive, that the sexes are balanced and whether analysis plans have appropriately considered sex-related variation. When a proposal includes only one sex, the framework evaluates whether the justification is a scientifically appropriate and not based on common misconceptions. The framework fulfils multiple objectives:(1) to provide transparency in the assessment process for both researchers and those evaluating the funding proposals, thus aligning expectations, (2) to deliver reproducible and unbiased evaluation of the proposal in regard to sex inclusion, and (3) to help address common misconceptions and encourage researchers to provide considered justifications that will enable a better understanding of when sex inclusive research is possible.

The development of the framework

An original decision tree concept was initiated by a working group of community leaders—the authors of this manuscript, including representatives from industry, academia, animal ethical review committees and funding review communities. The working group was constructed to span a wide range of expertise (Supplementary Data 1). The framework was developed based on our collective experience of conducting animal research, reviewing research proposals, funders’ policies on sex in experimental design (such as the MRC’s and CRUK’s), and common questions and misconceptions that we had encountered in our interactions with preclinical researchers. Initial usability testing involved evaluating a collection of 30 published rationales for single sex experiments or lack of sex-based analyses8, to assess whether the decision tree would return a clear classification. This process refined the decision tree: for example, by introducing a question to assess whether the justification was a statement that the disease model could only be induced in one sex. The decision tree was then shared with eight UK animal ethical review bodies and informal feedback was collected to assess the accessibility and help develop FAQs. No additional training material was provided as the SIRF was designed to be a self-contained resource with links to further reading if required. Finally, members of the working group, who are all members of an UK animal ethical review body and/or a funding body, also tested the usability of the draft framework by applying it to 36 research proposals and assessing whether it was feasible to navigate the decision tree until a classification was reached. This review further refined the decision tree; for example, we identified questions that could not be answered in a dichotomous manner as the question was assessing multiple elements simultaneously. These are now split into separate questions. The wording of the questions was also altered to ensure usability for proposals that contain non-comparative experiments.

The Sex Inclusive Research Framework

The framework (Fig. 1) is a centred on a decision tree of up to 12 questions and includes detailed supporting information for each question, consisting of assessment advice and a rationale for including the question. The framework can be executed via a pdf document (Supplementary Information 1) or via an interactive web interface. The interface returns a report which can be submitted alongside the research proposal to the assessment body. To support the use of the framework, the website17 contains supporting information including a recorded seminar on the framework, FAQs, and some example classifications from a published dataset of single-sex justifications.

Fig. 1: The Sex Inclusive Research Framework decision tree.
figure 1

Underpinning the Sex Inclusive Research Framework is a decision tree consisting of twelve questions which, when applied to a research proposal, results in the assignment of one or more traffic light outcome classifications. These indicate whether a proposal is appropriate (green thumbs up symbol), carries some risk (amber caution hand symbol) or is insufficient with regards to sex inclusion (red thumbs down symbol).

Comparison of the SIRF with other frameworks

When the National Institutes of Health (NIH) policy was launched they (NIH) developed a flow-chart18 of questions to assist reviewers in their assessment of grant proposals when evaluating compliance with the policy. The first question explores whether the proposal is going to study vertebrate animals or humans. If not, then a sex-inclusive design is not considered further. The Sex and Gender Equity in Research (SAGER) guidelines19 are more inclusive and advise that the starting point is to reflect on whether the sex of the research subject can be determined. This is a more inclusive position (for example it would include ex-vivo samples such as tissues) and is adaptive for the research question and design (for example if collecting data at the level of a cluster of animals e.g. herd or a litter group). Consequently, the SIRF explicitly asks whether the sex of the sample can be determined. The NIH flowchart then proceeds to consider whether the proposal is intending to study sex differences. If yes, the flowchart advises the reviewers to assess whether the design and analysis is appropriate for this objective. In comparison, the SIRF provides prompts for the experimental design but does not provide any evaluation prompts for the statistical analysis. For all other experiments, the NIH flowchart then instructs reviewers to assess whether males and females are included in the study, and if only a single sex is included, a strong justification is required. No guidance is offered to the reviewers to assist in evaluating the justification.

Where the research proposals include females and males, the NIH flow chart then asks whether the researchers plan to report data disaggregated by sex. A lack of specificity in the NIH flow chart introduces risks. Firstly, a proposal could intend to use males and females but to study them separately; this approach doesn’t allow for assessment of whether sex explains variation in the response. Secondly, the terminology used encourages incorrect statistical analysis and feeds into the misconception that inclusive designs have lower power as analysis is disaggregated20. The SIRF framework, through targeted questions, evaluates the justification and is aligned with the SAGER guidelines and therefore, builds considerably on the NIH flow chart.

Usselman et al.21 wrote a review article providing advice on best practice in the use of sex and gender in cardiovascular research. The authors ended the article with a decision tree for researchers to facilitate incorporation of sex as a biological variable according to the guidelines. It consists of four questions which encourage the researchers to consider the role of sex in the disease and pathology, recommends the inclusion of males and females, and then gives some guidance on data presentation depending on whether a sex-related difference was observed and, if so, the nature of the differences. The decision tree is not the primary focus of the article and lacks supporting information to explain the questions and terminology used.

Rich-Edwards et al.20 have provided a 4Cs framework based on the process of Consider, Collect, Characterize and Communicate. This framework supports researchers planning studies by highlighting best practice in the research pipeline from planning to design and analysis. For example, it includes prompts around operationalizing sex, conducting a literature review, committing to an exploratory or confirmatory approach to evaluating sex differences and includes advice on including a statistical interaction test in the analysis. The framework then provides clear guidance for both exploratory or confirmatory studies on how the results should be interpreted and reported. However, it does not provide any evaluation or guidance on what qualifies a suitable justification for using only one sex. This framework is helpful for those wishing to understand the presentation of inclusive research and covers not only studies where males and females are included to improve generalizability (exploratory studies) but also those actively exploring sex-related differences (confirmatory studies).

The SIRF working group made an active decision to advise that inclusion of an interaction test was the recommended strategy. However, the framework ultimately leaves the evaluation of the analysis to the reviewer and asks a more generic question about whether the analysis plan adequately considers sex-related variation. This decision was taken due to the need for the SIRF framework to be generic and suitable for a huge variety of research questions and analysis plans. As such, we could envisage situations where a factorial analysis with an interaction term would not be applicable. In short, one could consider the 4Cs as complementary to the SIRF; with SIRF being used in research proposal generation/evaluation and the Rich-Edwards tool in generation/evaluation of research reports.

The flowchart proposed by Beltz et al.22 begins with a query on whether there are sex differences, thus implying that both males and females are already included. The decision tree is focused on whether a sex difference is expected and the anticipated nature of that difference, therefore determining different strategies for analysis. They provide examples of analysis paths that are the most appropriate for different research scenarios. This framework provides useful definitions and assists in guiding analysis strategies after the data have been collected. However, it does not provide explicit information on how to include sex in the initial design or how to justify a single-sex experiment.

Becker et al.23 developed a decision tree to guide researchers in a logical set of experimental steps that can be used to understand the biological origins of observed sex-related differences. For example, the decision tree advises the second step would be to explore the role of sex hormones at the time of testing and the associated paper gives advice on this. This can be useful for researchers who have conducted inclusive designs and identified statistically and biologically meaningful sex-related variability in the treatment response and wish to understand the source of this effect. This again would be complementary to the SIRF.

We have presented a comprehensive assessment framework, which will guide both funders and researchers into better practice in implementation of sex inclusive policy. Whilst other frameworks have been put forward, SIRF provides a substantial evolution in guidance with a focus on addressing the embedded cultural barriers within the community.

Implementation of the SIRF framework

Effective implementation of sex inclusive research policies requires funding bodies to provide training and guidance for applicants and evaluators24. This framework aligns with this need by providing a knowledge-base and practical support for the implementation of sex inclusive research policies with a system openly accessible to staff, applicants and evaluators.

The framework may be used in several ways. Researchers could use the framework before submitting a proposal or application to evaluate their position in a manner consistent with how it will be evaluated by a funding panel or ethical review body. Research funding bodies or ethical review body assessors could independently evaluate research proposals, either with the PDF format or through the web tool. Alternatively, the assessors could request that applicants submit a report as evidence of the applicant’s assessment of their justification and review the classification provided.

The assessing bodies will need to explore the applicability of the framework to their specific area. For example, the framework includes a question on whether the design has equal representation of males and females. If the information is not collected during the application process, even in a situation where males and females are included, warnings might accumulate around the design or analysis. There are several options for the assessing body: proceed with this potential risk; request additional information; adjust the decision tree for their application process; or adjust the application process. An assessment of national funding agencies’ sex, gender and diversity analysis policies, concluded that funders should provide applicants and evaluators similar forms and instructions to promote consistency across the research process24. This framework could help provide a more consistent evaluation for efficient engagement by the research community.

The framework is designed to provide guidance to evaluate research proposals from the sex inclusive perspective. However, as previously discussed, many of the questions require some subjective evaluations and, as a consequence, the consistency of the outcome will depend on the experience and knowledge of the reviewer. This is why Hunt et al.24 recommended that training for reviewers and applicants alike should be the responsibility of funders and education/research institutes alongside the implementation of guidance and tools.

Going beyond the binary construct

Though the terms sex and gender have often been used interchangeably, sex refers to a set of biological attributes in humans and animals whilst gender refers to the socially constructed roles, behaviours, expressions and identities of female, male and gender-diverse people19. Clinically, both, sex and gender, are now understood to be multi-faceted, variable and non–binary25. Within preclinical research, with our current understanding, researchers can only explore the impact of sex for other organisms, because anything equivalent to the human experience of gender is inaccessible so far. For reproducing species, it is commonly assumed that there are two sexes (female and male) which neatly aligns with our societal culture, which is structured around the concept that sex is a binary, biological truth26. However, like gender, evidence frequently contradicts this position. For example, it is estimated that roughly 5–6% of animals’ species are hermaphroditic27. There are multiple traits, such as genetic, endocrinological and anatomical features, that can be used to categorize an individual’s sex28. However, there are no universal agreed guidelines for defining sex26. McLaughlin et al. argue that instead of assuming binary sex, for some systems, it may be better to categorize as multivariate and non-binary as this approach would support new exploration of biological variation29,30. As sex is complex, when studying it and disseminating results, it is important to give context and specificity29. During research, we therefore need to operationalize sex by defining and reporting the concrete and measurable variables that were used to distinguish females and males31. For example, a visual assessment of primary and secondary sexual organs, or assessment of chromosomes or hormone concentrations. This will increase reproducibility31 and facilitate research into systems and species that do not align with the current perceived norm of binary sexual phenotype29.

A concern has been raised that over-emphasizing sex differences can lead to stereotyping and continues to feed into a cultural mindset that males and females are profoundly and systematically different32. Furthermore, the analysis approach can feed into this. For example, treating sex as a categorical factor of interest, can lead to perception that sex is an underlying causal mechanism31. However, sex is a category that is represented by multiple mechanisms and therefore it is not sex itself that drives the sex-related variation but one or more of the underlying mechanisms that is associated with the sex category. Future research, to understand sex differences, will therefore need to carefully consider the study and select concrete, measurable, sex-related variables which provide plausible mechanisms to understand what is driving the sex-related differences. Such a strategy will improve precision of understanding and provide more clinically relevant insights31.

When presenting research, where the research design has used the proxy categories of male and female, the terminology needs to be mindful and appropriately describe the sex-related variation. Research into sex-related variation finds it is more typical for sex to lead to a different size intervention effect for females compared to males33. However, the terminology frequently used implies stark differences31. For example, the term dimorphism refers to distinct phenotypic forms and sex-specific represents a phenotype that occurs in only one sex32.

Statistical analysis of sex-inclusive research

There are two challenges that need to be addressed to ensure research is representative from the perspective of sex: inclusion and appropriate statistical analysis. Among published studies including females and males, a large proportion apply inappropriate analysis strategies9. These mistakes include splitting the data (disaggregating) and statistically testing the sexes separately or pooling the data and ignoring sex in the analysis when the term explains variation in the data. Errors such as these run the risk of greatly reducing statistical power14 and failing to provide statistical evidence for the conclusions made (e.g. providing a statistical test to support a statement that the treatment effect depends on sex)8,9,34. Particularly, disaggregation (whereby separate tests are applied to males and females) often results in the inappropriate comparison of p values, where a significant effect in one sex but not the other is interpreted as a sex-related difference35. Further resources and community-wide training will be needed to enable researchers to appropriately analyze their data24.

The SIRF has been developed to evaluate research proposals and thus covers a broad range of data types, biological questions and experimental designs. Consequently, it is beyond the scope of the SIRF to provide exhaustive guidance on how to analyze data from sex inclusive research studies. Typically, the most appropriate statistical strategy will be to apply a factorial model, with sex as a factor that potentially interacts with the other factors in the dataset (e.g., treatment, genetic status, etc.). This enables evaluation of the main effect of sex in the data, alongside testing whether the sexes respond differently to the other experimental manipulations (e.g., are females more affected than males by a compound or genetic knockout?). Importantly, this strategy ensures that statistical power to test the manipulation of interest is generally not lost when including males and females, whilst retaining the ability to detect large, sex-related effects14,36,37.

Limitations of the SIRF framework

The present framework was developed to support the evaluation of in vivo or ex vivo research proposals from an experimental design perspective. If a proposal contains multiple sets of experiments, then the framework needs to be independently applied to each one. The development of a separate framework is being considered for in vitro research projects, which will need collaboration with a different set of stakeholders.

The framework provides a structured set of questions to evaluate a proposal. However, many of the questions require a subjective evaluation, which could lead to variations in the judgement reached. The provision of supporting information for each question should mitigate that risk. Furthermore, decisions for a question may shift in time as science/culture evolves. For example, different research communities might have problems identifying the sex of a sample if genetic testing might be required, due to limited access to appropriate technology and the associated costs. This could move the classification from one where the single sex is not appropriately justified to an appropriate justification when considering a cost/benefit evaluation.

Conclusions

This initiative aims to support the research community in using females and males in the design and analysis of preclinical experiments by launching a framework to differentiate genuine barriers preventing the use of males and females from culturally embedded misconceptions. Cultural change is necessary to make sex inclusive research the standard for scientific rigour, excellence, and combating sex bias in biomedical research.