arising from R. S. Narayan et al. Nature Communications https://doi.org/10.1038/s41467-020-16735-2 (2020)

Synergistic drug interactions can be assessed in a laboratory setting by multiple methods that do not necessarily lead to the same outcome; hence, the optimal way to determine synergy is a matter of an ongoing debate. Drug synergy assessment in animal experiments is even more complex, not only because of the existence of multiple methods but especially because of their experimental conditions and their practical limitations, which affects their interpretation1. Inevitably, no clear and standardized method or guideline exists for these experiments as yet. We appreciate the opportunity to discuss this and will highlight several fundamental and practical considerations associated with the determination of synergy in mouse experiments and will discuss these concerning our chosen approach. We will provide recommendations to determine synergy in vivo which might be valuable for future research. We will also point out how we used a published and accepted method in our publication, that has led to an evaluation bias in our publication1 which we will put here into perspective. This evaluation shows that, upon re-analysis using four different synergy assessment methods, similar frequencies are seen between in vivo, in vitro and in silico experiments1,2,3, confirming the validity of our prediction method in different model systems. Given the high number of parameters and decision steps involved in the evaluation of drug combinations in mice, we think that insights of our re-evaluation can be valuable to the scientific community. Below, we provide a number of considerations that should be taken into account when drug interactions are assessed in mice.

The first consideration concerns the method to determine the synergistic interaction which is part of a longer scientific debate in the field, concerning the interpretation of in vitro experiments. There are two dominating principles, either dose equivalence or multiplicative survival. Dose equivalence as originally proposed by Loewe4 is based on the idea that reducing one drug concentration can be compensated for by increasing the other drug concentration to reach the same lethal effect, hence dose equivalence. The second principle, multiplicative survival as originally proposed by Bliss5, assumes that both drugs act independently and that their viability effect can be combined to estimate their additive lethal effect. For both methods, synergy is observed when the measured combined effect is stronger than the expected additive effect. Although attempts are being made to unify the field between these principles6, this discussion is not finalized to date7,8,9,10,11,12,13,14 and is therefore unlikely to be resolved here, given that dose-based outcomes do not always match effect-based outcomes. Within the context of the proof-of-concept format of our study1, we decided that a high number of independent mouse models representing different tumor types would be most informative. Therefore, we chose a multiplicative survival metric to determine synergy (i.e., comparable to Bliss) since a fixed dose for each drug is taken, thereby limiting the amount of mice needed for the experiments. We used the mutual non-exclusive form of the median effect equation by Chou and Talalay, which assumes that Michaelis Menten kinetics apply, defined by them as first-order kinetics and based on earlier work15,16. An alternative approach and similar multiplicative survival method would be to make no assumptions about kinetics by taking the fractional product17 to determine excess over Bliss independence5,18. We avoided to use the dose-equivalence method of Chou and Talalay19 as it would greatly increase the number of mice needed since it is based on dose-response effects (see explanation in Fig. 1).

Fig. 1
figure 1

Comparison of metrics that can be used for in vivo drug-combination efficacy assessment.

A second consideration relates to the assessment of synergy endpoints. At the start of each experiment, there may be exponential tumor growth, but the tumor volume as assessed by the luciferase signal is normalized and therefore, no synergy can be seen yet. In the next phase, drug efficacy is taking place affecting tumor growth and synergy can occur. In the final phase, the therapy is terminated, sometimes already accompanied by therapy resistance and/or relapses, which might weaken the synergy effect. Because the tumor volume in the control group cannot be allowed to go beyond the humane endpoint, there is an underestimation of the effect in the latter phase (see Fig. 2). Additionally, variability in the drug concentration as a result of combined pharmacokinetics may result in temporal and spatial variability in drug accumulation and corresponding effects in tumor tissues affecting tumor volume and mice survival, which might also result in a possible unequal (heteroscedastic) variance over time. Despite that there are longitudinal statistical methods available20,21,22,23, the combinatorial aspects as indicated might need substantial experimental data to be resolved to allow modeling them properly. Given these issues in the field and the proof of concept character of our work, synergy assessment at each time point based on the tumor volume is in our view preferred, since it captures the current situation at each time point without the need to make assumptions or model unknown variables over time or base synergy on the accumulated effect of all of these variables (i.e. bootstrapping or mouse survival). Synergy does therefore not necessarily lead to long-term effects that eventually accumulate in a survival benefit, although they are of course desired to do so.

Fig. 2: Longitudinal modeling of synergy is complicated by uncertainties between phases where normalization, efficacy and resistance/relapse dominate.
figure 2

Synergy can be underestimated because of the normalization as well as through the occurrence of therapy resistance or relapse after the end of the treatment (areas shown in pink). The borders between these three phases can differ between models due to differences in growth speed and the temporal-spatial occurrence of resistance as well as a function of the pharmacokinetic (PK) profiles of the combination of drugs in the tumor tissue. In each drug combination experiment, the phases can therefore have a different velocity and given the underlying uncertainties, we chose to determine synergies at each time point separately as shown by the numbers on the time axis. In some cases, the humane endpoint is reached for the control groups earlier than the treated mice, which makes it in some cases necessary to extrapolate the tumor volume (dotted box).

A third consideration relates to the assessment of synergy which, as advocated by Chou and Talalay15,19, is to be determined in a quantitative (i.e., non-statistical) manner by calculating the combination index (for the dose equivalence, i.e., Loewe related methods) or as a fraction (for multiplicative survival, i.e. Bliss related methods)15,19. If this value is lower than 1, it is considered synergy, if equal to 1 is considered additive, and if higher than 1 is considered antagonism. This quantitative assessment has been the standard in the field and was used in 86% of the n = 223 provided synergy publications that we have curated1. Note that the remaining publications (14%) evaluate the effect of the drug combination and do not assess the type of drug interaction (i.e., synergy, additivity, antagonism). Similarly, the DREAM challenge, representing the largest synergy determination effort to date, approaches synergies quantitatively and not statistically24. For the determination of drug synergy in vivo, only a few publications exist22,25,26,27,28,29,30. Since a quantitative assessment is the standard in the field, we have taken this approach to assess synergy in our in vivo experiments, where we took a value of 0.8 as the threshold for synergy according to the recommendations of Chou31.

We noticed that the combined effects of the applied drug combinations frequently resembled the theoretical fractional product17 of the two individual drugs. When we created a matrix containing additive values based on the fractional product and subsequently calculated the synergy based on this matrix, this unexpectedly showed that the threshold of synergy (0.8) can be reached as a result of just an additive effect. This occurred only when the viability reduction of both drugs had a value of lower than 50% of the non-treated control (see Fig. 3A, blue area). Since we selected drugs based on their high potency, this might therefore have led to a positive bias in the assessment of synergy in our in vivo data. Nevertheless, when visualizing the relation of synergy to the calculated additive effects, several combinations exceeded this anticipated additive effect, indicating that these are cases of synergy, despite the apparent bias in the assessment (Fig. 3B, see also Table 1). These results show that for values lower than 50% viability, it becomes increasingly difficult to assess synergy, and a statistical assessment is therefore necessary.

Fig. 3: The used multiplicative survival equation can provide a false positive outcome when a 0.8 threshold is applied.
figure 3

A Matrix showing the estimated additive survival based on the fractional product (left panel) and the corresponding synergy as calculated by the mutual non-exclusive median-effect formula of Chou and Talalay (right panel) using 0.8 as a synergy threshold. This shows that a false positive synergistic outcome can be obtained when high efficacies are reached by both drugs (>50% viability loss). This warrants the use of this method in the context of high efficacies. B 3D plot of the calculated mutual non-exclusive median effects using the equation of Chou and Talalay as based on the theoretical fractional product (gray and blue surface). Experimental data that follow an additive pattern are shown in purple. Experimental combinations that show values beyond the expected theoretical additivity and hence, are considered synergistic despite the flaw in the Chou and Talalay equation, are shown in bright colors: MDA-MD231 treated with AZ628 and Gemcitabine (AZD + GEM representing 4 time points shown in dark red), MDA-MD231 treated with Thapsigargin and AZ628 (AZD + THAP representing 2 time points shown in red), CHL1-FM treated with Gemcitabine and CGP-082996 (GEM + GCP representing 3 time points shown in green), U87MG treated with Docetaxel and GNE-317 (DOC + GNE representing 1 time point shown in blue). P-values represent a two-sided student t-test of the measured luminescence (tumor volume) of the drug combination versus the predicted effect calculated from the fractional product according to Webb17.

Table 1 Statistical evaluation of synergy using the stringent fractional product as an additivity reference identifies two drug combinations that show synergy over multiple time points

For a statistical assessment, we compared the calculated fractional product17 of the anticipated bioluminescence signal with the measured signal to enable a statistical comparison at each time point (i.e. t-test). As a reference for the statistical evaluation, we compared the results to excess over BLISS additivity, which is a benchmark quantitative method, see “Online methods”. To correct the non-linear distribution of the variance over the dynamic range and to facilitate the calculations, we used log-transformed data. For details, see “Online Methods”. This analysis confirmed that two combinations showed a significant difference with the fractional product over multiple time points: MDA-MD231 treated with AZ628 and Gemcitabine or AZ628 and Thapsigargin. Two other combinations (U87MG treated with Docetaxel and GNE-317; CHL1-FM treated with Gemcitabine and CGP-082996) are on the border between additive and synergistic, and partly based on extrapolation due to the absence of a control group at later time points. The BLISS (quantitative) analysis showed concordance with the statistical evaluation, where most statistically significant cases indeed corresponded to BLISS synergy (6 out of 7 time points) and the remainder only showed BLISS synergy, which is less stringent, see Table 1.

Furthermore, we compared these results to two methods22,32 that take all time points along in the synergy assessment. As anticipated above, these methodologies turn out to be more stringent by showing synergy for two (invivosyn) or one (bootstrapping) drug combination (i.e., 25% or 12.5% of the cases). These drug combinations indeed match the most significant outcomes of the t-test evaluation but do not seem to detect any other cases of temporal synergy, such as seen for Gemcitabine and CGP-082996.

Together, this evaluation indicates that caution should be taken when in vivo synergy experiments are designed, conducted and evaluated. First, a decision has to be made to either use a dose equivalence or multiplicative survival metric will be applied. For both, a dose-exposure-response study using single drugs above the minimal effective dose is to be preferred over the current practice of dosing each drug to the maximum tolerated dose (MTD). Also, variability in pharmacokinetics might lead to varying longitudinal concentrations of the drugs, and knowledge upfront may further improve the interpretation of the experiments33. Lowering doses cannot be performed unrestricted because of the shorter life span of control mice versus treated mice where controls are becoming increasingly sparse or even absent in time. The absence of control mice makes it necessary to extrapolate the tumor volume to compensate for the loss of control mice at later time points. The way this extrapolation is performed can affect the assessment of the level of synergy. Finally, care should be taken by applying the Chou and Talalay mutual non-exclusive equation on in vivo data since this can provide a bias in the interpretation when drug effects exceed the IC50 value. From our re-assessment of synergy as provided here, we would prefer a statistical evaluation together with a quantitative approach (i.e. BLISS) for each measured time point. Methods such as invivosyn and the bootstrap method, that take all time points into account, might be insufficiently able to detect temporal drug interactions that can occur during in vivo experiments, similar as survival analysis.

By using different synergy assessment methods, we have found two to three cases of synergy in our in vivo experiments, out of eight experiments, indicating that the success rate is 25% to 38%. We have independently performed wet-lab drug combination screens using drug atlas-informed drug combinations to validate our results. This showed that around 35% of the combinations show a synergistic effect in vitro2. Also, the in silico drug atlas model has been independently validated, leading to a similar result3. Because our re-assessed in vivo results resemble these in vitro and in silico outcomes, we have confidence in the value of the prediction model. By discussing both fundamental as well as practical aspects, we hope to have made clear that we have taken a stepwise and rational approach to assess synergy in vivo. We used a quantitative approach which is commonly accepted, but show here that this methodology can be prone to a bias in interpretation of which we were unaware. We hope that we have convincingly shown that a more stringent and statistical assessment provides a more realistic estimation of synergy for our in vivo experiments, that is consistent with in silico and wet lab results. We provide several recommendations in Box 1 that may provide a nuanced interpretation of drug interactions in mice. In general, we think that pharmacokinetic or dose-response assessments before performing drug combination experiments can assist in a better evaluation of the outcomes of the experiments. However, multi-dose experiments would require much more animals and resources, which is not always feasible, and not necessarily compatible with 3 R principles of animal studies (e.g., reduction, refinement, replacement). Synergistic drug combinations might provide an optimal concentration window where efficacy is enhanced and toxicity, often additive in nature34, remains tolerable. We consider our investigation a next step in that direction rather than a claim to have set a new benchmark, hence the proof-of-concept character of our study. Our study is not focused on synergy methodology itself, but rather a strategy to identify synergistic drug combinations based on a new concept.

Online methods

Statistical assessment of synergy

We quantitatively assessed synergy by statistically determining the difference between the expected drug effect and the measured effect. This was performed on log-transformed luminescence data where the measured effect versus the expected additive effect based on the fractional product according to Webb17 was compared. First, the estimated standard deviation of the fractional product was determined using the following formula (Eq. 1):

$${s}_{{fp}1,2}=\sqrt{{{s}_{{control}}}^{2}+{{s}_{{drug}1}}^{2}+{{s}_{{drug}2}}^{2}}$$
(1)

Where s is the standard deviation of log-transformed data (i.e. fp is the fractional product of the logarithmic normalized effect of the control versus drugs 1 and 2).

Subsequently, we calculated the t factor using the following formula (Eq. 2):

$$t=\frac{({\bar{X}}_{m1,2}-{\bar{X}}_{{fp}1,2})}{\sqrt{\frac{{s}_{m1,2}^{2}}{{n}_{m1,2}}+\frac{{s}_{f1,2}^{2}}{{n}_{{fp}1,2}}}}$$
(2)

Where t is the t-factor; \({\bar{X}}_{m{\mathrm{1,2}}}\) is the measured normalized logarithmic average effect of drug combination 1,2; \({\bar{X}}_{{fp}{\mathrm{1,2}}}\) is the fractional product of the normalized logarithmic average of drug combination 1,2; \({s}_{m{\mathrm{1,2}}}\) is the standard deviation of the measured normalized logarithmic average effect; \({s}_{{fp}{\mathrm{1,2}}}\) as calculated by Eq. 1 and n = number of mice.

The two-sided p-value for the t-test was calculated by duplicating the t-factor based (one-sided) p-value, using n minus 3 degrees of freedom for all experiments except the triple experiment, where n minus 4 degrees of freedom was taken. When a statistically significant difference has occurred, the fractional product and the measured value should be significantly different18.

Quantitative determination of excess over BLISS independence

Bliss independence expressed as survival can be described using the following formula (Eq. 3)

$${BLISS\; ratio}=\frac{{M}_{A}+{M}_{B}-({M}_{A}\,x\,{M}_{B})}{{M}_{{AB}}}$$
(3)

Where MA is the surviving fraction after treatment with drug A and MB is the surviving fraction after treatment with drug B; MA + MB Is the fractional product of both effects and MA MB Is the measured effect of the two drugs together. When the BLISS ratio is smaller than 1, synergy is assumed.

Testing for normal distribution using Jarque–Bera test

The Jarque–Bera test35 was used as a goodness-of-fit test to assess whether the data show skewness and kurtosis matching a normal distribution. In the case the outcome is not significant, a normal distribution can be assumed.

Invivosyn and bootstrapping synergy calculations

We used the R software package invivosyn22 which can assess the combination index and synergy score using the Bliss independence model as well as the highest single agent (HSA) model. The model does not make any assumption on tumor growth kinetics, study duration, data completeness, and balance for tumor volume measurement. The used script is provided. For the bootstrap analysis, the R software package boot was used32.

Mouse experiments

Studies were performed under the European Community Council Directive (2010/63/EU) for laboratory animal care and the Dutch Law on animal experimentation, and when performed in a facility at Massachusetts General Hospital, accredited by the Association for the Assessment and Accreditation of Laboratory Animal Care (AAALAC). Studies were approved by the Animal Welfare Body (IVD) of the VU and VUMC (in Amsterdam) and the Institutional Animal Care and Use Committee IACUC (in Boston). All experiments meet ARRIVE guidelines. Four to 6-week-old female Athymic Nude-Foxn1nu mice were purchased from Harlan/Envigo and used after 1 week of acclimatization. All animals were housed in one cage and kept under filter top conditions, receiving ad libitum water and food. For the Tagrisso (AZD9291), AZD2014, Docetaxel experiment, we used the date of progression based on body weight as an end-point, because at later time points mice had to be taken out of the study due to toxicity; i.e. “Progressive disease is defined as the last time point before disease progression (i.e., weight loss)” was mentioned in the publication1 but not clearly referred to Fig. 5A. The first observed weight loss varied between 1.7 and 29%. Regarding the IVIS luminescence data, the most extreme outlier (>1000-fold-increase) at day 14 was removed. To increase the power of this experiment, two of the experimental arms were supplemented with mice from an independent experiment with the same experimental setup. Here we show only the main experiment, since this does not affect the interpretation.