Abstract
Characteristics of tumors and patients can be used as predictive biomarkers to guide treatment choice. Although many potential biomarkers are evaluated each year, only few will eventually be used since evidence is usually based on small studies leading to inconclusive results. Such data are often analyzed with Cox proportional hazards regression using a multiplicative interaction term between biomarker and treatment, with insufficient power and possibly biased results. Instead of analyzing patients who do (cases) and do not experience (non-cases) the survival event of interest, case-only analysis with logistic regression has been proposed, however with unknown small sample properties. We evaluated the performance of case-only analysis with bias-eliminating Firth correction and confidence intervals obtained with a profile likelihood method in a simulation study tailored to breast cancer. Our results show that this approach is generally inferior to the full cohort analysis but has acceptable properties when the marker is protective or null among patients treated with the standard treatment, the event rate is low (e.g., a rare event and a protective marker) and treatment assignment is independent of the marker level (e.g., in randomized studies). In such situations, the case-only design offers substantial cost savings. However, the model is sensitive to these assumptions.
Similar content being viewed by others
Introduction
Personalized medicine aims to find effective treatments for selected individuals. In oncology, for example, the selection can be based on the genetics of the tumor or healthy tissue, the tumor (immune) environment, lifestyle or comorbidities of patients1. These characteristics, called predictive biomarkers, may predict a patient’s response to a particular treatment, i.e., they indicate how one should be treated2. However, even though many candidate biomarkers are discovered in laboratories, for instance, by screens on cells, rodents or humans, only few end up being used in clinical practice. One reason for this may be the rigorous process biomarkers have to go through, culminating in a randomized clinical trial as the last step. Such a trial generally requires a large number of patients, access to patient specimen, and standardized assays for biomarker measurement. Unfortunately, these studies are often prohibitively expensive or suitable patients with appropriate tissue samples are scarce, so that these early clinical studies are often too small. This reduces power and causes small sample bias when suboptimal statistical methods are applied, which may lead to abandoning a promising biomarker.
A commonly used statistical method for evaluating a binary predictive biomarker is the Cox proportional hazards regression for failure time data3 with a multiplicative interaction term between biomarker and treatment. The interaction term indicates whether the relative effect of an experimental treatment in comparison to a control treatment differs by biomarker level4,5,6. In our earlier work7, we show that in particular settings specific to studies on predictive biomarkers, this method yields biased results and overestimates the standard error of the interaction term for cohort sizes under 600 patients. We also show that bias is reduced when the score function of the Cox model is modified with a Firth correction8 and confidence intervals (CIs) are obtained with a profile likelihood (PL) approach. However, results of studies with less than 400 patients rarely have sufficient power to detect interaction between biomarker and treatment. Thus, there is a need for the development of new statistical methods or the adaptation of standard methods for small studies of predictive biomarkers.
It has been shown that the interaction coefficient and treatment effects in biomarker subgroups can be estimated in the subset of patients who experience the event of interest, i.e., cases only9,10,11. The estimation is unbiased if the event rate is low, censoring is non-informative, and the biomarker level and treatment assignment are independent. With such a design, a simple logistic model can be used instead of a Cox model. The case-only design has been proposed more than a decade ago but has only rarely been applied in biomarker studies12,13. Epidemiologists, on the other hand, have used it for a long time to evaluate gene-environment or gene-gene interactions on binary outcomes14,15,16. In such studies, a case-only design is being used as an alternative to the case-control design since it obviates the need for genetic assays in non-case subjects and even provides a more efficient estimate of the interaction coefficient under the assumption of independence between the genetic and the environmental factors. Note that the assumption has to be made in observational studies but it is fulfilled by design when treatment is randomized.
Here, we performed a simulation study and designed it using results from real clinical studies on breast cancer (BC)17,18,19,20,21. Three of the studies were randomized controlled trials and two were observational series of patients. All studies had used archived specimens for biomarker measurements and evaluated interactions with either chemotherapy or endocrine therapy on risk of BC relapse or death due to any cause (recurrence-free survival, disease-free survival) or death due to BC (breast cancer-specific survival). The simulated data was analyzed using cases only with a logistic model corrected with the bias-eliminating approach developed by Firth8 and a CI calculated using a PL approach. The two approaches are generally recommended for analyses of small studies. We compared the results with an uncorrected logistic model on cases only and a Cox model modified with the Firth correction on cases and non-cases. More details about the performance of the latter model can be found in our earlier work7. The aim of our study was to find scenarios of studies on predictive biomarkers that indicate when such studies could be analyzed with a modified case-only model.
Methods
Data generation
N datasets were generated and all n patients within each dataset were assigned to one of four combinations of biological marker M (low level: \(M=0\); high level: \(M=1\)) and treatment T (standard treatment: \(T=0;\) experimental treatment: \(T=1\)). The probabilities of assignment to each combination depended on the proportion \(p_M\) of patients with high marker level, the proportion \(p_T\) of patients treated with the experimental treatment, and the odds ratio \(\text{ OR}_{MT}\) of the association between marker and treatment22.
Event times \(t_e\) were generated from a random variable \(U_e\) uniformly distributed on the interval [0, 1], M, T, and the product MT of M and T:
\(\text{ exp }\left( \beta _M\right) =\text{ HR}_M\) was the ratio of hazards for high vs. low marker level among patients receiving standard treatment, \(\text{ exp }\left( \beta _T\right) =\text{ HR}_T\) was the ratio of hazards for experimental vs. standard treatment among patients with low marker level, \(\text{ exp }\left( \beta _{I}\right) =\text{ HR}_{I}\) was the interaction hazard ratio, i.e., the ratio between treatment hazard ratios in high vs. low marker level. An exponential survival distribution with a scale parameter \(\lambda _e\) was used to calculate baseline survival with
so that before the end of follow-up \(t_{end}\) the proportion of patients with low marker level receiving standard treatment who experienced an event was \(q_e\), i.e., the exponential survival function \(S(t)=\text{ exp }\left( -\lambda _e t\right)\) at \(t_{end}\) was \(S(t_{end})=1-q_e\). In additional analyses, the baseline survival was calculated with a Weibull survival distribution with increasing or decreasing hazard of event occurrence over time. We do not show these results but refer to them in the discussion. Censoring times \(t_c\) were generated similarly from a random uniform variable \(U_c,\) scale parameter \(\lambda _c\), the proportion \(q_c\) of patients with low marker level receiving standard treatment censored before \(t_{end}\) (excluding administrative censoring at the end of the study period) and \(\beta _M=\beta _T=\beta _{I}=0\) to achieve non-differential censoring by marker and treatment. The patient was specified as experiencing an event at \(t_e\) if \(t_e< \text{ min }(t_c, t_{end}\)) and censored otherwise at \(\text{ min }(t_c, t_{end}\)).
We generated \(N=10000\) datasets with different values for n (200, 300, 400, 500, 600, 800, 1000), \(p_M\) (0.25, 0.5, 0.75), \(\text{ HR}_M\) (0.6, 0.8, 1, 3, 6), \(\text{ OR}_{MT}\) (0.5, 1, 2) and \(\text{ HR}_{I}\) (0.25, 0.5, 0.75, 1) but only one value of \(p_T=0.5\), \(q_e=0.2\), \(q_c=0.2\), \(t_{end}=5\) years and HR\(_T=1\). The different specifications were chosen based on real datasets presented and summarized in our earlier work7. Briefly, the sample size in these studies varied from 117 to 541. The proportion of patients with high marker levels was 14\(\%\), 18\(\%\) and about 50\(\%\), and the marker effect among patients treated with the standard treatment was either protective (\(HR_M\) = 0.67, 0.86) or harmful (\(HR_M\) = 3.51, 5.39, 6.60). The ratio between odds of high marker level for patients treated with experimental vs. standard treatment, i.e., \(OR_{MT}\), ranged from 0.79 to 2.34, and between 42\(\%\) and 58\(\%\) of the patients received the experimental treatment. Patients with the low marker level benefitted from the experimental treatment (\(HR_T\) between 0.23 and 0.87) and in all studies except one, the benefit of the experimental treatment was greater for patients with high vs. low marker levels (\(HR_I\) = 0.08, 0.24, 0.37, 0.63, 1.95). However, since a qualitative interaction between the marker and the treatment is needed to guide treatment choice23, we simulated scenarios with equally efficacious treatments among patients with low marker level (HR\(_T=1\)).
Aggregated data from various clinical studies were used as inputs for the simulation study. All studies were carried out in accordance with relevant guidelines and regulations. All study protocols were approved by responsible institutional committees. The study by de Boo et al.17 was approved by the Ethics Committee of the participating medical institutions and the National Agency for Medicines, Finland. The Institutional Review Board at the Helsinki University Hospital, Finland, approved the use of archival tissue for the current translational study. All seven studies in Knauer et al.18 had been approved by the respective institutional review boards. The ethical committees of Lund and Linköping universities approved the study by Kok et al.19. The study by Schouten et al.20 was approved by the Ethical Committee of the University of Heidelberg. The trial described in Vollebergh et al.21 was approved by the Institutional Review Board of the Netherlands Cancer Institute. In all those studies, informed consent was obtained from all subjects and/or their legal guardian(s).
Data analysis
The generated datasets were analyzed using three different models and two parametrizations of each model, and the 95% CIs were calculated according to Wald and PL methods.
A logistic regression of treatment assignment was fitted to case-only data, i.e., K patients who experienced an event of interest at times \(t_e\) (\(e=1,...,K\)), using the formula
and
where \(\text{ log }(p_e/(1-p_e))\) was a constant (“offset”) term with \(p_e\) being the fraction of patients in the full cohort at time \(t_e\) assigned to the experimental treatment who were still at risk at time \(t_e\). \(M_{low}\) and \(M_{high}\) were binary variables indicating patients with low and high marker level, respectively. Additionally, models (1) and (2) with modified score functions based on the method developed by Firth8 were fitted to case-only data.
A Firth-corrected Cox proportional hazards model was fitted to all generated patients (cases and non-cases) using the hazard function
with baseline hazard function \(h_0\) to evaluate the interaction term \(\beta _{I}\) and
to evaluate the treatment effect by marker level, i.e., \(\beta _{TM_{low}}\) and \(\beta _{TM_{high}}\). \(\text{ exp }\left( \beta _{TM_{low}}\right) =\text{ HR}_{TM_{low}}\) and \(\text{ exp }\left( \beta _{TM_{high}}\right) =\text{ HR}_{TM_{high}}\) were the hazard ratios for experimental vs. standard treatment in subgroups of low and high marker levels, respectively. \(TM_{low}\) and \(TM_{high}\) were binary variables defined as \(TM_{low}=1\) if \(M=0\) and \(T=1\), and \(TM_{low}=0\) otherwise; \(TM_{high}=1\) if \(M=1\) and \(T=1\), and \(TM_{high}=0\) otherwise, to indicate patients receiving experimental treatment in the two marker levels.
As shown by Dai et al.11, \(\gamma _T \approx \beta _{T}\), \(\gamma _I \approx \beta _{I}\), \(\gamma _{TM_{low}} \approx \beta _{TM_{low}}\), \(\gamma _{TM_{high}} \approx \beta _{TM_{high}}\), when treatment assignment is independent of marker level, censoring is independent of treatment conditionally on marker level and the event is rare for all event times \(t_e\). Even though the \(\gamma\) parameters are estimated with logistic regressions, they are interpreted as hazard ratios.
As defined by Morris et al.24, we calculated several performance measures to summarize estimation of the interaction term across all scenarios, namely (i) bias: \(\frac{1}{N_c}\sum _{j=1}^{N_c}\hat{\beta }_{I,j}-\beta _{I}\) or relative bias: \(\frac{1}{N_c}\sum _{j=1}^{N_c}\frac{\hat{\beta }_{I,j}-\beta _{I}}{|\beta _{I}|}\), (ii) relative % error in model standard error (ModSE): \(100\left( \frac{\widehat{\text{ ModSE }}}{\widehat{\text{ EmpSE }}}-1\right)\) with the model standard error ModSE obtained as \(\sqrt{\frac{1}{N_c}\sum _{j=1}^{N_c} \widehat{\text{ Var }}\left( \hat{\beta }_{I,j}\right) }\) and the empirical standard error EmpSE obtained as \(\sqrt{\frac{1}{N_c-1}\sum _{j=1}^{N_c}\left( \hat{\beta }_{I,j}-\bar{\beta }_{I}\right) ^2},\) (iii) coverage of the CI: \(\frac{1}{N_c}\sum _{j=1}^{N_c} \mathbf{{1}}\left( \hat{\beta }_{l,j}\le \beta _{I}\le \hat{\beta }_{u,j}\right)\) with \(\hat{\beta }_{l,j}\) being the lower bound and \(\hat{\beta }_{u,j}\) being the upper bound of the 95% CI around \(\hat{\beta }_{I,j}\), and (iv) type I error or power: \(\frac{1}{N_c}\sum _{j=1}^{N_c} \mathbf{{1}}(p_j\le \alpha )\), where \(p_j\) was the p-value obtained with the j-th dataset by testing the null hypothesis \(\beta _I=0\) and \(\alpha\) was the significance level fixed at 0.05. In all formulas, \(N_c\) indicated the number of converged models, \(\beta _{I}\) was the true value of the coefficient of the interaction term and \(\hat{\beta }_{I,j}\) was the estimate of the interaction coefficient in the j-th dataset. The mean of all \(\hat{\beta }_{I,j}\) was indicated as \(\bar{\beta }_{I}\) and \(\textbf{1}\) was an indicator function. Since the calculations of coverage, type I error and power depended on the CI method, separate calculations were performed for the Wald and PL approach. Additionally, the estimation of treatment effect in the subgroups of low and high marker levels was summarized with the bias and relative percentage error in standard error. For calculations of all performance measures, only datasets with events in at least three combinations of marker and treatment and results from converged models were used. A model was considered converged if the actual number of iterations for a model fit was less than the prespecified maximum number of iterations. However, for summary statistics of the PL-based power and coverage, models with overall convergence and additionally with convergence of the confidence bound were used since the latter is required to determine whether or not the PL confidence interval included zero or the true parameter value.
All simulation scripts were written in R version 4.3.1. The logistf function of the logistf package version 1.26.025 was used to fit a standard logistic and logistic-Firth model. The coxphf function of the coxphf package version 1.13.426 was used with maximally 1000 iterations (maxiter) and a maximum step size (maxstep) of 0.01 to fit a Cox-Firth model. The scripts are available on request from the corresponding author.
Results of the simulation study for treatment assignment independent of the marker level, i.e., \(\text{ OR}_{MT}=1\), and a protective (\(\text{ HR}_M=0.8\), left panel) and a null (\(\text{ HR}_M=1\), right panel) marker effect among patients treated with the standard treatment. The treatment HRs were \(\text{ HR}_{TM_{low}}=1\) and \(\text{ HR}_{TM_{high}}=0.5,\) i.e., \(\beta _{TM_{low}}=0\) and \(\beta _{TM_{high}}=-0.69\), the interaction HR was \(\text{ HR}_{I}=0.5\), i.e., \(\beta _{I}=-0.69\), and the proportion of patients with high marker level was \(p_{M}=0.25\). Case-only results were obtained with a Firth-corrected logistic regression, while full cohort results were obtained with a Firth-corrected Cox proportional hazards model. Number of patients was the number of patients per dataset in full cohort HR, hazard ratio; M, marker; OR, odds ratio; PL, profile likelihood; Rel., relative; SE, standard error.
Simulation results
Under marker-treatment independence, i.e., \(\text{ OR}_{MT}=1\), the interaction and the treatment effect coefficients and their standard errors estimated with the Firth-corrected case-only method showed usually an acceptable bias when the marker was protective or null among patients treated with the standard treatment, i.e., \(\text{ HR}_M \le 1\) (Fig. 1, Table 1). The event rate in these scenarios was such that 10–20\(\%\) of patients experienced an event over the 5-year follow-up period and the type I error for the interaction coefficient was around or slightly below 5\(\%\) for both the Wald and PL method (see Supplementary Table 3 for an example scenario). For harmful markers, however, the interaction coefficient was heavily positively biased with bias up to 50% (data not shown), irrespective of sample size. This bias came from a negative bias of the treatment effect among low marker level patients and a positive bias among high marker level patients (Fig. 2, Table 2). Even a small bias in the treatment coefficient for the two marker levels led to a large relative bias for the interaction coefficient because the estimated treatment coefficient in the low marker level was negative instead of 0 and in the high marker level was away from the true value and towards 0. That caused that the interaction effect was away from the truth and also towards 0. The event rate for harmful markers was always larger so that 20-55\(\%\) of patients experienced an event over the 5-year follow-up period, and the stronger the marker effect the larger the event rate and the larger the bias. If treatment assignment depended on marker level, i.e., \(\text{ OR}_{MT}\ne 1\), the interaction coefficient and the treatment effect coefficient in the high marker level were severely biased with the direction of bias related to the direction of dependence (Supplementary Fig. 1–2, Supplementary Table 1–2) and the type I error was substantially above the nominal level of 5\(\%\).
Convergence of the Firth-corrected case-only model was very high for all scenarios. Coverage was often above the nominal level of 95% and approached 95% with increasing sample size when \(\text{ HR}_M\le 1\) and \(\text{ OR}_{MT}=1\). It was usually closer to the nominal level when it was calculated with the PL in comparison to the Wald approach. However, coverage for both methods was below the nominal level and moved away from nominal level with larger sample size when \(\text{ HR}_M>1\) or \(\text{ OR}_{MT} \ne 1\), i.e., when the interaction coefficient but not its standard error was biased. This often led to 95% CIs for the interaction coefficient which did not include its true value (Fig. 1–2, Supplementary Fig. 1–2, Tables 1–2, Supplementary Table 1–2). Moreover, statistical power also depended strongly on the marker-treatment association and it decreased with larger values of the \(\text{ OR}_{MT}\). Under marker-treatment independence, power was lower than 80% for sample sizes smaller than 600 with event rates over 5 years below 20% when the marker was protective or null and was consistently slightly higher for PL-based vs. Wald-based CI (Tables 1–2, Supplementary Table 1–2).
As previously shown7, the full cohort analysis with the Firth-corrected Cox model was virtually unbiased for sample sizes down to 200 and overall event rates over 5 years above 20\(\%\) when the marker was harmful among patients treated with the standard treatment, i.e., \(\text{ HR}_M>1\). Otherwise, the interaction coefficient and its standard error were substantially biased. The interaction coefficient was biased towards and away from the null and standard error was overestimated for small sample sizes, but bias decreased and monotonically approached zero as the sample size increased. These results did not depend on the marker-treatment association nor the censoring rate.
Under marker-treatment independence, coverage, power and estimation of the standard error of the interaction coefficient were similarly good for the case-only analysis with the logistic-Firth model and the full cohort Firth-corrected Cox model for protective or null markers, i.e., \(\text{ HR}_M \le 1\). However, relative bias of the interaction coefficient persists at 5-10\(\%\) with the case-only model regardless of sample size and is lower, often around or below 5\(\%\) for the full cohort Firth-corrected Cox model. Power is generally low at 600 patients or less for either method (Fig. 1, Table 1).
It is noteworthy that the Firth-correction improved the performance of the case-only analysis in general. When \(\text{ HR}_M\le 1\) and sample size was small, relative bias of all evaluated coefficients decreased when the correction was used in comparison to a standard case-only model without the correction (Supplementary Fig. 3). However, the standard Firth correction shrinks all parameters, including the intercept, and therefore produces estimates which are slightly biased at 5\(\%\) or less even for large sample sizes.
Results of the simulation study for treatment assignment independent of the marker level, i.e., \(\text{ OR}_{MT}=1\), and a harmful (\(\text{ HR}_M=3\), left panel; \(\text{ HR}_M=6\), right panel) marker effect among patients treated with the standard treatment. The treatment HRs were \(\text{ HR}_{TM_{low}}=1\) and \(\text{ HR}_{TM_{high}}=0.5,\) i.e., \(\beta _{TM_{low}}=0\) and \(\beta _{TM_{high}}=-0.69\), the interaction HR was \(\text{ HR}_{I}=0.5\), i.e., \(\beta _{I}=-0.69\), and the proportion of patients with high marker level was \(p_{M}=0.25\). Case-only results were obtained with a Firth-corrected logistic regression, while full cohort results were obtained with a Firth-corrected Cox proportional hazards model. Number of patients was the number of patients per dataset in full cohort HR, hazard ratio; M, marker; OR, odds ratio; PL, profile likelihood; Rel., relative; SE, standard error.
Discussion
A modified case-only model can be used to analyze relatively small studies of predictive markers when the overall event rate is low, i.e., when the event is rare at baseline and the marker is protective, and when the treatment assignment is independent from the marker level, e.g., patients are randomized to treatment. In such studies, the model estimates the interaction and treatment coefficient and their standard errors with acceptable bias. Moreover, the coverage of the CI for the interaction coefficient is at or just slightly above the nominal level while type I error is at or slightly below the nominal level. The number of biomarker measurements and corresponding costs are reduced by 80\(\%\) or more compared with a full cohort analysis.
However, the modified case-only model has to be used cautiously since our simulation results are based on a finite series of scenarios derived from previous clinical studies of breast cancer. Moreover, model performance appears to be sensitive to the assumptions. The model should not be used when the event rate is not low, e.g., due to a harmful marker, or when treatment assignment depends on the marker level.
Most clinical studies on treatment heterogeneity with failure time endpoints are analyzed using data from a cohort and applying a Cox regression with a multiplicative interaction term between marker and treatment6. In studies with a small number of patients, unbiased results of a Cox regression are guaranteed when the marker is harmful among patients treated with the standard treatment. The model can yield biased results when the marker is protective or null in this subgroup of patients. In our earlier work7, we showed that the bias is reduced when the score function of a Cox model is modified using a Firth correction. In the current study, we show that bias reduction can also be obtained by analyzing only patients who experience the survival event of interest using a Firth corrected case-only model. The Firth-corrected model with the full cohort and with cases only, show acceptable performance only when there is no association between the marker level and the treatment assignment. When there is a dependence between the marker and the treatment or the marker is harmful among patients treated with the standard treatment (leading to a high event rate), a Firth-corrected case-only model is severely biased. Thus, our study confirms the importance of assumptions for valid results of the case-only approach discussed in the literature, namely, a low event rate and marker-treatment independence10,11,16.
The comparison between results obtained with a standard Cox model and a standard case-only model for survival outcomes has been previously conducted using randomized studies, where the independence between the two factors that interact with each other is established by design10,11. However, in epidemiological studies, it has been shown that a dependence causes bias16. Even in retrospective data from a randomized clinical trial, independence between marker and treatment is not guaranteed. For example, the availability of tissue for biomarker measurements may depend on marker or treatment. If the dependence, on the other hand, can be explained by a third factor, the bias can be reduced or eliminated by adjusting the case-only model for this third factor27. We did not evaluate this in the simulation study, since none of the patient and tumor characteristics in the BC example studies explained the dependence between the marker and the treatment.
Our simulation study does not address complex situations that may occur in some cancer studies. For example, risk of relapse or death can increase over time or the baseline hazard changes in other ways, i.e., hazards are not constant. Although some limited sensitivity analyses indicated that our results do hold in more complex situations, e.g., when a non-constant hazard at baseline was used instead of the exponential hazard, caution needs to be used when applying the case-only design to situations not evaluated here.
An important advantage of using a case-only approach in a retrospective study is the cost reduction since marker measurements are only performed for a subset of trial participants. With the resources for a full cohort study, one could pool patients with events from multiple trials, which would lead to increased power. However, the case-only approach estimates the marker-treatment interaction and the treatment effects by marker level but not the marker effect. If the latter is an objective, a full cohort or an augmented case-only design is needed. The augmented case-only design is a hybrid method which combines case-only and case-control designs by randomly sampling controls from both treatment arms or from the experimental treatment only11.
The assumptions under which the case-only design can be useful are not easy to verify prior to study onset. However, the expected event rate is generally known during the design phase of a study and the independence assumption is per definition fulfilled in randomized designs, making a large number of studies suitable for retrospective case-only analyses. The direction of the marker effect has to be known from previous studies. Since it cannot be estimated with a case-only model, it is not even known after the study. Many predictive marker candidates were, however, previously used as prognostic markers. Noteworthy, results with acceptable bias for a case-only model with a harmful marker cannot be obtained by simply recoding and estimating \(1/\text{HR}_M\) for the standard treatment. Changing the reference category for the marker automatically recodes the interaction effect to \(1/\text{HR}_I\). Although, the different combinations of marker and treatment are shuffled and the comparison groups are different, the event rate in the different subgroups which eventually influences the bias is not changed.
In conclusion, we show that small studies on predictive markers can be analyzed with a case-only model when the event rate is low, treatment assignment is independent from marker level and the marker is protective or null among patients who received the standard treatment. The design offers substantial cost savings.
Data availability
Computer scripts in the programming language R are available on request from the corresponding author.
References
Hoeben, A., Joosten, E.A.J., & Van Den Beuken - Van Everdingen, M.H.J. Personalized medicine: recent progress in cancer therapy. Cancers 13: 242 (2021).
Ballman, K. V. Biomarker: Predictive or prognostic?. Clin. Oncol. 33(33), 3968–3971 (2015).
Cox, D. R. Regression models and life-tables. J. R Stat. Soc. Series B Stat. Methodol. 34(2), 187–220 (1972).
Ou, F.-S. et al. Biomarker discovery and validation: Statistical considerations. J. Thorac. Oncol. 16(4), 537–545 (2021).
Altman, D. G. et al. Reporting recommendations for tumor marker prognostic studies (REMARK): Explanation and elaboration. PLoS Med 9(5), e1001216 (2012).
Sollfrank, L. et al. A scoping review of statistical methods in studies of biomarker-related treatment heterogeneity for breast cancer. BMC Med. Res. Methodol. 23, 154 (2023).
Jóźwiak, K., Nguyen, V.H., Sollfrank, L., et al. Cox proportional hazards regression in small studies of predictive biomarkers. Sci. Rep. 14(1), 14232 (2025).
Firth, D. Bias reduction of maximum likelihood estimates. Biometrika 80(1), 27–38 (1993).
Dai, J. Y. et al. Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction. Biometrika 99(4), 929–944 (2012).
Vittinghoff, E. & Bauer, D. C. Case-only analysis of treatment-covariate interactions in clinical trials. Biometrics 62, 769–776 (2006).
Dai, J. Y. et al. Augmented case-only designs for randomized clinical trials with failure time endpoints. Biometrics 72, 30–38 (2016).
Prentice, R. L. et al. Variation in the FGFR2 gene and the effects of postmenopausal hormone therapy on invasive breast cancer. Cancer Epidemiol. Biomarkers Prev. 18, 3079–3085 (2009).
Chlebowski, R. T. et al. Estrogen alone and health outcomes in black women by African ancestry: A secondary analyses of a randomized controlled trial. Menopause 24(2), 133–141 (2017).
Piegorsch, W., Weinberg, C. & Taylor, J. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat. Med. 13(2), 153–162 (1994).
Khoury, M. & Flanders, W. Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: Case-control studies with no controls!. Am. J. Epidemiol. 144(3), 207–223 (1996).
Albert, P. S. et al. Limitations of the case-only design for identifying gene-environment interactions. Am. J. Epidemiol. 154(8), 687–693 (2001).
De Boo, L. W. et al. Adjuvant capecitabine-containing chemotherapy benefit and homologous recombination deficiency in early-stage triple-negative breast cancer patients. Br. J. Cancer 126, 1401–1409 (2022).
Knauer, M. et al. The predictive value of the 70-gene signature for adjuvant chemotherapy in early breast cancer. Breast Cancer Res. Treat 120(3), 655–661 (2010).
Kok, M. et al. Estrogen receptor-\(\alpha\) phosphorylation at serine-118 and tamoxifen response in breast cancer. J. Natl. Cancer Inst. 101, 1725–1729 (2009).
Schouten, P. C. et al. Breast cancers with a BRCA1-like DNA copy number profile recur less often than expected after high-dose alkylating chemotherapy. Clin. Cancer Res. 21(4), 763–770 (2015).
Vollebergh, M. A. et al. An aCGH classifier derived from BRCA1-mutated breast cancer and benefit of high-dose platinum-based chemotherapy in HER2-negative breast cancer patients. Ann. Oncol. 22(7), 1561–1570 (2011).
Fleiss, J. L., Levin, B. & Paik, M. C. Statistical methods for rates and proportions 3rd edn. (Wiley, 2003).
Polley, M.-Y.C. et al. Statistical and practical considerations for clinical evaluation of predictive biomarkers. J. Natl. Cancer Inst. 105(22), 1677–1683 (2013).
Morris, T. P., White, I. R. & Crowther, M. J. Using simulation studies to evaluate statistical methods. Stat. Med. 38(11), 2074–2102 (2019).
Heinze, G., Ploner, M., Dunkler, D., et al. Firth’s bias-reduced logistic regression. Package logistf version 1.26.0 https://cran.r-project.org/web/packages/logistf/index.html (2023).
Heinze, G., Ploner, M., Jiricka, L., et al. Cox regression with Firth’s penalized likelihood. Package coxphf version 1.13.4 https://cran.r-project.org/web/packages/coxphf/index.html (2023).
Gatto, N. M. et al. Further development of the case-only design for assessing gene-environment interaction: Evaluation of and adjustment for bias. Int. J. Epidemiol. 33, 1014–1024 (2004).
Funding
Open Access funding enabled and organized by Projekt DEAL.
This work was supported by the Dutch Cancer Society, Grant No. KWF 10603.
Author information
Authors and Affiliations
Contributions
M.H. contributed to the conception of the study, edited the manuscript and provided overall supervision and coordination of the manuscript preparation. V.H.N. wrote R scripts and edited the manuscript. L.S. edited the manuscript. S.C.L. contributed to the conception of the study and edited the manuscript. K.J. contributed to the conception of the study, wrote R scripts, drafted and edited the manuscript. The final version was reviewed and approved by all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hauptmann , M., Nguyen, V.H., Sollfrank , L. et al. Case-only analysis in small studies of predictive biomarkers. Sci Rep 15, 13068 (2025). https://doi.org/10.1038/s41598-025-96904-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-96904-9




