In this issue of Blood Cancer Journal, Letailleur et al. explore the use of real-world data (RWD) as a synthetic control arm (SCA) in older patients with diffuse large B-cell lymphoma (DLBCL) and propose this innovative design for future trials [1]. To date, the use of RWD for assessing therapeutic benefit in pre-marketing drug development has largely been confined to benchmarking early phase studies. Such analyses often fail to account for confounders, as shown in a review of 40 EU conditional marketing authorizations where unadjusted, side-to-side comparisons were often used to support therapeutic advantage [2]. By optimizing the use of RWD, earlier certainty about clinical benefit could reduce the risk of marketing withdrawals due to disappointing confirmatory studies. However, caution is required. The risk associated with using RWD in early-phase development–for example, selecting the most promising drug candidate for clinical development—is lower than in confirmatory settings, where biased results could lead to large-scale exposure of patients to ineffective treatments.

The SCA in the study by Letailleur et al. was drawn from 170 patients from a historical clinical trial (CT, LNH09-7B) and RWD (REALYSA). Its performance was then tested in a hypothetical setting, replacing the internal control arm from SENIOR in a simulated study. SENIOR randomized 249 DLBCL patients ≥80 years 1:1 to R-miniCHOP plus placebo or lenalidomide (R2-miniCHOP) with identical 2-year overall survival (OS) of 66% in both arms (95% CI 0.65–1.5) [3]. In the simulation, inverse probability weighing with propensity scores was used to balance confounders between 104 available patients from the experimental arm of SENIOR and the SCA. Using this approach, the authors reached the same conclusion: that R2-miniCHOP was not superior to R-miniCHOP. Results were also similar using RWD alone (REALYSA) to generate the SCA, leading the authors to conclude that careful selection and appropriate analytics may allow RWD to replace internal control arms in selected situations.

While these results are promising, more work is needed before SCAs can adeptly replace internal control arms without increasing uncertainty. With a reported HR of 0.743 (95%CI 0.49–1.12) between the experimental arm of SENIOR and the SCA, the lack of observed difference could be related to statistical power. In fact, based on the included patient numbers and with a 2-year OS of 58% for the SCA vs. 66% for the experimental arm of SENIOR, a post-hoc calculation (\(\alpha\) = 0.05) yields power of just 30%, assuming the HR of 0.743 is true. In the head-to-head comparison of the SCA and the control arm of SENIOR, the absence of statistical significance at the P < 0.05 level also does not fully exclude a clinically relevant difference. With an HR between the internal arm of SENIOR and the SCA of 0.79 (95%CI 0.52–1.20), OS between arms could differ substantially.

Letailleur et al.’s analyses provide valuable insights needed for the future use of SCA generated from RWD. Considering that with few exceptions [4], survival gains for novel therapies of >10% are rarely seen in DLBCL, there is reasonable concern that any potential bias could risk false outcomes (positive or negative) with the use of SCAs in pivotal studies. Investigators and regulatory authorities need to better understand how to quantify this risk and the conditions required to conclude that the performance of an SCA is equivalent to an internal control. The U.S. Food and Drug Administration and European Medicines Agency guidelines in development for external comparator cohort studies clearly demonstrate that regulatory agencies expect increasing use of innovative trial designs which incorporate RWD [5, 6]. From a societal perspective, these types of trials are highly warranted as they can address costs, timelines, and logistical challenges in research.

Sources of bias in RWD are well-described. Clinical trial participants are likely to have different health-related behaviors compared to general populations. Compared to RWD, clinical trials may also be less likely to include patients with rapidly progressive disease, consistently associated with poorer survival in aggressive lymphoma [7]. Cancer trials are also subject to increasingly selective eligibility criteria [8], with trial ineligibility reliably associated with inferior survival outcomes across lymphoma subtypes [9, 10]. These factors all represent challenges in identifying a relevant external control group from the ‘real-world’. While methods like inverse probability weighting do control for measurable, known confounders, unmeasured, uncontrolled confounders will persist. Categorization of continuous variables (e.g., defining lactate dehydrogenase as normal vs. elevated) can also result in loss of informational value and suboptimal matching. Incentivizing the prospective and accurate collection of granular RWD will help to mitigate these sources of bias, as will the promotion of more inclusive and pragmatic clinical trials.

In addition to the above, there are other less quantifiable risks. Typically, an SCA would be considered for a single-arm trial (SAT) that has impressive efficacy results. However, impressive efficacy can sometimes be the result of selection bias and randomness. If only the SATs with the best observed outcomes are considered for matching with an SCA in a post-hoc manner, this would be a source of bias. Optimally, the plans for SCA should be detailed prospectively prior to knowing SAT results in a statistical analysis plan (SAP). The SAP should include matching variables as well as important prognostic covariates included in adjusted efficacy analyses. For full transparency, the SAP should also describe which RWD sources will be used for the SCA. Selection of those RWD sources should preferably be done with blinding to outcome data from the RWD source to avoid bias related to the Sponsor’s knowledge about patient outcomes in the SAT. By requiring preplanned use of an SCA, including how to generate the SCA and adjust for confounders, the risk of systematic bias from post-hoc, data-driven decisions will be lowered.

Endpoint selection when using RWD also merits caution. Without protocolized assessment timepoints, more toxic and disease-related events will be detected in patients undergoing more frequent physician visits and investigations. In the real-world setting, endpoints based on acute events such as time-to-discontinuation (TTD), time-to-next-therapy (TTNT), or OS may therefore be more bias-resistant than endpoints reliant on protocolized clinical or radiologic disease assessment, such as progression-free survival (PFS). However, safety assessment remains challenging; despite a substantially higher rate of Grade 3–4 adverse events for the experimental arm of SENIOR (81% vs. 53%), no comparison of the toxicity rates between the experimental arm and SCA was attempted, underscoring the unresolved challenges of comprehensive benefit/risk assessments with SCAs.

Despite these limitations, there is significant potential to improve the efficiency and quality of evidence generation for regulatory decisions by harnessing RWD, especially in the current digital era of healthcare delivery. Robust RWD should serve as the ultimate control for assessing the impact of new interventions on outcomes in representative populations, particularly in rare diseases where large-scale data are time consuming to collect. RWD provides information largely absent from trials including cost effectiveness, healthcare utilization, and long-term follow up. Yet, in contrast to established global frameworks for transparent clinical trial data management supported by regulatory and research governing bodies, existing RWD policies regarding collection and sharing are highly varied across and within jurisdictions [11, 12]. This significantly diminishes RWD capability and speed of use by researchers and regulators. These barriers largely stem from privacy and legal data custody concerns [13]. Although legitimate, the focus on these mitigatable risks should be balanced against the ethical implications to both existing—and future—patients of failing to incorporate RWD in drug development pathways. Consumer feedback indicates many patients share researchers’ concerns regarding data sharing barriers [14, 15].

Regulators must prioritize RWD science, such as that by Letailleur et al., and continue to work with registries globally to create clear guidance on acceptable RWD conduct. Only then can we harness the full benefits of RWD in drug development. Recognizing this need, we have worked with registry leads internationally to form the global Lymphoma Registry Alliance and bring together 50 registries from 30 countries to overcome some of these major issues faced.