Main

Circular extrachromosomal DNAs (ecDNAs) are present in most cancer types and are associated with poor patient outcomes1. ecDNAs typically span 50 kb to 1 Mb and have low nucleosome densities and high levels of transcription, resulting in increased expression of oncogenes when they are present5,6. This confers a growth advantage to recipient cells and preferential retention of oncogene-expressing ecDNAs. ecDNAs replicate autonomously, approximately once per cell cycle7; however, the absence of a centromere results in their unequal segregation at mitosis8,9, driving cancer heterogeneity and oncogene amplification in daughter cells10. This contributes to tumour evolution, treatment resistance and increased ecDNA copy numbers as the cancer progresses11. Consistent with this, ecDNAs predict 43% of high-grade oesophageal cancers and persist from early-stage to late-stage cancers12.

V(D)J recombination is vital to generate diversity of immunoglobulin and T cell receptor (TCR) genes. It is catalysed by the recombination-activating gene (RAG) proteins RAG1 and RAG2, which bring complementary V, D or J gene segments into a synaptic complex by binding to their adjacent recombination signal sequences (RSSs)2. The RSSs consist of conserved heptamer and nonamer sequences separated by a non-conserved spacer of 12 ± 1 or 23 ± 1 bp, with recombination almost exclusively occurring between RSSs of different spacer lengths13. Following cleavage at gene segment–RSS boundaries, gene segments are joined to generate the variable exons of immunoglobulins or TCRs, whereas the intervening DNA is normally excised2,3. The signal sequences on the excised DNA are joined together, generating a signal joint (SJ) on an ESC2,3 (Fig. 1a). ESCs from immunoglobulin light chain loci are typically 50 kb to 1 Mb in size.

Fig. 1: RAG-generated extrachromosomal circular DNAs persist throughout mouse B cell development.
figure 1

a, Schematic of the mouse immunoglobulin kappa (Igk) locus, highlighting the gene segments studied: Vκ16-104, Vκ3-1 and Vκ11-125, which undergo deletional recombination to generate an ESC. All Igl recombination reactions are deletional. Grey and blue triangles represent 12- and 23-RSSs, respectively; blue and purple rectangles represent V and J gene segments, respectively; the orange rectangle represents the constant region exon; the RS element is depicted by a green rectangle. b, Ratio of SJs to recombination junctions (Rec) at sequential stages of B cell development as determined by absolute qPCR for Jκ5-Vκ3-1, Jκ5-Vκ16-104 and Jλ1-Vλ1 and compared with the ratio in pre-B cells, which was set at 1:1 for ease of comparison. n = 3 samples. BM, bone marrow. c, Schematic of the isolation of ESCs from mouse mature B cells. Splenocytes were isolated from six-week-old mice and purified by flow cytometry to obtain IgM+ and IgG+ B cells (Supplementary Fig. 1); genomic DNA (gDNA) was extracted and digested with RecBCD to remove linear DNA. d, Jκ5-Vκ11-125 and Jκ5-Vκ16-104 ESCs persist as circles in mouse mature B cells (IgM+, upper and IgG+, lower). The amount of undigested DNA was determined by qPCR. Gapdh (encoded by linear, genomic DNA) was used as a negative control; untreated DNA was a further control (Ctrl). P values were determined by an unpaired, two-tailed Student’s t-test. Comparison versus Gapdh, 95% confidence interval: IgM+ Jκ5-Vκ11-125, −71.36 to −13.20; IgM+ Jκ5-Vκ16-104, −79.49 to −25.81; IgG+ Jκ5-Vκ11-125, −80.44 to −10.43; IgG+ Jκ5-Vκ16-104, −88.83 to −37.15. n = 3 samples. e, Rag1 expression levels at different stages of mouse B cell development as determined by quantitative PCR with reverse transcription (RT–qPCR). Data are normalized to Hprt expression levels. NIH3T3 cDNA is a negative control. n = 3 samples. Mean values are shown; error bars represent s.d.

Source Data

Initially, ESCs were thought to be inert, non-replicative entities that become diluted through cell division4. However, ESCs are now known to trigger genome instability through two related mechanisms. First, RAGs reassociate with the ESC SJ to catalyse ESC reintegration at RSS-like sequences known as cryptic RSSs (cRSSs), in a reaction that requires cleavage of both the ESC SJ and a genomic cRSS14,15,16. Although the frequency of reintegration is not known, it can cause insertional mutagenesis at tumour suppressor gene hotspots in T cell acute lymphoblastic leukaemia (T-ALL)17. Second, the RAG–ESC complex triggers double strand DNA breaks (DSBs) at cRSSs via a ‘cut-and-run’ reaction. In this reaction, the RAG–ESC complex synapses with a genomic cRSS, but unlike in reintegration, only the cRSS is cut18. The cleaved cRSS is released as a DSB, whereas the RAG–ESC complex remains intact to potentially trigger further DSBs18. Crucially, the DSBs caused by the cut-and-run reaction colocalize with many structural variants (SVs) found in ETV6–RUNX1 acute lymphoblastic leukaemia (ALL)19 and map to frequently mutated genes in B cell precursor ALLs (BCP-ALLs)18, implying that cut-and-run contributes to the development of BCP-ALL.

Current dogma states that ESCs are diluted during cell division and gradually decrease in number to negligible levels20,21. This, together with the downregulation of RAG proteins following productive immunoglobulin light chain (IgL)22 or TCRα/γ recombination was believed to limit the potential harmful effects of the RAG–ESC complex. However, we show here that ESCs replicate and persist through multiple cell divisions. Moreover, similar to ecDNAs, high ESC copy numbers correlate with poor cancer prognosis, indicating that ESCs also have a pivotal role in cancer progression.

ESCs persist to mature B cells in mice

ESCs, in the form of TCR excision circles, are present in naive thymic emigrants and persist in circulating lymphocytes for approximately two weeks in chickens and potentially much longer in primates23,24,25. Similarly, kappa-deleting recombination excision circles, generated by recombination between the Jκ–Cκ intron RSS and kappa-deleting element (KDE in humans, RS in mice), are found in about 30% of Igκ+ B cells and nearly all newly generated Igλ+ B cells4. RAG gene expression is downregulated following productive TCRα/γ and IgL rearrangement and editing22, and therefore ESC production outside the thymus and bone marrow is negligible. Furthermore, ESCs are believed to be non-replicative4 and lost via cell division. However, to test whether, like ecDNAs, ESCs are replicated and inherited, we purified genomic DNA from pre-B cells and IgM+ B cells from mouse bone marrow, as well as IgM+ and IgG+ B cells from mouse spleen.

Several IgL recombination products were examined from both immunoglobulin kappa (Igk) and lambda (Igl) loci, together with their corresponding ESCs, via quantitative PCR (qPCR) using standards with known numbers of copies of the PCR product under investigation. The Igk locus undergoes both deletional recombination, where the SJ is found on the extrachromosomal ESC, and inversional recombination, where SJs are retained in the genome; our analyses focussed solely on deletional events (Fig. 1a). Remarkably, the SJ:recombination junction ratio shows only modest differences across all stages of B cell maturation and even appears to increase in IgG+ cells for Jκ5-Vκ3-1 and Jκ5-Vκ16-104 ESCs (Fig. 1b and Extended Data Fig. 1a). This increase cannot be explained by removal of the recombination junction via secondary recombination, as the ratio of the recombination junction to other genomic regions is maintained throughout B cell development (Extended Data Fig. 1b). Given that at least six cell divisions are required for maturation from IgM+ to IgG+ cells26, these data imply that SJs on ESCs are replicated and inherited. Indeed, in the absence of replication, ESCs would be diluted to 1.6% or less of their corresponding recombination junction in IgG+ cells.

It remains possible, however, that ESCs have reintegrated into the genome and been replicated as part of genomic DNA. Reintegration cannot have occurred via opening of the SJ, as the assay to detect ESCs involves amplification of intact SJs. However, in principle, reintegration could occur via recombination between an RSS on the ESC (Fig. 1a) and one in the genome, resulting in the insertion of an intact SJ. The frequency of such events is expected to be low15,16,17, but to address this possibility, we tested whether ESCs remain as extrachromosomal circles in IgM+ and IgG+ B cells.

High-molecular-mass genomic DNA was prepared from IgM+ and IgG+ B cells using conditions that minimize DNA shearing. Following exonuclease V (RecBCD) treatment to digest linear DNA but leave closed circular or nicked DNA intact27 (Fig. 1c), we measured the amounts of residual Jκ5-Vκ11-125 and Jκ5-Vκ16-104 SJs by qPCR, using standards with known copy numbers. Remarkably, both SJs were present at more than 50% of the undigested level, whereas the Gapdh control from linear genomic DNA was nearly completely lost (Fig. 1d). The presence of significantly higher fractions of ESCs compared with Gapdh following treatment strongly indicates that ESCs are circular in mouse splenic IgM+ and IgG+ B cells. Moreover, very low Rag1 expression in IgG+ cells implies that the circular ESCs have replicated and persisted from earlier stages, rather than being newly generated (Fig. 1e).

ESCs are present in BCP-ALL samples

In principle, the presence of ESCs in mature lymphocytes is unlikely to have functional consequences, owing to their very low RAG gene expression. However, RAGs are aberrantly expressed in most BCP-ALL subtypes28,29,30 (Extended Data Fig. 1c), and analysis of cDNA from more than 85 patient samples shows that expression of RAG1, but not RAG2, is significantly increased at diagnosis in patients who subsequently relapse (Extended Data Fig. 1d; RAG1: P = 0.0022). Therefore, the presence of RAGs and ESCs in these cancers could increase RAG–ESC complex formation and the risk of genome instability via cut-and-run or reintegration reactions. To investigate whether ESCs are detectable in primary BCP-ALL samples, we first determined the major recombination event(s) in nine ETV6–RUNX1 BCP-ALL samples using degenerate primer sets31 to amplify rearrangements at the human IGK and IGL light chain loci. The resulting PCR products were cloned and sequenced, followed by sequence alignment to the IGK and IGL loci (Extended Data Fig. 2a).

The presence of the corresponding ESC SJs was then investigated using primers specific to each sample. Using this strategy, SJs with the predicted sequences were detected in six out of nine patients analysed (Extended Data Fig. 2b), with between one and three distinct SJs per patient. Since each recombination event generates just a single ESC and there are millions of cells in these malignancies, the ability to detect SJs in these samples suggests that these ESCs have replicated and persisted through many cell divisions.

Similar analyses (and high-throughput analyses, discussed below) using samples from patients with BCR–ABL1, CRLF2-r and low-hypodiploid BCP-ALL showed that SJs from deletional recombination events are also present in other BCP-ALL subtypes (Extended Data Fig. 2c).

To verify that SJs in BCP-ALL are present as extrachromosomal circles, high-molecular-mass DNA was prepared from patient samples using conditions that minimize DNA shearing. Following RecBCD treatment to remove linear DNA, SJs were amplified by droplet digital PCR (ddPCR). Consistent with their presence on extrachromosomal circles, SJs were amplified to a similar level with or without RecBCD treatment; by contrast, RecBCD reduced control linear DNA (GAPDH) to around 7% of untreated DNA levels (Extended Data Fig. 2d).

To better determine the fraction of patients with BCP-ALL carrying ESCs, we next capitalized on available whole-genome sequencing (WGS) data from patients with ETV6–RUNX1 BCP-ALL and re-analysed them for the presence of ESCs. DNA is prepared for WGS by shearing into fragments that average 500 bp, and approximately 35–50 bp at each end is sequenced. If the sequenced fragment includes an ESC SJ, the corresponding paired-end reads are flagged as ‘discordant’ when mapped to the genome, because the reads map much further away from each other than expected and appear to point towards each other, rather than away from each other. To detect ESCs, we therefore aligned WGS data from 61 patients with ETV6–RUNX1 BCP-ALL (European Genome-phenome Archive: EGAD00001000116) to the human genome and filtered for reads that fit the above criteria. We detected ESCs in 51 out of 61 patients, with between 1 and 27 different ESCs per patient (Extended Data Fig. 2e and Supplementary Table 1). Owing to low sequencing depth and the fact that an ESC will be identified only if a split read is positioned exactly across the SJ, these data cannot quantify ESCs. Nevertheless, they fully support the presence of ESCs in the majority of patients with ETV6–RUNX1 BCP-ALL and demonstrate that multiple ESCs can be present in each individual.

The coexistence of ESCs and RAGs in BCP-ALL could lead to increased RAG–ESC-mediated mutations and disease progression. However, the potential effect of the increased RAG–ESC activity depends on the quantity of ESCs and the timeframe over which ESCs are present. To investigate this, we established assays to detect all IGK and IGL recombination events and all IGK and IGL SJs. These assays, linear amplification-mediated (LAM)-recombination and LAM-ESC, are derived from LAM-high-throughput genome-wide translocation sequencing (LAM-HTGTS)32 and involve linear amplification with biotinylated primers against J regions across VJ coding junctions or ESC SJs (Extended Data Fig. 3a,b). The biotinylated products are then selected, adapters are ligated and the coding junctions or SJs are amplified by PCR. Following amplicon sequencing of the products, the resulting sequences are mapped to bespoke reference databases that include all possible IGK/IGL recombination events or IGK/IGL SJs33 (https://github.com/Boyes-Lab/LAM-ESC-Recombination) but exclude intra-KV region recombination events34 and their SJs.

Using the LAM-recombination and LAM-ESC assays, we analysed 71 samples, taken at diagnosis, from patients with BCP-ALL, 34 of whom subsequently relapsed. In addition to the primary recombination event, many secondary recombination events were present, indicating continued RAG activity29 (Supplementary Table 2). Complementary LAM-ESC data show multiple ESCs per patient and that ESC copy numbers varied widely (Fig. 2a and Supplementary Table 3). Remarkably, comparison of normalized LAM-ESC sequencing reads between patients who did and who did not subsequently relapse shows that significantly more ESCs were present at increased levels in patients who later relapsed (Fig. 2a,b; P = 0.00017). By contrast, in most patients who did not relapse, ESC copy numbers were close to those in healthy blood or bone marrow (Fig. 2a). Given that absolute ESC copy numbers, determined by ddPCR, correlated well with the number of normalized sequencing reads (Extended Data Fig. 4a) and that ddPCR experiments independently confirmed a similar fold increase of SJ levels in patients who later relapsed (Extended Data Fig. 4b), these data imply that there was increased ESC replication and/or persistence in patients who subsequently relapsed. No correlation was observed between RAG1 or RAG2 expression and ESC copy numbers (Extended Data Fig. 4c) and even in BCP-ALL samples in which RAG1 expression was very similar, increased ESC copy numbers were observed in patients who later relapsed but not in those who did not (Fig. 2c; P = 0.0156). Differences in tumour infiltration also do not explain altered ESC levels, as leukaemic blast levels were similar (greater than 90%) in both patient groups (Supplementary Table 4).

Fig. 2: High SJ copy numbers at diagnosis correlate with subsequent relapse.
figure 2

a, Total IGK/IGL SJ levels were determined in samples from patients with BCP-ALL (from VIVO Biobank) at diagnosis using LAM-ESC. Normalized copy number of each SJ at diagnosis for patients who later relapsed (n = 34) and those who remained in remission (non-relapse; n = 37) are plotted. Normalized reads were calculated by dividing the reads obtained per SJ by the total LAM-ESC reads in that experiment. A threshold (horizontal line) was set at the highest normalized ESC level detected in healthy blood (HB, n = 2). SJ levels in a bone marrow sample taken at remission (REM) are shown for comparison. Only SJs resulting from deletional recombination events are shown. b, A significantly higher number of distinct SJs is present at levels above the threshold in patients who subsequently relapse compared with those who remain in remission. P value was determined by a two-tailed Fisher’s exact test. n = 34 (relapse) and 37 (non-relapse). c, Left, expression of RAG1 and RAG2 mRNA at diagnosis in patients who later relapsed (n = 5) and those who did not (n = 6); comparable RAG1 expression levels were observed in both groups. Right, normalized SJ copy numbers. The P value was determined by a two-tailed Fisher’s exact test.

Source Data

High ESC replication coupled to relapse

The increased ESC copy numbers in patients who later relapse implies that ESCs have replicated. Consistent with this, we identified seven SJs that were more abundant than their corresponding recombination junction using ddPCR (Extended Data Fig. 4d). Similarly, comparison of numbers of SJs and recombination junctions from KDE-KV2-30 and KDE-KV3-20 rearrangements across 48 samples shows that SJ numbers were significantly higher in patients who later relapsed (KDE-KV2-30: P = 0.0461, KDE-KV3-20: P = 9.25 × 10−5). By contrast, a much smaller increase (KDE-KV2-30: P = 0.0756, KDE-KV3-20: P = 0.0173) was observed in the corresponding recombination junctions (Extended Data Fig. 4e,f).

To investigate the extent of ESC replication, we capitalized on the fact that when each ESC is generated, a corresponding recombination ‘footprint’ is formed in the genome. If cells undergo multiple divisions following generation of an ESC, there will be multiple copies of the corresponding recombination junction. By contrast, if an ESC was generated recently, fewer copies of its recombination junction will be present (Fig. 3a). To measure ESC replication when the influence of ESC persistence is minimal, we focussed on recently generated ESCs. To this end, we examined ESCs corresponding to the lower limits of detectable LAM-recombination reads (≤0.2 normalized reads); as shown in Fig. 3b, the ratio of these SJs to the corresponding recombination junction was significantly higher in patients who later relapsed compared with those who did not (P = 0.004), implying increased ESC replication in patients prone to relapse. Notably, of the recently generated ESCs, 11 were identical in both patient groups (Extended Data Fig. 5a). After verifying that these ESCs were indeed recently generated by measuring the corresponding recombination junction by ddPCR, we quantified the SJ levels. Recently generated SJs were present at significantly higher levels in patients who later relapsed (Fig. 3c; P = 0.001). Given that the exact same SJ sequence was examined in the two patient groups, this increased replication cannot be due to the ESC sequence. Instead, it is likely that cell-intrinsic factors that are present at diagnosis trigger increased ESC replication in patients who later relapse.

Fig. 3: Increased ESC replication in patients with BCP-ALL who later relapse.
figure 3

a, Schematic highlighting that multiple copies of a recombination junction (white and pink) are present if recombination took place many cell divisions ago. b, SJs corresponding to very recent recombination events (≤0.2 normalized LAM-recombination reads) were identified. Normalized SJ reads were divided by corresponding normalized recombination reads. P value for the increase in patients who later relapsed determined by two-tailed Mann–Whitney U test; n = 74 SJs from 12 patients (relapse) and 51 SJs from 8 patients (non-relapse). c, Eleven SJs corresponding to ≤0.2 normalized recombination reads are in both patient groups. SJ levels were measured by ddPCR for patients who later relapsed (red) or remained in remission (blue). Data are mean observed/predicted SJ values, assuming a twofold SJ dilution at each cell division (Methods). Black dots indicate values for individual patients. P values determined using a two-tailed Wilcoxon signed-rank test. n = 11 pairs. SJ levels only shown for HeLa controls (orange dots; n = 2). d, Enrichment plot for DNA repair genes from GSEA of RNA-seq data from 123 patients at diagnosis, 74 of whom relapsed. Significance determined as described53,54 with corrections for multiple comparisons. e, RT–qPCR analysis of expression of genes associated with DNA replication and repair in patients who later relapse (for patients with high (H-) or low (L-) SJ levels) or remain in remission (non-relapse). n = 11 for each group except L-relapse: n = 8 (PCNA) and 10 (POLE3); and non-relapse: n = 12 (RBX1). Data are normalized to HPRT expression. P values determined using an unpaired, two-tailed Student’s t-test. The 95% confidence interval is presented in Methods. Data are mean ± s.d. f, GSEA of RNA-seq data (EGAS00001006863) from patients with known SJ levels. Enrichment plot for DNA repair genes for patients with high versus low SJ levels (n = 4 and n = 3, respectively). P values determined as in d.

Source Data

To investigate what these cell-intrinsic factor(s) might be, we capitalized on available RNA sequencing (RNA-seq) data (https://www.cancer.gov/ccg/research/genome-sequencing/target, dbGaP sub-study ID: phs000464) obtained at diagnosis from 123 patients with BCP-ALL, 74 of whom later relapsed. Using DESeq2, followed by gene set enrichment analysis (GSEA), we found significantly increased expression of DNA repair genes (P = 0.0095; Fig. 3d) in patients who were prone to relapse. Published single-cell RNA-seq data from neuroblastoma cell lines similarly showed increased expression of the replication-associated genes PCNA, POLE3 and RPA2 corresponding to high ecDNA levels35. It is therefore plausible that replication and repair-associated gene products enhance ESC replication; consistent with this, significantly increased expression of PCNA, RBX1, POLE3 and POLE4 was observed in patients with high levels of SJs who later relapsed (Fig. 3e), but not in those with low levels of SJs, regardless of whether they relapsed (Fig. 3e). Furthermore, analysis of RNA-seq data (EGAS00001006863) from seven patients with known SJ levels (Fig. 2a) using DESeq2 and GSEA showed that expression of PCNA, RBX1, POLE3 and POLE4 was significantly increased in patients with high SJ levels compared with those with low SJ levels (Extended Data Fig. 5b) and that there was a significant increase in expression of DNA repair genes in patients with high SJ levels (Fig. 3f). These data therefore link higher expression of replication and repair-associated genes with increased ESC copy numbers, although a causal relationship with ESC replication has yet to be established. Nonetheless, previously identified replication initiation sites36 are present within IGK and IGL ESC sequences (Extended Data Fig. 6), suggesting that ESCs may replicate via eukaryotic replication origins.

ESCs persist through many cell divisions

The effects of an ESC are dependent not only on how many copies there are, but also on the timeframe during which it co-exists in cells with RAG proteins. Therefore, we investigated the extent of ESC persistence by measuring how many ESCs correspond to recombination events that took place many cell divisions ago—that is, where the corresponding recombination junction was present at a high copy number (Fig. 3a). These were categorized as ‘major’ recombination events and were distinguished from other events by the maximal change in the gradient on plots of the distribution of sequencing reads37 (Extended Data Fig. 7a). Although we detected many ESCs corresponding to such major recombination events, implying that ESCs persist, there was no significant difference in the percentage of major recombination events with a corresponding SJ (or SJs) between patients who later relapsed and those who did not (relapse: 25.85%, non-relapse: 28.3%). However, SJs that corresponded to major recombination events were present with higher copy numbers in patients who later relapsed (Fig. 4a; P = 0.011). Moreover, when only SJs with increased copy numbers (above the threshold in Fig. 2a) were considered, almost half corresponded to major recombination events (Fig. 4b, left). This is consistent with persistence of high-copy ESCs through multiple cell divisions, increasing the risk that they will trigger mutations.

Fig. 4: ESCs persist at higher copies in patients who subsequently relapse.
figure 4

a, Major recombination signifies multiple cell divisions since the recombination event (Fig. 3a and Extended Data Fig. 7a). Normalized sequencing reads for SJs corresponding to major recombination events are plotted for each patient group. P value for the increase in patients who later relapse versus those who do not determined by a two-tailed Mann–Whitney U test; n = 40 from 23 patients (relapse) and n = 25 from 20 patients (non-relapse). b, Pie chart showing percentage of SJs with normalized sequencing reads above or below the threshold in healthy blood (Fig. 2a) that correspond to major recombination events versus other recombination events. P value for the difference determined by a two-tailed Fisher’s exact test. n = 23 high-copy SJs from 7 patients and 124 low-copy SJs from 12 patients. Recomb, recombination events. c, Representative DAPI-stained metaphase chromosome spreads from BCP-ALL samples with ESC levels similar to patients who remain in remission (left; n = 29 images) or who later relapse (right; n = 26 images). Red arrows indicate non-chromosomal DNAs. Images deconvoluted using CellSens Dimension software. d, ESCs detected by interphase FISH in patients with ESC levels similar to those who relapse (patient P, n = 108 images) or remain in remission (patient M, n = 101 images; patient N, n = 127 images). e, Schematic showing ESC and control FISH probes to IGK and IGL loci. Blue and grey triangles represent 12- or 23-RSSs. f, Representative interphase FISH image. Control (ctrl) probes against IGK and IGL are in green; the regions excised to generate ESCs are in red (IGL) or magenta (IGK). n = 99. g, IGK (magenta) and IGL (red) ESCs in the same cell. n = 10. h, Multiple ESCs detected per cell using IGK control probe (green) and KDE probe (magenta). n = 8. All images obtained at 60× magnification.

Source Data

In control experiments, we considered the possibility that rather than persisting, identical ESCs could have been generated by more recent, secondary recombination events38,39. However, by capitalizing on the presence of ESCs that correspond to the primary recombination event (that is, those with the most sequencing reads) in three patients and analysing the clonotypes of the respective recombination junctions, our data indicate that it is very likely that the ESCs persisted from the primary recombination event in at least two cases (Extended Data Fig. 7b–g).

ESC distribution in BCP-ALL

BCP-ALL progression is also likely to be influenced by the number of ESCs per cell and their intratumoral heterogeneity. To explore this possibility, we performed DNA fluorescence in situ hybridization (FISH) using probes that detect all ESCs from IGL and IGK loci. Actively cycling cells were blocked in mitosis and the presence of non-chromosomal DNAs was confirmed by DAPI staining (Fig. 4c). Since ecDNAs were not detectable in primary BCP-ALL samples by AmpliconArchitect40 in this and a separate study41 (20 and 44 patients41, respectively), these non-chromosomal DNAs are likely to be ESCs. To further test this and more accurately determine ESC distribution, we screened interphase nuclei for ESC FISH signals that were distinct from control (chromosomal) signals. For a patient with ESC levels similar to those in patients who later relapsed (patient P; Extended Data Fig. 8a), ESCs were observed in around 50% of cells, with a noticeable clustering of 3–7 ESCs in some cells (Fig. 4d–f and Extended Data Fig. 8b). By contrast, for patients with ESC levels similar to those in patients who remained in remission (patients M and N; Extended Data Fig. 8a), ESCs were detected in fewer cells, with only one or occasionally two ESCs per cell (Fig. 4d). Notably, these interphase FISH signals showed high correspondence with the relative levels of DAPI-stained non-chromosomal DNAs (Extended Data Fig. 8c), and, when present at increased levels, ESCs exhibited remarkable intratumoral heterogeneity (Fig. 4d). We also detected the presence of IGK and IGL ESCs in the same cell (Fig. 4g). Given that simultaneous recombination of both loci is highly unlikely, the coexistence of three IGK ESCs and one IGL ESC supports the idea that ESCs replicate and persist.

Similarly, seven ESCs resulting from recombination to KDE were found within a single cell (Fig. 4h). There are only two KDE RSSs; therefore, the maximum number of ESCs that could be generated by recombination to KDE is two. Thus, the presence of seven ESCs in one cell was a strong indication of ESC replication. To verify this, we prepared metaphase chromosome spreads from BCP-ALL cells that were cultured in the presence of bromodeoxyuridine (BrdU) and then blocked in mitosis. Following hybridization with BrdU antibodies, we observed signals coinciding with DAPI-stained non-chromosomal DNAs, confirming ESC replication (Extended Data Fig. 8d).

ESC copy number and BCP-ALL progression

The presence of high ESC copy numbers over multiple generations increases the cumulative risk of DNA damage and may lead to cells with high ESC copy numbers undergoing clonal expansion. Consistent with this, SJs with high copy numbers were associated with a larger fraction of cells that had undergone many divisions (major recombination events; Fig. 4b, left) than those with low copy numbers (below threshold) (Fig. 4b, right). Next, we aimed to determine how ESC copy number at diagnosis influenced BCP-ALL progression. In other cancers, ecDNAs lead to a worse prognosis by increasing oncogene copy numbers1,42, acting as mobile enhancers43, or by ecDNA integration into a tumour suppressor gene44. There are no known oncogenes on ESCs from the IGKIGL or TCRA–TCRG loci and no known enhancers in the excised V-J regions of human loci. Similarly, although recombination to KDE generates ESCs that incorporate two strong enhancers—iEκ and 3′Eκ45,46—and such ESCs are significantly enriched and found in patients who later relapse (Extended Data Fig. 8e–g), patients who lack KV-KDE ESCs nonetheless relapse (Supplementary Table 3). Therefore, although ESCs may promote malignancy by acting as mobile enhancers, ESCs are likely to influence disease progression by other mechanisms.

ESCs cause increased mutations when complexed with RAG proteins, triggering either ESC reintegration at cRSSs or the cut-and-run reaction that generates DSBs at cRSSs. Both reactions produce SVs that have a cRSS on only one side of the breakpoint14,18. RAGs also trigger genome alterations that are independent of ESCs via off-target recombination between two cRSSs, leading to insertion–deletion mutations or chromosome translocations with cRSSs on both sides of the breakpoint19,47. To test whether the RAG–ESC complex contributes significantly to the mutations found in BCP-ALL, we examined the SVs in WGS data of patients with BCP-ALL using Therapeutically Applicable Research to Generate Effective Treatments (TARGET) data (https://www.cancer.gov/ccg/research/genome-sequencing/target; dbGaP sub-study ID: phs000464). Using 150 patient sequences obtained at diagnosis across all BCP-ALL subtypes, we first calculated the fraction of SVs with one or two cRSSs at the breakpoint. Consistent with previous data19, cRSSs were present at nearly 40% of breakpoints (38.4%); of these, nearly 62% had a single cRSS and 36.4% had two cRSSs (Extended Data Fig. 9a). To determine the probable cause of the breaks at single cRSSs, we used an in-house script48 (https://github.com/Boyes-Lab/NGS-Analysis), which showed that cut-and-run18 occurs more than 60-fold more frequently than either reintegration14,15,16 or RAG-mediated insertions49 (Extended Data Fig. 9a).

Next, we compared the numbers of SVs in patients, and found a significant increase in SVs with cRSSs on only one side of the breakpoint in patients who subsequently relapsed compared with those who did not (Fig. 5a, left). Moreover, consistent with the above analysis, the frequency of SVs with a single cRSS at the breakpoint (Fig. 5a, left) was much higher than those with cRSSs on both sides of the breakpoint (Fig. 5a, right; mean relapse: 27.19 versus 8.27, mean non-relapse: 23.48 versus 4.635), suggesting a greater role for RAG–ESC-mediated mutations.

Fig. 5: Association between increased ESC-mediated mutations and relapse.
figure 5

a, SVs per patient at diagnosis with a single cRSS (left) or two cRSSs (right) at the breakpoints plotted for patients who later relapsed (121 patients) versus those who remained in remission (29 patients). P value for the difference determined by a two-tailed Mann–Whitney U test. Single cRSS: P = 0.016; two cRSSs: P = 0.0006. SVs are from the TARGET study. Breakpoint junctions within antigen receptor loci are omitted. b, Right, pie chart showing all SVs at relapse with one cRSS at the breakpoint, two cRSSs or no cRSS for 83 matched patient samples; SVs at diagnosis were subtracted from those at relapse. Left, bar chart showing SVs at genes commonly mutated at relapse. P value for increased SVs involving single cRSSs at relapse-associated genes compared with all SVs involving a single cRSS calculated using a one-sided hypergeometric distribution; P = 0.0001. c, Frequency of breaks in 10 genes that most often acquire somatic mutations in relapsed BCP-ALL51 in the presence of the ESC (SJ) versus 12-RSS and 23-RSS controls (RSS) in a LAM-HTGTS experiment18. P value for co-localization of breaks detected by LAM-HTGTS and frequently mutated genes in relapsed ALL versus all genes mutated in ALL calculated using a one-sided hypergeometric distribution; P = 0.00077. n = 4. d, ddPCR of selected recombination junctions at diagnosis and relapse. Cell numbers normalized using GAPDH. Bars show mean values. n = 10 recombination junctions from four patients. e, ddPCR analysis of selected SJs at diagnosis and relapse for the indicated patients. Black dots show technical repeats. Bars show mean values. n = 10 SJs from 4 patients. f, Copy numbers of individual SJs at diagnosis for patients with ETV6–RUNX1 BCP-ALL (n = 16) who later relapsed or remained in remission. The horizontal line shows the highest normalized SJ copy number detected in healthy blood.

Source Data

Increased ESCs in patients at diagnosis imply that the RAG–ESC complex continues to cause damage between diagnosis and relapse. We therefore compared WGS data from 83 matched samples taken at diagnosis and relapse. By focusing on SVs near genes that are frequently mutated at relapse50,51, we observed significantly more SVs with cRSSs on one side of the breakpoint that are specifically present at relapse compared with other SVs (Fig. 5b). Likewise, re-analysis of available LAM-HTGTS data18 (NCBI SRA: PRJNA483469) showed a significant increase in targeting of relapse-associated genes by the RAG–ESC complex compared with the RAG–RSS complex in cells derived from a patient with relapsed ETV6–RUNX1 BCP-ALL (Fig. 5c). Together, these data imply that ongoing activity of the RAG–ESC complex triggers mutations at genes that are associated with relapse.

To test whether there is a direct link between increased ESC levels and SVs at single cRSSs, we analysed WGS (EGAS00001006863) and whole-exome sequencing (WES) data of patients with BCP-ALL for whom we had measured SJ and RAG gene expression levels (Fig. 2a and Extended Data Fig. 1d). Consistent with the idea that the RAG–ESC complex triggers relapse-associated mutations, we observed a significant increase in SVs with a single cRSS at the breakpoint at relapse-associated genes in patients with high SJs plus high RAG1 expression compared with those with high RAG1 expression plus low SJs or low RAG1 expression plus low SJs (Extended Data Fig. 9b–d). Moreover, in a patient with high SJs, we observed clonal expansion of an SV at a single cRSS in the spleen tyrosine kinase (SYK) gene between diagnosis and relapse (Extended Data Fig. 9e).

If reactions involving the ESC truly underpin the mutations that lead to relapse, the presence of sufficient copies of ESCs at or before diagnosis would be expected to lead to expansion of the ESC-harbouring cells by relapse. To test this, we capitalized on the ability to trace cells in which ESCs have been generated by virtue of their corresponding recombination junction. We therefore performed LAM-recombination on samples taken at relapse, in cases where patients have high ESC copies at diagnosis. The frequencies of some recombination junctions stayed the same or even decreased between diagnosis and relapse, possibly owing to treatment-mediated loss of the corresponding cells (Supplementary Table 2). However, others showed marked increases in normalized sequencing reads (Extended Data Fig. 10a), an observation that was confirmed for ten cases by ddPCR (Fig. 5d).

We next tested whether clonal expansion, measured by the increase in recombination junctions, was linked to the presence of a matching ESC at diagnosis. Remarkably, a corresponding ESC was detected in every case of clonal expansion, but in only 58% of cases where no expansion of recombination junctions was observed, and in these latter cases, ESC copy numbers were often lower (Extended Data Fig. 10b). This correlation is notable and consistent with the idea that ESC-mediated mutagenesis promotes disease progression to relapse.

A further prediction is that if ESCs indeed replicate and persist, the same ESCs should be detectable at both diagnosis and relapse. Remarkably, we found that all ESCs persisted (Fig. 5e), in one case for more than seven years and in two cases for more than four years. Quantification of SJs at diagnosis and at relapse showed that some increased, whereas others decreased slightly, supporting the idea that the SJ is on an extrachromosomal circle rather than integrated into the genome.

Finally, if ESC activity causes disease progression, high SJ copy numbers at diagnosis should predict disease outcome. We therefore re-analysed the data showing SJ levels at diagnosis (Fig. 2) according to BCP-ALL subtype (Supplementary Table 4). Although genomic integration of SJs in some DUX4-r samples (Extended Data Fig. 10c,d) precluded such analyses for this subtype, there was a good correlation between the SJ copy number above the threshold and subsequent relapse for subtypes that usually have a good prognosis (ETV6–RUNX1 and high hyperploid (HeH) BCP-ALL) or an intermediate prognosis (TCF3::PBX BCP-ALL) (Fig. 5f and Extended Data Fig. 10e). Moreover, in BCP-ALLs where PAX5 was altered or mutated, SJ levels were higher overall, but were noticeably different between patients who relapsed and those who did not, suggesting that a higher threshold may better identify patients who are at risk of relapse (Extended Data Fig. 10e). Our data therefore imply that the presence of ESCs at diagnosis above subtype-specific threshold levels is frequently associated with subsequent relapse. This strongly suggests a central role for ESCs in disease progression.

Discussion

ESCs were long believed to be inert and diluted during cell division. However, it is now clear that ESCs have biological activity and replicate and persist in healthy lymphocytes and in BCP-ALL. In BCP-ALL, persistence of increased copy numbers of ESCs is strongly linked with worse disease outcomes, thereby resembling ecDNAs. ESCs and ecDNAs are both extrachromosomal circles; IGK/IGL ESCs are a similar size to ecDNAs; and ESCs and ecDNAs replicate and persist through multiple cell divisions and both confer a growth advantage to cancer cells when present at elevated levels5,6. However, there are key differences. Whereas ecDNAs often confer a growth advantage by increasing oncogene copy numbers10, ESCs trigger mutations, including at cancer driver genes and relapse-associated genes, via the cut-and-run reaction18. These mutations accumulate through time and are inherited by all daughter cells, regardless of whether those cells inherit the ESC. Similarly, once the RAG–ESC complex has triggered sufficient mutations in key genes, the continued presence of the ESC may not be required. By contrast, continued oncogene amplification via ecDNAs is required for a growth advantage. This may explain why ecDNA levels gradually increase as cancers progress12, whereas ESC levels are lower overall and may increase or decrease between diagnosis and relapse. Moreover, the stochastic nature by which the recombinase, and thus the RAG–ESC complex, finds its targets means that different mutations will be triggered in each cell. This may explain why some cells with high ESC copy numbers expand between diagnosis and relapse whereas others, possibly those in which fewer cancer driver genes have been mutated, either respond better to treatment or become diluted. Although high ESC copy numbers correlate with subsequent relapse in around 50% of cases, some patients with low ESC levels at diagnosis nonetheless go on to relapse. In these cases, other mutations, such as deletion of IKZF152 or initiating mutations that alter signalling to dysregulate multiple pathways50, may predispose the patient to relapse. Nonetheless, the good correlation between high ESC levels and poor prognosis, particularly for ETV6–RUNX1 and HeH BCP-ALL, may be of clinical utility. Collectively, our data demonstrate that ESCs are replicated and inherited and form a complex with RAG proteins that can lead to relapse-associated mutations and clonal expansion (Extended Data Fig. 11).

Methods

Purification of mouse B cells

Non-transgenic CBA/C57BL/6 J mice were obtained from the University of Leeds animal facility, which is a full barrier facility, with a light/dark cycle of 12 h on/12 h off, an ambient temperature of 21 °C (range 20–22 °C) and 45–65% humidity. No more than six animals are housed per cage and all mice are free of common pathogens, including murine norovirus, Pasteurella and Helicobacter. Animal procedures were performed under Home Office licence P3ED6C7F8, following review by the University of Leeds ethics committee. Mouse femurs and spleens were collected from 5- to 7-week-old mice, using 12 mice in total to minimize animal numbers used, according to 3Rs principles. Roughly equal numbers of male and female mice were used; randomization and blinding were not necessary, as all mice were wild type. Bone marrow cells were flushed from femurs with PBS whereas splenocytes were prepared by flushing cells from finely diced pieces of spleen with PBS through a 50-µm cell strainer. Following preparation of single-cell suspensions in PBS, bone marrow cells and splenocytes were centrifuged at 600g for 3 min and resuspended in 10 ml of 0.168 M NH4Cl to lyse erythrocytes. After 10 min, cells were washed with 40 ml PBS and resuspended in 1 ml staining buffer (2% FCS, 1 mM EDTA, 25 mM HEPES-KOH pH 7.9 in PBS).

Cells were stained with the appropriate antibodies prior to purification by flow cytometry. For bone marrow pre-B cells, cell suspensions were stained with 6 μl each (6:1,000 dilution) of FITC anti-CD19 (BD Pharmingen, 553785) and PE anti-CD43 (BD Pharmingen, 553271). Bone marrow or spleen IgM+ cells were stained with 10 μl (1:100 dilution) FITC anti-IgM (BD Pharmingen, 553408) whereas spleen IgG+ cells were stained with 15 μl PE (3:200 dilution) anti-IgG (eBioscience, 12-4010-82). Following incubation at room temperature for 10 min, cells were washed with PBS and resuspended in 0.5 ml staining buffer prior to purification using a FACSMelody (BD) cell sorter running BDFACSChorus 3.0 software. CD19+/CD43 cells were isolated as the pre-B population whereas cells stained with anti-IgM+ or anti-IgG+ were isolated as their respective populations.

Patient samples

Patient samples, taken as part of routine diagnostics, were supplied by VIVO Biobank, HMDS or a hospital in the Czech Republic. Collection and use of patient samples were approved by the appropriate institutional review board (IRB). Each organization obtained informed patient consent for anonymized samples to be used by third parties for research. The use of surplus diagnostic material for research by HMDS and collaborators was approved by the Health Research Authority (HRA): 04_Q1205_125. Local ethics approval was obtained from the Biological Sciences Research Ethics Committee, University of Leeds: BIOSCI 18-031 & 2308 and CCR 2285, Royal Marsden Hospital NHS Foundation Trust.

Cell culture

hTERT-RPE-1 cells were from ATCC where they were authenticated by morphology, STR profiling and karyotyping and verified mycoplasma-free. NIH3T3 cells were from the laboratory of C. Bonifer, whereas HeLa cells were from the laboratory of T. Enver. Both cell types were authenticated using species-specific PCR primers. They were verified free from mycoplasma using MycoAlert Mycoplasma Detection Kit (LT07-318). HeLa cell contamination has caused misidentification of other cell lines. However, HeLa cells were used here only to prepare human DNA and we verified that the DNA was human using human-specific PCR primers. hTERT-RPE-1, NIH3T3 and HeLa cells were maintained in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal calf serum, 4 mM l-glutamine, 50 U ml−1 penicillin and 50 μg ml−1 streptomycin in a humidified incubator at 37 °C with 5% CO2.

Preparation of genomic DNA

Genomic DNA from mouse B cells, NIH3T3 cells, HeLa cells and patient samples was prepared by gently resuspending 1 × 106 to 5 × 106 cells in a 200 µl digestion buffer (200 mM NaCl, 10 mM Tris-HCl pH 7.5, 2 mM EDTA, 0.2% SDS). Proteinase K was added to a final concentration of 0.4 mg ml−1, followed by incubation at 56 °C overnight, with rotation. The next day, an equal volume of isopropanol was added and the sample mixed thoroughly but gently by inversion to precipitate the DNA. The DNA pellet was recovered by centrifugation at 20,000g for 5 min at room temperature and washed twice with 70% ethanol. DNA was then resuspended in 100 µl TE and incubated at 56 °C for at least 3 h to ensure complete resuspension; the concentration was measured using a DeNovix DS-11 spectrophotometer.

High-molecular-mass genomic DNA was prepared from fresh BCP-ALL bone marrow aspirates (that were surplus to diagnostic needs and that had been maintained at 4 °C), using a Promega Wizard HMW DNA extraction kit according to the manufacturer’s instructions for whole blood. Samples were resuspended in TE at room temperature overnight prior to measuring the concentration as above.

Isolation of total RNA and reverse transcription

Two million mouse B cells, NIH3T3 cells, HeLa cells or BCP-ALL cells were resuspended in 500 µl of TRIzol (Invitrogen, 3289) and total RNA was isolated according to the manufacturer’s instructions. DNA contaminants were removed by treatment with 2 U DNase I (Thermo Scientific, EN0521) for 45 min at 37 °C in 100 µl of 1× DNase I buffer (10 mM Tris-HCl pH 7.5, 2.5 mM MgCl2, 0.1 mM CaCl2). Following phenol-chloroform extraction and ethanol precipitation, total RNA was resuspended in 20 µl of ddH2O and the concentration was determined using a DeNovix DS-11 spectrophotometer.

One microgram of total RNA was reverse transcribed with M-MuLV reverse transcriptase (Invitrogen, 28025-013). In brief, 1 µg of RNA was added to 2.5 µM oligo dT primer, 500 µM dNTPs and ddH2O to give a total volume of 12 µl. This was incubated at 65 °C for 5 min and immediately placed on ice before addition of 4 µl first strand buffer (Invitrogen), 10 mM DTT and 1 µl RNase inhibitor (PCRBIO, PB30.23-02). The reaction was incubated at 37 °C for 2 min, followed by addition of 1 µl M-MuLV and incubation at 37 °C for 50 min prior to heat inactivation at 70 °C for 15 min.

Exonuclease V treatment of genomic DNA

Linear DNA was removed from genomic DNA using Exonuclease V (RecBCD, NEB M0345S). Reactions comprised 1 µg genomic DNA, 1× NEBuffer 4, 1 mM ATP and 10 U RecBCD in 100 µl total volume. Negative control reactions were identical except RecBCD was omitted. Following incubation at 37 °C for 1 h (mouse DNA) or 3 h (BCP-ALL DNA), EDTA was added to a final concentration of 11 mM and reactions were heat inactivated at 70 °C for 30 min. DNA was then ethanol precipitated and resuspended in 50 µl ddH2O; 5 µl (100 ng) was used directly for PCR.

Quantitative PCR

qPCR was performed using a Rotor-Gene 6000 cycler (Corbett) and analysed using the Corbett Rotor-Gene 6000 Series Software (v.1.7, build 87). All reactions were carried out in a final volume of 10 µl, containing 1× qPCRBIO SyGreen Mix (PCRBIO, PB20.14), 400 nM each primer and 100 ng genomic DNA, 1–5 ng cDNA or 1 µl first round PCR product (for nested PCR). All reactions were performed in duplicate. In each case, a standard curve of the amplicon was analysed concurrently to evaluate the amplification efficiency and to calculate the relative amount of amplicon in unknown samples. R2 values were 1 ± 0.1. A typical cycle consisted of: 95 °C for 3 min, followed by 40 cycles of 95 °C for 5 s, Tm for 10 s and extension at 72 °C for 10 s, where Tm = melting temperature of the primers (Supplementary Table 5). A melt curve, to determine amplicon purity, was produced by analysis of fluorescence as the temperature was increased from 72 °C to 95 °C. Amplicons were 100–200 bp.

Standard curves for absolute quantification were generated by 35 cycles of conventional PCR and purification of the desired product via a 1.2% agarose gel and a QIAquick gel extraction kit, (QIAGEN, 28704). Following measurement of the concentration via absorbance at 260 nm (using a DeNovix DS-11 spectrophotometer), DNA was diluted to 1–10 ng/μl before more accurate concentration determination using a QuantiFluor dsDNA kit (Promega, E2670) and a FLUOstar OPTIMA plate reader (BMG Labtech). An appropriate range of each standard was used in qPCR, ensuring that all unknown samples were within the standard curve.

Primers and melting temperatures are shown in Supplementary Table 5 for quantitative analysis of recombination, SJs and Rag1 expression in mouse bone marrow and spleen, and for quantitative (qPCR) analysis of recombination, SJ levels, RAG1 and RAG2 expression as well as PCNA, POLE3, POLE4 and RBX1 expression in BCP-ALL patient samples. BLAST was used to check primer specificity. HPRT was used as a reference gene for expression studies, using primers that span an intron. Genomic GAPDH sequences were used to normalize for cell numbers. These housekeeping genes were chosen for their widespread expression (HPRT) and low likelihood of mutation.

Detection of recombination in BCP-ALL patient samples

PCR was performed using Taq DNA polymerase (NEB, M0267) in reactions comprising 1× ThermoPol buffer, 200 µM dNTPs, 0.5 µM each primer, 1.25 U Taq DNA polymerase and 100 ng genomic DNA template in a final volume of 50 µl. Primers for recombination at the immunoglobulin kappa and lambda loci were as described by the BIOMED-2 consortium31. Cycling conditions involved initial denaturation at 95 °C for 5 min, followed by 95 °C for 30 s, 60 °C for 30 s and 68 °C for 30 s for 35 cycles, followed by a final extension of 5 min at 68 °C. PCR products were separated by gel electrophoresis; products of the expected sizes31 were excised and cloned prior to Sanger sequencing.

Nested PCR of recombination junctions and SJs in mouse and patient samples

To achieve sufficient specificity and sensitivity, nested PCR was performed using Taq DNA polymerase (NEB, M0267). First round reactions consisted of 1× ThermoPol buffer, 200 µM dNTPs, 0.5 µM each primer, 1.25 U Taq DNA polymerase and 20–100 ng genomic DNA template in a final volume of 50 µl. To detect mouse recombination and SJs, thermocycling conditions involved denaturation at 94 °C for 3 min, followed by 18 cycles of 94 °C for 20 s, Tm for 20 s and 72 °C for 20 s. A 7 min of final extension was performed. One microlitre of 1:10 diluted first round PCR product was used as the template for a second round of qPCR using the primers shown in Supplementary Table 5.

As the BIOMED-2 primers to detect human SJs do not robustly identify specific J gene segments, nested PCRs were carried out with a mixture of first round primers that bind upstream of all five KJ RSSs or LJ1, LJ2 and LJ3 RSSs, which account for ~99% of IGL recombination events31, followed by second round PCRs specific for each individual J RSS. First round reactions were set up as described above with 50–100 ng template DNA, followed by denaturation at 95 °C for 5 min, and 25 cycles of 95 °C for 30 s, Tm for 30 s and 68 °C for 30 s, and a final extension of 5 min at 68 °C. Second round reactions were identical except 1 µl first round PCR product was used as template and the number of cycles was optimized for each amplicon, which was typically 36 cycles. Primers and melting temperatures are shown in Supplementary Table 5.

Sequencing of PCR products

PCR products were separated on a 1.2% agarose gel; DNA was purified from the gel using a QIAquick gel extraction kit (QIAGEN, 28704) and cloned into a T-tailed pBluescript II SK (+) vector (Stratagene, 212205) that had been digested with EcoRV-HF (NEB, R3195). Positive clones were sent for Sanger sequencing (Eurofins Genomics, LightRun Tube) using a M13 forward sequencing primer.

Sequencing traces (in .ab1 format) were aligned using SnapGene (v4; GSL Biotech). For recombination, sequences were aligned against the human immunoglobulin kappa (NCBI Gene ID: 50802) or lambda loci (NCBI Gene ID: 3535), as appropriate. For ESCs, the possible head-to-head SJ sequence was assembled from the appropriate genomic sequence and sequences were aligned to this. All alignments were verified by BLASTN (NCBI; accessed at https://blast.ncbi.nlm.nih.gov), where the search set was limited to the Homo sapiens (taxid: 9606) RefSeq Genome Database (refseq_genomes).

ddPCR

ddPCR reactions were conducted in a total volume of 20 µl with 1× ddPCR Supermix for probes (Bio-Rad 1863026), 900 nM of each primer, 250 nM probe and 63 ng template DNA. Droplets were generated in an 8-well droplet generation plate using a Bio-Rad QX100 droplet generator. Nanodroplets were carefully transferred to a 96-well plate, which was sealed with foil prior to thermocycling. The latter involved an initial denaturation at 95 °C for 10 min, followed by 40 cycles of 94 °C for 30 s and 60 °C for 1 min, followed by 98 °C for 10 min and 4 °C for 30 min to allow droplets to equilibrate. The presence of amplified products was determined using the Bio-Rad QX100 Droplet Reader and QuantaSoft v1.7.4 software. For positive droplet identification, a manual threshold (2000) was applied to 1D amplitude data to minimize the occurrence of false positives. Primers are shown in Supplementary Table 5, for absolute quantitative analysis of GAPDH, recombination junctions and SJs in BCP-ALL samples.

Targeted sequencing of recombination and SJs (LAM-ESC and LAM-recombination)

Targeted sequencing of light chain recombination and SJs was performed using modified versions of the LAM-HTGTS technique32 with bespoke analysis pipelines. Recombination junctions were detected using LAM-recombination, where bait primers (Supplementary Table 5) were designed against regions adjacent to J gene segments, allowing recombination to V gene segments to be determined. SJs were detected via LAM-ESC where bait primers (Supplementary Table 5) were designed against regions adjacent to J segment RSSs, allowing any sequence (for example, V RSSs) joined to J RSSs to be determined.

Libraries were generated as described32 with minor modifications. Specifically, 500 ng genomic DNA was used as template in 90 cycles of the initial Bio-PCR. For the final PCR step (Tagged PCR), primers were used to add sequencing adaptors (Amplicon-EZ-I7-blue and Amplicon-EZ-I5-nested; Supplementary Table 5). Following library generation, samples were sent for 2× 250 bp paired-end sequencing using the Amplicon-EZ service (Azenta).

To analyse the data, FASTQ files were initially demultiplexed using each J gene segment or J RSS nested primer, for recombination and SJ libraries, respectively. The paired-end reads were then combined into a single read, using the overlap at the 3′ end of read 1 and 5′ end of read 2. If reads could not be combined, read 1 only was analysed. A custom Python script was used to automate BLAST searches against a custom BLAST database, consisting of all V-J recombination events or all head-to-head RSS combinations from the immunoglobulin kappa and lambda loci, for recombination and SJ libraries, respectively https://github.com/Boyes-Lab/LAM-ESC-Recombination33.

Clonotype determination

Recombination junctions were amplified using LV3-1_REC_F, LV5-45_REC_F or LV2-11_REC_F with LJ2/3_REC_R (Supplementary Table 5). Following gel purification of the amplified products using the QIAquick gel extraction kit (QIAGEN, 28704), samples were sent for 2 ×250 bp paired-end sequencing using the Amplicon-EZ service (Azenta) that included addition of Amplicon-EZ-I7 and Amplicon-EZ-I5 sequencing adaptors. Paired-end reads in the .fastq format were combined by overlapping the 3′ end of read 1 with the 5′ end of read 2 (EGAD50000001518). The reads were compared to the reference motifs near the breakpoint junction of interest using the script Clonotype_analysis.py (https://github.com/Boyes-Lab/NGS-Analysis)48, which identified each unique clonotype and the frequency at which it occurred. The reference motifs consisted of 5 bp from each of the respective V and J motifs, derived from sequences that lie 20 bp from the V-J junction that would be formed in the absence of processing. Specifically, the reference motifs were: 5 bp of V gene reference sequence; omit 20 bp of V gene sequence to the boundary; omit 20 bp of J gene sequence; use 5 bp of J gene sequence as reference. If the amplified sequence contained both of the 5bp motifs, then the code identifies each unique sequence that intervenes. The identified sequences were then inputted into IgBlast55 which determined if the insert at the V-J junction was derived from elsewhere in the genome. The number of unique clonotypes was determined from these inserts: two sequences with the same V-J insert were classed as being the same clonotype.

The number of recombination copies attributable to each distinguishable minor clonotype was determined by calculating the absolute copy number of the recombination event by ddPCR and multiplying that by the percentage of each minor clonotype. Assuming each recombination is present on a single allele, the number of cells harbouring the recombination event was then estimated (that is, one recombination copy equates to one cell harbouring the recombination). The number of cell divisions (n) required to generate the recombination levels measured for each minor clonotype was subsequently calculated using the formula for cell population doublings: N2 = N1(2n) where N1 = number of cells at beginning (that is, 1 cell) and N2 = number of cells at end (that is, the estimated number of cells). From this, the minor clonotypes were calculated to be present at copies equivalent to 0.81–3.69 relative cell divisions (patients R-8 and R-13).

To investigate if the SJ levels observed via ddPCR (Extended Data Fig. 7c) could have resulted from replication of SJs from the minor clonotypes, the number of SJs predicted to remain if ESCs are diluted at each cell division were calculated using the formula for exponential decay: xt = x0/2t where xt = predicted SJ level, x0 = initial SJ level (that is, 1) and t = number of cell divisions (calculated above). Theoretically, the SJ measured by ddPCR could correspond to one or more of the minor clonotypes; we therefore took a conservative approach and summed the predicted SJ values for all the distinguishable clonotypes. We then divided the observed SJ value (Extended Data Fig. 7c) by the predicted SJ levels for the minor clonotypes. This observed/predicted value was compared to the observed/predicted values for SJs generated a similar number of relative cell divisions ago, using the values shown in Fig. 3c, where the SJs were estimated to result from 0.33–6.32 relative cell divisions. The SJ copies found in patient R-8 are substantially higher than those generated by replication of SJs from similarly recent recombination events (Extended Data Fig. 7d), implying that at least some of the SJs have persisted from the primary recombination event. Similar analyses cannot discount the possibility that replication of recently generated SJs generated the SJs observed in patients R-13 and R-9.

Calculation of observed/predicted ESC levels

The observed SJ/predicted SJ ratio provides a more accurate measure of ESC replication as it takes the extent of cell division into account. It ensures that ESCs from cells that have undergone marked differences in cell division do not artefactually show the same level of replication. Predicted SJ levels were calculated using the formula for exponential decay given above. For Fig. 3c, only recombination junctions that had undergone ≤6 cell relative divisions per ddPCR sample were examined.

Phi29 amplification

Freeze-thawing of cells causes DSBs56,57 and depletion of circular DNAs in patient samples. Phi29 was therefore used to amplify remaining circular DNAs in BCP-ALL samples via rolling circle replication, using the Illustra TempliPhi 500 Amplification kit (Cytiva 25640010) according to the manufacturer’s protocol. Specifically, 10 ng of DNA was diluted with 50 μl of sample buffer, incubated at 95 °C for 3 min and cooled to 4 °C. The reaction was then incubated in 1× Phi29 reaction buffer (50 mM Tris-HCl pH 7.5, 10 mM MgCl2, 10 mM (NH4)2SO4, 4 mM DTT) with 2 μl of enzyme mix at 30 °C for 18 h (human DNA) or 6 h (mouse DNA), followed by 65 °C for 10 min to inactivate the enzyme. DNA was precipitated with isopropanol, washed with 70% ethanol and resuspended in 25 μl ddH2O. Control experiments omitted the enzyme. Sample concentrations were determined via absorbance at 260 nm (using a DeNovix DS-11 spectrophotometer) and diluted to 7 ng/µl. SJs were quantified by ddPCR and the SJ/GAPDH ratio of treated versus untreated sample determined.

Fluorescence in situ hybridization

DNA-FISH was performed as described58. Fosmid clones targeting IGK (ABC10-44246300H4), IGK-JK-KDE (ABC8-2123240B1) and IGL (ABC10-44455600K21) ESCs as well as IGK and IGL control regions (ABC10-43608900D2 and ABC10-44444000A2, respectively) were gifts from E. Eichler. Each fosmid probe was directly labelled by nick translation as described59, except the amount of DNA labelled was reduced to 1 μg and both aminoallyl-dUTP and aminoallyl-dCTP were incorporated, followed by coupling to fluorescent dyes (Alexa Fluor 488/555/647, Invitrogen). Fosmid probes were purified using a QIAquick PCR purification kit and elution in 10 mM Tris-HCl pH 8.5. Patient bone marrow samples were cultured in StemSpan SFEMII medium (Stem Cell Technologies) supplemented with 20% fetal calf serum, 1% l-glutamine, 100 μg ml−1 Primocin (InvivoGen), 20 ng ml−1 IL-3 and 20 ng ml−1 IL-7 (Cell Guidance Systems) in a 37 °C humidified incubator, 5% CO2, prior to Colcemid treatment (0.2 µg ml−1, KaryoMAX, ThermoFisher) for 2 h. hTERT-RPE-1 cells were maintained in supplemented DMEM and treated with Colcemid as described above. Cells were then centrifuged and resuspended in prewarmed 75 mM KCl, followed by incubation at 37 °C for 20 min. After further centrifugation, cells were resuspended in Carnoy’s fixative (methanol: glacial acetic acid 3:1) and dropped onto humidified microscope slides. Slides were incubated in 2× SSC/RNase A 100 µg ml−1 at 37 °C for 1 h, followed by successive dehydration in 70%, 90%, and 100% ethanol. Slides were heated to 70 °C for 5 min on a hot plate, followed by DNA denaturation in preheated denaturant (2× SSC/70% formamide) at 70 °C for 30 min. Subsequently, slides were placed in ice-cold 70% ethanol, then 90% and 100% ethanol at room temperature before air-drying. For each slide, 100 ng of fosmid probe was combined with 6 µg human Cot-1 DNA (Invitrogen) and 5 µg single stranded DNA from salmon testes (Invitrogen), followed by ethanol precipitation. The DNA pellet was washed with 70% ethanol and resuspended in hybridization buffer (2× SSC, 50% deionized formamide, 10% dextran sulfate, 1% Tween-20). FISH probes were denatured at 92 °C for 5 min, pre-annealed at 37 °C for 15 min and then were immediately hybridized with DNA on slides overnight at 37 °C in a light-tight humidified chamber. Slides were washed in 2× SSC at 45 °C, 0.1× SSC at 60 °C, 1× PBS with 10 µg ml−1 DAPI at room temperature and finally mounted with SlowFade Gold antifade reagent (Invitrogen). Slides were imaged using an Olympus IX83 widefield fluorescence microscope with a 60× (60×/1.4 Oil, Plan Apo (oil)) objective and a Photometrics Prime BSI CMOS camera with a motorized xyz stage. Filter sets are DAPI (excitation 365/10 nm) emission 440/40 nm, GFP (excitation 482/24 nm) emission 530/40 nm, RFP (excitation 545/10 nm) emission 600/50 nm, Cy5/A647 (excitation 628/40 nm) emission 692/40 nm. Images were acquired using Olympus CellSens Dimension 3.2 (Build 23706) software and analysed using FIJI 2.16.0 software.

BrdU immunofluorescence

Bone marrow cells were resuspended in StemSpan SFEMII medium, supplemented as described above, and labelled with 10 µM BrdU (Merck B5002) for 28 or 48 h in a 37 °C humidified incubator, 5% CO2. Primary BCP-ALL cells60 have a doubling time of 26 to 240 h. Therefore, for cells incubated with BrdU for 28 h, incorporation should be limited to a single S phase for any cells in metaphase. Chromosome spreads were prepared as described for DNA-FISH. DNA was denatured by incubating the slides in 1 M HCl for 40 min at room temperature, followed by neutralization in 0.1 M Borate buffer pH 8.5 for 15 min at room temperature. Slides were then washed in 0.1% Triton X-100/PBS and blocked in 0.1% Triton X-100/PBS/5% goat serum (Merck G9023) for 1 h at room temperature. Following immunostaining with a 1:500 dilution of anti-BrdU antibody (BD Pharmingen, 555627) for 1 h at room temperature, cells were incubated with goat anti-mouse secondary antibody (Jackson ImmunoResearch, 115-001-003) at a 1:1,000 dilution for 1 h at room temperature, followed by incubation with Alexa Fluor Plus 488 labelled donkey anti-goat antibody (ThermoFisher, A32814), also at 1:1,000 dilution for 1 h at room temperature. Following counterstaining with DAPI (10 µg ml−1, Invitrogen), images were captured using an Olympus IX83 widefield fluorescent microscope and CellSens software and analysed using FIJI 2.16.0 software as above.

PCR amplification of the SYK gene

PCR was performed using Taq DNA polymerase (NEB, M0267) using the conditions described above for BIOMED-2 primers except the final volume was 25 µl. Primers are given in Supplementary Table 5. Cycling conditions involved initial denaturation at 95 °C for 3 min, followed by 95 °C for 20 s, 64 °C for 30 s and 68 °C for 30 s for 39 cycles, followed by a final extension of 5 min at 68 °C. PCR products were separated by gel electrophoresis.

Analysis of ETV6–RUNX1 BCP-ALL WGS data for ESCs

WGS datasets from patients with BCP-ALL (downloaded from the European Genome-phenome Archive (EGA), dataset ID: EGAD00001000116) were analysed for SJs using an in-house Python script (https://github.com/Boyes-Lab/NGS-Analysis)48. In brief, paired-end sequencing data was aligned to the hg19 build of the human genome using Bowtie2 in local alignment mode. Following alignment to the genome, the data were filtered for discordant reads at the immunoglobulin and TCR loci using Samtools. Reads were further filtered (using the above in-house Python script) to extract divergent reads, indicative of SJs. Similar tools that capitalize on discordant paired-end reads have been developed to map ecDNAs. However, AmpliconArchitect40 also requires increased circular DNA copy number whereas CircleSeq involves removal of linear DNA prior to sequencing and analysis44,61. AmpliconArchitect40 did not detect circular DNAs in WGS from 20 patient samples where ESCs were detected by LAM-ESC (Fig. 2; EGA accession code: EGAS00001006863). This is likely because unlike ecDNAs, individual ESC sequences do not undergo copy number amplification that is required for detection by AmpliconArchitect40.

Identification of differentially expressed genes and GSEA analysis

RNA-seq data of patients at diagnosis were downloaded from the TARGET database (dbGaP Sub-study ID: phs000464; Fig. 3d) or EGA (Accession Code EGAS00001006863; Fig. 3f). Differentially expressed genes were identified using DESeq2 with |logFC| > 0.585 and FDR < 0.05. Adjustment for multiple comparisons was performed via the DESeq2 programme. Gene set enrichment analysis (GSEA) was carried out according to the user guide provided by the BROAD Institute (https://docs.gsea-msigdb.org/#GSEA/GSEA_User_Guide/).

Whole-exome sequencing

DNA was prepared from germline and BCP-ALL patient samples taken at diagnosis and relapse, using the Promega Wizard HMW DNA extraction kit. The concentration was measured via Qubit and whole-exome paired-end read sequencing was performed at 100× depth by Azenta.

Analysis of SVs

SV data of patients at diagnosis and relapse were downloaded from Complete Genomics (CGI, from within the TARGET database: dbGaP Sub-study ID: phs000464; Fig. 5a,b). For Extended Data Fig. 9d, WGS of patients with BCP-ALL in the VIVO Biobank cohort was downloaded from EGA under the Accession Code EGAS00001006863. The analysis used similar numbers of WGS and WES where WES from germline and BCP-ALL samples are available from EGA (dataset ID: EGAD50000001519). Breakpoints of SVs were analysed using a bespoke Python program, SVs_near_RSSs.py, which creates an analysis window spanning 50 bp either side of each breakpoint (https://github.com/Boyes-Lab/Structural-Variants)62. The presence of an RSS within the window is then analysed using the DNAGrep algorithm, via RSSsite63.

The relative impact of cut-and-run, ESC reintegration and RAG-mediated insertion was determined as follows: SVs were defined as either deletions, insertions, translocations or complex insertions based on the source of the DNA strand on either side of an SV breakpoint. The DNA damage occurring near RSS sites was analysed, as RAG-mediated DSBs are required for both reintegration and cut-and-run-mediated DNA damage. ESC reintegration events are defined by the reinsertion of ESC DNA into the genome. Thus, insertions were considered to be reintegration-derived where the inserted DNA came from immunoglobulin or TCR loci. The proportion of these events near RSS sites was then compared with other types of DNA damage. Open-source scripts to analyse potential cut-and-run events compared to reintegration are provided at https://github.com/Boyes-Lab/Structural-Variants62.

Analysis of ecDNAs using AmpliconArchitect40

In silico experiments to detect ecDNAs were carried out using AmpliconArchitect40 according to the AmpliconSuite pipeline (https://github.com/AmpliconSuite/AmpliconSuite-pipeline/blob/master/documentation/GUIDE.md) using WGS BAM files from VIVO Biobank patient samples available from EGA (Accession Code EGAS00001006863).

Statistics

All statistical analyses were performed using GraphPad Prism v9. Statistical test results are provided as P values in the figures. Detailed descriptions of error bars and the number replicates and/or cells analysed are reported in the figure legends. Biological replicates are shown unless otherwise indicated. Analyses of fold changes between biological replicates were performed using a two-tailed Student’s t test or Mann–Whitney U test (depending on data distribution) where *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. The 95% confidence intervals are given in the figure legends where possible. The 95% confidence intervals for the difference of mean gene expression (H-relapse versus non-relapse) in Fig 3e: PCNA: −2.007 to −0.1691; POLE3: −2.206 to −0.1488; POLE4: −4.910 to −0.8914; RBX1: −2.206 to −0.6157. The Kolmogorov–Smirnov test was used to determine whether the data followed a Gaussian distribution. GSEA uses the statistical test described53,54 with corrections for multiple comparisons. DeSeq2 analysis used the Wald statistical test with the Benjamini–Hochberg correction for multiple testing. Statistical analyses with two categorical variables were performed using a two-way ANOVA. Statistical analyses of the proportion of ESCs above the threshold in patients who relapse versus those who do not were determined using a two-tailed Fisher’s exact test. The significance of the difference between matched ESCs in relapse and non-relapse groups was determined using a two-tailed Wilcoxon signed-rank test. Pearson correlation coefficients (r values) were computed for scatter plots and tested (null r = 0) with a standard two-tailed test. The significance of SVs involving single cRSSs occurring more frequently at relapse-associated genes was calculated using the hypergeometric distribution (Fig. 5b and Extended Data Fig. 9d) to analyse the number of SVs involving single cRSSs within relapse-associated genes compared to SVs involving single cRSSs in the whole dataset. The significance of the co-localization of breaks detected by LAM-HTGTS and genes that are frequently mutated in relapsed ALL (Fig. 5c) was calculated by using the hypergeometric distribution (implemented in the R software, https://www.r-project.org) to test whether the number of genes which were more commonly mutated with SJ partner versus 12-RSS or 23-RSS partners (controls) occurred more frequently in the relapse-associated genes versus the whole gene list. A power calculation, taking an alpha value of 0.05 and a desired power of 80% was used to initially estimate the sample size in which to analyse ESC levels.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.