Main

The intricate dance between the replication and transcription processes, both of which operate on the same DNA template, must be tightly regulated to maintain genome integrity7. Collisions between these processes occur across nearly all species8,9 and can be either head-on (HO) or co-directional (CD), depending on which DNA strand is transcribed. HO collisions arise when the transcription machinery moves opposite to the replisome, with the transcribed strand serving as the lagging-strand template. In CD collisions, transcription and replication proceed in the same direction, with the transcribed strand acting as the leading-strand template1,10. Protein-coding genes are transcribed by RNAPII, which initiates at promoter sequences11. After promoter escape, RNAPII frequently undergoes transient promoter-proximal pausing12. Multi-protein complexes regulate its release into productive elongation: Integrator terminates and removes paused RNAPII at promoter-proximal sites6,13 whereas the Mediator complex, comprising a core body and a kinase module, coactivates RNAPII-dependent transcription14. After release, RNAPII at first elongates slowly near promoters, accelerating over the first approximately 10 kb of genes until reaching peak speed15. This early acceleration is stimulated by diverse elongation factors that act through distinct mechanisms16,17. Slow elongation or pausing promotes re-annealing of nascent transcripts to the template DNA, forming R-loops—three-stranded structures composed of an RNA–DNA hybrid and displaced single-stranded DNA2,3. RNAPII engaged with R-loops can obstruct replisomes, leading to genome instability10. Most studies of transcription–replication conflicts have focused on HO collisions18,19, which are generally considered more deleterious. However, most human genes are oriented co-directionally6, and active replication origins, particularly those that fire early in S-phase, frequently lie near promoters5. Given that RNAPII often pauses at promoter-proximal sites of highly transcribed genes, CD collisions in this region are likely. How replisomes navigate CD RNAPII during productive elongation remains unknown.

Genome-wide transcription–replication

To investigate spatial connections between transcription, co-transcriptional R-loops and replication genome-wide, we mapped RNAPII occupancy by chromatin immunoprecipitation followed by sequencing (ChIP–seq); nascent transcription by bromouridine (BrU)–seq; and R-loops by DNA–RNA hybrid immunoprecipitation (DRIP)–seq in RPE1 cells (Fig. 1a). Replication origins were mapped using previously published Okazaki fragment (OK)–seq data from unperturbed RPE1 cells5 (Extended Data Fig. 1a), identifying 4,785 origins shared between two replicates. These aligned well with Origin-seq (Ori–seq) origins mapped in hydroxyurea (HU)-treated RPE1 cells20 and were enriched in transcriptionally active, early-to-mid-S-phase regions, as confirmed by single-cell 5-ethynyl-2′-deoxyuridine sequencing (scEdU–seq) in unperturbed cells21 (Extended Data Fig. 1b,c). For these origins, we calculated distances to the nearest transcription start site (TSS) and retained only those without another gene within 5 kb upstream, yielding 2,040 origins. RNAPII, BrU and DRIP profiles were overlaid with these coordinates and sorted by origin–TSS distance. Metaprofiles were generated by aligning all co-directionally (CO) oriented TSSs (n = 1,395) relative to origins and compared with HO TSSs (n = 408) (Fig. 1b and Extended Data Fig. 2a). As previously observed6, RNAPII binding and nascent transcription were higher at CD-oriented TSSs than at HO-oriented TSSs (Fig. 1c and Extended Data Fig. 2b,c). R-loop levels were modestly higher at CD-oriented TSSs, consistent with increased transcription (Fig. 1c). Extending this analysis, we plotted R-loop levels within 25-kb promoter windows adjacent to origins and up to 75 kb away in either orientation (Extended Data Fig. 2d). R-loop levels were markedly increased near origins, particularly in the CD orientation, suggesting that TSSs that are close to origins experience greater transcription stress (Fig. 1d). We propose that cells deploy mechanisms to mitigate transcription stress at these TSSs to minimize clashes with CD replisomes.

Fig. 1: RNAPII transcription, R-loops and CFAP20 function in replication.
figure 1

a, Heat maps of RNAPII ChIP–seq, BrU–seq and DRIP–seq in RPE1 cells, aligned around replication origins mapped by OK–seq5. b, Model showing RNAPII transcription on lagging-strand (HO) or leading-strand CD) templates relative to the replication fork. c, Metaprofiles of RNAPII ChIP–seq (green), BrU–seq (blue) and DRIP–seq (red) in RPE1 cells around TSSs oriented HO (n = 408) or CD (n = 1,395) relative to origins5. Data are averages after trimming the top and bottom 5% of data (a trim-mean of 0.1) to remove extreme values. d, Metaprofiles of DRIP–seq signals within a 25-kb window adjacent to origins extending up to 75 kb in HO and CD orientations (Trimmean 0.1). e, Schematic of CRISPR–Cas9 screens. NGS, next-generation sequencing. f, Correlation of normalized z-scores from CD437 and illudin S screens7; lowest and highest z-scores normalized to –1 and +1. g, Representative co-localization of GFP–CFAP20 with the primary cilium (arrowheads) marker acetylated α-tubulin. Scale bar, 20 μm. h, Representative image of immunofluorescent labelling of R-loops using S9.6 antibody. Scale bar, 10 μm. i, AlphaFold model of CFAP20, highlighting residue R100; positively charged residues are in blue. j, Quantification of nuclear R-loop signal from h for the indicated stable cell lines. Each coloured circle is one cell; black circles represent medians of independent experiments (more than 100 cells); black lines are means of all experiments; significance was calculated by one-way ANOVA with Šidák’s correction. P values from left to right: <0.0001, 0.9944, 0.0002, 0.9980, 0.0020 and 0.9980. NS, not significant. k, Schematic of sister fork symmetry principle. l, Representative sister fork symmetry observed by sequential CldU (red) and IdU (green) labelling. Scale bar, 5 μm. m, Quantification of sister fork symmetry from l. Data as in j (more than 100 fibres); significance by one-way ANOVA with Šidák’s correction. P values from left to right: <0.0001, 0.8851, <0.0001, >0.9999, <0.0001 and >0.9999.

Source data

CFAP20 in transcription–replication screens

To uncover mechanisms and factors that fine-tune the coexistence of transcription and replication, we performed two genome-wide CRISPR screens. Cells were transduced with 71,090 gRNAs targeting 18,053 protein-coding genes and left untreated, or exposed to illudin S to stall transcription17 or the DNA polymerase α inhibitor CD437 to stall replication22 (Fig. 1e). Genes at the intersection of these genetic screens encode proteins that respond to both transcription arrest and replication arrest. The genes with the highest scores in both screens encode three subunits of the 9-1-1 complex (RAD9–HUS1–RAD1), a known checkpoint complex that is strongly activated by transcription–replication encounters23. The fourth top hit at the intersection of these screens is the CFAP20 gene (Fig. 1f), which encodes a small (23 kDa) understudied protein that is currently known only as a ciliary protein24. In addition to its expected localization at the primary cilium of RPE1 cells, we observed that GFP–CFAP20 localized to the cell nucleus (Fig. 1g). This prompted us to investigate its nuclear function in more detail.

CFAP20 prevents R-loop accumulation

Although CFAP20 was previously suggested to be an essential gene25, we were able to generate a CFAP20 full knockout (KO) cell line in RPE1 TP53-KO cells (Extended Data Fig. 3a). In agreement with our CRISPR screens, clonogenic survival assays confirmed that CFAP20-KO cells are sensitive to illudin S and to CD437 (Extended Data Fig. 3b,c). Although our previous work revealed that many illudin S sensitizer genes are involved in transcription-coupled DNA repair (TCR)17, illudin S sensitization alone is not enough to unequivocally identify TCR genes26. In line with this, functional assays show that CFAP20 is fully dispensable for TCR (Extended Data Fig. 3d). Notably, illudin S treatment has been shown to cause R-loop accumulation independently of TCR27, which prompted us to investigate R-loop levels in CFAP20-deficient cells. Immunofluorescence experiments using the S9.6 antibody (Fig. 1h), recognizing the RNA–DNA hybrid of R-loops28,29, showed a twofold increase in R-loop levels in CFAP20-KO cells, similar to R-loop levels in BRCA1-KO cells30 (Extended Data Fig. 3e). While mining the COSMIC (Catalogue Of Somatic Mutations In Cancer) database, we observed a charge-loss substitution (R100C) in CFAP20, situated within a highly positively charged patch on the protein surface (Fig. 1i). This mutation is recurrent in a small number of tumour types, yet it has not been classified as a tumour driver (Extended Data Fig. 3f and Supplementary Table 1). Owing to its potential effect on CFAP20 function, we chose to characterize this mutant. Although the R-loop phenotype in CFAP20-KO cells was fully reversed by stable re-expression of GFP-tagged wild-type (WT) CFAP20, expression of the GFP–CFAP20(R100C) mutant did not rescue the R-loop phenotype (Fig. 1j). To demonstrate specificity, we lentivirally transduced CFAP20-KO cells with GFP–RNaseH1, which abolished the S9.6 signal (Fig. 1j). Moreover, imaging R-loops using catalytically inactive recombinant GFP-tagged RNaseH1(D210N) confirmed the accumulation of R-loops in CFAP20-KO cells31,32, which was fully reversed by re-expression of CFAP20 (Extended Data Fig. 3g). A consequence of R-loop accumulation is the asymmetry of sister forks progressing from single origins33 (Fig. 1k). Accordingly, we could detect a marked fork asymmetry in CFAP20-KO cells (Fig. 1l) which could be reversed by expression of WT CFAP20 and by lentiviral transduction of GFP–RNaseH1, but not by CFAP20(R100C) (Fig. 1m), indicating that this is an R-loop-driven phenotype.

CFAP20 limits R-loops beyond cilia

We next investigated whether the accumulation of R-loops is connected to the ciliary function of CFAP20. To test this, we exploited the observation that homozygous cfap20−/− zebrafish larvae develop anterior–posterior ventral axis curvature, which has been attributed to the loss of motile ciliary function24. Micro-injecting human CFAP20 mRNA into cfap20-deficient zebrafish embryos fully rescued the body-axis-curvature defect. A similar rescue was observed when micro-injecting the CFAP20R100C variant (Fig. 2a,b). Consistent with these findings, GFP–CFAP20(R100C) localized to the primary cilium of RPE1 cells (Extended Data Fig. 4a). These findings suggest that the R-loop phenotype in CFAP20-KO cells is unrelated to its ciliary function.

Fig. 2: CFAP20 is synthetic viable with Mediator loss.
figure 2

a, Representative micrographs of cfap20−/− zebrafish embryos two days after fertilization, with severe ventral anterior–posterior curvature rescued by microinjection of 25 pg human CFAP20 mRNA (WT or R100C). Scale bar, 0.2 mm. n = 2 biological replicates. b, Percentage of cfap20−/− homozygotes with curvature defects, either uninjected or rescued by 25 pg human CFAP20 mRNA (WT or R100C). Sample sizes are indicated next to the bars. Significance by two-tailed Fisher’s exact test. ****P < 0.0001. c, Quantification of competition assays between the indicated conditions. NLS, nuclear localization signal. Each coloured circle represents the mean of an independent experiment (more than 30,000 cells). The coloured line represents the mean of n = 3 biological independent experiments. Significance by one-way ANOVA with Šidák’s correction. ****P < 0.0001. d, Additional competition assay quantification as in c. P values from left to right: 0.0064, <0.0001 and <0.0001. e, Colony formation assay for the indicated cell lines. f, Cell-cycle profiles analysed by quantitative image-based cytometry in the indicated RPE1 lines. g, Quantitative image-based cytometry after cyclin A staining; red box highlights G2-phase cells with low levels of cyclin A. Green indicates the mean intensity of cyclin A per nucleus (0–250). h, Quantification of G2 cells with low levels of cyclin A from g. Data are mean (three technical replicates from three independent experiments). Significance by unpaired two-tailed t-test. P = 0.0131. i, Genome-wide CRISPR screen in CFAP20-KO cells. Genes are ranked by z-score, showing synthetic-viable (blue) interactions with CFAP20. j, Colony formation assay for the indicated cell lines. k, Quantification of sister fork symmetry for the indicated stable cell lines. Data are as in Fig. 1j; significance by one-way ANOVA with Dunnett’s correction. P values from left to right: 0.0006, 0.8623 and 0.7491. l, Quantification of nuclear R-loop signal from the indicated stable cell lines. Data are as in Fig. 1j; significance as in k. P values from left to right: 0.0185, 0.9987, 0.0486 and >0.9999. m, Averaged spike-in normalized metaplots around TSS of RNAPII ChIP–seq for the same 3,000 BrU–seq-positive genes >3 kb in the indicated RPE1 cells.

Source data

CFAP20 and Mediator are synthetic viable

We noticed that CFAP20-KO cells grow more slowly than parental cells. Flow-cytometry-based competitive cell-growth assays confirmed that CFAP20-KO cells are rapidly outcompeted by WT cells (Fig. 2c and Extended Data Fig. 4b) and GFP–CFAP20 rescue –cells (Fig. 2d). This led to markedly decreased colony formation in CFAP20-KO cells, which was reversed by re-expression of WT CFAP20 but not by CFAP20(R100C) (Fig. 2e). Quantitative image-based cytometry revealed no obvious differences in cell-cycle profiles between WT and CFAP20-KO cells (Fig. 2f), but showed an increase in the percentage of cyclin A-negative G2 cells in the CFAP20-KO cell population (Fig. 2g,h), suggestive of cell-cycle exit34. To gain genetic insight into the cause of the poor-growth phenotype, we performed a genome-wide CRISPR screen to identify genes whose knockout would improve the fitness of CFAP20-KO cells (Extended Data Fig. 4c). sgRNAs targeting multiple subunits of the Mediator coactivator complex (Fig. 2i) were strongly enriched in our screen, which suggests that Mediator is a driver of the poor fitness in CFAP20-KO cells. To validate these results, we knocked out CCNC (encoding cyclin C, a subunit of the Mediator kinase module) in CFAP20-KO cells (Extended Data Fig. 4d,e). We observed a marked increase in colony formation in CFAP20/CCNC-double-knockout (dKO) cells, compared with single CFAP20-KO cells (Fig. 2j). Knockout of CCNC in a CFAP20-KO background also reversed the increase in cyclin A-negative G2 cells (Extended Data Fig. 4f,g). Thus, inactivation of the Mediator kinase function greatly improves the fitness of human CFAP20-KO cells. Notably, transient knockdown of ccnc in zebrafish larvae did not rescue the anterior–posterior body-axis curvature of the cfap20−/− mutant, and resulted in the development of additional microphthalmia and pericardial oedema (Extended Data Fig. 4h,i). These findings indicate that loss of CCNC does not rescue the ciliary dysfunction caused by the loss of CFAP20 function, but rather that CCNC loss rescues a function of CFAP20 that is unrelated to cilia.

Mediator-dependent R-loops in CFAP20-KO cells

Because inactivation of the Mediator subunit CCNC in a CFAP20-KO background could rescue the poor cell growth, we wondered whether this could also rescue the R-loop phenotype. Immunofluorescence experiments using either the S9.6 antibody (Extended Data Fig. 5a) or purified GFP–RNaseH1(D210N) (Extended Data Fig. 5b) showed a full reversal of the R-loop phenotype in CFAP20/CCNC-dKO cells. Moreover, bidirectional fork asymmetry was also fully reversed in dKO cells, whereas single CCNC-KO cells showed no R-loop phenotype (Fig. 2k). Of note, knockout of CCNC did not rescue the R-loop phenotype of cells deficient in the Integrator subunit INTS9 (Extended Data Fig. 5c,d). To determine whether the Mediator-dependent function of cyclin C drives these phenotypes, we used a CCNC point mutant (CCNC(D182A)) that is defective in binding the Mediator complex35 (Extended Data Fig. 5e). Proteomic and co-immunoprecipitation analyses confirmed that WT GFP–CCNC associated with CDK8, CDK19 and fifteen Mediator subunits, whereas GFP–CCNC(D182A) still associated with CDK8 and CDK19 but did not associate with Mediator (Extended Data Fig. 5f–h). Immunofluorescence experiments showed that re-expression of WT GFP–CCNC in CFAP20/CCNC-dKO cells restored R-loop accumulation to the level of CFAP20-deficient cells. However, expression of the GFP–CCNC(D182A) Mediator-binding mutant in CFAP20/CCNC-dKO cells did not increase R-loop levels (Fig. 2l). Previous studies have shown that inactivation of the Mediator kinase module, by knockout of the CDK8 counterpart of cyclin C, leads to global repression of transcription by reducing RNAPII occupancy at promoters36. Consistently, RNAPII occupancy measured by ChIP–seq was similar between WT and CFAP20-KO cells but was reduced in CFAP20/CCNC-dKO cells (Fig. 2m and Extended Data Fig. 5i). Together, these results show that CFAP20 specifically suppresses R-loops induced by Mediator-dependent transcription.

R-loops accumulate at TSSs in CFAP20-KO cells

We next set out to map where R-loops accumulate in the absence of CFAP20 by using genome-wide DRIP–seq. R-loops mapped mainly to promoters (TSSs) and terminators (transcription termination sites; TTSs), and treatment with recombinant RNaseH1 consistently abolished DRIP signals (Fig. 3a). More R-loops accumulated at promoters in CFAP20-KO cells than in WT cells across 508 genes (Fig. 3b and Extended Data Fig. 6a), whereas this was not the case at terminators (Fig. 3c and Extended Data Fig. 6b). Mapped regions with increased R-loops were found in transcriptionally active, early-replicating areas of the genome (Extended Data Fig. 6c). When selecting promoters with the strongest R-loop increase in CFAP20-KO cells (a greater than 1.5-fold increase in signal from −5 kb to +5 kb around TSSs in CFAP20-KO over WT), the terminators of these same genes still did not show an increase (Fig. 3b,c). Metaprofiles of around 1,800 aligned TSSs, sorted on the basis of their directionality relative to origins of replication, revealed that CD-oriented TSSs exhibited a stronger increase in R-loop levels in CFAP20-KO cells than did HO-oriented TSSs (Fig. 4d,e). The magnitude of R-loop accumulation at CD-oriented promoters in CFAP20-KO cells did not correlate with the level of anti-sense transcription (Extended Data Fig. 6d,e), suggesting that transcription in the CD orientation relative to replication is responsible for this phenomenon. To further strengthen these findings, we used a defined episomal system in HEK293T cells, with a doxycycline (DOX)-inducible gene oriented either in the same direction (CD) or the opposite direction (HO) relative to a nearby unidirectional replication origin1 (Fig. 4f). After DOX induction, cells transfected with a control short interfering RNA (siRNA) exhibited R-loops on the HO plasmid but not on the CD plasmid, as previously reported1. Knockdown of CFAP20 in these cells triggered a strong accumulation of R-loops after transcription induction, which was selectively observed at the CD-oriented promoter (Fig. 4g and Extended Data Fig. 6f,g), and was accompanied by ATM-dependent phospho-CHK2 activation (Extended Data Fig. 6h), consistent with the characteristic DNA damage response that is associated with CD conflicts. Thus, CFAP20 prevents the accumulation of R-loops specifically at CD-oriented promoters, consistent with its genetic interaction with the promoter-associated Mediator complex.

Fig. 3: CFAP20 suppresses CD R-loops at promoters.
figure 3

a, Distribution of DRIP–seq reads across indicated regions of the ITGA11 gene in the specified cell lines and conditions. b, Metaprofiles of DRIP–seq aligned around 508 promoters (TSSs) of genes >50 kb (top) or 135 TSSs with ≥1.5-fold higher DRIP signal from –2 kb to +3 kb around TSSs (bottom) in CFAP20-KO relative to WT. c, Metaprofiles of DRIP–seq in the indicated cell lines and conditions aligned around 508 terminators (TTSs) of genes >50 kb (top) or 135 TTSs with ≥1.5-fold higher DRIP signal from −2 kb to +3 kb around TSSs (bottom) in CFAP20-KO relative to WT. d, Metaprofiles of DRIP–seq in WT and CFAP20-KO cells aligned around HO (n = 408) TSSs relative to origins mapped by OK–seq5. Data are averages after trimming the top and bottom 5% of data (a trim-mean of 0.1) to remove extreme values. e, Metaprofiles of DRIP–seq in WT and CFAP20-KO cells aligned around CD (n = 1,395) TSSs relative to origins mapped by OK–seq5. Data as in d. f, Schematic of episomal system for transcription–replication conflicts. The unidirectional EBV replication origin (oriP; red) is placed either behind (HO) or in front (CD) of the EBNA1 gene containing the R-loop-forming mAIRN segment (blue) under TetON control. g, DRIP–quantitative PCR (qPCR) in HEK293T cells with mAIRN HO or CD plasmids, transfected with control or CFAP20 siRNAs ± DOX for 48 h. DRIP signals around the mAIRN TSS are shown as % input; data represent n = 3 biological independent experiments. Significance by one-way ANOVA with Šidák’s correction. P values from left to right: 0.4616 and <0.0001. h, Heat maps of CFAP20 ChIP–seq in TY-CFAP20 cells aligned around origins mapped by OK–seq5. i, Metaprofiles of DRIP–seq (red), CFAP20 ChIP–seq (green) and BrU–seq (blue) in the indicated cell lines within a 25-kb window adjacent to origins extending up to 75 kb. Data are Trimmean 0.1 to remove extreme values.

Source data

Fig. 4: CFAP20 suppresses Mediator-driven replication stress.
figure 4

a, Heat map of scEdU–seq from a single 15-min EdU pulse. Maximum-normalized log counts for WT and CFAP20-KO cells, ordered by S-phase progression (x axis) and binned per 400 kb (y axis) for a 60-Mb region of chromosome 2. b, Representative images (top) and quantification (bottom) of origins per Mb derived from inter-origin distances, using sequential CldU (red) and IdU (green) labelling of nascent DNA. Scale bar, 5 μm. Significance by one-way ANOVA with Dunnett’s correction. P values from left to right: 0.0134, 0.5742, 0.0079 and 0.6897. c, Number of replication forks per cell (y axis) versus S-phase progression (x axis) in WT and CFAP20-KO cells; LOESS fit (line) with standard error ribbon. d, Representative images (top) and quantification (bottom) of replication-fork speed in the indicated cells using sequential CldU (red) and IdU (green) labelling. Scale bar, 5 μm. Significance by one-way ANOVA with Dunnett’s correction. P values from left to right: <0.0001, 0.5175, <0.0001 and 0.5893. e, DNA replication speed over S-phase (y axis) in WT (n = 402) and CFAP20-KO (n = 331) cells, measured as median replication width. Each dot is one cell, the line indicates the fit using a LOESS fit and the ribbon indicates the 95% standard error for the fit. f, Outline of S1 nuclease experimental set-up. g, Representative DNA fibres in the indicated cell lines without or with S1 nuclease. Scale bar, 10 μm. h, Quantification of fibres ± S1 nuclease in the indicated cells using sequential CldU (red) and IdU (green) labelling. Data representation as in Fig. 1j (more than 100 fibres); black and blue lines represent means of all experiments. Significance between –S1 and +S1 conditions was determined by one-way ANOVA with Šidák’s correction. P values from left to right: 0.9982, <0.0001, >0.9999, <0.0001, 0.9992, 0.0003 and >0.9999.

Source data

CFAP20 limits Mediator-dependent stress

We next overlayed the difference in R-loops between CFAP20-KO cells and WT cells with origins of replication mapped using published OK–seq datasets5 (Extended Data Fig. 6i). To correlate this to the binding of CFAP20 in the genome, we also performed genome-wide ChIP–seq on TY1-tagged CFAP20 (Fig. 4h). This analysis revealed a particular increase in R-loops at TSSs close to origins in the CD orientation, which did not correlate with an increase in nascent transcription in these genomic regions in CFAP20-deficient cells (Fig. 4i). Moreover, we detected CFAP20 binding mainly to gene promoters, with a strong preference for CD-oriented promoters (Fig. 4h,i). These findings suggest that CFAP20 acts locally at promoters to prevent transcription stress and R-loop accumulation. The marked accumulation of R-loops at TSSs close to origins in CFAP20-KO cells raises the possibility that this affects DNA replication dynamics. We performed scEdU–seq, which showed that S-phase progression is mostly unaltered between WT and CFAP20-KO cells21 (Fig. 4a and Extended Data Fig. 7a–d), in line with our quantitative image-based cytometry analysis (Fig. 2f). To investigate an effect on replication-fork progression, we used DNA fibre assays to measure the distance between origins37 (Extended Data Fig. 7e), and used this to calculate the number of origins firing per megabase. Origin firing was suppressed in CFAP20-KO cells, and re-expressing WT GFP–CFAP20 rescued this phenotype, whereas GFP–CFAP20(R100C) did not (Fig. 4b). The additional knockout of CCNC also restored origin activity to WT levels. In line with this finding, scEdU–seq analysis showed that origin usage was less efficient in CFAP20-KO cells than in WT cells (Extended Data Fig. 7f). In addition, quantification of the number of replication forks from scEdU–seq showed that, compared with WT cells, CFAP20-KO cells exhibited a decreased number of forks throughout S-phase and across all human chromosomes21 (Fig. 4c and Extended Data Fig. 7g). This is consistent with a decrease in origin firing and a general increase in fork stalling in CFAP20-KO cells.

Differences in origin activity are often compensated for by changes in replication-fork speed38. For instance, PARP inhibitors have been shown to trigger fork acceleration at first, followed by a secondary reduction in origin activity39,40. To test this possibility in the context of CFAP20 deficiency, we performed DNA fibre assays, which revealed that CFAP20-KO cells had an increased replication-fork speed (Fig. 4d). This phenotype was fully rescued by expressing WT GFP–CFAP20 but not by GFP–CFAP20(R100C). Meanwhile, the CFAP20/CCNC-dKO cells exhibited a normal fork speed, similar to that of WT cells (Fig. 4d). Treatment with a PARP inhibitor indeed caused fork speeding in WT cells41, but did not further accelerate forks in CFAP20-KO cells (Extended Data Fig. 8a). To extend these findings, we quantified replication-fork speed from scEdU–seq data, which confirmed the increased fork speed in CFAP20-KO cells and revealed that fork speeding occurs throughout S-phase (Fig. 4e). To identify the main cause of the replication defect in CFAP20-KO cells, we performed DNA fibre assays in combination with chemical inhibitors of replication-fork progression (aphidicolin) and origin activity38 (CDC7 kinase inhibitor, XL413) (Extended Data Fig. 8b). As expected, treating WT cells with the CDC7 kinase inhibitor reduced origin activity, which was accompanied by an acceleration of fork speed, whereas treatment with aphidicolin reduced fork speed and led to increased origin activity (Extended Data Fig. 8c,d). Whereas untreated CFAP20-KO cells already exhibited an increased fork speed and decreased origin activity (Fig. 4b–e), decreasing the fork speed with aphidicolin fully rescued origin activity in these cells (Extended Data Fig. 8c). If reduced origin activity were the main cause, this phenotype should be resilient to aphidicolin treatment38, which is not what we observed (Extended Data Fig. 8d). These experiments therefore reveal that the main cause of replication stress in CFAP20-KO cells is the accelerated fork rate, which consequently triggers a secondary decrease in origin activity.

CFAP20 limits Mediator-dependent gaps

Fork acceleration induced by PARP inhibition was shown to be associated with the formation of single-stranded DNA (ssDNA) gaps39. We therefore tested whether this is also the case in CFAP20-KO cells. To detect ssDNA gaps in the genome of CFAP20-KO cells, we used the DNA fibre assay in the presence of the ssDNA-specific S1 nuclease42 (Fig. 4f,g). Measurements of 5-iodo-2’-deoxyuridine (IdU) tracks showed a marked accumulation of ssDNA gaps in CFAP20-KO cells, which was reversed by expression of WT GFP–CFAP20 but not by the GFP–CFAP20(R100C) mutant (Fig. 4h). BRCA1-deficient cells also accumulate gaps, which were suggested43 to underlie their sensitivity to PARP inhibition. Notably, cell viability assays with increasing concentrations of PARP inhibitor showed that, in contrast to BRCA1-KO cells included in parallel, CFAP20-KO cells are not sensitive to PARP inhibition (Extended Data Fig. 8e). To investigate whether the accumulation of ssDNA gaps in CFAP20-KO cells is a consequence of Mediator-driven transcription, we performed S1 nuclease assays on different CCNC mutants. The ssDNA gap phenotype was fully reversed in CFAP20/CCNC-dKO cells. Re-expression of WT GFP–CCNC in these cells restored the ssDNA gap phenotype, whereas expression of the GFP–CCNC(D182A) Mediator-binding mutant did not (Fig. 4h).

During DNA replication, ssDNA gaps can arise from two main sources: incomplete lagging-strand processing and PRIMPOL-dependent repriming on the leading strand44,45,46. To assess the contribution of each mechanism, we first inhibited DNA polymerase α with the inhibitor CD437, which initiates replication at each Okazaki fragment on the lagging strand. Fork speeding in CFAP20-KO cells treated with CD437 was completely reversed (Extended Data Fig. 9a,b), and ssDNA gaps persisted (Extended Data Fig. 9c,d). Next, we used either siRNAs or CRISPR RNAs (crRNAs) to knock down or acutely knock out PRIMPOL. Under these conditions, we confirmed that loss of PRIMPOL fully reversed the fork speeding induced by PARP inhibitor treatment, as previously shown39. By contrast, the increased fork speed and fork asymmetry observed in CFAP20-KO cells were unaffected by the loss of PRIMPOL (Extended Data Fig. 9e–h). However, ssDNA gap accumulation was strongly suppressed by knockout of PRIMPOL (Extended Data Fig. 9i–k). The preferential accumulation of R-loops at CD-oriented promoters, where the transcribed strand for RNAPII serves as the leading-strand template during DNA replication (see Fig. 1b), is consistent with a repriming mechanism mediated by PRIMPOL. Supporting this model, fork asymmetry in CFAP20-KO cells was fully reversed by S1 nuclease treatment (Extended Data Fig. 9l). Together, these results suggest that CFAP20 suppresses Mediator-driven transcription stress at promoters to maintain replication fidelity.

CFAP20 salvages promoter-proximal RNAPII

We next asked whether regions with increased R-loops in CFAP20-KO cells show differences in replication timing in our scEdU–seq dataset. By quantifying the replication timing of forks with the earliest, median and latest replication timing per 10-kb bin, we found that regions with increased R-loops in CFAP20-KO cells exhibited a delay in early DNA replication relative to WT cells, consistent with increased fork stalling at CD promoters in cis. By contrast, later-replicating regions completed DNA replication earlier in CFAP20-KO cells than in WT cells, consistent with the acceleration of replication forks observed in both DNA fibre and scEdU–seq experiments in trans (Extended Data Fig. 10a). Together, our results suggest that local Mediator-driven transcriptional stress at promoters, when not mitigated by CFAP20, culminates in global replication defects by increasing fork speed, which ultimately leads to reduced origin activity. To investigate R-loop dynamics in the promoter-proximal region, where R-loops specifically accumulate, we treated cells with the reversible transcription elongation inhibitor 5,6-dichloro-1-β-d-ribofuranosylbenzimidazole (DRB); this strongly suppressed R-loop accumulation in WT and CFAP20-KO cells (Fig. 5a,b). After DRB washout and release, R-loops returned rapidly to the original levels in both backgrounds (Fig. 5b), suggesting that R-loops are continuously formed in the promoter-proximal region. To extend these findings, we directly measured the RNAPII elongation rate by releasing cells after DRB elongation arrest and immediately pulse-labelling nascent transcripts using 4-thiouridine (4-SU) ribonucleoside (Fig. 5c). Isolation and sequencing of nascent transcripts revealed that CFAP20-KO cells showed a transcription elongation defect after DRB release (Fig. 5d and Extended Data Fig. 10b), which was also observed in CFAP20R100C cells (Extended Data Fig. 10c). Although the wave-front of RNAPII elongation was not different, suggesting that there was no effect on RNAPII processivity, we observed decreased elongation, consistent with an increased fraction of arrested RNAPII molecules (Fig. 5d). CFAP20 thus seems to salvage slowly elongating or arrested RNAPII molecules, thereby removing them from the path of CD replisomes. In support of such a role, we found that RNAPII co-immunoprecipitated with GFP–CFAP20 (Extended Data Fig. 10d,e). To further corroborate this model, we investigated whether replication phenotypes in CFAP20-deficient cells could be restored either by removing R-loops or by removing arrested RNAPII through α-amanitin degradation47 (Extended Data Fig. 10f). Whereas transient treatment with α-amanitin had no effect on fork speed (Fig. 5e) and only a marginal effect on fork symmetry (Extended Data Fig. 10g), overexpression of RNaseH1 led to a subtle but reproducible rescue of replication-fork speed in CFAP20-KO cells (Fig. 5e), along with full restoration of symmetric fork progression. However, degradation of RNAPII through transient treatment with α-amanitin combined with RNaseH1 overexpression fully restored both replication-fork speed and symmetry in CFAP20-deficient cells (Fig. 5e and Extended Data Fig. 10g). These data suggest that neither R-loops nor arrested RNAPII individually are sufficient to cause replication stress. Instead, CFAP20 acts on arrested RNAPII engaged with an R-loop to suppress Mediator-driven transcription stress, thereby minimizing interference with DNA replication (Fig. 5f).

Fig. 5: RNAPII and associated R-loops cause replication stress.
figure 5

a, Cartoon showing DRB mechanism of action. b, Quantification of nuclear R-loop signal in the indicated stable cell lines and conditions. Data as in Fig. 1j (more than 100 cells). Significance by one-way ANOVA with Šidák’s correction comparing WT and CFAP20-KO to respective conditions. P values from left to right: 0.8069, >0.9999, 0.9994, >0.9999, 0.0044, 0.0164, 0.7249 and >0.9999. c, Schematic of DRB treatment and release experiment. d, Averaged metaplots of TTchem–seq signal 60 min after DRB release for 508 genes ≥50 kb in WT or CFAP20-KO cells. e, Quantification of replication-fork speed in the indicated cells using sequential CldU (red) and IdU (green) labelling. Data as in Fig. 1j (more than 150 fibres). Significance by one-way ANOVA with Šidák’s correction comparing WT and CFAP20-KO to respective conditions. P values from left to right: 0.9292, 0.5810, 0.3237, 0.0040, >0.9999 and <0.0001. f, Model illustrating how CFAP20 acts on arrested RNAPII engaged with R-loops to suppress Mediator-driven transcription stress, thereby minimizing interference with DNA replication.

Source data

Discussion

Our findings support a model6 in which transcription–replication encounters are minimized by diverting RNAPII from the path of CD replisomes. During S-phase, promoter-proximally paused RNAPII is terminated by the Integrator complex, which clears the path for CD replisomes6. Once RNAPII transitions into productive elongation, Integrator no longer acts, and CFAP20 becomes essential to prevent Mediator-driven transcription–replication conflicts. Consistent with this, simultaneous loss of CFAP20 and INTS9 additively increases R-loops, with Integrator-dependent R-loops remaining Mediator-independent (Extended Data Fig. 5c,d). CFAP20 thus functions in a salvaging pathway that removes RNAPII stalled on R-loops, which otherwise obstruct replisomes10. Combined R-loop removal and RNAPII degradation reverses replication defects in CFAP20-deficient cells.

At CD genes, promoting elongation ensures that RNAPII remains ahead of replisomes that initiate near promoters. The Mediator complex and its kinase module enhance RNAPII promoter occupancy and transcriptional output36, increasing elongating RNAPII flux and transcriptional stress, which CFAP20 counterbalances. This explains the synthetic viability between Mediator loss and CFAP20 deficiency. Without CFAP20, impaired elongation after promoter release leads to fork stalling that is compensated for by the acceleration of neighbouring forks.

We propose a ‘block–trigger’ mechanism that involves R-loops and stalled RNAPII. R-loop accumulation induces asymmetric fork progression and modestly increases fork speed in CFAP20-KO cells. R-loops act as ‘blocks’ to fork movement in cis, whereas stalled RNAPII acts as a ‘trigger’ for fork acceleration in trans. RNAPII alone is a poor obstacle, consistent with evidence that DNA polymerases can bypass it through transient interaction with proliferating cell nuclear antigen (PCNA)7. However, RNAPII engaged with an R-loop combines both block and trigger functions, producing fork asymmetry and an increased trans fork speed. Thus, transcription-dependent fork stalling in cis drives trans acceleration and the accumulation of single-stranded DNA (ssDNA) gaps (Fig. 5f). Restoration of fork symmetry by S1 nuclease, which degrades leading-strand ssDNA, supports this model (Extended Data Fig. 9l). ssDNA gap accumulation in CFAP20-KO cells strongly depends on PRIMPOL, implicating leading-strand repriming45. PRIMPOL depletion did not affect fork speed, whereas inhibition of DNA polymerase α did (Extended Data Fig. 9a–k), explaining the hypersensitivity of CFAP20-deficient cells to the polymerase α inhibitor CD437 (Fig. 1f and Extended Data Fig. 3c). Fork acceleration induced by PARP inhibition also depends on polymerase α activity39, paralleling our observations, although PRIMPOL contributes under PARP inhibition. Fork acceleration was restored by PRIMPOL loss and PARP inhibition in WT but not in CFAP20-KO cells (Extended Data Fig. 9g), indicating that there are dual effects in CFAP20-KO cells: PRIMPOL-dependent repriming that generates ssDNA gaps, and polymerase α-driven acceleration of neighbouring forks.

CFAP20 mutations have been linked to retinitis pigmentosa through ciliary dysfunction24. The R100C mutation, which distinguishes RNAPII-related from ciliary functions, is not associated with retinitis pigmentosa but is found across several tumour types in the COSMIC database (Extended Data Fig. 3f and Supplementary Table 1). The Cancer Genome Atlas (TCGA) analyses across 33 tumour types identified CFAP20 hotspot mutations48, and CRISPR screens in cyclin E1-amplified ovarian and uterine cancer cells revealed a specific vulnerability to CFAP20 loss49. Although not investigated further in those studies, our results highlight CFAP20’s role in maintaining transcription–replication homeostasis, suggesting that tumour cells depend on CFAP20 to mitigate transcriptional stress. Further investigation of the functions of CFAP20 will elucidate how human cells coordinate transcription with replication and might uncover therapeutic opportunities in cancers that rely on this safeguard.

Methods

Cell lines

All cell lines are listed in Supplementary Table 2 and were cultured at 37 °C with 5% CO2 in DMEM GlutaMAX (Gibco), supplemented with 10% fetal calf serum (Avantor, VWR; Supplementary Table 3) and penicillin–streptomycin (Sigma; Supplementary Table 3).

Compounds and materials

All compounds and instruments are listed in Supplementary Table 3.

Generation of knockout cells

Cells were transfected with Cas9-2A-GFP (Addgene, 48138) containing a guide RNA targeting the gene of interest (sgRNAs are listed in Supplementary Table 4 and plasmids in Supplementary Table 5) using Lipofectamine 2000 (Life Technologies, 11668027). Cells were sorted by fluorescence-activated cell sorting on BFP and GFP and were plated at a low density, after which individual clones were isolated. Isolated knockout clones were verified by Sanger sequencing and/or western blot analysis (primers and antibodies are listed in Supplementary Tables 6 and 7, respectively).

PGK plasmids with GFP-tagged protein

The CFAP20 gene was amplified from cDNA by PCR and inserted in PGK-EGFP-C1-IRES-PURO, thereby tagging CFAP20 at its N terminus with GFP (primers and plasmids are listed in Supplementary Tables 5 and 6). The CFAP20R100C mutant was generated using site-directed mutagenesis PCR. The CCNC gene was amplified from the CMV-EGFP-CCNC plasmid and inserted in pLenti-PGK-GFP-puro, thereby tagging CCNC at its N terminus with GFP. The CCNCD182A mutant was generated using site-directed mutagenesis PCR. A region spanning the CMV promoter was amplified by PCR and used to replace the PGK promoter in pLenti-PGK-GFP-puro. A fragment encoding RNaseH1 from plasmid pFRT-TO-EGFP-RNaseH1 was amplified by PCR and inserted in pLenti-CMV-GFP-puro. All sequences and plasmids were verified by Sanger sequencing.

Generation of stable cell lines

Cells were transfected with Lipofectamine 2000 (Life Technologies, 11668027) or polyethyleneimine reagent (Brunschwig Chemie, 23966-2) according to the manufacturer’s instructions. All plasmids are listed in Supplementary Table 5. Lentiviral particles were produced by co-transfecting pLenti plasmids with pMDLg-pRRE, pRSV-REV and pCMV-VSVG in a 2:1:1:4 ratio in HEK293T cells by using polyethylenimine reagent. After production, lentivirus was filtered through a 0.44-µm filter and added to RPE1 cells in a complete DMEM medium supplemented with 4 µg ml−1 polybrene and 10 mM HEPES. After overnight incubation, the medium was removed, and fresh medium was added. The expression of GFP was verified three days after lentiviral transduction.

CRISPR–Cas9 gene editing

For CRISPR–Cas9 gene editing, we used a previously described approach50. In brief, Cas9 expression was induced by 200 ng ml−1 DOX followed by transfection with 20 nM equimolar crRNA:tracrRNA duplexes with 1:1,000 RNAiMAX (Life Technologies).

RNA interference

For RNA interference (Supplementary Table 4), cells were transfected with 50 nM siRNA duplexes using Lipofectamine RNAiMAX (Invitrogen). Cells were transfected twice with siRNAs at 0 h and 24 h and were typically analysed 60 h after the first transfection.

Zebrafish lines and husbandry

All adult zebrafish strains are listed in Supplementary Table 8 and were raised at 28.5 °C under a 14-h–10-h light–dark cycle. Larvae were raised to 48 hours post-fertilization (hpf) at 28.5 °C in an incubator in E3 medium. The mutant cfap20 line (ua5025)24 was a gift from W. Ted Allison. All fish were on an AB background and staged as previously described51. Anaesthesia for live imaging was achieved with 60 mg l−1 of eugenol. All rescue experiments were performed on at least two clutches. All animal experiments were performed with the approval of the University of Toronto Animal Care Committee in accordance with the guidelines from the Canadian Council for Animal Care (CCAC).

mRNA and morpholino microinjection in zebrafish

All oligonucleotides used in zebrafish strains are listed in Supplementary Table 8. Microinjection into the cell (mRNA) or the yolk syncytial layer (morpholino) before the four-cell stage was done using pulled (P-97; Sutter Instrument) glass capillary tubes (TW100F-4; World Precision Instruments). Unfertilized eggs or embryos stalled during gastrulation were removed at 12 hpf. WT and CFAP20R100C variant mRNAs were transcribed from linearized pCS2+ vectors. The WT-containing plasmid was a gift from W. T. Allison. The R100C variant sequence was ordered as a gBlock (Integrated DNA Technologies) and directionally cloned into pCS2+ using BamHI or XbaI restriction enzymes. In vitro transcription was performed using the SP6 mMessage mMachine kit (Thermo Fisher Scientific, AM1340), followed by phenol:chloroform purification. A dose response using the WT mRNA diluted with ddH2O (into cfap20+/− incross clutches) was performed (25, 100 pg) to ensure that a rescue efficiency higher than 90% was achieved (data not shown). Embryos from cfap20+/− incrosses were microinjected with WT or R100C CFAP20 mRNA and larvae were then raised to 48 hpf and groups were blinded. Larvae were scored on the basis of a straight extension of the anterior–posterior axis (normal) or ventral curling of the body (curvature). Embryos were then processed for DNA lysis and genotyped as below, and groups were then unblinded. Only scores from cfap20−/− homozygotes were analysed. Standard control and ccnc splice morpholino oligonucleotides (Gene Tools), as in a previous study51, were used to knock down ccnc. A dose response of 1, 2 and 4 ng MO was performed in AB incross (2 clutches; more than 20 animals) as in Extended Data Fig. 5f and larvae were scored at 48 hpf on the basis of the severity of the phenotype. An optimal dose of 1.5 ng was chosen for subsequent experiments. Cfap20+/− heterozygote incross embryos were injected, groups were blinded and larvae were raised to 48 hpf. Larvae were scored (as above) on the basis of anterior–posterior curvature (2 clutches; more than 50 animals), then processed for genotyping before unblinding. Only scores from injected or uninjected cfap20−/− homozygotes were analysed.

Zebrafish cfap20 genotyping

A genomic DNA template for PCR was generated by adding tissue to 50 µl 50 mM NaOH, heating at 95 °C for 20 min and then neutralizing with 5.5 µl 1 M Tris-HCl. The template was diluted 50-fold and PCR genotyping was performed using GoTaq 2 (Promega). Primer sequences are listed in Supplementary Table 8.

Microscopic analysis of zebrafish larvae

Larval zebrafish were anaesthetized as above and transferred to 1% agar-lined Petri dishes for imaging. Representative bright-field images were taken using ZEN 3.7 (Zeiss) at 32× magnification on a Lumar V12 (Zeiss) stereomicroscope with an Axiocam 712 mono (Zeiss) camera. All graphing of and statistical tests on zebrafish data were done in Prism 10 (GraphPad), as described in Supplementary Table 9. The absolute number of normal versus axis-curvature defects was compared statistically using Fisher’s exact test. Raw images were cropped, and brightness and contrast were adjusted in Photoshop 2024 (Adobe). Identical transformations were performed on control and experimental images.

Western blotting

Proteins were separated on 4–12% Criterion XT Bis-Tris gels (Bio-Rad, 3450124) in NuPAGE MOPS running buffer (Thermo Fisher Scientific, NP0001–02), or on 3–8% Criterion XT Tris-Acetate protein gel (Bio-Rad, 3450131) in Tris/Tricine/SDS Running Buffer (Bio-Rad, 1610744), followed by blotting onto PVDF membranes (EMD Millipore, IPFL00010). Membranes were blocked with 5% milk powder in phosphate-buffered saline (PBS) with 0.1% Tween for one hour at room temperature. Protein expression was analysed by immunoblotting with the designated primary antibodies (listed in Supplementary Table 7) and corresponding secondary antibodies at 1:10,000. For detection, the Odyssey infrared imaging scanning system (LICORbio) was used.

Immunoprecipitation

Cell pellets were solubilized in EBC-1 (50 mM Tris, pH 7.5, 150 mM NaCl, 0.5% NP-40 and 2 mM MgCl2 with protease inhibitor cocktails (Roche)) supplemented with 500 U benzonase for one hour at 4 °C under rotation. The lysates were cleared from insoluble chromatin by centrifugation and were subjected to immunoprecipitation with GFP Trap beads (Chromotek, GTA-200) for 1.5 h at 4 °C under rotation. The beads were then washed four to six times with EBC-2 buffer (50 mM Tris, pH 7.5, 150 mM NaCl, 0.5% NP-40 and 1 mM EDTA) and boiled in Laemmli buffer. Bound proteins were resolved by SDS–PAGE and immunoblotted with the indicated antibodies (Supplementary Table 7). For endogenous immunoprecipitation, 2 µg of antibody was incubated with the samples in EBC-1 buffer and benzonase, and they were subjected to immunoprecipitation with protein A agarose beads (Millipore, 16-157).

Mass-spectrometry sample preparation

After pull-down, the GFP beads were washed three times with 50 mM ammonium bicarbonate, followed by overnight digestion using 2.5 μg trypsin at 37 °C under constant shaking. Digested peptides were separated from the beads by a 0.45-µm filter column (Meck, UFC30HV00) that was prewashed with 50 mM ammonium bicarbonate. Trypsin activity was quenched by acidifying the sample with trifluoroacetic acid to a final concentration of 1%. Peptides were desalted and concentrated using in-house assembled triple-disc C18 stage-tip columns (serial number 66883-U; Sigma-Aldrich) as previously described52.

Mass-spectrometry data acquisition

The GFP–CCNC and GFP–CCNC(D182A) samples with their corresponding GFP-NLS controls were analysed by on-line C18 nano-high performance liquid chromatography (HPLC) MS/MS with a system consisting of an UltiMate3000 nano gradient HPLC system (Thermo Fisher Scientific) and an Exploris480 mass spectrometer (Thermo Fisher Scientific). Digested peptides were injected onto a cartridge precolumn (300 μm × 5 mm, C18 PepMap, 5 μm) in 100% solvent A (0.1 % formic acid in milli-Q), with a flow of 10 μl per min for 3 min (Thermo Fisher Scientific), and eluted using a homemade analytical nano-HPLC column (30 cm × 75 μm; Reprosil-Pur C18-AQ 1.9 μm, 120 A (Dr. Maisch). The chromatography gradient length was 60 min from 2% to 40% solvent B, followed by a 5-min increase to 95% solvent B, another 5 min of 95% solvent B and back to 2% solvent B for chromatography column reconditioning. The mass spectrometer was operated in positive polarity data-dependent MS/MS mode with a cycle time between master scans of 3 s. Full-scan MS spectra were obtained with a resolution of 60,000, a normalized automatic gain control (AGC) target of 300% and a scan range of 350–1,600 m/z. Precursors were fragmented by higher-energy collisional dissociation (HCD) with a normalized collision energy of 28%. Tandem mass spectra (MS/MS) were recorded with a resolution of 30,000 and a normalized AGC target value of 75%. Precursor ions selected for MS/MS analysis were subsequently dynamically excluded from MS/MS analysis for 30 s and only precursors with a charge state of 2–6 triggered MS/MS events.

Mass-spectrometry data analysis

RAW data were analysed using MaxQuant (v.1.6.14.0) as previously described53,54.

Mass-spectrometry data availability

The mass-spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE55 partner repository with the dataset identifier PXD051449 (GFP–CFAP20(R100C) and GFP–CCNC sample sets).

CRISPR screens

For every screen, three populations of RPE1-iCas9 were transduced at a multiplicity of infection (MOI) of around 0.2 with a 1:1,000 dilution of TKOv3-in-pLCKO lentiviral library in medium containing 8 µg ml−1 hexadimethrine bromide (Sigma-Aldrich). The library was a gift from K. Chan, A. Tong and J. Moffat. Twenty-four hours after transduction, puromycin (Sigma-Aldrich) was added to 5 μg ml−1 to select for transduced cells. After all cells in non-transduced control populations had died and dishes with transduced populations had reached 90% confluence, a t = 0 sample was taken for each of the three populations. From the remaining cells of each population, 30 × 106 (corresponding to a library representation of more than 400) were grown as a control population. To screen for replication stress genes, RPE1-iCas9 parental cells were grown in the presence of the DNA pol α inhibitor CD437 at a concentration of 200 nM. The illudin S screen has been described previously17. To screen for synthetic-viable genes, RPE1-iCas9 CFAP20-KO cells were grown without drugs or inhibitors. DOX was added to the medium of all replicates from t = 0 onwards to induce expression of Cas9, at a concentration of 200 ng ml−1. After 3 doublings, 30 × 106 cells of each population were passed. After 12 doublings, all populations were collected.

Sequencing and analysis of CRISPR screens

Genomic DNA was isolated from each population using the Blood and Cell Culture DNA Maxi Kit (QIAGEN). Then, 3 µg of gDNA from each population was amplified using the KAPA HiFi ReadyMix PCR Kit (Roche) with the TKO outer Fw and Rv primers (primers are listed in Supplementary Table 5), followed by a second PCR reaction using reverse primers with different Illumina i7 index sequences for each sample to identify the sample after pooled sequencing as described56. The second PCR products of each pool were purified using the QIAquick PCR Purification Kit (QIAGEN). Samples were sequenced on a NovaSeq 6000 and reads were mapped to the TKOv3 library sequences, not allowing any mismatches. To compare the illudin S to the CD437 screen (Fig. 1f), the lowest z-score for each screen was normalized to −1 (sensitizer genes:UVSSA for illudin S; HUS1 for CD437), and the highest score was normalized to +1 (resistance genes: PTGR1 for illudin S; CDAN1 for CD437). The synthetic-lethal and synthetic-viable interactions were analysed by comparing the CFAP20-KO line with the parental RPE1-iCas9 WT by first normalizing end-point reads based on t = 0 reads, as described previously50. We used an adapted version of DrugZ, termed IsogenicZ, which can be found at https://github.com/kdelint/IsogenicZ.

Immunostaining

Cells were grown on coverslips and fixed with 4% formaldehyde. By incubating with 0.5% Triton X-100 in PBS for 5 min, cells were permeabilized, followed by blocking with 100 mM glycine for 10 min. After washing with WB buffer (0.5% bovine serum albumin (BSA) and 0.05% Tween 20 in PBS), coverslips were incubated with the primary antibody (Supplementary Table 7) in WB buffer for two hours at room temperature. Cells were then washed extensively and labelled with their corresponding secondary antibody (Supplementary Table 7) in WB buffer containing 0.1 μg ml−1 DAPI for one hour at room temperature. Finally, the coverslips were washed extensively with PBS and mounted in Polymount (Brunschwig, 18606).

Immunostaining for detection of RNA–DNA hybrids

Indirect immunofluorescence with S9.6 antibody against RNA–DNA hybrids was performed as previously described57. Imaging of RNA–DNA hybrids using GFP–RNaseH1(D210N) was performed as described previously31.

Recovery of RNA synthesis

Cells were irradiated with UV-C light (12 J m−2), allowed to recover for the indicated periods and pulse-labelled with 400 μM 5-ethynyl-uridine (EU; Jena Bioscience) for one hour, followed by a 15 min medium-chase with DMEM without supplements. Cells were fixed with 3.7% formaldehyde in PBS for 15 min, permeabilized with 0.5% Triton X-100 in PBS for 10 min at room temperature and blocked in 1.5% BSA (Thermo Fisher Scientific) in PBS. Nascent RNA was visualized by click-iT chemistry, labelling the cells for one hour with a mix of 60 μM Atto azide–Alexa 594 (ATTO-TEC), 4 mM copper sulfate (Sigma), 10 mM ascorbic acid (Sigma) and 0.1 μg ml−1 DAPI in a 50 mM Tris-buffer (pH 8). Cells were washed extensively with PBS and mounted in Polymount (Brunschwig).

Microscopic analysis of fixed human cells

Images of fixed samples were acquired on a Zeiss AxioImager M2 wide-field fluorescence microscope equipped with 63× Plan-Apo (1.4 NA) oil-immersion objectives (Zeiss) and an HXP 120 metal-halide lamp was used for excitation. Fluorescent probes were detected using the following filters for DAPI (excitation filter, 350/50 nm; dichroic mirror, 400 nm; emission filter, 460/50 nm), Alexa 488 (excitation filter, 470/40 nm; dichroic mirror, 495 nm; emission filter, 525/50 nm) or Alexa 647 (excitation filter, 640/30 nm; dichroic mirror, 660 nm; emission filter, 690/50 nm). Images were recorded using ZEN 2012 (blue edition, v.1.1.0.0) and analysed in Image J (v.1.47–1.48). Graphs were plotted and analysed using GraphPad Prism 10 (10.2.3), Microsoft Excel 365 and Adobe Illustrator 2022, as described in Supplementary Table 9.

Quantitative image-based cytometry

Quantitative image-based cytometry was performed as described previously58. Colour-coded scatter plots and bar charts of asynchronous cell populations were generated with Spotfire data visualization software (v.10.10.1; TIBCO). Representative scatter plots and bar charts are shown.

Pairwise fluorescent competitive growth assay

Cell lines stably expressing either GFP or mCherry were seeded in a 1:1 ratio (30,000 cells per 6 wells). Cells were grown as usual and split every three days. During trypsinization, samples were taken at each time point. Cell pellets were washed with PBS followed by incubation in 2% formaldehyde in PBS for 15 min. Samples were quenched with glycine, washed with PBS, fixed in ice-cold methanol and stored at −20 °C. On the day of analysis, pellets were washed once with PBS and resuspended in 350 µl PBS. An AECE NovoCyte flow cytometer and NovoExpress software (Agilent) were used for analysis. For immunostaining, cells were grown simultaneously on coverslips and fixed in 4% formaldehyde at the corresponding time points. After permeabilization with 0.5% Triton X in PBS, cells were mounted with ProLong Gold Antifade Mountant with DNA Stain DAPI (Invitrogen, P36935).

Clonogenic growth assays

Cells were plated at low density in 6-cm culture dishes and allowed to attach, and were grown for ten days in growth medium supplemented with the indicated concentrations of the drugs. To visualize clones, cells were fixed with NaCl and stained with methylene blue. Formed clones were manually counted.

CellTiter-Glo assays

In a Costar black, clear-bottom 96-well plate, cells were seeded (WT and CFAP20-KO, 200 per well; BRCA1-KO, 400 per well) in medium containing increasing doses of olaparib or dimethyl sulfoxide (DMSO; 0.1% final DMSO concentration). Wells with no cells were included as a background luminescence control. After six days, the viability measurement was performed according to the manufacturer’s protocol. In brief, CellTiter-Glo substrate was dissolved in CellTiter-Glo buffer (Promega), and 100 µl of this was added to 100 µl fresh medium per well. The plate was briefly shaken and after equilibration, luminescence was recorded on a SpectraMax iD3 microplate reader (Molecular Devices). Luminescence values were corrected for background and for each cell line, normalized to wells treated with DMSO. Data were exported to GraphPad Prism 9.3.1 for further analysis.

DNA fibre spreading assay

Treatments with different compounds are shown in each experiment. Cells were labelled with 25 µM 5-chloro-2’-deoxyuridine (CldU) (Merck; Supplementary Table 2) for 20 min and washed three times with PBS, followed by labelling with 250 µM IdU (Merck; Supplementary Table 3) for 20 min. Labelled cells were collected and resuspended in 1× cold PBS. Two microlitres of the cell suspension was spotted on a positively charged slide (VWR) and then mixed with 7 µl of lysis buffer (200 mM Tris-HCl pH 7.4, 50 mM EDTA and 0.5% (w/v) SDS). The cells were incubated in lysis buffer horizontally for 5 min and then tilted at about 45°, allowing the drop to run by gravity. The DNA spreads were air-dried at room temperature and were then fixed in methanol/acetic acid (3:1) at room temperature for 10 min and stored at 4 °C overnight. Slides were processed as previously described59. Fibres were visualized and imaged using a Zeiss Axio Imager-M2 wide-field fluorescence microscope equipped with 40× Plan-Apo (1.4 NA) oil-immersion objectives (Zeiss) and an HXP 120 metal-halide lamp was used for excitation. Images were recorded and analysed with ZEN 2012 (blue edition, v.1.1.0.0) and analysed in Image J (v.1.53). Replication-fork speed (kb min−1) was calculated on the basis of the assumption that 1 µm of DNA fibre corresponds to 2.59 kb, as previously shown60.

DNA fibre assay with S1 nuclease

For the DNA fibre assay with the ssDNA nuclease (S1 nuclease), cells were labelled with 25 µM CldU for 15 min, washed three times with PBS and labelled again with 250 µM IdU for one hour. Cells were treated and processed as previously shown42,59.

scEdU–seq

The scEdU–seq procedure was performed according to a method described previously21. RPE1 WT and CFAP20-KO were labelled with 15-min pulses of EdU (10 μM). The cells were trypsinized, fixed in 70% ethanol and kept at −20 °C for 24 h. Then, the samples were resuspended and washed in 1 ml wash buffer (47.5 ml RNAse-free H2O, 1 ml 1 M HEPES pH 7.5, 1.5 ml 5 M NaCl, 3.6 µl pure spermidine solution, with an additional 0.05% Tween, and 4 µl ml−1 0.5 M EDTA). Next, biotin-PEG3-azide was conjugated to the EdU molecules through a CuAAC click reaction, followed by staining with DAPI. Single S-phase RPE1 cells were then sorted into 384-well plates for scEdU–seq processing. After sorting, libraries were prepared as follows: proteinase K digestion, NlaIII genome digestion, DNA blunt ending, A-tailing and adapter ligation incorporating cell barcodes and unique molecular identifiers (UMIs). Single-cell libraries were pooled and bound to MyOneC1 streptavidin beads to capture DNA replication fragments. These fragments were released by heat denaturation and filled in using the Klenow enzyme. The libraries underwent amplification through in vitro transcription, reverse transcription and PCR, followed by Illumina sequencing (NextSeq1000 P3 2×100 bp). The code for analysis and plotting can be accessed on GitHub21.

DRIP–qPCR

Approximately 1 ×107 cells per condition were lysed in 1.6 ml TE buffer supplemented with 82 μl of 10% SDS and 10 μl of 10 mg ml−1 proteinase K and incubated at 37 °C overnight. DNA was isolated by phenol:chloroform:isoamyl alcohol (25.24:1, v/v) extraction and isopropanol precipitation. DNA was reconstituted in 130 μl TE buffer, transferred to AFA microTUBEs with snap caps and sonicated for 4 min using a Covaris E220 sonicator (140 peak incident power, 10% duty factor and 200 bursts per cycle). Sonicated DNA was quantified on a NanoDrop 2000c spectrophotometer. For immunoprecipitation, 4 μg of DNA was resuspended in 150 μl 1× binding buffer (10 mM Na3PO4 pH 7, 140 mM NaCl and 0.05% Triton X-100), 10% removed as input DNA and the remaining sample bound to 6 μg of S9.6 antibody in 1× binding buffer overnight at 4 °C. Protein A/G agarose beads were added for two hours. Bound beads were washed three times in 1× binding buffer for 10 min at 4 °C. Elution was performed in elution buffer (50 mM Tris-HCl pH 8, 10 mM EDTA, 0.5% SDS and proteinase K) for 45 min at 55 °C with agitation. Eluted DNA was purified by phenol:chloroform:isoamyl alcohol (25.24:1, v/v) extraction and ethanol precipitation. Enrichment analysis of RNA–DNA hybrids in input and immunoprecipitation samples was performed by qPCR using the primers listed in Supplementary Table 6.

DRIP–seq

DRIP–seq was performed as previously described61 with minor modifications. Samples were sequenced using an Illumina NextSeq500 or HiSeq X, using paired-end sequencing with 42 bp or 151 bp from each end.

BrU–seq

Cells were grown to 80–90% confluency in three 15-cm plates per condition and incubated for 30 min with 2 mM BrU (Sigma, 850187). After incubation, cells were lysed in Trizol (Thermo Fisher Scientific, 15596018) and BrU-containing RNA was isolated as previously described62. cDNA libraries were made from the BrU-labelled RNA using the Illumina TruSeq library kit and paired-end 151-bp sequenced using the Illumina NovaSeq platform at the University of Michigan Advanced Genomics Core. Single-end or paired-end sequencing data were used for downstream analyses.

ChIP–seq

Cells were grown to 80–90% confluency and cross-linked with 0.5 mg ml−1 disuccinimidyl glutarate (Thermo Fisher Scientific) in PBS for 45 min at room temperature. Cells were washed once with PBS, followed by incubation with 1% formaldehyde for 20 min at room temperature. Fixation was stopped by adding glycine in PBS to a final concentration of 0.1 M for 3 min at room temperature. This was followed by washing with cold PBS and collection of the cells in 0.25% Triton X-100, 10 mM EDTA (pH 8.0), 0.5 mM EGTA (pH 8.0) and 20 mM HEPES (pH 7.6) in milli-Q. Chromatin was pelleted by centrifugation for 5 min at 400g and incubated in 150 mM NaCl, 1 mM EDTA (pH 8.0), 0.5 mM EGTA (pH 8.0) and 50 mM HEPES (pH 7.6) in milli-Q for 10 min at 4 °C. Chromatin was again pelleted by centrifugation and resuspended in ChIP buffer (0.15 % SDS, 1% Triton X-100, 150 mM NaCl, 1 mM EDTA (pH 8.0), 0.5 mM EGTA (pH 8.0) and 20 mM HEPES (pH 7.6) in milli-Q) to a final concentration of 15 × 106 cells per ml. Chromatin was sonicated to approximately one nucleosome using the Bioruptor Pico (Diagenode), with 8–15 cycles of 30 s on and 30 s off in a 4 °C water bath. RNAPII ChIP was performed using 28 µg of chromatin (22 µg for CCNC-KO and CFAP20/CCNC-dKO) + 40 ng of Drosophila spike-in chromatin (32.6 ng for CCNC-KO and CFAP20/CCNC-dKO; Active Motif, 53083) with 3 µl of RNAPII antibody and 1 µg spike-in antibody (Supplementary Table 6) by overnight incubation at 4 °C. TY ChIP was performed using 84 µg of WT chromatin and 60 µg of CFAP20-KO + TY-CFAP20 chromatin + 74 ng and 53 ng of Drosophila spike-in chromatin, respectively (Active Motif, 53083) with 5.7 µg of TY antibody (Diagenode, C15200054) and 1 μg spike-in antibody (Active Motif, 61686) (Supplementary Table 6) by overnight incubation at 4 °C. Protein–chromatin pull-down followed, with a 1:1 mix of protein A and protein G Dynabeads for RNAPII ChIPs, and protein A Dynabeads for TY ChIPs (Thermo Fisher Scientific, 10001D and 10003D). ChIP samples were washed extensively and purified using the QIAGEN MinElute kit. Sample libraries were prepared using the HiFi KAPA sample preparation kit and A–T-mediated ligation of NEXTflex adapters or xGen UDI-UMI adapters. Samples were sequenced using an Illumina NextSeq500 or HiSeq X, using paired-end sequencing with 42 bp or 151 bp from each end.

TTchem–seq

TTchem–seq was performed as described previously63. For TTchem experiments in WT or CFAP20-KO cells, this included depletion of rRNAs using the QIAseq FastSelect rRNA depletion kit (QIAGEN), followed by library preparation using the TruSeq Stranded Total RNA kit (Illumina, 20020596). For CFAP20-KO cells expressing WT GFP–CFAP20 or GFP–CFAP20(R100C), no ribosomal RNA was performed. The libraries were amplified according to the manufacturer’s instructions, pooled and paired-end sequenced on a DNBSEQ-G400 (BGI) system.

Definition of replication origins

OK–seq data in untreated RPE1 cells were downloaded from a previous report5 (datasets GSM3130725 and GSM3130726). Sequences were trimmed using TrimGalore (v.0.6.5) and aligned to hg38 using STAR (v.2.7.7a) with the genome file GCA_000001405.15_GRCh38. Duplicate reads were removed using SAMtools (v.1.11) with fixmate -m and markdup -r settings. Replication initiation zones were subsequently defined using the replication fork directionality analysis R toolkit (OKseqHMM v.2.0.0; available at https://github.com/CL-CHEN-Lab/OK-Seq; ref. 64), with read coverage threshold 6 for GSM3130725 and 1 for GSM3130726, and smoothing window size 15 kb. Initiation zones present in both datasets were identified using mergePeaks of HOMER tools (v.4.8.2)65, with -d given. Origins were defined as the centre of initiation zones. For all origins, their nearest TSS was defined using annotatePeaks of HOMER tools, together with the distance between the TSS and the origin. Here, a negative distance represents an origin upstream of the TSS (CD transcription relative to replication), whereas a positive distance represents an origin downstream of a TSS (HO transcription relative to replication). To allow for clean transcription versus replication analyses, we further selected only origins for which the nearest TSS was not preceded by another gene within 5 kb upstream of the TSS (Extended Data Fig. 2a). This resulted in a list of 2,040 origins.

ChIP–seq, DRIP–seq, BrdU–seq and TTchem–seq data analysis

For all sequencing data, a sequencing quality profile was generated using FastQC (v.0.11.9). Sequences were trimmed using TrimGalore (v.0.6.5). For ChIP–seq, reads were aligned to the human genome 38 GCA_000001405.15_GRCh38 and Drosophila genome BDGP6 using bwa-mem tools (BWA, v.0.7.17)66. For DRIP–seq, reads were aligned to the human genome 38 GCA_000001405.15_GRCh38 using bwa-mem tools (BWA, v.0.7.17)66. Only uniquely or primary mapping and high-quality reads (>q30) were included in the analyses. For BrU–seq and TTchem–seq, reads were aligned to hg38 using STAR (v.2.7.7a)67 with the genome file GCA_000001405.15_GRCh38. For ChIP–seq, BrU–seq and TTchem–seq data, duplicate reads were removed using SAMtools (v.1.11) with fixmate -m and markdup -r settings. Bam files were converted into stranded TagDirectories (with fixed fragment length 150–200 when automated fragment length definitions varied extensively) and UCSC genome tracks using HOMER tools (v.4.8.2)65. Example genome tracks were generated in IGV (v.2.4.3). A list of 2,040 origin coordinates was defined using data derived from a previous study5, as described in ‘Definition of replication origins’. A list of 49,948 gene coordinates was obtained from the UCSC genome database selecting the ‘knownCanonical’ table containing the canonical TSSs per gene68. To prevent contamination of binding profiles, genes were selected to be non-overlapping with at least 2 kb between genes and a minimal size of 3 kb (n = 9,944). From this, a set of 3,000 actively transcribed genes was selected by calculating gene-size-corrected read densities of BrU–seq data in WT cells, using the AnnotatePeaks.pl tool of HOMER with default settings. These 3,000 actively transcribed genes were used in downstream analyses, unless stated otherwise. For all DRIP–seq, ChIP–seq, BrU–seq and TTchem–seq experiments, read-density profiles around origin or TSS/TTS coordinates were defined using the AnnotatePeaks.pl tool of HOMER, using the default normalization to 10 million reads. For ChIP–seq experiments around transcribed genes, reads were normalized to the number of identified spike-in reads. Individual datasets were subsequently processed into heat maps or binding profiles using R (v.4.0.5) and Rstudio (v.1.1.423)69. Where indicated, average read-density profiles were generated after trimming 10% of the data (trim-mean 0.1; removing the top 5% and bottom 5% of datapoints) to remove extreme values.

Metaprofiles of TSSs in CD and HO orientations

We aligned TSSs with either a negative distance (CD) or a positive distance (HO) relative to the nearest origin. We subsequently generated average read-density profiles of RNAPII ChIP–seq, BrU–seq and DRIP–seq for all 1,395 CD and 408 HO genes at a maximum distance of 75 kb from the origin. We also sub-selected HO TSSs into those at 75–50 kb (n = 37), 50–25 kb (n = 80) or 25–0 kb (n = 291) upstream of the origin, and CD TSSs into those at 0–25 kb (n = 1,199), 25–50 kb (n = 143) or 50-75 kb (n = 53) downstream of the origin. These analyses provide a transcription-centred view of replication.

Statistics and reproducibility

Experimental data were plotted for statistical analysis in GraphPad Prism 10.2.3 (GraphPad). In figures showing all data points, each coloured circle represents a single cell, and the black circles represent the median of each independent biological repeat—of which there were at least two—as indicated for each experiment. More information on the n of each experiment is provided in the source data. Statistical analyses were performed on the median of each independent biological repeat per experiment using one-way ANOVA after Dunnett’s or Šidák’s correction where appropriate, unpaired two-tailed t-test or Fisher’s exact test, as indicated in the figure legends. All experiments were independently repeated at least twice, with similar results obtained. All micrographs are representative images of experiments that were performed at least twice, with similar results. In the figures, the notation NS = P > 0.05, *P < 0.05, **P < 0.01, ***P < 0.001 and ****P < 0.0001 is used, and precise P values are provided in the figure legends and the source data.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.