Introduction

Commensurate with their vast functional diversity, natural proteins fold into a broad range of structures. Small proteins, which have principally been used to study folding mechanisms1, fold reversibly and quickly in vitro. In contrast, multi-domain proteins exhibit a high propensity to misfold in vitro, highlighting the importance of early folding steps that begin before synthesis is complete2. These co-translational folding steps are often modulated by interactions with cellular machinery, including the ribosome and molecular chaperones, to achieve robust and efficient folding in vivo3,4,5,6.

Before emerging outside the ribosome, nascent proteins can already form structures inside the polypeptide exit tunnel that spans the large ribosomal subunit. Contacts with ribosomal proteins and ribosomal RNA lining the tunnel walls can stabilize such structures7,8. These interactions can, in turn, affect ribosome activity, causing elongation arrest. Specialized sequences termed “arrest peptides” (APs) exploit this type of interaction to regulate ribosome activity in cis9,10. More extensive folding takes place once the nascent chain is extruded from the tunnel. Interactions with the ribosome surface and molecular chaperones then modulate folding rates11,12 and structural stability13,14,15. Thus, during their slow and vectorial synthesis, nascent proteins transition through continuously changing folding environments. Modulation of folding kinetics may delay folding until sufficient polypeptide has been extruded for productive folding11. Thermodynamic effects could destabilize misfolded species12 or promote folding16. How these interactions benefit co-translational folding is just beginning to be understood. Experimental studies are needed to define principles of co-translational folding.

Biochemical and biophysical in vitro approaches have been invaluable for defining co-translational folding pathways. However, they cannot fully capture the networks of interactions that modulate nascent chain folding in the cytosol. Ribosome profiling experiments, combined with immunoprecipitation (IP), have been used to assess nascent chain folding in vivo17,18, but may not capture dynamic chaperone-nascent chain interactions. NMR spectroscopy has been used to detect ribosome-nascent chain interactions in live cells19. However, the approach is technically very demanding, and it has not yet been possible to resolve nascent polypeptide folding on the ribosome in cells. New experimental approaches are needed to define co-translational folding pathways in the cellular environment. We have previously shown that mechanical force generated by nascent protein folding can be coupled to reporter expression through the use of a force-sensing AP20,21.

Here, we combine AP-based detection of folding with cell sorting and deep sequencing into an assay that we term “Arrest Peptide Profiling” (AP Profiling). We show that related GTPase domains fold co-translationally along distinct pathways that are likely determined by differences in their topologies. Genetic ablation of nascent chain-binding chaperones reveals co-localized but distinct effects on co-translational folding, explaining how unrelated chaperone systems achieve functional redundancy. The approach resolves folding and chaperone action in individual domains and in large, multi-domain proteins. AP Profiling thus establishes a scalable platform for defining co-translational folding events in live cells at high throughput and codon resolution.

Results

Ratiometric fluorescence measurements report on co-translational folding in live cells

The 17 amino acid AP from the SecM protein arrests translation elongation by forming specific interactions with the polypeptide exit tunnel inside the ribosome9,10. Mechanical force accelerates arrest release, and elongation resumes20 (Fig. 1a). Coupling arrest release to translation of a downstream reporter thus affords an opportunity for sensitive and quantitative detection of nascent chain folding in cells20,21,22. So far, studies employing AP-based nascent chain folding have been carried out with limited throughput, requiring the generation, isolation and individual expression of candidate constructs, one for each nascent chain length at which folding is assessed. To overcome this limitation, we developed “AP Profiling” to effectively scale up the AP reporter assay.

Fig. 1: Arrest peptide-mediated detection of co-translational folding in cells.
figure 1

a Co-translational folding near the ribosome surface generates force and resumes translation arrested by the SecM arrest peptide (AP). b The candidate protein is fused to the SecM AP and GFP. Candidate folding results in increased GFP synthesis. The coding sequence of mCherry under control of a separate promoter on the same plasmid serves as an internal control. c Cytometry results of the individual reporter constructs. A non-folding truncation of EF-G (ncodons = 212) shows low GFP signal, whereas a variant for which stable folding is expected (ncodons = 331) shows medium to high GFP signals. An arrest-defective AP variant (APmut) shows even higher GFP signal, delineating the upper boundary of the reporter readout. d Histogram of the GFP and mCherry signal ratios from the different variants. Imposing boundaries along constant log10(GFP/mCherry) ratios illustrates how the distribution of reporter signal can be reconstructed from cell sorting counts. Source data are provided as a Source Data file.

We designed AP Profiling constructs (Fig. 1b) that encode the candidate protein fused in frame to the SecM AP9 and monomeric superfolder GFP (msGFP)23, which folds and matures quickly and robustly. Expression of this open reading frame (ORF) is controlled by an arabinose promoter (ParaBAD)24. On the same plasmid, expression of a separate ORF encoding mCherry25 is driven by a separate ParaBAD promoter. Each individual AP Profiling plasmid thus encodes two fluorescent proteins: mCherry, which serves as an internal control for variation in overall expression levels, and msGFP, which reports on arrest release and, thus, on folding. This dual reporter system enables ratiometric fluorescence measurements in live cells for robust and sensitive measurements of nascent chain folding.

To validate the new reporter construct, we expressed individual constructs with the G-domain of EF-G (referred to here as GEF-G) as the candidate protein. GEF-G folding has previously been investigated in vitro12,26 and in vivo21. In AP-based measurements, maximal arrest release was observed at nascent chain lengths of around 330 amino acids (aa)21, in line with detection of the natively folded domain at a similar nascent chain length in single-molecule experiments12. Shorter chain lengths are not expected to fold21. We individually cultured E. coli cells transformed with plasmids encoding two distinct nascent chain lengths, 212 and 331 amino acids, corresponding to arrest positions on the mRNA of ncodons = 212 and ncodons = 331 (Fig. 1c).

After expression, GFP and mCherry fluorescence of individual cells were measured by flow cytometry. The fully extruded G-domain (ncodons =  331; Fig. 1c, red) exhibited GFP intensities ~100-fold higher than those observed for the partially synthesized GEF-G (ncodons = 212, Fig. 1c, orange), while both constructs exhibited similar mCherry intensities. This result is in line with expectations for these nascent chain lengths21, indicating that the reporter system developed here accurately reports on co-translational folding. To determine the upper limit of the dynamic range of the assay, we ablated arrest by replacing the wild-type SecM sequence with an arrest-defective mutant10 (APmut). Ablating arrest resulted in a approximately tenfold increase in GFP fluorescence compared to the ncodons = 331 construct (Fig. 1c, green). Therefore, arrest release by GEF-G is well within the dynamic range of the assay, which spans about three orders of magnitude in GFP fluorescence intensity from the low (ncodons  = 212) to the high (APmut) end.

Variations in experimental conditions, such as plasmid copy number or overall transcriptional activity, are likely to affect the overall accumulation of msGFP and mCherry. However, their ratio is expected to be relatively constant for a given construct. The logarithm of the measured fluorescence intensity ratio, log(IGFP/ImCherry), is thus an appropriate measure of arrest release27. For the individual constructs shown in Fig. 1c, raw counts of cells with a given log(IGFP/ImCherry) ratio approximated a normal distribution (Fig. 1d, gray lines). We then grouped the experimental data into bins of equal log(IGFP/ImCherry) intervals (diagonal gates 1–12 in Fig. 1c; bars in Fig. 1d). The cell counts across bins also approximated a normal distribution, with a mean very similar to that of the raw counts (colored lines in Fig. 1d). Thus, peak position can be retrieved from gated cell counts, and it should be possible to develop a multiplexed version of the AP assay by combining dual reporter expression with cell sorting and deep sequencing, as described in the following section.

AP Profiling quantifies co-translational folding with codon resolution

Based on the results described above, we designed an AP assay with increased throughput (Fig. 2a). We generated a library of GEF-G constructs by time-dependent exonuclease digestion28. We optimized this approach to yield a uniform distribution of truncations along candidate ORFs. Out-of-frame constructs, which are expected to constitute approximately two-thirds of clones within the library, are terminated at stop codons before or within the SecM coding sequence and thus do not yield GFP-positive clones. Our sorting scheme eliminates these constructs (see Methods for details). Expressing the library in E. coli cells yielded GFP intensities spread over three orders of magnitude, whereas most mCherry intensities were more narrowly distributed (Fig. 2a). We subjected the cultures to fluorescence-activated cell sorting (FACS) using 12 sorting gates defined along boundaries of constant log(IGFP/ImCherry) values (Fig. 2a and Methods). To determine the distribution of each construct from a mixture of clones in the sorting gates, we adopted a “sort-seq” scheme27 to identify library members by their 3’-terminal sequence. Employing a custom deep-sequencing workflow, we determined the relative abundance of individual library members in each sorting gate. From the distribution of each truncation variant across sorting gates, we calculated an “AP score” for each clone in the library (see Methods). While short truncations are not well resolved (Methods), the approach yielded good coverage in the range 85 ≤ ncodons ≤ 400 (Fig. S1).

Fig. 2: AP Profiling resolves folding at codon resolution.
figure 2

a A truncation library of the G-domain from EF-G (GEF-G) is inserted into the AP Profiling construct. After library expression, cells are sorted into gates by their log10(GFP/mCherry) ratio. Deep sequencing of individual gates yields the distribution of each clone in the library. b AP scores (top panel) for wild-type GEF-G (red) and a destabilized variant (F/A mutant, gray). AP scores are mostly similar, except for the region around ncodons = 330, where signals for the wild-type sequence are higher than for the mutant, indicating folding into a stable tertiary structure. The folding score (bottom panel), calculated as the difference between wild-type and F/A mutant scores, reveals the position of tertiary folding around ncodons = 330. c Single-molecule optical tweezers experiments with ribosome-nascent chain complexes show the unfolding signature of wild-type GEF-G, whereas the F/A mutant lacks a clear unfolding transition, indicating that it is not stably structured. Source data are provided as a Source Data file.

AP Profiling analysis of GEF-G revealed a strong folding event with a maximum in AP scores centered around ncodons = 330, a position where the complete domain has been extruded from the ribosome (Fig. 2b, red, region “ii”). In line with expectations21, short nascent chains (region “i”) and nascent chains beyond extrusion of the G-domain (region “iii”) do not exhibit high AP scores. However, we did observe substantially elevated AP scores in the region between ncodons = 230 and ncodons = 320 (Fig. 2b, red), which had not been detected in previous experiments21. This apparent discrepancy likely results from higher sensitivity in AP Profiling (note that AP scores are calculated on a logarithmic scale), compared to a similar assay using a luciferase reporter21. Indeed, the linearly scaled IGFP/ImCherry intensities resemble the folding profile reported previously using a luminescent reporter (Fig. S2).

Single-molecule force spectroscopy experiments with optical tweezers indicated that GEF-G does not form a stable tertiary structure before extrusion of the complete domain21. To determine the origin of elevated AP scores preceding full domain extrusion, we generated a version of GEF-G with perturbed stability. We changed all 8 phenylalanine residues in GEF-G to alanine residues (F/A mutant). Replacing the bulky hydrophobic side chain of phenylalanine with the much smaller one of alanine disrupts packing of the hydrophobic core and greatly destabilizes tertiary structure29. To confirm reduced stability of the GEF-G F/A variant, we carried out single-molecule optical tweezers experiments30 using ribosome-nascent chain complexes (RNCs) (Fig. 2c), in which either the wild-type or the mutant domain were extruded from the ribosome12. When we subjected RNCs with wild-type nascent chains to continuously increasing force (“force ramp”), we obtained force extension curves (FECs) that exhibit an unfolding transition (“rip”) characteristic for GEF-G12,26 (Fig. 2c, red traces). In contrast, the F/A mutant yielded smooth FECs (Fig. 2c, gray traces). The absence of the characteristic rip suggests that the mutant is not stably folded. These experiments confirm that the GEF-G F/A mutant indeed lacks a stable tertiary structure.

We then analyzed the GEF-G F/A mutant with AP Profiling (Fig. 2b, gray). The AP scores around ncodons = 328 (full GEF-G extrusion) were greatly reduced, relative to the WT domain (Fig. 2b, region “ii”). In contrast, the minor peaks in the region of 230 ≤ ncodons ≤ 320 remained largely unchanged (Fig. 2b, compare red and gray). Given that the F/A mutant is structurally unstable (Fig. 2c), these signals must originate from processes other than tertiary structure formation, perhaps including the formation of (transient) secondary structures. By subtracting the AP scores of the F/A mutant from those of the wild-type protein, we calculated a “folding score” (Fig. 2b, bottom panel, black line) that captures the formation of stable tertiary structures. Employing the destabilized mutant thus allows us to specifically detect co-translational formation of stable tertiary structures that represent major waypoints en route to fully folded proteins.

AP Profiling resolves a co-translational folding intermediate

EF-G belongs to a superfamily of translational GTPases (trGTPases) that share a conserved domain topology31. Elongation factor Tu (EF-Tu), a highly abundant and essential trGTPase32,33, shares structural homology with EF-G. The G-domain of EF-Tu (GEF-Tu) is much smaller than that of EF-G (200 vs 294 amino acids) because it lacks the G’ insertion present in GEF-G (Fig. S3).

To investigate how folding of GEF-Tu differs from that of GEF-G, we constructed AP Profiling libraries of wild-type (WT) GEF-Tu and a folding-deficient F/A mutant. GEF-Tu folding scores (Fig. 3a, black) exhibit several peaks along the coding sequence, in contrast to the single peak in GEF-G scores (Fig. 3a, gray; see also Fig. S4). In addition to a strong peak near the position of full domain extrusion from the ribosome (ncodons = 242), GEF-Tu exhibits a prominent signal at shorter chain lengths (centered at ncodons = 198). The AP Profiling analysis thus clearly indicates the formation of a stable tertiary structure before the domain is fully extruded from the ribosome exit tunnel, which sequesters ~40 amino acids. At position ncodons = 198, alpha-helix 5 (ending with Tyr160) has just exited the tunnel, whereas the high folding scores around ncodons = 242 map onto the end of the last alpha-helix in GEF-Tu (Fig. 3b). AP Profiling therefore indicates that GEF-Tu populates a co-translational folding intermediate, in contrast to GEF-G.

Fig. 3: GEF-Tu folds through a co-translational intermediate.
figure 3

a Folding scores obtained from AP Profiling experiments exhibit folding peaks for the partially (around ncodons = 200) and fully (around ncodons = 230) extruded G-domain of EF-Tu (GEF-Tu, black line), indicating the population of a co-translational folding intermediate. The folding score of GEF-G (scaled) is plotted for comparison (gray line). b Folding scores mapped onto the structure of GEF-Tu indicate that the intermediate forms occur when helix 5 is being extruded from the ribosome. c Single-molecule optical tweezers recordings of GEF-Tu nascent chain unfolding show reversible population of an on-pathway intermediate (white arrowhead) before complete unfolding (red arrowhead). d The structural region corresponding to the folding intermediate (red), based on length changes measured in optical tweezers experiments, comprises the N-terminus of GEF-Tu up to helix 5. The unstructured part of the domain in the intermediate is shown as a transparent outline. The corresponding fragment from GEF-G (pink), with identical topology and a very similar structure, is shown for comparison. e Position of the G’ insertion in GEF-G relative to the GEF-Tu folding intermediate. The G’ insertion is shown in pink, with flanking helices (corresponding to h5 and h6 in panel d) shown semi-transparent. f Comparison of GEF-G and GEF-Tu structures. GEF-Tu lacks the G’ subdomain present in GEF-G that engulfs the last alpha-helix in the domain (cyan). pdb codes: 1efc (EF-Tu) and 4v9p (EF-G). Source data are provided as a Source Data file.

From the AP Profiling results, it is not clear whether the observed folding intermediate represents a productive, on-pathway or a non-productive, potentially deleterious species. To distinguish between these two scenarios, we conducted single-molecule optical tweezers experiments with GEF-Tu RNCs. Force extension curves of 240 amino acid long nascent chains reveal that GEF-Tu partially unfolds in a near-equilibrium fashion before complete unfolding (Fig. 3c, white and red arrowheads, respectively). When combined, the two transitions add up to a contour length change of 71.8 ± 4.0 nm (SEM) (see Supplementary Methods and Fig. S5 for details), consistent with the expected value of 68.2 nm for unfolding natively structured GEF-Tu. These results indicate that unfolding of fully folded GEF-Tu proceeds through an obligatory intermediate.

The contour length of the intermediate state is 23.1 ± 1.2 nm (SEM) longer than the fully folded state (Fig. 3c and Fig. S5). This length change is consistent with the unfolding of ~60 amino acids from the C-terminus (Fig. S6). Our single-molecule analysis, therefore, suggests that the co-translational folding intermediate comprises the N-terminal ~140 amino acids (Fig. 3d). Importantly, GEF-Tu never transitions directly between the folded and the unfolded state, indicating that the intermediate is an obligatory species on the pathway between the unfolded state and native state.

The AP Profiling results suggest that the intermediate forms at an extruded length of around 160 amino acids, similar but not identical to the result from single-molecule experiments, which indicate folding at a length of around 140 amino acids (Fig. S6). This discrepancy may result from measurement uncertainties or differences between the environments for the in vitro single-molecule and the cellular AP Profiling experiments. Nevertheless, these experiments collectively suggest that GEF-Tu populates an intermediate that is on-pathway to the natively folded domain.

The position at which the intermediate forms in GEF-Tu corresponds to the site where the homologous GEF-G domain contains a large insertion termed G’ subdomain (Fig. 3e). A previous study found that GEF-G folds only upon full domain extrusion from the ribosome, a finding corroborated by the data presented here (Fig. 2). This observation was rationalized by the presence of the G’ insertion in GEF-G that engulfs the helix at the extreme C-terminus of the domain (Fig. 3f). GEF-G might thus have a evolved a folding pathway in which the C-terminal helix folds prior to the G’ insertion21. This interpretation is supported by the findings presented here. GEF-Tu lacks the G’-insertion and the associated constraints on folding order. Our experiments show that the two homologous domains fold co-translationally along distinct pathways.

Functional redundancy of chaperone function in co-translational folding

Molecular chaperones guide co-translational folding34. Even though their importance for efficient folding is firmly established, it is not well understood how they promote folding and which steps along the folding pathways they affect. Individual nascent chain-binding chaperones have been the subject of mechanistic studies3. However, the presence of multiple chaperones with generally low binding specificity and moderate affinities results in competition for client binding in the cytosol35. AP Profiling experiments, carried out in the presence of all cellular nascent chain-interacting proteins, may provide an opportunity to localize effects of individual nascent chain-binding chaperones, a first step toward understanding their contributions to nascent protein folding in vivo.

In E. coli and most other bacteria, trigger factor (TF) and DnaK, a member of the Hsp70 family, are the main chaperones that interact with growing nascent polypeptides, competing for nascent chain binding3,34 (Fig. 4a). The two chaperones share no homology and are regulated by distinct mechanisms. Individual deletion of TF or DnaK causes only relatively mild growth phenotypes36, indicating functional redundancy37, even though it is not clear whether they employ similar or distinct mechanisms to promote efficient de novo folding.

Fig. 4: AP Profiling resolves chaperone effects on co-translational folding.
figure 4

a Trigger factor and DnaK compete for interactions with nascent chains on the ribosome. b Ablation of trigger factor (\(\Delta\) tig, gold) resulted in localized changes of AP scores compared to a wild-type background (WT, black). c Same as panel (b) for a DnaK deletion strain (\(\Delta\) dnaKJ, blue). d Δchaperone scores, obtained as the difference in AP scores between wild-type and chaperone-knockout strains, exhibit the most prominent change in the region of intermediate folding for both \(\Delta\) tig and \(\Delta\) dnaKJ. e Δchaperone scores mapped onto the GEF-Tu structure illustrate localization of chaperone deletion effects at distinct sites within the same region (around amino acid 200). f Chaperone deletion has only minor effects on GEF-G scores, compared to GEF-Tu. Source data are provided as a Source Data file.

To detect specific chaperone effects during synthesis of GEF-G and GEF-Tu, we carried out AP Profiling experiments in TF and DnaK deletion strains36tig and ΔdnaKJ, respectively, both in an MC4100 background). For unknown reasons, libraries of E. coli GEF-Tu exhibited growth defects upon expression in this strain background. We therefore chose to analyze the ortholog from Bacillus subtilis. The G-domain of B. subtilis EF-Tu shares 72% sequence identity, a virtually identical structure (RMSD of 0.55 Å) and very similar AP scores with the E. coli ortholog (Fig. S7), including the peaks for the folding intermediate (around ncodons = 200) and the complete domain (around ncodons = 250), respectively.

The overall profiles of AP scores measured for GEF-Tu are similar in Δtig and wild-type cells (Fig. 4b, yellow and black lines, respectively). However, specific regions showed markedly decreased (around ncodons = 110) or increased (ncodons = 190 and 210) scores upon TF ablation. Likewise, AP scores in the ΔdnaKJ strain were similar to those in wild-type cells, except for a prominent increase in the region around ncodons = 210 (Fig. 4c, blue and black lines, respectively). A minor effect is observed for both chaperones at position ncodons ≈ 240, which corresponds to G-domain folding when the final alpha helix (h9) is extruded from the ribosome. Notably, neither TF nor DnaK ablation caused global changes in the measured AP scores, which is apparent when comparing the difference between wild-type and knockout backgrounds (Fig. 4d, Δchaperone).

The largest changes associated with chaperone ablation localize to the region where the co-translational folding intermediate is observed (see also Fig. 3), for both TF and DnaK. Interestingly, while both chaperones exert major effects in this region, the exact positions differ. Mapped onto the native GEF-Tu structure, the most prominent effect of TF is apparent when helix h7 and strand s6 emerge from the ribosome, whereas the strongest DnaK effect is localized to the short segment in between these two elements (Fig. 4e). The chaperones therefore appear to engage nascent GEF-Tu in distinct ways, yet in the same region. Very similar results were obtained for GEF-Tu from Thermus thermophilus (Fig. S7), indicating conservation of these features. Our observations suggest that the functional redundancy between TF and DnaK might be achieved by directing these two chaperones to distinct sites within the same region.

Ablating TF or DnaKJ did not strongly affect folding signals along the ORF of GEF-G (Fig. 4f), in contrast to the results with GEF-Tu. The largest (although still minor) deviation from wild-type cells is seen for the Δtig strain in the region 240 ≤ ncodons 275 of GEF-Tu. In this region, corresponding roughly to the extrusion of the G’ subdomain, we also observed high AP scores that were not due to tertiary structure formation (Fig. 2d). The Δchaperone values are positive in this region, indicating that TF destabilizes these structures or interferes with their formation. Nevertheless, the AP Profiling results show that chaperone ablation does not profoundly affect GEF-G folding.

AP Profiling for co-translational multi-domain protein folding

Most folding studies that used arrest peptides have employed reconstituted in vitro translation systems38,39,40,41 which not only lack the arsenal of nascent chain interactors present in the cytosol, but are also inefficient at synthesizing long polypeptides40,42, restricting the approach to small proteins or domains. AP Profiling does not suffer from these limitations. We therefore utilized the approach to analyze how chaperones affect the folding of long multi-domain proteins, which constitute large fractions of all proteomes. For full-length EF-G, a 704-amino acid protein with five domains (Fig. S8a), we obtained good coverage throughout the entire open reading frame (Fig. S8b, red), which allowed us to calculate AP scores (Fig. S8c, red). The highest AP score maps to the position of complete G-domain at around ncodons = 330, a number of smaller local peaks around positions 437 (extrusion of domain II), 650 (extrusion of domain IV) and 750 (extrusion of the full-length protein, enabled by a C-terminal extension in the AP Profiling constructs).

To determine folding scores, we analyzed the destabilized F/A variant of full-length EF-G (Fig. S8d, Methods). The highest folding score around ncodons = 330 corresponds to co-translational G-domain folding. The second-highest peak is located at ncodons = 437, close to full extrusion of domain II, consistent with previously observed co-translational folding of domain II12. Coverage for the F/A mutant dropped precipitously around ncodons = 630 (Fig. S8b) for unknown reasons. Long, destabilized EF-G fragments appear to interfere with cell growth, resulting in the elimination of these constructs from the library. The lack of F/A variant data does not allow us to draw firm conclusions about the co-translational folding of the C-terminal part of EF-G. Nevertheless, the absence of high AP scores in this region (Fig. S8c) is consistent with in vitro experiments suggesting that this part of the protein completes folding post-translationally43. Notably, one of the highest scores is observed when the full-length protein, tethered to the ribosome through a C-terminal linker, is extruded from the ribosome (Fig. S8c, around ncodons = 750).

To map chaperone effects on nascent EF-G, we next determined AP scores for wild-type, Δtig and ΔdnaKJ E. coli cells (Fig. S9). The chaperone deletion strains were created in an MC4100 background36. To ensure that AP Profiling results are consistent among different strains, we compared results from wild-type MC4100 and Top10 cells. AP scores from MC4100 wild-type cells closely resembled those obtained from Top10 cells (Fig. S9a), with good coverage across the first 400 codons of the ORF. However, coverage in the C-terminal half of the protein was relatively poor in the wild-type and the chaperone deletion MC4100 strains (Fig. S9b), suggesting strain-specific growth defects. Expression of relatively long fragments of EF-G, a central component in protein synthesis, might pose challenges to cellular growth, similar to what we observed for a destabilized EF-G variant (Fig. S8).

The trGTPase superfamily contains several multi-domain proteins with strong homology to EF-G, including the six-domain translation elongation factor 4 (EF4) (Fig. S10). In contrast to EF-G, EF4 does not appear to be generally required for translation, instead functioning to facilitate translation elongation under specific stress conditions44. To evaluate chaperone effects on co-translational folding of EF4, we expressed AP Profiling libraries of EF4 in wild-type and chaperone deletion strains (Fig. 5a and Fig. S11).

Fig. 5: Localized chaperone effects on folding of a multi-domain protein.
figure 5

a The six-domain EF4 protein exhibits multiple AP score peaks (gray line), indicating several locations of co-translational folding. The vertical gray bar indicates the position of full-length protein extrusion from the ribosome. Chaperone deletions (Δchaperone scores for \(\Delta\) tig, gold, and \(\Delta\) dnaKJ, blue) result in localized changes with both positive and negative signs. b Δchaperone scores mapped onto the native structure of full-length EF4. Trigger factor deletion elicits large effects on co-translational folding, most prominently for the G-domain and domain III. c Comparison of Δchaperone scores for trigger factor and DnaK. Trigger factor causes larger changes to co-translational folding than DnaK. d Trigger factor causes opposite effects on neighboring alpha-helix and beta-sheet. Source data are provided as a Source Data file.

Ablation of Trigger factor resulted in AP score changes mainly at positions corresponding to extrusion of the first and third domains (G-domain and domain III) (Fig. 5a, b). The C-terminal domains remained largely unaffected by the deletion of either chaperone. As observed for the G-domain of EF-Tu (Fig. 4d), changes resulting from chaperone deletion are localized to defined positions. In the Δtig strain, both increased and decreased AP scores are observed (Fig. 5a). Deletion of the DnaK system caused changes that were small in magnitude compared to TF deletion (Fig. 5a, c).

The largest change in EF4 folding upon trigger factor deletion is localized to domain III (Fig. 5a, d). Around positions 386 and 398, AP scores in the Δtig strain are markedly decreased, before increasing around position 409, which corresponds to full extrusion of domain III (Fig. 5d). One possible explanation is that TF stabilizes smaller secondary structures, breaking up folding cooperativity. A similar trend is observed for domain III of another trGTPase, SelB (Fig. S12). Delineating chaperone effects on the folding of related multi-domain proteins may thus reveal recurring patterns of chaperone function.

Discussion

Co-translational folding is a crucial yet still poorly defined step in the biogenesis of functionally active proteins3,6,45, particularly for large multi-domain proteins2. We have combined AP-based folding detection in living cells with ratiometric flow cytometry, cell sorting and deep sequencing (Fig. 1), building a sensitive approach that maps co-translational folding with codon resolution in vivo. The approach not only yields high throughput and resolution, but also captures folding in the context of all the cytosolic factors that compete for nascent chain interaction. Thus, we were able to map major co-translational folding events and reveal chaperone effects on nascent chain folding using this approach, which we termed “Arrest Peptide Profiling” (AP Profiling).

Compared to previously reported AP-based folding measurements, which mostly rely on gel electrophoresis to detect arrest release, AP Profiling provides increased sensitivity. The AP score at each position reflects the average GFP/mCherry intensity ratio measured for dozens to hundreds of individual cells, yielding high signal-to-noise ratios. AP Profiling measurements show that the GTPase domain of EF-G (GEF-G) does not fold into stable structures before the entire domain is extruded from the ribosome (Fig. 2), supporting earlier observations21. Even minor variations in AP scores along the GEF-G ORF are resolved reproducibly (Fig. S13). At the same time, major co-translational events, such as full-length GEF-G folding (around ncodons = 330), are still fully captured within the dynamic range of the assay (Fig. 1). These results illustrate the exquisite sensitivity and large dynamic range of the AP Profiling approach.

The high sensitivity of the AP Profiling assay revealed previously undetected co-translational folding signals during GEF-G synthesis (compare Fig. 2b and Fig. S2). Experiments with destabilized candidate protein variants show that these well-defined increases in AP scores do not reflect tertiary structure formation. Rather, the formation of secondary structure, which can take place in ref. 46 and even be stabilized by ref. 8, the ribosome exit tunnel, may cause elevated AP scores. Alternatively, amino acid motifs adjacent to the SecM AP may affect arrest strength. In fact, even nascent chain segments outside the ribosome exit tunnel have been found to modulate arrest strength47. It is not clear how either of these mechanisms would result in mechanical force on the SecM AP. As such, an arrest release profile may not represent a force profile.

Cells contain several quality control mechanisms that safeguard against potentially toxic effects of translation stalling and accumulation of structurally unstable proteins48. It is possible that interactions of nascent chains with quality control machinery modulate the kinetics of SecM arrest release. Regardless of arrest release mechanism, distinguishing tertiary structure formation from other sources of signal AP-based folding measurements is important when defining co-translational folding pathways, e.g., with destabilized candidate variants (Fig. 2). Tertiary structures can clearly serve as waypoints for co-translational folding, whereas secondary structures may not be (kinetically) stable enough to serve as scaffolds for larger scale folding during slow polypeptide elongation.

In contrast to the GTPase domain of EF-G, GEF-Tu exhibits a folding intermediate when ~160 of its 200 amino acids have been extruded (Fig. 3). Single-molecule optical tweezers experiments complementing AP Profiling measurements suggest that the intermediate corresponds to the core P-loop segment that is highly homologous in GEF-Tu and GEF-G. A common ancestor of the two domains may have folded through the co-translational intermediate suggested by our results. Insertion of the G’ subdomain (Fig. 3f) may then have forced a complete remodeling of the folding energy landscape, resulting in distinct folding pathways of the homologous domains that start at the C-terminus for GEF-G21 and at the N-terminus for GEF-Tu (this work). This interpretation would suggest that insertion of the G’ domain resulted in changes to the GEF-G sequence upstream of the insertion site that do not allow intermediate formation. Direct interactions between this part of the domain and the G’ insertion cannot be responsible for the changed folding pathway, because the G’ insertion is not yet synthesized at chain lengths where the intermediate would form.

The lack of co-translational folding intermediates suggests that GEF-G accumulates in an unstructured or loosely structured conformation during synthesis, which is likely stabilized by nascent chain-binding chaperones against non-native folding or interactions. Consistent with this conjecture, the EF-G amino acid sequence exhibits normal hydrophobicity (see Fig. S14), a major determinant of chaperone binding49. However, individually ablating trigger factor and DnaK, the two major nascent chain-binding chaperones in bacteria, caused only minor effects on the AP scores obtained with GEF-G (Fig. 4). Chaperone binding itself therefore does not appear to have pronounced effects on AP scores in our experimental setup.

In contrast to GEF-G, GEF-Tu exhibits pronounced localized changes in AP scores when trigger factor or DnaK are removed (Fig. 4). The largest changes, expressed as Δchaperone scores, are observed at a nascent chain length of ~200 amino acids, close to the position where the co-translational folding intermediate is detected. Notably, the effects of both chaperones localize to the same region, but distinct sites, indicating engagement with nascent GEF-Tu in distinct ways. Notably, individual deletions of trigger factor and DnaK do not result in global changes to the folding scores we obtained with AP Profiling. These observations provide a molecular underpinning of the notion that trigger factor and DnaK have overlapping binding sites and functions36, even though they are unrelated.

Most of the pronounced Δchaperone scores (Fig. 4) are positive, including those around ncodons = 200 (intermediate folding) and ncodons = 240 (full domain folding). A positive Δchaperone score suggests that chaperone binding destabilizes folded structures, which is consistent with an earlier study of trigger factor effects on DHFR folding50. However, trigger factor ablation also results in negative scores at some positions, including nascent chain lengths of 110 amino acids in GEF-Tu (Fig. 4) and 390 to 400 amino acids in EF4 (Fig. 5). Negative Δchaperone scores might suggest that trigger factor promotes structure formation, perhaps within its binding cavity51,52,53, or could be the result of particularly strong interactions of the chaperone with specific nascent chain sequences. The Δchaperone score pattern observed for domain III in EF4 (Fig. 5) and SelB (Fig. S12), showing a dip followed by an increase, is consistent with both of these scenarios.

The combined action of trigger factor and DnaK likely has profound effects on the timing of co-translational folding. For instance, while domain II in EF-G folds immediately upon ribosome extrusion in vitro12, its folding does not elicit high folding scores (Fig. S8), suggesting that the two co-translationally acting chaperones modulate its co-translational folding. Overall, DnaK ablation appears to mostly result in positive Δchaperone scores (Figs. 4, 5), consistent with the canonical role of DnaK in stabilizing unstructured conformations of its clients. Δchaperone scores also are, on average, smaller for DnaK ablation than for trigger factor ablation (Fig. 5 and Fig. S12). DnaK is believed to act downstream of trigger factor in chaperoning nascent chain clients16, which may result in fewer and weaker effects on co-translational folding near the ribosome. However, Δchaperone scores of trigger factor and DnaK are of similar magnitude at specific positions (Fig. 4), illustrating that both chaperones can elicit strong effects on nascent chain folding.

AP Profiling enables co-translational folding measurements of large multi-domain proteins that cannot be efficiently synthesized in reconstituted in vitro translation systems. Here, we investigated EF-G (Fig. S8), EF4 (Fig. 5), and SelB (Fig. S12). The folding signals from EF-G AP Profiling pointed to co-translational folding of the N-terminal G and II domains and post-translational folding of domain III, consistent with the folding pathway determined by single-molecule optical tweezers12,43. EF4 exhibited co-translational folding from all five domains at individual domain boundaries, with TF interaction detected at domain III (Fig. 5b). In contrast, SelB did not exhibit domain-wise folding at domains II and VI (Fig. S12). The observation that folding pathways diverge among homologous proteins highlights the intricacies of multi-domain protein folding and the need for more thorough and systematic examination. Likewise, interactions between domains in large globular proteins are ubiquitous and likely affect co-translational folding, even though few examples have been investigated in detail so far. AP Profiling experiments, combined with manipulations such as circular permutation or deletion of individual domains, will be a useful tool for guiding future investigations.

A limitation of the cellular AP Profiling approach is that expression of candidate truncations might result in growth defects and, consequently, lack of coverage (Figs. S8, 9). Reliable folding detection requires comparable cell viability and growth rates for all truncation fragments within the library. Expressing arrest peptides generates stalled ribosomes, which can interfere with growth54. In addition, fragments of some proteins can cause toxicity55. The N-terminal regions of candidate proteins are particularly impacted. They are expected to show low AP scores, resulting in their elimination from libraries alongside non-folding out-of-frame constructs (see Methods), and may themselves negatively affect cell growth56. Careful optimization of expression conditions might alleviate some of these challenges. For instance, expression of an EF-G truncation library of E. coli EF-Tu yields marked differences in coverage upon expression in MC4100 and Top10 cells, even though the overall AP score profile is preserved (Fig. S9). Likewise, orthologs from other species (here, we used B. subtilis and T. thermophilus EF-Tu) can circumvent expression problems (Fig. 3 and Fig. S7). Thus, the choice of expression strains and candidate orthologs for AP Profiling experiments can be tailored to achieve optimal coverage.

In summary, we describe here an effective methodology, termed AP Profiling, for mapping co-translational folding in live bacterial cells with high throughput. Our experiments revealed divergent co-translational folding pathways and can readily be extended to study other aspects of nascent chain biology, such as membrane insertion57,58. The unparalleled throughput and sensitivity make AP Profiling ideally suited for in vivo studies of protein folding and membrane insertion and to dissect contributions from chaperones and other cellular machinery. Collectively, our advances provide much-needed tools for mechanistic investigations of protein biogenesis.

Methods

All commercially available enzymes were purchased from NEB unless stated otherwise. PCR reactions were carried out with Phusion high-fidelity DNA polymerase (Thermo Scientific, F530S). Chemicals were purchased from Sigma-Aldrich unless stated otherwise. DNA Tini spin column (Enzymax LLC, EZC106N) was used for DNA purification protocols. GeneJET plasmid miniprep kit (Thermo Scientific, K0503) was used for all plasmid miniprep steps. GeneJET PCR purification kit (Thermo Scientific, K0701) was used for all nucleotide purification. The binding buffer from K0701 was tested and used for quenching enzymatic reactions in AP Profiling at twice the reaction volume. E. coli TOP10 cells was from a commercial source (Invitrogen, C404010). MC4100 strains were from a previous study36. Double-stranded DNA fragments (gBlock) and sequencing adapters are synthesized by IDT (Integrated DNA Technologies). See Supplementary Table S1 for candidate proteins used in this study.

AP Profiling vector design

The dual reporter design was inspired by a previous study59. However, we encountered an unregulated sigma factor binding site in the mCherry coding sequence and decided to separate the two fluorescent proteins in different open reading frames. The AP Profiling vector (Fig. S15) is built by assembling multiple components from different sources (see Supplementary Methods for detailed descriptions). The various pieces of DNA fragments were obtained by PCR or commercial synthesis. The resulting vector allows for facile insertion of candidate proteins and unidirectional truncation library construction.

Truncation library construction

The truncation library is built by time-dependent exonuclease digestion, adapted from protocols described previously28. The coding sequence of a candidate protein without the stop codon is cleared of conflicting endonuclease sites before being amplified with flanking digestion sites to be inserted between the BbsI sites in the AP Profiling vector. Fresh plasmid miniprep (more than 20 ml overnight culture for one protein) was performed to collect enough materials for enzymatic digestion. Purified plasmids were digested stepwise with SacI (NEB, R3156S) to prevent exoIII digestion on the arrest peptide side and then with AvrII (NEB, R0174S) to allow for exoIII digestion toward the candidate protein. Linearized dsDNA was purified and quantified by Qubit (1X dsDNA high sensitivity assay kit, Invitrogen, Q33231).

Incremental digestion by exoIII (Promega, M1811) was performed by diluting the linearized plasmids to a final concentration of 33 ng/µl before adding the exoIII amount to a final 0.1 U per nanogram of DNA (Qubit reading). We supplemented the reaction with NaCl at a final concentration of 30 mM. Based on our sequencing results, we estimated the digestion rate to be around 35 nt/min at 22 °C in the digestion workflow, similar to the previously estimated rate60. The reaction mixture was left in a heat block at 22 °C for 5 min before sampling every 30 s into a chilled quenching buffer (binding buffer from Thermo Scientific, K0701). Total digestion time depends on the length of the protein-coding sequence. The quenched DNA mixture was purified and quantified by Nanodrop (Thermo Scientific, ND-ONEC-W).

To blunt the digested DNA, mung bean nuclease (NEB, M0250S) was mixed with the purified DNA at a final concentration of 1.5 U per microgram of DNA (Nanodrop reading). In our workflow, 30 min reaction at 30 °C at this concentration is enough to blunt the DNA and digest the remaining RNA from miniprep without introducing non-specific cleavages in the dsDNA region. The DNA was purified and quantified by Nanodrop. Klenow fragment (NEB, M0210S) was added to the purified DNA for end-polishing at a final concentration of 1 U per microgram of DNA. Klenow reaction was performed at 25 °C for 15 min before purification and quantification by nanodrop.

The end-polished DNA was then diluted to around 2 ng/µl final concentration for blunt-end intramolecular ligation by T4 DNA ligase (NEB, M0202S). Ligation reaction was carried out at 16 °C for 16–20 h. The circularized DNA was then purified carefully by washing multiple times during a standard purification protocol and eluted to a final concentration around 40 ng/µl.

Transformation efficiency was tested by transforming 50–60 ng ligation product into TOP10 cells by electroporation. Importantly, all transformations were done with freshly cultured cells. One transformation reaction consists of 5 ml fresh culture harvested at OD600 around 0.4. Cells were cultured in LB supplemented with 1% glucose at appropriate temperatures (37 °C for TOP10, 30 °C for MC4100 strains) to recover for 2 h at 220 rpm. Cultures were spun down and spread on agar plates supplemented with 1% glucose (to repress PBAD expression) and carbenicillin plus strain-specific antibiotics36. Plates were imaged for colony counting facilitated by openCFU61. Library coverage was calculated based on the potential truncation fragments (in-frame and out-of-frame constructs) and colony counts. Large-scale transformation was done for the truncation library to reach at least 1000X coverage. Colonies were then scraped and resuspended in fresh LB before being deposited as glycerol stocks for long-term storage.

Bacteria fluorescence-activated cell sorting (FACS)

Glycerol stocks of the truncation library in TOP10 cells were revived for overnight cultures in LB supplemented with 1% glucose. Cells carrying the truncation library were freshly diluted into glucose-free medium and grown for at least 2 hours to OD600 = 0.2. The cultures were induced with arabinose at a final concentration of 0.05% (w/v) to express for 2 h.

The bacteria population was diluted in PBS and sorted with a MoFlo XDP equipment (Beckman Coulter) based on GFP and mCherry signals (488 nm laser, 514/30 band-pass filter for GFP detection; 561 nm laser 610/30 band-pass filter for mCherry detection). Based on our AP Profiling workflow and a similar bacteria sorting application62, the first sort of the population (enrichment sort) serves to enrich the high GFP population, which also serves to eliminate out-of-frame constructs which—due to the library construction workflow—are expected to constitute approximately two-thirds of clones. We arbitrarily set up four gates for high-mCherry events, which contain the expressing population, from low to high GFP signals. Sorting was done at around 14,000 events per second and typically proceeded for 10 min when the scarcest gate contains at least 20,000 sorted events (the most abundant gate normally has 400,000 events by this time). Sorted samples were grown in LB supplemented with 1% glucose (w/v) and carbenicillin plus strain-specific antibiotics overnight at 37 °C, 220 rpm. Plasmids and glycerol stocks were prepared from the overnight cultures.

For AP Profiling in TOP10 cells, glycerol stocks of TOP10 cells were revived for fresh culture before mixing in proportions to enrich high GFP species. For AP Profiling to study chaperone effects, purified plasmids were mixed in proportions to enrich high GFP species before being electroporated into MC4100 strains (PG1 for WT, PG2 for \(\Delta\)TF, and PG3 for \(\Delta\)DnaKJ) with a minimum 1000X coverage. The new mixed cultures were then sorted again (quantification sort) with a more defined gating strategy (Fig. 2), similar to previous studies63,64. Specifically, the log(GFP/mCherry) values linearly carve up the GFP-mCherry plot on the log scale during cytometry. The Summit (v5.5.0) software allows for manually adjusting gate boundaries according to a coordinate system. We then drew up 12 slanted (slope = 1) gates on the high-mCherry population, shoulder by shoulder, with a fixed gate width. Each sorting gate can now be represented by a single log(GFP/mCherry) value.The MoFlo XDP allows for sorting four gates simultaneously. We ran the sorter at around 14,000 events per second and recorded the exact numbers of detected events, sorted events and the total number of events in the high-mCherry cluster (for normalization across groups during analysis). Running the sorted samples back through the cytometry resulted in reproducible signal levels, verifying singlet bacteria sorting (Fig. S16). Each four-gate sorting lasts around 10 min to collect enough cells for sequencing detection. Sorted samples were grown in LB supplemented with 1% glucose (w/v) and carbenicillin plus strain-specific antibiotics overnight at appropriate temperatures (37 °C for TOP10, 30 °C for MC4100 strains), 220 rpm. Plasmids and glycerol stocks were prepared from the overnight cultures.

High-throughput sequencing

We developed a custom end targeting high-throughput sequencing scheme, inspired by Tail-seq65. Plasmids extracted from truncation library constructions, transformations and sorting were used for sequencing library preparations. Briefly, the BsaI recognition site near the arrest peptide coding sequence was digested, and the overhang on the truncation fragment site was used to ligate with a biotin-modified P5 adapter containing the UMI and desired index. The single-stranded portion of the P5 adapter was then filled up by Klenow fragment before purification and quantification by Qubit. We subcloned the Tn5 transposase (from Addgene plasmid #60240) and purified the enzyme for DNA tagmentation66. The ME-B fragment was transposed into the DNA samples before biotin enrichment by streptavidin-coated magnetic beads (M280 beads, Invitrogen, 11205D). The beads were vigorously washed with wash buffer according to the manufacturer’s recommendations. The magnetic beads, bound with the biotin-labeled DNA fragments, were directly used as a template for a limited-cycle PCR (8-10 cycles) with Phusion DNA polymerase to attach the desired P7 adapter. PCR products were purified and subjected to size selection by AMPure beads (Beckman Coulter, A63880) to remove short DNA fragments (0.8X volume).

The resulting DNA architecture (Fig. S17) is compatible with Illumina sequencing platforms. The samples from this study were sequenced with HiSeq2500 (50 cycles single end) as well as NovaSeq6000 SP (100 cycles, single end) with additional index cycles to acquire UMI information.

AP Profiling data analysis

The sequencing data was processed by custom scripts chaining standard analysis tools. Briefly, short-read sequencing data were demultiplexed and de-duplicated based on index and UMI information. Reads were mapped to candidate protein sequences to locate the C-termini of the truncations, identifying the nascent chain lengths before translation arrest. The distribution of sequencing reads was correlated with cell sorting information to infer the distribution of the truncated protein across the sorting gates. Fitting a Gaussian curve over the distribution of cell counts produces an “AP score” (Fig. 1) for the truncated protein. The “folding score” refers to the value after subtracting the AP score from the F/A mutant of the same protein. A spline curve is fitted for the scores along the open reading frame for visualization. Coverage was calculated as the fraction of codons with an AP score over a 10-codon sliding window. See Supplementary Methods for detailed processing steps and descriptions on folding score calculations. Because the nascent chain length is inferred from sequencing the template DNA, we report positions from AP Profiling experiments in terms of number of codons translated (ncodons), rather than nascent chain length in amino acids.

Flow cytometry of individual truncation constructs

Individual truncation constructs were generated by Gibson assembly. TOP10 E. coli cells were transformed with the truncation plasmids separately by heatshock. Individual colonies were used to inoculate fresh LB media supplemented with 1% glucose and carbenicillin for overnight cultures. Dilutions were made in fresh media without glucose and cultured for at least 2 h before induction by arabinose in the same way for FACS experiments. Cells were diluted in PBS before running through an Attune NxT Flow Cytometer (Invitrogen). FCS files were analyzed by custom Python scripts.

Single-molecule force spectroscopy of RNC

Ribosome nascent chain complexes (RNC) were generated and prepared as described in previous work30. Briefly, protein-coding sequences with an N-terminal SpyTag67 were amplified without a stop codon. Ribosomes with a SpyTag-labeled L17 subunit was used during the in vitro translation system (NEB, E3313S) to generate the stalled RNC. The reaction products were coupled to SpyCatcher conjugated double-stranded DNA that served as handles, which also carried affinity labels for binding to streptavidin or anti-digoxigenin coated polystyrene beads. The RNC/handle adducts were then tethered between two polystyrene beads with diameters of 2.1 µm (see ref. 30 for details). Optical tweezers force-ramp measurements were carried out with a dual-trap optical tweezers instrument (CTrap, Lumicks) at ambient temperature (22 °C). The trap stiffness was in the range of 0.4 to 0.45 pN/nm and was determined before each measurement. One of the traps was moved at a constant velocity of 100 nm/s for force-ramp measurements.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.