Introduction

The global transcriptional regulator ScoC (formerly known as Hpr—Hyper Protease Producing) belongs to a group of transition phase regulators that mediate the transition of bacterial growth from exponential to stationary phase1,2,3,4,5. In Bacillus subtilis, ScoC controls directly or indirectly the transcription of more than 500 genes involved in the regulation of cellular functions such as sporulation, catabolite repression, motility, chemotaxis, competence, oxidative stress response, and degradative enzyme production6,7,8. The reported 9-bp ScoC-binding consensus motif, RATANTATY, was demonstrated by footprinting analysis to be located in the regulatory regions of two main proteases, subtilisin and natural protease (aprE and nprE), and two oligopeptide permeases (opp and dtpT)1,9. Previous studies, analyzing the promoter regions of ScoC-mediated genes, indicated that ScoC binds to two binding sites, and both sites are required for full repressive effect10, suggesting that ScoC can form a “repression loop” similar to other regulatory proteins11.

ScoC homologs are present in most Gram-positive bacteria, and resemble proteins of the Multiple Antibiotic Resistance Regulator (MarR) family12. Most of the MarR regulators function as repressors, while some have an activation role13. Induction of MarR activity depends on various cellular signals, including small phenolic compounds, metal ions, small peptides, and in some cases, oxidation of cysteine residues is involved13. MarR homologs are winged helix-turn-helix (wHTH) DNA-binding proteins that generally have a triangular shape with pseudo two-fold symmetry, and all the currently solved proteins are dimers in solution12. The recognition helix of the wHTH domain binds the DNA major groove while the wing contacts the minor groove12.

Geobacillus proteiniphilus strain T-6 (formerly known as Geobacillus stearothermophilus strain T-6) is a Gram-positive soil bacterium that possesses extensive systems for the utilization of hemicellulose polysaccharides in the plant cell wall, including xylan, arabinan, and galactan14,15,16,17. The xylanolytic system in G. proteiniphilus was shown to be controlled by diverse and unique transcriptional mechanisms, allowing the cells to sense, efficiently utilize, and compete for the scarce carbon in the soil18. The global regulator ScoC is among several regulators which are directly involved in the regulation of the xylan-utilization genes. Specifically, the ABC oligopeptide transporter oppABCDF (formerly known as spoOK) from strain T-6 is regulated by ScoC and additional regulators. This Opp system mediates quorum-sensing activation of the extracellular xylanase gene; thus, ScoC indirectly regulates the xylanase gene18.

The goal of this study was to uncover the mechanism by which the Gram-positive global transcriptional regulator ScoC of the MarR family exerts its action. Here, we demonstrate that ScoC functions as a tetramer, binds to two distant operator sites, supporting the proposed regulatory mechanism involving DNA loop formation. The tetrameric crystal structure of ScoC complexed with a 23-bp palindromic dsDNA was solved, revealing that the ScoC-DNA binding recognition is based on the minor-groove shape, determined by the A Trac signature of ScoC-binding sequence. Size-exclusion chromatography coupled with small-angle X-ray scattering (SEC-SAXS) confirmed that ScoC retains its tetrameric structure in solution. Using transcriptional fusions, fluorescence resonance energy transfer, and atomic force microscopy analyses, we show that ScoC can bind to two operator sites to empower transcriptional repression by inducing a DNA loop. To the best of our knowledge, ScoC is the only reported protein in the MarR family that functions as a tetramer and represses expression via DNA looping.

Results

The oppA regulatory region contains two ScoC-binding sites, which are both required for repression

In the framework of studying the regulation of the hemicellulolytic system in Geobacillus, we discovered that the extracellular xylanase gene xynA is regulated by cell density (quorum sensing) involving a quorum-sensing peptide18. Quorum-sensing peptides usually enter the cell via oligopeptide transport systems (Opp) and inactivation of these systems affects the regulation of many cell-density controlled processes19, including biofilm formation20, virulence factor regulation21, competence development22, sporulation23, and induction of conjugation24. G. proteiniphilus strain T-6 possesses a single oligopeptide transport system (oppABCDF), which appears to be negatively regulated by ScoC. Thus, ScoC indirectly regulates the expression of the extracellular xylanase gene, xynA.

The transcriptional start point of the oppA operon was determined by capillary primer extension analysis and was assigned to a G nucleotide, 72 bases upstream from the initiation GTG codon. The apparent −35 and −10 sequences (TTTTCA and TATAAT, respectively) resemble the B. subtilis σA consensus. The oppA regulatory region contains two ScoC-binding sites 208-bp apart with no more than two mismatches to the proposed 9-bp consensus sequence (AATANTATT)6 (Fig. 1A). The two binding sites (23 and 22-bp) are composed of two pseudo palindromic sequences of 9-bases, which are separated by five bases (ScoC-binding site 1) or four bases (ScoC-binding site II). In addition, each of the 9-bases is also made of an internal pseudo-palindromic sequence (Fig. 1A). To test whether these two ScoC-binding sites are required for repression, lacZ-fusions with truncated versions of the oppA regulatory region were prepared, integrated into the B. subtilis chromosome, and LacZ activity was determined. The expression of the oppA373-lacZ fusion, which has the two intact ScoC-binding sites, was 16-fold lower than in the other constructs, which contained either a single ScoC site (oppA157-lacZ), no sites at all (oppA101-lacZ), or a modified ScoC site in which specific replacements of conserved nucleotides in ScoC binding site I were made (oppA373mut-lacZ) (Fig. 1B and Supplementary Fig.1). These results indicate that both upstream and downstream ScoC-binding sites are required for effective oppA repression.

Fig. 1: The oppA regulatory region contains two ScoC-binding sites, which are both required for repression.
figure 1

A The ScoC-binding sequences are organized in two regions 208-bp apart. Each site is composed of two pseudo-palindromic sequences of 9-bases, and each of the 9-bases is also made of an internal pseudo-palindromic sequence. B Expression of oppA-lacZ-fusions demonstrating that two ScoC binding sites are required for repression. The left panel shows the truncated forms of the oppA regulatory regions. The mutated upstream ScoC-binding site I (ScoC I) contains specific substitutions of conserved nucleotides, indicated by underlined letters in the sequence: ATGATTAGTAACAATCATATAGT. The conserved nucleotides are presented in Supplementary Fig. 1. The right panel shows the LacZ activity of the different oppA-lacZ-fusions constructs. lacZ-fusions were constructed in B. subtilis SMY following chromosomal integration at the amyE locus. Cells were grown until mid-log in minimal medium (SMM) containing glucose, ammonium, and a mixture of 17 amino acids excluding Glu, His, and Tyr. β-Galactosidase (LacZ) activity was assayed and expressed in Miller units per cell turbidity at OD 600 nm. Numerical source data are in Supplementary Data 1.

ScoC induces a DNA loop in the oppA promoter region

The fact that ScoC repression depends on two binding sites raises the possibility that ScoC may function as a tetramer and can form a similar DNA loop, for example, as the lac repressor25. Indeed, size exclusion chromatography confirmed that ScoC is a tetramer in solution, consistent with the tetrameric protein assembly observed in the crystal structure (see below). If ScoC can indeed induce DNA looping, it should be able to bring together the two ends of a DNA molecule, provided that each end contains a ScoC-binding site. This looping process can be verified by Förster resonance energy transfer (FRET) technique, which follows the energy transfer from a donor molecule to an acceptor molecule. The efficiency of this energy transfer is extremely sensitive to the distance between donor and acceptor molecules26,27. We designed a FRET experiment in which the substrate for ScoC was a 248-bp dsDNA, containing two ScoC binding sites at both ends and labeled with a 5’-donor Cy3 and a 3’-acceptor Cy5. Indeed, in the presence of ScoC the fluorescence intensity enhanced significantly compared to the background fluorescence without ScoC. These results were the first indication that ScoC can bring together the two ends of a DNA molecule, therefore inducing DNA looping. When plotting the fluorescence intensity as a function of ScoC concentration, the fluorescence increases first, and when the ScoC concentration exceeds about 10 nM, the fluorescence decreases (Fig. 2A). This phenomenon can be explained if two independent ScoC tetramers occupy the two binding sites on the same DNA molecule, thus preventing DNA looping.

Fig. 2: ScoC induces DNA looping.
figure 2

A FRET analysis demonstrating DNA looping mediated by ScoC. The fluorescence intensity was measured at 30 °C in the presence of different concentrations of ScoC with 10 nM of a 5’-Cy3 3’-Cy5 labeled 248-bp DNA element. The element was amplified from the oppA operator region and contained the two ScoC-binding sites at the ends. On the right is a cartoon demonstrating the equilibrium between the different intermediates, where D stands for DNA, and P stands for the tetrameric ScoC structure. At ScoC concentrations above 10 nM the equilibrium is shifted towards the DPP form, resulting in reduced fluorescence intensity. B Electrophoretic mobility shift assays demonstrating different DNA forms obtained upon binding of ScoC. Left panel: ScoC was incubated with a fluorescently labeled 316-bp DNA fragment containing two ScoC-binding sites at its ends. Two distinct retarded bands can be visualized. A slow migrating band representing DNA in the loop form (DP*) and a faster migrating band, representing the DNA in its linear form, in which two independent ScoC tetramers are bound at each end (DPP). In higher concentrations of ScoC only the fast-running band of the linear form appears. The DP form is not visible presumably since the equilibrium (K2) tends strongly towards the loop form (DP*). Right panel: a 247-bp DNA containing a single ScoC-binding site gives only a single retarded band shift representing a linear DNA form with one ScoC tetramer bound. Numerical source data are in Supplementary Data 2. C Visualizing ScoC-induced DNA looping by atomic force microscopy. AFM topography images of A 316 bp DNA fragment, B ScoC, C ScoC-DNA mixture showing different types of interactions and, D a zoom into a DNA loop induced by ScoC (yellow dashed square).

DNA looping induced by proteins can also be deduced from gel retardation assays28. In a native non-denaturing and low percentage acrylamide gel, when ScoC is incubated with a 316-bp DNA fragment containing two ScoC-binding sites, we obtained two different band shifts. A slow migrating band representing DNA in the loop form and a faster migrating band, representing the DNA in its linear form, in which two independent ScoC tetramers are bound at each end. In higher concentrations of ScoC only the fast-running band of the linear form appears (Fig. 2B and Supplementary Fig. 2A). When using a 247-bp DNA fragment containing a single ScoC binding site, only one retarded band can be detected, indicating that loop formation requires two intact binding sites (Fig. 2B and Supplementary Fig. 2B).

To directly visualize the formation of ScoC-induced DNA loops, we used AFM. Images of control samples of a 316-bp DNA fragment, as well as ScoC, were first obtained (Fig. 2C: A–B). The length of the DNA fragments measured from the AFM images (N (fragments) = 47) was 110 \(\pm\) 10 nm and the diameter of the ScoC particles was 14 \(\pm \,\) 4 nm (N (ScoC) = 1099), about the expected sizes. AFM measurements of ScoC with DNA showed the possible different conformations of ScoC-DNA binding, including binding to one site, two sites, DNA chains, and DNA loops (Fig. 2C: C). The DNA loops induced by ScoC (Fig. 2C: D) had a diameter of about 40 nm, roughly what is expected from a DNA fragment of about 108 nm in length (316 DNA bases).

The overall structure of ScoC

The structure of ScoC at 3.15 Å resolution was solved by molecular replacement using as a model the structure of ScoC from B. subtilis (https://doi.org/10.2210/pdb2fxa/pdb, Table 1). The asymmetric unit contains the monomer consisting of six α-helixes and five β-strands arranged as α1-α2-β1-α3-α4-β2-β3-α5-α6-β4-β5 (Fig. 3A) and the unit cell is made of an X-shaped tetrameric structure. The tetramer consists of two dimers that are perpendicular to each other. Similar to the MarR family, the dimerization region involves helices α1, α5, and α6 and is made of 20 hydrogen bonds and two salt bridges (Fig. 3A and Supplementary Table 1). The ScoC DNA-binding motif is a wHTH domain that has a β1-α3-α4-β2-β3 topology (Fig. 3B). The interactions forming the dimer-dimer interface include ten hydrogen bonds and four salt bridges involving residues: Tyr139, Lys141, Glu144, Cys150, and the backbone of Leu138 (Fig. 3C and Supplementary Table 2). Together with the interactions between the dimers, these form a network of hydrogen bonds that stabilize the tetramer. SEC-SAXS analysis provided the envelope of the protein in solution and superposition with the solved crystallographic structure (calculated with CRYSOL) showed good fit (Fig. 4). Except for the deviations, mainly at larger values of q (above values of 0.11), the simulated curve is within the error range of the experimental data, providing evidence for the reliability of the SAXS data and the resulting molecular envelope. The Guinier plot and the pair distance distribution are shown in Supplementary Fig. 3.

Fig. 3: The three-dimensional structure of the ScoC.
figure 3

A Ribbon representation of the monomer, dimer, and tetramer structures in which the topology of the MarR family is labeled. B The wing Helix turn Helix (wHTH) binding motif of ScoC. Each monomer of ScoC contains one motif unit shown in red. C The ScoC tetramerization contact region. The interactions forming the dimer-dimer interface include ten hydrogen bonds and four salt bridges involving residues: Tyr139, Lys141, Glu144, Cys150, and the backbone of Leu138.

Fig. 4: ScoC is a tetramer in solution.
figure 4

A Size-exclusion chromatogram of ScoC indicates that ScoC is a tetramer in solution with an apparent Mw of about 100,000 in solution. Protein standards used were Thyroglobulin (Mw, 669,000), Ferritin (Mw, 440,000), Aldolase (Mw, 158,000), Conalbumin (Mw, 75,000), and Ovalbumin (Mw, 44,000), all from GH Healthcare. Numerical source data are in Supplementary Data 3. B Scattering curve for ScoC. Log[I(q)] is plotted as a function of the momentum-transfer vector q. The q range is q = 0.007–0.037 Å−1. Experimental data are shown in black. The red curve corresponds to the simulated scattering curve for the envelope constructed using DAMMIN with χ2 = 0.95. The blue curve corresponds to the simulated scattering curve from the crystallographic tetrameric structure of ScoC calculated by CRYSOL with χ2 = 1.07. The graphs were generated using Origin 2019b (OriginLab Corporation, Northampton, MA, USA). Numerical source data are in Supplementary Data 4-6.

Table 1 Data collection and refinement statistics (molecular replacement)

Structure of the ScoC–DNA complex

Several DNA constructs with ScoC-binding sites were used for co-crystallization attempts with ScoC, however, only a 23-bp ds DNA made of two identical perfect symmetric palindromes (ScoC binding sites) provided crystals with reasonable diffraction. The structure of ScoC suggests that each ScoC dimer should bind a 23-bp DNA segment with two ScoC binding sites. Indeed, each dimer binds a 23-bp palindrome dsDNA, and the dimers are perpendicular to each other and consequently also the DNAs (Fig. 5A). Unexpectedly, the crystal packing resulted in crystallography artifacts in which a) the unit contained an additional 23-bp dsDNA that seems to bridge between adjacent tetramers; and b) the binding of one of the DNA segments was not symmetrical (does not exhibit the same binding orientation in both binding sites) and in fact, three of its bases are protruding outside the dimer. Thus, only one of the two 23-bp segments is bound correctly to the two ScoC binding sites in each tetramer. Obviously, we will refer only to the correct symmetric binding when analyzing the ScoC-DNA binding interactions.

Fig. 5: The crystal structure of the ScoC complexed with DNA.
figure 5

A Each dimer binds a 23-bp palindrome dsDNA. The dimers are perpendicular to each other and consequently also the DNAs. B DNA binding to the ScoC winged HTH binding motif induces a 9.36 Å movement of the wing, allowing it to accommodate the minor groove. In blue is the free-winged HTH binding motif and in purple is the DNA-bound winged HTH binding motif. C Representation of the ScoC winged HTH binding motif occupying the minor and major grooves of the ScoC binding site. In the wing, occupying the minor groove, the conserved Arg100 is stabilized by Asp98 and interacts with the O2 atom of base T3. In the recognition helix, Asn79 interacts with the O4 atom of base T8 and Ser75 interacts with N6 of the complementary base, A16. The sequence of the ScoC binding site is shown at the bottom, and in red are the three bases that form hydrogen bonds with ScoC.

The wHTH motif of ScoC binds the conserved 9-bp, 5’AATATTATT 3’, binding site. The wing of the wHTH motif is stretched and enters the minor groove, and the recognition helix occupies the major groove. Upon binding, significant conformational changes occurred in both the ScoC binding motif and the DNA. The ScoC wing moves out of its place and stretches about 9 Å towards the minor groove entering it and interacting with the DNA (Fig. 5B). In the dimer, this movement changes the distance between the wings from 57 Å before binding to 70 Å following DNA binding, whereas the distance between the recognition helices remains the same at about 23 Å. The conformational changes of the DNA binding site upon binding were examined by comparing 12 structural and angular parameters using the 3DNA web server (http://web.x3dna.org/)29. For the ScoC-bound-DNA the average base pair twist is 36.38°, the rise 2.94 Å the tilt 13.32° and the roll −5.90° (compared to 35.9°, 3.40 Å, 0.00°, −2.40° for ideal B-DNA). The binding of the wing to the minor groove reduces marginally the width of the minor groove to 10.5 Å compared to 10.7 Å for ideal B-DNA. The width of the major groove after the binding is 19.7 Å compared to 17.4 Å in ideal B-DNA29,30.

ScoC–DNA interactions

Most of the interactions between ScoC and the DNA are non-sequence-specific and occur mainly between the wHTH motif and the ribose rings and the phosphates of the DNA backbone (Supplementary Table 3 and Supplementary Fig. 4). The specific interactions formed by hydrogen bonds and Van der Waals interactions involve residues Arg100, Ser75, Asn79, and Thr76. For example, Asn79 of monomer A forms a hydrogen bond through its amine side chain group ND2 to the O4 atom of T8 (d ~3.60 Å). Ser75 of monomer A forms a hydrogen bond through its hydroxyl group OG to the N6 atom of the A16 base (d ~3.20 Å). Arg100 of monomer A forms hydrogen bond and other a Van der Waals interaction with base T3, and one Van der Waals interaction with base T22. The hydrogen bond formed by the NH1 group of Arg100 and the O2 atom of the T3 base (d ~3.50 Å) (Fig. 5C). Indeed, FRET analyses demonstrated that the replacement of Arg 100 to Ala reduced the fluorescence by 90% compared to the wild type ScoC, and the double replacement of Asn79 and Ser75 residues to Ala reduced fluorescence by about 70% compared to the wild type ScoC (Supplementary Fig. 5). Thus, these mutations reduce the ability of ScoC to induce DNA looping.

Discussion

DNA looping has been implicated in many biological processes, including transcriptional regulation, site-specific recombination, and regulation of DNA replication2,3,4,5. The pioneering discoveries of the arabinose operon (ara) in E. coli unraveled the looping-based mechanism, in which, in the absence of arabinose, a single monomer of AraC binds to the operator (araO2), and another AraC monomer interacts with a distal site (araI1), to generate DNA looping, thus preventing gene expression31. These looping-based regulatory studies allow us to decipher how a repressive mechanism could operate utilizing a remote upstream site in many other systems. In this work, we demonstrate that ScoC binds two distal operator sites and induces loop formation, and both operators are essential for complete repression, similar to the Lac repressor32. In the AraC system, it was elegantly demonstrated that the spacing between the two operators alters the level of repression with periodicities that are consistent with the 10.5-bp helical periodicity33. In addition, stable loop structures were detected only when the two lac operators were separated by an integral number of B-DNA helical turns from 6 to 2134. The two ScoC-binding sites in the oppA operator are 208-bp apart, providing exactly 20 helix-turns assuming ~10.4 bases per turn. Thus, the two operators are separated by an integral number of B-DNA helical turns. The FRET analysis indicated that ScoC can bring together two DNA ends. At increasing concentrations of ScoC (above 10 nM), DNA looping is prevented, consistent with the results of the gel retardation assays in which the slow migrating bands (the loop form) disappeared at high ScoC concentrations. This phenomenon can occur, for example, when two independent ScoC tetramers are bound to the ends of the same DNA element. Prevention of DNA looping can also occur if two different DNA molecules interact in trans via a single ScoC tetramer to form a sandwich (cross-jointed) structure. In sandwich structures in which two binding sites are on different molecules, the entropic cost is high, therefore, such structures are in fact rarely formed11. In the case of the Lac repressor system, the formation of such structures was enhanced by increasing the concentration of DNA molecules and the lac repressor or by shortening the space between the two operators to 63 bp34. In our case, we did not observe such structures either in the gel shift assays or the AFM experiments.

Our results indicate that ScoC is a tetramer in solution in its native state and is capable of binding two remote DNA sites, inducing DNA-looping. To the best of our knowledge, this is the first tetrameric structure within the MarR family members12,13. Hitherto, the MarR structures reported to exist as dimers with each subunit comprising a winged helix-turn helix DNA binding motif35. However, two proteins from the MarR family have been reported to exhibit the potential for a tetrameric oligomeric state, triggered either by the binding of a Cu²⁺ ion and subsequent oxidation of cysteine residues or by cooperative interactions between dimers36. Notably, in these cases, the proteins exist as dimers in their native and initial states, with tetramer formation occurring only under specific conditions. ScoC complexed with DNA has an overall shape of a triangular dimer and shows a similar binding pattern as with other MarR proteins. However, the relative position of each subunit to each other is somewhat different. If we overlay subunit II of different MarR proteins we can see that subunit I is in a different spatial location, thus the range of movement of the recognition helix and the wing is about 8 Å and 21 Å, respectively (Supplementary Fig. 6A). Interestingly, ScoC is longer by about 60 amino acids than most members of the MarR family and has additional secondary structures at the C-terminus of β4-β5. In the dimerization interface, helix α1 is extended by 19 amino acids, and helices α5 and α6 extended by ~45 amino acids (Supplementary Fig. 6B). These extensions provide additional interactions in the dimerization region, and presumably contribute to the stabilization of the dimer.

ScoC–DNA interaction in the minor groove is based on a highly conserved Arg100 residue and replacing it with alanine reduced the ability of ScoC to induce DNA looping. Indeed, arginine-minor-groove interactions for DNA recognition are extensively used by proteins possessing a winged helix DNA binding domain37. These proteins bind specifically via arginine to the DNA bases in the narrow minor groove38. The narrowing of the minor groove is known to be a result of consecutive ApA, TpT, or ApT bp steps. Such AT-rich sequences enlarge propeller twisting of the base pair, resulting in a local bend with enhanced negative electrostatic potential in the minor groove39,40, thus facilitating the binding of the positively charged arginine38. The correlation between the electrostatic potential and the minor groove width can be calculated41, and is shown for the ScoC binding sequences in Supplementary Fig. 7A. The presence of A-tract sequences in the ScoC binding sites tend to form a negative potential resulting in narrowing the minor grove and exposing binding site for Arg100 (Supplementary Fig. 7B). Taken together, the mechanism of ScoC-DNA recognition specificity is likely to be determined by sequence-dependent shape, in which the wing (Arg 100) interacts with the minor groove and increases affinity via indirect readout mechanism.

In the MarR proteins, the interactions made by the residues of the α4 helix with the major groove can vary significantly and can exhibit various types of binding42,43,44. DNA-binding activity in many MarR proteins was shown to be stimulated by interaction with a variety of signals, including small phenolic molecules, metals, or copper-mediated oxidation of cysteine residues. Upon ligand binding, the wHTH motif rotates toward its dimerization interface, preventing DNA binding35. The MarR ligand-binding pocket is located between α1, α2, and α5 helices and has several aromatic residues which can stabilize ligand binding via π–π interactions. Although ScoC exhibits a potential ligand-binding pocket similar to other MarR family members, it is possible that this pocket represents an evolutionary relic, as no specific ligand relevant to ScoC has been identified thus far (Supplementary Fig. 8). Interestingly, another structure of this protein (PDB 9FF4), obtained with a 17-bp dsDNA at a resolution of 2.70 Å and is not reported in this paper, revealed a hexaethylene glycol molecule (from the crystallization solution) bound within a pocket in the protein monomer. This pocket is located between helices α1, α2, and α5, and its location and structure are similar to binding pockets identified in other MarR proteins12,35,45,46.

In B. subtilis, the expression of the scoC gene is coordinated by multiple regulatory mechanisms (Supplementary Fig. 9A). ScoC represses its own expression, it is negatively regulated by the phosphorylation-dependent regulator SalA,47 the global regulator CodY48, and was reported to be activated by AbrB8. At present, no effector molecule that influences the binding of ScoC to DNA has been reported, and it appears that ScoC-mediated activity is governed by its expression and the interplay with other global transcriptional regulators10. As noted above, ScoC is directly repressed by CodY, and codY null mutants overexpress ScoC10,49,50. Another global regulator that prevents repression mediated by ScoC, is TnrA, which regulates nitrogen metabolism10. In B. subtilis, under nitrogen limitation, when TnrA is active, ScoC-mediated repression on oppA is reduced by four to five-fold, by yet an unknown mechanism10. Analyzing the regulatory region of the B. subtilis oppA gene revealed that TnrA- and ScoC-binding sites overlap each other51, suggesting that TnrA can prevent ScoC binding. This type of regulatory arrangement can extend to other ScoC-regulated operons as well, and may clarify how TnrA positively regulates expression from its far upstream TnrA binding site. In G. proteiniphilus strain T-6, the upstream operator site of the oppA gene contains a TnrA-binding site which overlaps the upstream ScoC-binding site, suggesting that TnrA competes with ScoC on the scoC operator site (Supplementary Fig. 9B). Thereby, under nitrogen limitation (active TnrA), DNA looping induced by ScoC is prevented, and the oligopeptide transporter operon (opp) is expressed, allowing transport and utilization of oligopeptides as alternative nitrogen sources. Unlike the Lac repressor, which binds to a perfect symmetric operator and two pseudo-operators (O2 and O3), ScoC mediates loop formation via two pseudo-operators. Using perfectly symmetric lac operators, binding increased by 100-fold in vitro52. In the case of ScoC-mediated regulation, it may be advantageous for the cell to avoid such tight repression and allow more flexible and responsive regulation from a single regulatory region.

Materials and methods

Bacterial strains

Geobacillus strains T-6 and T-1 were obtained following enrichment procedures for strains capable of producing alkaline-tolerant, extracellular, thermostable xylanases53. Following the genome sequencing of these two strains, they were reclassified as G. proteiniphilus strain T-6 GenBank: CP133076.1; and Geobacillus kaustophilus strain T-1, GenBank: CP133077.1. Escherichia coli strains used were XL-1 Blue for general cloning and BL21(DE3) for expression via the T7 RNA polymerase expression system. The lacZ-fusions were all constructed in B. subtilis SMY strain50,54.

Capillary primer extension analysis

Oligonucleotide, 5’GTTGTTGTTGTCCTTGCCTC 3’, (5′ 6-FAM labeled) was designed in the 3′ to 5′ orientation as to hybridize the oppA gene 90-bp from the start codon. Total RNA was isolated from mid-log culture grown on mBSM18 supplemented 1% glucose with the RNeasy kit (Qiagen GmbH, Hilden, Germany). Total RNA (50 μg) was incubated with the 5′ 6-FAM labeled oligonucleotide (10 μM) in 5× hybridization buffer (0.5 M KCl, 0.5 M Tris–HCl pH 8.3) for 1 min at 90 °C, 2 min at 6 °C, and 15 min on ice. cDNA synthesis was carried out using avian myeloblasts virus reverse transcriptase (Promega), in 5X RT buffer (Promega) and 5 mM dNTPs for 2 h at 42 °C. The size of the extension product was determined using GeneScan 3130 (Applied Biosystems) and the raw trace was analyzed using PeakScanner v1.0 (Applied Biosystems).

Construction of lacZ transcriptional fusions and β-galactosidase activity assay

B. subtilis SMY was used as a host for measuring expression of oppA-lacZ fusions. The transcriptional fusions were constructed by cloning a restricted XbaI and HindIII PCR fragments into the integrative plasmid pHK2355. B. subtilis containing various lacZ fusions at the amyE locus were isolated after transformation and selection for erythromycin resistance colonies. pHK23a (scoC373bp-lacZ) contains a 373-bp fragment with the entire oppA regulatory region of strain T-6, including two ScoC-binding sites. Plasmid pHK23b (scoC157bp-lacZ) contains a 157-bp fragment with only a single ScoC-binding site and plasmid pHK23c (scoC101bp-lacZ) contains a 101-bp fragment lacking any ScoC-binding sites. Insertion of specific replacements in conserved nucleotides in the ScoC-binding site was constructed using a homemade Gibson master mix as described at https://www.protocols.io/view/homemade-gibson-mastermix-kqdg35xdqv25/v2. Primers used for amplifying the PCR products for cloning and for Gibson assembly are specified in Supplementary Table 4. β-Galactosidase (LacZ) activity was determined by harvesting 1 ml of mid-log B. subtilis culture and resuspending the pellet in 1 ml of Z buffer (0.1 M Na-PO4 containing 10 mM KCl, 10 mM MgSO4, and 1% (vol/vol) toluene). LacZ specific activity was assayed at 25 °C using ortho-nitrophenyl-β-galactoside as a substrate and expressed as absorbance units (OD420) per min per culture turbidity at 600 nm.

Mutagenesis of His-tagged ScoC

The scoC open reading frame was cloned via PCR from the Geobacillus strain T-1. The primers (Supplementary Table 4) were designed to allow in-frame cloning of the gene into the T7 polymerase expression vector pET28a (Novagen) and contained six histidine codons to provide C-terminus His-fused product. Mutagenesis was performed using the QuikChange site-directed mutagenesis kit (Stratagene, La Jolla, CA, USA). The mutated genes were sequenced to confirm that only the desired mutations were inserted. For protein production, the various constructs were expressed in 0.5 l E. coli BL21(DE3) cultures, grown overnight in Terrific Broth with kanamycin (25 μg/ml) in 2 l baffled shake flasks shaken at 230 rpm at 37 °C. Induction was carried out at about 3.5 OD600 by adding IPTG to 0.4 mM and letting the culture grow overnight to a final turbidity of 7–8 OD600. The cultures were harvested (8000 rpm for 15 min), and the cells were resuspended in 45 ml of buffer (20 mM imidazole, 20 mM phosphate buffer, 500 mM NaCl, pH 7.0), disrupted by five passages through a homogenizer (Avestin, Emulsiflex), and centrifuged (14,000 × g for 15 min) to obtain soluble extracts. The soluble extract was heat-treated at 52 °C for 30 min, followed by centrifugation (14,000 × g for 15 min). The soluble fraction was mounted on an ÄKTA-Avant FPLC system equipped with 5-ml His-trap column (GE Healthcare), according to the manufacturer’s instructions. For gel shift analyses, purified ScoC was dialyzed overnight against 2 l of buffer containing 50 mM Tris-HCl (pH 7.0) and 100 mM KCl followed by the addition of EDTA (1 mM) and glycerol (10%).

Gel filtration analysis

To determine the oligomeric state of ScoC in solution, the protein was applied on a Superdex 200 Increase 10/300 GL gel-filtration column (GE Healthcare Life Sciences) of 24 ml total volume using an ÄKTA-Avant system (GE Healthcare). Protein samples (0.2 ml) were applied onto the column and eluted at room temperature with a solution consisting of 50 mM Tris-HCl buffer pH 7.0, 100 mM NaCl, and 0.02% sodium azide, at a flow rate of 0.5 ml min−1.

Mobility-shift DNA-binding assays

Fluorescently labeled DNA fragments (316-bp and 247-bp) from the genomic DNA of G. proteiniphilus strain T-6, were generated via PCR, using Cy5-labeled primer (Supplementary Table 4). Binding was performed in a binding reaction mixture (30-µl total volume) containing 20 µl of solution comprised of 50 mM Tris-HCl (pH 7.5), 100 mM KCl, 10% glycerol, 1 mM EDTA, 2 µg of salmon sperm DNA, 0.66 mM dithiothreitol, 33 µg of bovine serum albumin, 0.27 pmol of labeled probe, and 0–80 nM of ScoC. The binding mixture was incubated for 30 min at 45 °C. One microlitre of 0.05% bromophenol blue was added to the reaction mixture, and then separated on a 4% nondenaturing polyacrylamide gel (acrylamide/bis-acrylamide was 37.5:1) prepared in Tris-borate-EDTA buffer. The fluorescence migrating bands were detected directly on the gel using the Fusion FX system (Vilber).

FRET assay

Cy3 or Cy5 labeled primers were used to amplify a 248-bp genomic element which contained at both ends ScoC-binding sites (Supplementary Table 4). The purified PCR product was cleaned using the Wizard® SV Gel and PCR Clean-Up System (Promega), and diluted to 10 nM in 50 mM Tris-HCl pH 7.0, 100 mM NaCl, and mixed with ScoC at 25 °C. Following 2 h of incubation, the fluorescence intensity (Excitation at 540 nm emission at 680 nm) was determined using SYNERGY H1(BioTek) microplate reader.

Atomic force microscopy

A 316-bp genomic DNA fragment containing two ScoC binding sites was amplified via PCR (Supplementary Table 4) and the DNA product was cleaned. Purified ScoC (40 nM) was incubated with 9 nM of the 316-bp DNA fragment at room temperature for 16 h. Prior to the AFM measurements, 18 µl of ScoC-DNA sample was mixed with 2 µl of 20 mM MgCl2 solution, then a 20 µl sample was placed on a freshly cleaved mica surface and incubated in a humid atmosphere for 30 min. The sample was then washed twice with water for 5 min and dried under nitrogen flow. Control samples of ScoC alone (50 nM) and DNA (0.5 µM) were prepared similarly. A bioScope ResolveTM BioAFM (Brucker, Santa Barbara, CA, USA) was used for topography scanning of the samples. Height sensor topography images were measured in Peak Force QNM mode using ScanAsyst Air probes (Bruker, 70 kHz, 0.4 N/m). Cantilever spring was calibrated using the Thermal Tune noise method. All data acquired were processed using NanoScope Analysis software.

Crystallization of ScoC and ScoC-DNA complex

Purified ScoC (6.9 mg/ml), in 50 mM Tris-HCl pH 7.0, 100 mM NaCl, and 0.02% sodium azide, was screened using the hanging drop-vapor diffusion method at 24 °C. Diamond-shaped crystals were obtained after one day in a solution containing 22% PEG 3350 and 8% Tacsimate pH 4 (Hampton Research). The crystals were soaked in a cryo-protecting solution containing 15% glycerol and a full X-ray diffraction data set was obtained at ESRF to 3.15 Å resolution, indicating that they belong to the tetragonal space group I4122. Indexing and integration of the diffraction data were conducted using XDS, scaling and merging using Aimless through CCP4I2. Data collection details and statistics are summarized in Table 1.

To crystallize ScoC in a complex with DNA, purified ScoC (20 mg/ml) in 50 mM Tris-HCl pH 7.0, 100 mM NaCl, and 0.02% sodium azide, was mixed with a 1.5-fold molar excess 23-bp dsDNA (Supplementary Table 4). The dsDNA was produced by mixing 8 mg of each of the complementary synthetic oligos in 0.4 ml buffer (50 mM Tris-HCl pH 7.0, 100 mM NaCl). The solution was heated to 95 °C for 5 min, cooled down to 30 °C for 10 min, to 10 °C for 30 min, and finally cooled to 4 °C on ice. The ScoC-DNA complex was screened for crystallization conditions using the mosquito LCP crystallization robot (TTP Labtech). Needle crystals of the complex were obtained after 7 days at 24 °C in 40% 2-Methyl-2,4-pentanediol and 0.08 M magnesium acetate, and were further improved by seeding. A full diffraction data set was obtained to 3.5 Å resolution, and the resulting crystals belong to the monoclinic space group PI2II. Processing of the diffraction data was conducted using XIA2 DIALS at ESRF, and reduction using Aimless through CCP4I256. Data collection details and statistics are summarized in Table 1.

Structure determination of ScoC and ScoC-DNA

Data for the ScoC structure were collected at the ESRF ID30 beamline using a beam wavelength of 0.96 Å at a temperature of 100 K. The structure of ScoC from strain T-1 was determined by molecular-replacement (MR) using as a model the B. subtilis ScoC (PDB code 2FXA) and the Phaser in PHENIX software suite46. One monomer of ScoC was located in the asymmetric unit, and the coordinates and maps obtained were improved using COOT57, while the structure was refined in PHENIX46. The final structure was refined to Rfactor and Rfree values of 0.233 and 0.249, respectively (Table 1). Out of the 201 aa of the protein, several aa could not be refined in the structure, including the first five N-terminus residues and the last 19 residues at the carboxy end. MOLPROBITY within the PHENIX suite was used for validation, the Ramachandran plot indicated that 97.1% residues were in the “most favored” regions, no residues were in the “disallowed” regions, and the overall B-factor average was 97.8 Å2.

The structure of the ScoC-DNA complex was determined by MR using Phaser in PHENIX based on the solved structure of ScoC. Data were collected at the ESRF ID23-2 beamline using a beam wavelength of 0.87 Å at a temperature of 100 K. The initial electron density map showed unmodeled regions and by using COOT, the 23-bp dsDNA was modeled in the density map and further refined with PHENIX. The final structure gave Rfactor and Rfree values of 0.233 and 0.282, respectively (Table 1). Out of the 201 aa of the protein, several residues could not be refined in the structure, including the first five N-terminus residues and the last 16 residues at the C-terminus. The Ramachandran plot indicated that 96.7% residues were in the “most favored” regions, 0.14% residues in the “disallowed” regions, and the overall B-factor average for the protein was 94.21 Å2 and for the DNAs was 141.18 Å2. The structural and refinement parameters are summarized in Table 1. The figures of the structures were prepared in Chimera and ChimeraX58.

ScoC SEC-SAXS data

SEC-SAXS was performed for ScoC on BM29 BioSAXS beamline at ESRF, France. ScoC was prepared at ~12.5 mg/ml in 50 mM of Tris-HCl pH 7.0, 100 mM NaCl, 0.02% sodium azide, which was also used for buffer scattering subtraction. The protein was passed through a Superose 6 column, and measurements were executed with a Pilatus 1 M detector at 293 K, λ = 0.990 Å, and included 200 frames of 1 s exposure to a 700 × 700 μm2 X-ray beam on a 50 μl sample. The measured scattered intensity ranged from q = 0.08 to 2.27 Å−1. Initial data processing was performed with the beamline software BsxCuBE, and further analyses were done with the SAXS software ATSAS59. The final scattering curve used for structure analysis included data from q = 0.007–0.037 Å−1, and the Guinier plot for ScoC (qRg = 0.25–1.30) showed medium linearity (adjusted R2 = 0.9644). Using the Guinier plot approximation, a plot of I(q) and q2 is linear for q < 1.3/Rg, the radius of gyration (Rg) of the protein was calculated to be 34.77 ± 0.37 Å, and I0 to be 3577.00 ± 28.54. The PRIMUS software60 was used for data truncation, and GNOM was used for calculation of the pair distribution function P(r). The SAXS data allowed to determine the molecular envelope of ScoC using 50 independent ab initio models obtained from DAMMIN61. The models were generated with no imposition of structure or symmetry, and show a good fit to the experimental data, which is χ2 values ranging between 0.9551 and 0.9555 for each of the 50 models. The improved envelop was obtained using DAMAVER62 for the 50 ab initio models. Using SUPCOMB63 we obtained the superposition of the SAXS-based models with the crystallographic atomic structure of ScoC, and fitting of the experimental SAXS curves to theoretical curves was calculated with the program CRYSOL64.

Nucleotide sequence accession numbers and PDB codes

The nucleotide sequence of the scoC gene and the opp oligopeptide operon have been assigned GenBank accession numbers WMJ20557.1 and WMJ17723.1, respectively. The atomic coordinates of the ScoC and ScoC–DNA complex have been deposited in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank under accession codes 9FA0 and 9FF5, respectively.

Statistics and reproducibility

The data in Fig. 1B, Fig. 2A, and Supplementary Fig. 5 present three independent experiments.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.