Introduction

Transcription is the first step in extracting the information stored in DNA for RNA production1,2. RNA polymerase (RNAP), the enzyme responsible for RNA production, is a multi-subunit protein that must be directed to the correct gene for RNA production to start1,2,3. The mediator complex4,5 acts as a bridge between RNAP and transcription factors (TFs), while the TFs localize RNAP to the correct gene on the DNA to initiate transcription and promote elongation2,3,6,7,8,9,10,11. The bundling of DNA by histone proteins for the compact storage of DNA in chromatin bundles affects the availability of a gene for TF binding, restricting transcription to active chromatin regions2,12,13,14. The transcription rates of different genes are controlled through transcriptional (or gene) regulatory networks (TRNs)15 by the recognition of specific promoter regions on DNA by the TFs, which are located near the targeted gene in space6,16,17. Independent gene regulation requires multiple differential transcription activation pathways within a TRN to enable selective gene expression to ensure that the correct genes are transcribed by the associated TF18,19,20,21. The TRN state of a cell is cell-type specific, so upon cell differentiation the selection of the specific TRNs required for cell function leads to the selection of TFs that are essential for those specific TRNs22. This means some TFs might never be produced by the same cell and therefore may never have an opportunity to interact with TFs in the mutually exclusive TF groups. In humans around 1600 different TFs have been identified, that can be divided into around 35 different family groupings23,24, indicating the immense diversity in human TFs and the ability to establish multiple independent TRNs. Identification of the underlying TF interactions requires the collection of a vast amount of proteomics data, generated with ChIP-seq, into databases such as ChIP-Atlas25.

In recent years it has been discovered that gene transcription is not only a biochemical process; spatial localisation also plays a key role. Super-enhancer regions spatially contribute to increased expression of specific genes, with down regulation of other genes26,27,28,29,30. The formation of such super enhancer regions and the associated transcriptional regulation has been increasingly linked to principles of polymer-based phase separation (PS)18,26,28,30,31,32,33,34,35,36,37,38. PS has been identified in the recent decade to drive the formation of membraneless organelles (MLOs) in several areas of cell biology39,40,41,42. The spatial localisation of TFs through MLO formation could assist mediator-RNAP binding to the correct gene during transcription initiation5,29,43,44. Indeed condensation of the RNAP CTD45, the mediator complex44 and chromatin13,14 have been identified, which could form a critical component in the further subcompartmentalisation and organisation within super enhancer regions2,29. Post-translational modification of RNAP, through the phosphorylation of the C terminal domain, is an essential part of the transcription cycle2,45. Such modification can have an effect on RNAP-TF interactions, therefore enabling the transfer of RNAP to different TF condensates45.

Understanding polymer phase separation within a theoretical framework started with Flory46 and Huggins47, where the relative strength of homotypic and heterotypic interactions of the polymer and solvent determine whether one homogeneous phase or two heterogeneous phases exist. Higher order mixtures of polymers and solvents have been shown to display rapidly increasing complexity in the resultant systems48,49,50,51, highlighting the complex balance between competing homotypic and heterotypic interactions. PS is driven by energetically favourable multivalent interactions, that counter the entropy loss of polymer demixing, which in proteins are from well structured or low complexity domains (LCDs)28,39,44,52,53,54,55,56. Changing the residue composition in LCD systems has been shown to enable the control of the PS behaviour, including changing the topology, by the tuning of critical interactions55,57,58,59,60,61,62. The observed transition from a fully demixed state (single homogenuos condensate) to a partially demixed state (with a multiphasic core-shell topology) in multicomponent systems57,58,59 upon residue modifications show that fine topological control of condensates is possible. The protein-protein interactions present in the resultant phase also enable the formation of phase-spanning percolation networks, resulting in the concept of phase separation coupled to percolation (PSCP) in MLO42,56,63.

TFs are typically composed of two main types of domains: DNA binding domains (DBDs) and effector domains (EDs)24. TF DBDs are responsible for binding specific DNA sequences (promoter regions) near the target gene to localise the TF, whereas TF EDs control target gene expression through other mechanisms such as interactions with cofactors, enzymes and mediator, RNAP recruitment, histone modification and DNA methylation24. LCDs are common features of transcription factors5,23,64 particularly the EDs24, while the RNAP C terminal domain region also contains a LCD65,66. TF condensate formation, predominantly mediated by LCD interactions of the TF EDs, is a mechanism for transcription control that enables colocalisation of the required proteins at the correct gene5,24,29,43,67. The formation of TF condensates at SE regions, such as SP167, BRD426,43, and OCT417,35, indicates their importance in gene regulation28,52. The interaction of these LCDs can be important for RNAP binding, with the size of the RNAP C terminal domain modulating the uptake into condensates mediating the transcription rates66,68,69. Interactions of TF condensates with mediator condensates is also likely to play an important role in transcription initiation at SE regions26,44. Disruption and misregulation of TF condensates are oncogenic drivers, leading to promotion of aberrant gene transcription behaviour64,67,70,71.

Understanding the driving forces for TF condensate formation is therefore essential to understand how gene expression can be regulated, and how it can be disrupted in the case of disease. In this work we aim to identify the molecular interactions that drive selective condensate formation, by focusing on a range of transcription factors from three different transcription families (FET, SP/KLF and HNF) that have LCDs previously identified to contribute to PS32,34,72,73. It has been hypothesized that the selectivity of TFs is caused by differential phase separation propensities32,34,74,75. Here we test this hypothesis by analyzing ternary phase diagrams of six TF LCDs that undergo PS to form condensates on their own under biological conditions32,34,72.

We use coarse-grained molecular dynamics simulations at amino-acid resolution72,76,77,78,79,80 to determine the molecular interactions that drive TF selectivity. Results are presented in terms of ternary phase diagrams, residue-scale contact maps and radial density profiles. It is important to note that our ternary diagrams represent a slice of the quaternary phase space (the four species are water plus three proteins) undertaken at a constant water fraction48,49,50,51. These revealed two distinct driving forces for condensate formation, consisting of hydrophobic interactions (from aromatic and aliphatic residues) and electrostatic/cation-π interactions (from cationic residues interacting with anionic or aromatic residues). We identified four dominant amino acid types, i.e. aromatic, aliphatic, cationic and anionic residues, responsible for driving collective phase separation forming four different sticker motifs in the conceptual ’sticker-spacer’ model of polymers56. The sequence composition of the TFs enables control over the homotypic and heterotypic intermolecular interactions, and therefore condensate selectivity. The relative homotypic and heterotypic strengths result in four distinct droplet morphologies: marbled (homogeneous), coated, bimodal, or separated droplets (heterotypic interactions are significantly weaker than homotypic interactions). The ability of the LCD of RNAP (POL II) to interact with condensates of all TFs highlights its central position and points to a universal principle in which selective transcription is regulated by sequence-based homotypic and heterotypic phase separation.

Results

Selection of transcription factors

Droplet simulations were undertaken of six human TF IDRs from three different TF families: the FET family (FUS residues 2–214, EWS residues 47–266, TAF15 residues 2–205), the SP/KLF family (SP1 residues 2–507, and SP2 residues 1–524), and the HNF family (HNF1A residues 280–631) to explore the orthogonal driving forces for phase separation (see the amino acid sequences for the IDRs used in Fig. 1 and Supplementary Table 1). The FET TFs were selected due to a large body of experimental evidence of the IDR being crucial in the phase separation behaviour for FUS, EWS and TAF1532,34,73,81,82. Additional evidence also exists for SP132,34,67 and HNF1A72. SP2 was chosen as another member of the SP protein family. The IDR sequences selected correspond to those used in the experimental work of Chong et al.32,34 for FUS, EWS, SP1 and TAF15, the work of Kind et al.72 for HNF1A, and the sequence of SP2 by the use of PONDR-VSL283. We examine a large range of relative compositions of the ternary compositional space to fully describe the key motifs responsible for selective partitioning of the IDRs. To understand how the balance of homotypic and heterotypic interaction strengths varies for different members within the same family or with a different family, simulations of the seven key points on phase diagrams for different ternary combinations were conducted as shown in Table 1. Finally we conducted a study of TF-POLII (residues 1546–1790) condensate interactions. The name of the TF or POLII will be used to name the IDR fragment for brevity in the rest of the manuscript, unless explicit mention of the full length protein is made. Simulations were done at a monovalent ion concentration of 150 mM, with the maximum number of molecules set to 120 for FUS, EWS, and TAF15, 60 for SP1, SP2, and POLII, and 90 for HNF1A, so that the maximum number of amino acids per molecule type is approximately the same. PS is observed across all compositions of the TFs and POLII studied.

Fig. 1: Coarse-grained representation and residue composition of the human TF IDRs studied and POL II C-terminal IDR.
Fig. 1: Coarse-grained representation and residue composition of the human TF IDRs studied and POL II C-terminal IDR.
Full size image

a FUS (residues 2–214), b TAF15 (residue 2–205), c SP1 (residues 2–507), d SP2 (residues 1-524), e EWS (residues 47–266), f HNF1A (residues 280–631), g POL II (residues 1546–1790). Residues are categorised into 5 groups: cations (R, K)- red, anions (D, E)- blue, aromatic (F, Y, W)- green, aliphatic (A, C, I, L, M, P, V)- black, hydropillic (G, N, S, H, Q, T)- white. The hydrophillic residues in the 1 bead-per-residue molecular representation are coloured differently to distinguish the proteins studied in this work: FUS (yellow), TAF15 (green), SP1 (purple), SP2 (light purple), EWS (orange), HNF1A (red), POL II (blue).

Table 1 Summary of transcription condensate systems studied by molecular composition and the corresponding figures with simulation images

In the next section we start out by analyzing the FUS-SP1-TAF15 system in detail as a reference system that has been studied experimentally34.

Aromatic, aliphatic and cation-π interactions drive simple coacervation of TFs in the FUS-SP1-TAF15 system

Pure FUS, SP1 and TAF15 all exhibited strong homotypic phase separation forming stable droplets, as shown at the corners of the ternary phase diagram of Fig. 2 (with convergence data in Fig. 3). The molecular interactions driving PS can be determined by examining the molecular contacts between the amino-acids for the single component systems, shown in Fig. 4. The hydrophobic π − π interactions between tyrosine residues are the main driving force for PS in FUS (Fig. 4a, d and g). For FUS the 1D summations of interactions (Fig. 4a) show increased contacts around the locations of aromatic residues, highlighted with black dashed lines. This is clearly indicative of the ’sticker-spacer’ model56 with the tyrosine residues acting as stickers that drive phase separation and in agreement with the in vivo work of Kang et al.73. Interactions with and between the glycine, serine, and glutamine are also present in FUS, with a significant decrease in interactions around anionic residues, due to the repulsive electrostatic nature, leading to white streaks on the contact map. The contact maps for the SP1 simulation (Fig. 4b and e), show a completely different interaction profile: here aliphatic (alanine (A), isoleucine (I), leucine (L), proline (P) and valine (V) residue) hydrophobic interactions are key contributors to PS due to the high abundance of these residues in SP1, that is also reflected in the 1D summations of the interactions. The SP1 N-terminus has much lower contact frequencies due to a greater density of ionic residues, which have repulsive interactions with the aliphatic residues that constitute the majority of the SP1 molecules. This arises from the hydrophilic nature of the ionic residues, making the interactions with neutral hydrophobic aliphatic residues repulsive.

Fig. 2: Ternary phase diagram of the three TFs with snapshots of the two-component droplets along the edge of the diagram.
Fig. 2: Ternary phase diagram of the three TFs with snapshots of the two-component droplets along the edge of the diagram.
Full size image

Ternary phase diagram showing the simulations undertaken in this work (orange dots) run with a total amino acid concentration of 80,000 μM. The total composition is defined relative to 120 FUS molecules, 120 TAF15 molecules, and 60 SP1 molecules, to give the percentage compositions (% FUS, % SP1, % TAF15). The end frame of 3 μs of simulation is displayed as a representative state. FUS molecules are coloured in yellow, SP1 molecules are coloured in purple, and TAF15 molecules are coloured in green. Droplets are formed in all simulations of these molecules under these concentration conditions, irrespective of composition.

Fig. 3: Number of molecules in clusters of specific sizes as a function of simulation frame for FUS-SP1-TAF15 mixtures.
Fig. 3: Number of molecules in clusters of specific sizes as a function of simulation frame for FUS-SP1-TAF15 mixtures.
Full size image

a (100, 0, 0) b (0, 0, 100) c (75, 25, 0) d (75, 0, 25) e (12.5, 25, 62.5) f (33, 33, 33). Composition as a percentage is expressed by (% FUS, % SP1, % TAF15). Molecular connectivity data is sampled at 5ns intervals.

Fig. 4: Intermolecular contact maps for homotypic interactions in single-component droplets.
Fig. 4: Intermolecular contact maps for homotypic interactions in single-component droplets.
Full size image

Intermolecular contacts by residue index for (a) 100% FUS, (b) 100% SP1, and (c) 100% TAF15, at 150 mM ion concentration and 300 K. The contacts are averaged in time and normalised by the number of molecules in the simulation (see section 3.2 in the SI for more details). A 1D contact profile (summation of the 2D map) is included below the contact map to show the total interactions per residue index (\({N}_{{\mathsf{contact}}}\)). The black dashed lines in (a) and (c) highlight the residues with the most contacts in FUS and TAF15 (which correspond to peaks in the 1D profiles). These are the aromatic residues in FUS driving π-π aromatic contacts, and the aromatic and cationic residues in TAF15 driving aromatic and cation-π interactions. Broad peaks are seen in (b) corresponding to the aliphatic residue stretches in SP1. Sequence information is included below the 1D summaries (copies of data displayed in a–c. Intermolecular contact map by residue type for (d) 100% FUS, (e) 100% SP1, and (f) 100% TAF15. The contact maps in (d), (e) and (f) are similar to the contact maps by residue index in (a), (b), and (c), respectively, but aggregated by residue type. A 1D contact profile (summation of the 2D map), are also included below (\({N}_{{\mathsf{contact}}}\)) together with the abundance for the residues (\({N}_{{\mathsf{residue}}}\)) shown by blue dashed lines. Intermolecular interaction summary for (g) FUS in 100% FUS (h) SP1 in 100% SP1, and (i) TAF15 in 100% TAF15 at 150 mM and 300 K. The fraction of interactions, \({{\mathsf{F}}}_{{\mathsf{int}}}\), are aggregated by type and normalised by the total number of the intermolecular interactions in (ac) respectively. Aromatic and aliphatic interactions denote aromatic-aromatic and aliphatic-aliphatic interactions respectively. This convention is used throughout this work. Details of the contact definitions can be found in the methods.

In contrast to the purely hydrophobic driving forces for FUS and SP1, TAF15 PS has significant contributions from cation-π interactions in addition to the dominant aromatic contacts as can be seen in the contact maps in Fig. 4c and f. For TAF15 the 1D summations of interactions show increased contacts around the locations of cationic and aromatic residues indicative of cation-π interactions (black dashed lines in Fig. 4c). The N-terminus has a large number of tyrosine residues, with a relatively low number of anionic residues. The C-terminus, on the other hand, contains the majority of the cationic residues, leading to the high number of cation-π (arginine-tyrosine) contacts between the opposite ends of two TAF15 proteins. The central region contains a greater concentration of anionic residues, leading to a decrease in contacts, because it is self-repulsive and also repels the N-terminus, leading to the broad white bands. Interactions between glycine, serine and glutamine residues within TAF15 also contribute to PS due to the high abundance of these residues, but they are significantly weaker than the arginine-tyrosine interactions (see Supplementary Fig. 36C for the contact map normalised by residue abundance).

Figure 4g–i shows that the most important interactions are aromatic interactions for FUS and TAF15, and aliphatic interactions for SP1. Cation-π interactions also play a substantial role for TAF15. It is striking to see that for FUS and TAF15, despite the large difference in spatial contact maps (Fig. 4a and c based on residue index), the nature of the molecular interactions (Fig. 4g and i), and the residue-based contact maps (Fig. 4d and f) are remarkably similar, with the appearance of additional arginine-tyrosine (cation-π) contacts for TAF15.

The locations of the sticker residues within the molecules have a distinct effect on the topology and density of the single component droplets. FUS has a relatively even distribution of stickers throughout the protein allowing for the formation of relatively spherical droplets. TAF15 has a bimodal distribution of stickers located at the two termini, and a repulsive spacer region in the centre that repels other chains. This leads TAF15 to form a more open and porous structure where the droplet is formed of a collection of associating smaller TAF15 clusters (Fig. 2, density profiles in Supplementary Fig. 28). Increasing the system size (increasing the number of copies of molecules) does not change this behaviour (Supplementary Fig. 9), indicating that finite size effects do not play a role here. In all cases highly dynamic contacts are seen with contact lifetimes in the range 4-6 ns (Supplementary Fig. 10), showing that highly dynamical cross links are present42,79. Such dynamical molecular networks (Fig. 3) indicate PSCP since percolations exist within the droplets42,56,63,79.

Heterotypic interactions compete with homotypic interactions

As previously mentioned, PS is observed for all compositions, but the droplet morphologies and the molecular interactions change in nature as a second component was introduced, shown in Fig. 2. FUS-TAF15 mixtures form droplets with both proteins mixed and uniformly distributed throughout the condensate, with the droplet becoming increasingly less spherical with increasing TAF15 fraction. Whereas for FUS-SP1 mixtures the condensate is more spherical, with FUS-SP1 being well mixed at high SP1 fractions, before FUS starts to form a coating on the surface of an SP1 droplet at high FUS fractions. Interestingly, for TAF15-SP1 mixtures the condensate morphology is completely different. A singular mixed droplet is not favoured, instead the molecules are partitioned into two condensates: a TAF15 condensate and a SP1 condensate which do not interact.

Before examining the contact maps to clarify this, it is important to note that the intermolecular contact maps for the homotypic interactions of FUS, TAF15 and SP1 shown in Fig. 4 for the single component droplets are identical to the homotypic intermolecular contact maps observed in mixed systems (see Supplementary Figs. 34-60 in the SI). In this section we therefore exclusively focus on the heterotypic interactions between the two components in a system.

FUS-SP1 intermolecular interactions, shown in Fig. 5a and d, are primarily hydrophobic interactions driven by the large number of aliphatic residues in SP1 and tyrosine residues in FUS. The cation-π interactions appearing sharply in Fig. 5a are only a small part of the overall FUS-SP1 interactions with aliphatic-aromatic interactions the main FUS-SP1 intermolecular interactions (Fig. 5d and g). FUS-TAF15 intermolecular interactions, shown in Fig. 5b are driven by aromatic interactions between tyrosine residues in FUS and TAF15. Additional cation-pi contacts, from cations in TAF15 interacting with the tyrosines in FUS are also observed (Fig. 5e and h). Such cation-π interactions have previously been found to be crucial in the PS of full length FUS68,73,82, explaining why these smaller FUS and TAF15 IDR constructs exhibit a high favourability of co-condensation. It is expected that the inclusion of additional FUS domains would reduce the degree of FUS-TAF15 co-condensation due to the increased electrostatic repulsion between the FUS and TAF15, and the creation of more stable FUS condensates27,73,84. SP1-TAF15 intermolecular interactions are minimal, shown in Fig. 5c by the 1D contact summaries being two orders of magnitude smaller than for FUS-SP1 or FUS-TAF15 contacts in Fig. 5a and b. The large number of anionic residues in the centre and C-terminal of TAF15 (TAF15 has a relatively high net negative charge per residue of −12/204) only offer repulsive interactions with the large number of aliphatic and anionic (net charge per residue of −8/506) residues in SP1. This restricts favourable SP1-TAF15 interactions to much smaller chain segments, around cations in SP1, and the formation of very few contacts. The SP1-TAF15 interactions are much more localised than between SP1 and FUS, or TAF15 and FUS, or the homotypic interactions, making any mixing of the two molecules in one condensate unfavourable. By looking at the residue-based contact maps in Fig. 5d–f, the stickers and spacers of FUS and TAF15 are strikingly similar to each other and rather different from that of SP1, clearly highlighting that they originate from two different families of TFs.

Fig. 5: Intermolecular contact maps for heterotypic interactions in two-component droplets.
Fig. 5: Intermolecular contact maps for heterotypic interactions in two-component droplets.
Full size image

Intermolecular contact map by residue index for (a) FUS with SP1 (50%, 50%), (b) FUS with TAF15 (50%, 50%), and (c) SP1 with TAF15 (50%, 50%) at 150 mM and 300 K. For the definitions of the different contact types see the caption of Fig. 4 and section 3.2 of the SI. The 1D contact profiles denote a summation of the 2D map of the corresponding molecules. The black dashed lines highlight the key residues: (a) the aromatic residues in FUS with cationic and aromatic residues in SP1, (b) aromatic residues in FUS with the aromatic and cationic residues in TAF15, and (c) the cationic residues in SP1 with the aromatic residues in TAF15. Sequence information is included below the 1D summaries (copies of data displayed in ac). Intermolecular contact map by residue type for (d) FUS with SP1 (50%, 50%), (e) FUS with TAF15 (50%, 50%), and (f) SP1 with TAF15 (50%, 50%) at 150 mM and 300 K. Intermolecular interaction summary for (g) FUS-SP1 interactions in (50%, 50%, 0%), (h) FUS-TAF15 interactions in (50%, 0%, 50%), and (i) SP1-TAF15 interactions in (0%, 50%, 50%) at 150 mM and 300 K. The fraction of interactions, \({{\mathsf{F}}}_{{\mathsf{int}}}\), are aggregated by type and normalised by the total number of the intermolecular interactions in (ac) respectively.

The normalised number of interactions between different species provides a method for comparison (full data for all simulations can be found in Supplementary Tables 9-13 and Supplementary Figs. 6-8 in the SI). The single component mixtures all have comparable numbers of homotypic interactions for FUS (0.015), SP1 (0.016), and TAF15 (0.014) with TAF15 forming slightly more open condensates. In the two component mixtures we see the emergence of a difference in the relative strengths of the homotypic and heterotypic interactions. Three possible cases exist for binary mixtures. In the first case the heterotypic interactions are stronger than the homotypic interactions (FUS-TAF15), then mixing of the two species in a marbled condensate is preferred (see (75,0,25) in Fig. 2 and Supplementary Fig. 6B). In the second case when the homotypic interactions are both stronger than the heterotypic interactions (SP1-TAF15), then a bimodal/separated condensate system is preferred (see (0, 50, 50) in Fig. 2 and Supplementary Fig. 6C). For the third case the homotypic and heterotypic interactions are of a comparable strength (FUS-SP1), leading to a coated (or core-shell) condensate structure (see (75, 25,0) in Fig.2 and Supplementary Fig. 6A).

Ternary systems display complex condensate morphology driven by interaction orthogonality

Moving into the centre of the FUS-SP1-TAF15 phase diagram we now see the competition between the attractive groups in the different molecules for interactions in the resultant ternary condensates. A combination of the previously observed morphologies is now seen, as shown in Fig. 6. Attractive interactions of FUS with both SP1 and TAF15 are opposed by the repulsive interactions between SP1 and TAF15 in the formation of a single heterogeneous condensate (see radial density profiles in Supplementary Figs. 28-29). Competition between SP1 and TAF15 for the interactions with FUS leads to a complex, composition-dependent behaviour of the droplets. In high FUS fraction compositions a single droplet is observed with SP1 and TAF15 localised at different ends of the condensate such that SP1-TAF15 contacts are minimised. At low FUS fractions a singular large condensate is lost, and instead separate droplets are formed containing only TAF15 with FUS inclusions or SP1 with FUS inclusions, see e.g., the (25, 50, 25) and (12.5, 12.5, 75) compositions. This transition to separate droplets occurs at 50% SP1 for SP1 rich systems, whereas TAF15 rich systems require 75% TAF15 for analogous behaviour.

Fig. 6: Snapshots of a selection of the ternary droplets.
Fig. 6: Snapshots of a selection of the ternary droplets.
Full size image

The percentage compositions are described in the image. The end frame of 3 μs of simulation is displayed as a representative state. Composition as a percentage is shown next to the droplets in brackets: (% FUS, % SP1, % TAF15). Droplets are formed in all simulations of these molecules under these concentration conditions, irrespective of composition. The relative interaction strengths are displayed on this diagram for the homotypic and heterotypic contacts. The definition of this quantity can be found in section 4.7.3.

The intermolecular contact maps for the three-component mixtures do not yield any extra information compared to the contact maps from the two-component mixtures. The intermolecular contact maps for all residue pairings exhibit the same interaction patterns, with only the absolute magnitude of each contact map changing (see Supplementary Figs. 34-60 in the SI). From the convergence data (Fig. 3) it can be deduced that at all times percolations exist within both homotypic as well as heterotypic droplets. The intermolecular interactions in the TF condensates, however, are not at all permanent; they are highly dynamic with contact lifetimes on the order of 5 ns (see Supplementary Fig. 10) such that the cross-links between TFs are transient in nature, similar to FG-nucleoporin condensates79. The nature of this phase state can best be categorized as viscoelastic: on one hand the continuous percolations suggest elastic properties, while their transient nature is indicative of a fluid, with mostly spherical morphologies and the ability of two droplets to merge into one42,56,63,79,85,86.

In the ternary mixtures (Fig. 6) the competition for interactions is dominated by the competition for the strongest heterotypic interactions. The strongest heterotypic interaction is a function of composition, with FUS-TAF15 the strongest for the majority of compositions, and FUS-SP1 interactions the strongest at 75% SP1 content, and SP1-TAF15 the weakest for all compositions (Supplementary Fig. 6). The dominance of the FUS-TAF15 interactions drive the formation of FUS-TAF15 marbled condensates, since it out-competes the other heterotypic interactions. Since the FUS-SP1 interactions are also strong, but weaker than SP1-SP1 interactions this promotes the FUS to coat the SP1 droplet. This results in a complex bimodal structure where a FUS-TAF15 marbled droplet is partially coating a SP1 droplet. The intriguing behaviour shown in the simulations is that if sufficient FUS is present, both SP1 and TAF15 are incorporated into the same bimodal droplet even though there interactions are very unfavourable (<0.001). At lower FUS compositions the ability of FUS to provide sufficient interactions to shield TAF15 from SP1 in the same droplet is reduced and eventually leads to the fragmentation of the droplets as in the binary SP1-TAF15 systems. A characteristic feature of the ternary interactions in Supplementary Fig. 6 is that the homotypic interactions of SP1 and TAF15 can increase with droplet composition. This is particularly strong for SP1 in high TAF15 fraction mixtures and TAF15 in high SP1 fraction mixtures (Supplementary Figs. 6(D)-S7(F)), indicating that the presence of the other proteins has a crowding effect through repulsive protein-protein interactions.

We identify four different residue types that can behave as stickers: aromatic, aliphatic, anionic, cationic. The table in Supplementary Fig. 5A shows the mean interaction propensity calculated for all 34 FUS-SP1-TAF15 systems studied (shown by the orange dots in Fig. 2, data in Supplementary Figs. 3 and 4 in the SI). This shows that cation-π interactions are the strongest interaction type, with 20% of all theoretically possible interactions being formed. The second most important interaction type are the aliphatic interactions which have a greater uncertainty than the cation-π interactions. Aromatic interactions are the third most important interaction, with a similar uncertainty to cation-π. Electrostatic and aliphatic-aromatic interactions provide the smallest relative contribution to the total. From these mean interaction propensities in the table in Supplementary Fig. 5A, we compute the possible fraction of interactions we expect to see in any FUS-SP1-TAF15 stochiometry as shown in Supplementary Fig. 5B–F. In Supplementary Fig. 5B, aromatic-aliphatic interactions are seen across all compositions, with fewer present in higher FUS/TAF15 compositions. Aromatic interactions occur most frequently in FUS and TAF15 rich mixtures. Cation-π and electrostatic interactions occur most frequently at large TAF15 fractions, due to the lower fraction of cationic residues in SP1 and FUS sequences. Aliphatic interactions decrease with decreasing SP1 fractions, since SP1 contains the majority of the aliphatic residues in the sequences investigated. This allows the interactions to be grouped into three main classes: SP1 preferred interactions (aliphatic), FUS/TAF15 preferred interactions (aromatic) and TAF15 preferred interaction (cation-π and electrostatic).

Universality of TF selectivity

In this section we extend the analysis to also include the TFs EWS, SP2 and HNF1A. These TFs also undergo PS on their own (Supplementary Figs. 11-16)32,72. The IDR of EWS contains a high aromatic content similar to FUS and TAF15, but with more aliphatic residues (Fig. 1 and Supplementary Fig. 18A). This leads to EWS interactions to be driven by aliphatic contacts from the higher alanine and proline content, and aromatic (tyrosine-tyrosine) contacts (Supplementary Fig. 18G). A similarly high aliphatic residue content in SP2 as SP1 leads to the same dominance of aliphatic interactions in driving SP2 condensate formation (Supplementary Fig. 18H). These aliphatic interactions in SP2 are similarly distributed along the entire length of the SP2 IDR in the same manner as SP1 (Fig. 1 and Supplementary Fig. 18B). HNF1A contains a high aliphatic residue content (Fig. 1 and Supplementary Fig. 18C), which leads to the dominance of leucine, valine, alanine and proline interactions (Supplementary Fig. 18F). It is interesting to see that the sticker-spacer paradigm is valid for FUS, TAF15 and EWS, in contrast to SP1, SP2 and HNF1A where it is the more uniform attraction between the aliphatic residues that drive PS.

Next we extend the analysis to binary and ternary systems, following the combinations shown in Table 1. Analysis of the different interaction types of the FUS-TAF15-SP1 system (Supplementary Fig. 6), the contact maps (Fig. 5, Supplementary Figs. 22-28) and the ternary phase diagrams (Figs. 2, 6 and Supplementary Figs. 12-17) of all systems studied, it can be concluded that the selectivity of phase separation is related to two major groupings of interactions (shown in Fig. 7a). The first grouping of hydrophobicity-controlled interactions encompasses aliphatic, aromatic and aliphatic-aromatic interactions that utilise stickers that can interact with other residues of the same type. The second grouping consists of cation-controlled interactions which encompasses cation-π and electrostatic interactions. Here sticker residues belong to different residue types, so the stickers in the interactions must be heterogeneous, i.e. cationic, anionic and aromatic residues. Interestingly, aromatic and anionic residues both have attractive interactions with cationic residues, but anionic residues have repulsive interactions with aromatic or other anionic residues. This categorisation of interactions highlights the molecular mechanism to achieve selectivity of phase separation. By exploiting the orthogonal molecular driving forces the relative homotypic and heterotypic interaction strengths can be tuned through sequence composition. These orthogonal interactions lead to four characteristic binary droplet morphologies: marbled (homogeneous) droplets (such as the SP1-SP2 system) with heterotypic interactions larger than homotypic interactions (Fig. 7b), coated (core-shell) droplets (such as the EWS-TAF15 system) with heterotypic interactions of comparable strength to the homotypic interactions (Fig. 7c), bimodal droplets (such as the HNF1A-TAF15 system) with heterotypic interactions smaller than homotypic interactions (Fig. 7d), separated droplets (such as the SP1-TAF15 system) with heterotypic interactions significantly smaller (almost negligible) than homotypic interactions (Fig. 7e).

Fig. 7: Summary of condensate interactions.
Fig. 7: Summary of condensate interactions.
Full size image

a Amino acid sticker types and interactions (as published in84 Fig. 6a, used under CC BY 4.0 license [http://creativecommons.org/licenses/by/4.0]). The favourable interactions (black) promote condensate formation, whereas unfavourable interactions (red) oppose condensate formation. Sticker interactions fall into two orthogonal molecular interactions: 1) Hydrophobic interactions (aliphatic, aromatic and aliphatic-aromatic contacts), and 2) Cationic interactions (cation-π and electrostatic). be Different condensate morphologies for the four cases of relative homotypic and heterotypic interaction strengths, illustrated with example (50%-50%) mixtures from simulations undertaken in this work. Corresponding radial density profiles can be found in Supplementary Figs. 28 and 31.

The universality of condensate selectivity can be nicely depicted by discussing the phase diagrams (of Table 1) relative to the reference system of FUS-SP1-TAF15. The first contains the substitution of FUS by EWS. The additional aliphatic residues in EWS lead to some important changes (Supplementary Fig. 11) when compared to the FUS-SP1-TAF15 triangle (Figs. 2 and 6). Of EWS, SP1, and TAF15 the homotypic interactions of EWS are the strongest (0.025), also stronger than FUS (0.015). Moreover, the additional heterotypic aliphatic interactions of EWS with SP1 (Supplementary Fig. 22) results in heterotypic EWS-SP1 interactions that are stronger than SP1 homotypic interactions, but weaker than EWS homotypic interactions and causes the formation of a binary SP1-coated EWS droplet (Fig. 7c). The additional aliphatic residue content of EWS compared to FUS leads to more repulsion with the ionic residues in TAF15, resulting in decreased heterotypic interactions of 0.01-0.02 (Supplementary Fig. 22B), leading to a more patchy binary interaction with TAF15 compared to FUS and a distinct separation from TAF15 in the 3-component system (Fig. 8a and Supplementary Fig. 11).

Fig. 8: Simulation images for equal-component ternary TF mixtures.
Fig. 8: Simulation images for equal-component ternary TF mixtures.
Full size image

a EWS-SP1-TAF15, b FUS-SP2-TAF15, c SP1-SP2-TAF15, d FUS-HNF1A-TAF15, e HNF1A-SP1-TAF15, f HNF1A-SP1-SP2. All simulations were run with a total amino acid concentration of 80,000 μM. The end frame of 3 μs of simulation is displayed as a representative state. A segment is not displayed to reveal the internal droplet structure. FUS molecules are coloured in yellow, EWS molecules are coloured in orange, SP1 molecules are coloured in purple, SP2 molecules are coloured in light purple, HNF1A molecules are coloured in red, and TAF15 molecules are coloured in green. Corresponding radial density profiles can be found in Supplementary Fig. 33.

The replacement of SP1 by SP2 leads to the FUS-SP2-TAF15 triangle in Fig. 8b and Supplementary Fig. 12. SP2 has an increased cationic residue content compared to SP1. These cationic residues in SP2 are clustered in three main locations around residue 250, 400 and in the C terminal end (above residue 500), see Fig. 1. These clusters are sufficient to enable the formation of strong electrostatic and cation-π interactions with TAF15 (Supplementary Fig. 21C), and the formation of marbled SP2-TAF15 condensates (Supplementary Fig. 12). All the heterotypic interactions in this triangle are stronger than homotypic interactions (Supplementary Fig. 8), resulting in the formation of a three-component marbled droplet (Fig. 8b and Supplementary Fig. 12). If we then replace FUS by SP2 the phase diagram of SP1-SP2-TAF15 in Supplementary Fig. 13 is created. The large number of aliphatic residues in both SP1 and SP2 dominate the heterotypic interactions leading to the formation of marbled condensates (see contact maps in Supplementary Fig. 21B). The heterotypic SP1-SP2 interactions are much stronger than the homotypic interactions (Supplementary Fig. 8B). Intriguingly, the strong heterotypic interactions of SP2 with both SP1 and TAF15 are able to overcome the repulsive SP1-TAF15 heterotypic interactions (like FUS in the FUS-SP1-TAF15 droplets). In Fig. 8c the equal contribution ternary SP1-SP2-TAF15 mixture results in marbled SP1-SP2 condensates forming around a disk of TAF15, with SP2 acting as a glue at the interface to reduce contact between SP1 and TAF15.

Three distinct cases of ternary phase diagrams are considered with HNF1A. The first case (Supplementary Fig. 14) is HNF1A with two FET proteins (FUS and TAF15). The heterotypic HNF1A-TAF15 interactions are unfavourable, producing a bimodal condensate with a minimal interfacial region between the HNF1A and TAF15 condensates. This is in contrast to the marbled FUS-HNF1A and FUS-TAF15 condensates where favourable heterotypic interactions are present in the binary mixtures. In the ternary mixture (Fig. 8d), FUS-HNF1A heterotypic contacts dominate over those of FUS-TAF15, leading to formation of a HNF1A condensate coated in FUS with a TAF15 condensate interacting with FUS on a shared interface. The second case (Supplementary Fig. 15) is HNF1A with one FET (TAF15) and one SP/KLF (SP1) protein. Here SP1-HNF1A heterotypic interactions are the strongest giving rise to the marbled HNF1A-SP1 binary condensates. In the ternary mixture the weak HNF1A-TAF15 heterotypic interactions lead to the association of a TAF15 homotypic domain to the marbled HNF1A-SP1 droplet (Fig. 8e). The third case (Supplementary Fig. 16) of HNF1A with two SP/KLF family TFs (SP1 and SP2) contains three heterotypic interactions that are more favourable than homotypic interactions (Supplementary Fig. 8). Therefore a marbled three component droplet is observed for this system driven by the large number of aliphatic residues in all three components.

The HNF1A IDR from the HNF family contains a high number of aliphatic residues, with a small number of aromatic and ionic residues, meaning it more closely resembles the IDRs of the SP/KLF family proteins than the FET proteins. Aliphatic-aromatic interactions dominate the heterotypic interactions of HNF1A with FUS and TAF15 (Supplementary Figs. 22C and 23C). The heterotypic interactions of HNF1A with SP1 and SP2 are dominated by aliphatic residue contacts (see contact maps in Supplementary Figs. 23A and B).

Pol II interaction promiscuity overcomes interaction orthogonality

The function of transcription factors is to direct RNAP to the required gene for transcription initiation. The C-terminal domain of RNAP II (referred to as POL II) is known to be able to undergo PS66,68,69,87. A key question exists: how can POL II circumvent the orthogonality of the interactions displayed by the transcription factors, such that it can reach all required genes to initiate and undertake transcription? The answer lies in the POL II amino acid sequence, which is based on a heptad repeat unit containing the consensus sequence YSPTSPS65. POL II homotypic interactions are dominated by aromatic tyrosine contacts (see contact maps in Supplementary Fig. 19). The use of aromatic interactions in a low charge IDR provides the opportunity to maximise the propensity for co-condensation with any TF, as shown in Fig. 9, where POL II was found to undergo condensation with all TFs in this study (FUS, EWS, HNF1A, SP1, SP2, and TAF15), overcoming the potential barrier posed by the orthogonal interactions used by different TF families (radial density profiles in Supplementary Figs. 30 and 32). This is achieved by heterotypic hydrophobic interactions of POL II with all TFs (Supplementary Figs. 24 and 25). The ability for POL II to enter any TF condensate means that the local TF condensate environment formed in the nucleus plays a crucial role in orchestrating POL II localisation to the correct genes for transcription.

Fig. 9: RNA polymerase II (POL II) C terminal domain PS with TFs.
Fig. 9: RNA polymerase II (POL II) C terminal domain PS with TFs.
Full size image

Simulations of 60 POL II molecules (centre) and 30 POL II with: 60 FUS (top left), 60 TAF15 (centre left), 60 EWS (bottom left), 30 SP1 (top right), 30 SP2 (centre right), and 45 HNF1A (bottom right). All simulations were run with a total amino acid concentration of 80,000 μM. The end frame of 3 μs of simulation is displayed as a representative state. POL II molecules are coloured in blue, FUS molecules are coloured in yellow, EWS molecules are coloured in orange, SP1 molecules are coloured in purple, SP2 molecules are coloured in light purple, HNF1A molecules are coloured in red, and TAF15 molecules are coloured in green. Droplets are formed in all simulations of these molecules under these concentration conditions, irrespective of composition. See radial density profiles in Supplementary Figs. 30 and 32).

POLII undergoes phosphorylation during the transition from the initiation to elongation stage of transcription2,45. This change leads to a change in the TFs and cofactors that interact with POLII. To model the change in charge on POLII the mutation of all (141) serine residues to aspartic acid residues is used (sequence POLIIcharged, in Supplementary Table 1). The increase in negative charge leads to the loss of simple coacervation for POLII (radial density profile in 31), and also the lack of solubility in FUS, EWS, TAF15, SP1, and HNF1A (see Supplementary Figs. 17 and 32 in the SI). Partial solubility of POLIIcharged is still observed for SP2 (Supplementary Figs. 17 and 33). The change in charge of POLII also leads to a significant reduction in the number of contacts (see contact maps in Supplementary Figs. 20, 26 and 27). This shows that post translational modification can be used to tune the condensation of RNAP, promoting RNAP to leave the TF condensates that can aid transcription initiation and transition towards RNA elongation45.

Discussion

We explored transcriptional condensates consisting of six TFs from three different families. We focused on different spatial levels of molecular interactions by analysing (i) residue scale contact maps (ii) the relative strengths of aliphatic, aromatic, aliphatic-aromatic, electrostatic, and cation-π interactions (iii) the aggregated overall homotypic and heterotypic interaction strength between the molecules and (iv) radial density profiles. We have found that four sticker motifs56 exist that feature two orthogonal driving forces causing PS in ternary TF systems, shown in Fig. 7a. The competitive interactions driving PS within the ternary TF systems can be ranked by their relative strength (the table in Supplementary Fig. 6A): cation-π are the strongest, followed by aliphatic, then aromatic, and finally aliphatic-aromatic and electrostatic are the weakest. The sticker motifs trigger hydrophobic interactions (aromatic, aliphatic-aromatic, and aliphatic interactions) between Y, A, L, V, P residues, or cation-π and electrostatic interactions of R and K with Y, D, or E, to drive PS (Fig. 7a).

PSCP of the single component droplets is driven by homotypic intermolecular interactions (Fig. 4 and Supplementary Fig. 18): PS of FUS and EWS is driven by aromatic interactions, TAF15 by electrostatic, cation-π and (mostly) aromatic interactions, and SP1, SP2, and HNF1A by aliphatic interactions. The contact maps we present highlight the key residue interactions responsible for the observed phase separation- mutations of these sites are expected to alter the PS behaviour. For instance, in our previous work on HNF1A72 we carried out deletion mutations which showed there were still sufficient interactions in other regions to promote PS, exemplifying redundancy. Such a perturbation analysis would be an interesting topic for future work. The two component mixtures feature competition of homotypic with heterotypic intermolecular interactions (Fig. 5 and Supplementary Figs. 21-27) leading to the identification of four distinct droplet morphologies in binary systems where both species can undergo phase separation: 1) heterotypic interactions being stronger than homotypic interactions leads to a single marbled condensate (Fig. 7b), 2) heterotypic interactions being comparable to homotypic interactions leads to a single coated condensate (Fig. 7c), 3) heterotypic interactions being weaker than homotypic interactions leads to bimodal condensates (Fig. 7d) that have a shared interface. 4) heterotypic interactions being substantially weaker than homotypic interactions leads to two separate condensates (Fig. 7e). These morphologies are also represented in the ternary systems studied, featuring selective phase separation (as summarised in Fig. 8). The ability of TF IDRs to undergo selective PS when combined with TF-DNA interactions through specific DNA binding domain interactions contributes to the selective gene localisation of such condensates.

Since one of the orthogonal driving forces for phase separation involves ionic residues, the localized partitioning of transcriptional condensates at the DNA is sensitive to the local electrostatic environment. To explore this we have studied the effect of ion concentration in one specific ternary system that has also been investigated experimentally: FUS-TAF15-SP134. Our results showed that with increasing ion concentration the strength of FUS-TAF15 interactions grows with a corresponding weakening of FUS-SP1 interactions in ternary mixtures, making SP1 inclusion in the condensates less favourable (see SI section 5 and Supplementary Fig. 7). In living cells, this interaction orthogonality is expected to be even larger, due to crowding or modified local electrostatic environments (for example the presence of large amounts of negatively charged DNA), enhancing the heterotypic FUS-TAF15 interactions at the expense of heterotypic interactions with SP1. This was indeed shown in the in vivo work of Chong and coworkers32,34, where FUS-TAF15 condensates were observed, with SP1 forming separate condensates. FUS-TAF15 and HNF1A-SP1 interactions have also been recorded in ChIP-seq data deposited in ChIP-Atlas25, however no data is present for other TF pairs examined in this work. The lack of ChIP-seq data could be due to no interactions being present to detect, as in the case of TAF15 with SP1, or insufficient sampling. The six TF IDRs studied in this work have provided significant insight into the mechanisms of TF condensation, yet these IDRs represent a subset of the entire proteins. Inclusion of the DNA/RNA binding domains of the TFs is expected to accentuate the selectivity of the interactions seen in the chosen TF IDRs due to the additional binding of DNA/RNA57,61,62,82,88. It is important to note that in the current work we focused on one specific LCD as used in previous experimental studies. However, some TFs have more than one IDR bridged by folded domains, which add additional multivalent interactions as driving forces for phase separation67,73,82. The mutual exclusion of SP1 and TAF15 in condensates through repulsion from the anionic residues also indicates the role anionic species could play in determining condensate structure and percolation through crowding effects. This is of particular interest for exploration in the DNA rich nuclear environment where TFs function, which would be an interesting topic for future studies. It should be noted that in this work we focus on the interactions of FET proteins with two SP/KLF family proteins and one HNF family protein, which almost always includes a FET protein. Therefore, exploration of the specificity in the molecular grammar and ensemble types using other physiological combinations of the 1600 human TFs would provide opportunities for future work to validate the condensate morphologies described in the current work.

The fact that both orthogonal driving forces for phase separation involve aromatic residues enables POL II to interact with any TF to enable co-condensation. Co-condesation of POLII with TFs (Fig. 9) enables the co-localisation in the vicinity of the gene promoter ready for transcription initiation. This provides a simple mechanism that aids in enzyme partitioning to active transcription sites where increased TF concentrations are present, and aids in the pre-assembly of RNAP-mediator complexes5,44. The condensates enable the RNAP to dock to the correct chromatin location through interaction of its IDR with TFs and at the same time facilitate the active enzymatic site to attach to the DNA and initiate transcription. The principle of sequence-dependent selective TF co-condensation together with the promiscuous binding of RNAP, provides a universal mechanistic basis for selective gene transcription. Subsequent POLII phosphorylation then enables the relocation of the RNAP complex to start the process of RNA elongation, leaving the TF condensate which promoted the transcription initiation45. Additionally the condensation of the mediator complex44 indicates that a complex network of percolated condensates may be present in cells with the variable miscibility of components (as seen in Fig. 6) contributing to the internal spatial organisation of super enhancer regions.

The wide range of protein density within condensates (SI section 10, Supplementary Figs. 28-33) further increases the complexity when the solubility of enzymes and other cofactors needed for transcription are considered. The uptake of various cofactors into the TF condensates are governed by enthalpic and entropic driving forces. The entropic penalty of confinement within a condensate is typically conteracted by intermolecular interactions which provide an enthalpic driving force for inclusion (as in condensate formation). The porosity of TF condensates will contribute to the selectivity of cofactor uptake based on the available molecular interactions (enthalpic contribution) and the displacement of water/rearrangement of TF (entropic contribution) within the condensate to enable cofactor uptake. Comparison of binary and ternary condensates in Figs. 68 and Supplementary Figs. 12-17 highlight how the addition of an additional species can lead to a wide array of different droplet morphologies when all components are able to undergo simple coacervation. When we also consider cofactors which must be recruited by TFs to enter condensates, then the surface topologies of TF condensates provide an additional layer of selectivity, with only TFs present on the droplet surface able to directly interact with cofactors. The percolated networks formed within the condensates allow for the transmission of information through the droplets, potentially creating a route for the transfer of information between adjacent condensates if a topological change is detected during the rearrangement of the transcriptional machinery (such as the unwinding of chromatin fibres)29.

All-in-all, our results might have several implications. The orthogonal molecular grammar provides generic molecular insights in understanding TF phase separation in selectively controlling gene transcription rates. It introduces the viewpoint that molecular interactions might be evolutionary converged to a state in which subtle differences in molecular grammar might steer different genetic programs. We anticipate that this will lead to increased research activity in the broader scope of proteomic exploration of interaction orthogonality in all existing TF families in human cells. This might also lead to exciting insights on disease mutations dysregulating epigenetic modifications as e.g., in many cancers71. Beyond gene expression, orthogonal molecule grammar might also provide a universal mechanistic basis for the cell to organise targeted (enzymatic) reactions in its highly crowded intracellular environment through transient compartmentalization. Moreover, the understanding of orthogonal interactions used in cells could also be of interest to scientists seeking inspiration from nature in the design of nanoscale soft-matter systems, such as the development of nanoscale reactors89 or containers for targeted drug delivery90. In closing, we believe that these insights will provide an important step forward in our understanding of transcriptional selectivity and might also have profound implications in the broad field of condensate biology in both health and disease.

Methods

Protein sequences

The exact sequences for the LCDs used for the work, as shown in Fig. 1, are contained in Supplementary Table 1 of the supporting information.

The 1 bead per amino acid (1BPA) molecular dynamics model

In this work we use the 1BPA model that was previously developed for the study of intrinsically disordered proteins in the nuclear pore complex (NPC)76,77. It has been used extensively to model the behaviour of the disordered nucleoporin regions which fill the center of the yeast NPC and provide a selective barrier to the transport of cargo between the cytoplasm and nucleoplasm78,79,91. In this work an updated 1BPA model is developed (1BPA-2.1) that extends the applicability domain beyond the intrinsically disordered domains of yeast nucleoporins, by incorporating a greater training set for parameterisation (collating experimental data from55,92,93,94,95), and further improves the performance of the previously used 1BPA-2.0 model72,80. The full details of the force field are given in the ESI.

Original 1BPA potential

As a starting point for the parameterisation in this work the 1BPA-cp version is used96,97 (referred to as 1BPA-1.0 in this work). The bonded potential, ϕb (equation (1)), consists of three components: bonding (ϕbond, equation (2)), bending (ϕbend) and torsion (ϕtorsion) components. The bonding potential, ϕbond is a simple harmonic potential where k = 8038 kJ mol−1nm−2 and b = 0.38 nm. An iterative Boltzmann inversion was used to fit Ramachandran data to generate the bending and torsion potentials, as described in76. During this analysis, the presence of glycine or proline residues was found to create distinctive Ramanchandran plots, which resulted in the definition of 6 different angular potentials and nine torsional potentials, described in Supplementary Table 2.

$${\phi }_{{{\rm{b}}}}={\phi }_{{{\rm{bond}}}}+{\phi }_{{{\rm{bend}}}}+{\phi }_{{{\rm{torsion}}}},$$
(1)
$${\phi }_{{{\rm{bond}}}}=k{(r-b)}^{2},$$
(2)

There are three components that constitute the non-bonded potential, ϕnb (equation (3)), of the 1BPA model: hydrophobic interactions (ϕhp, equation (4)), cation-π interactions (ϕcp, equation (5)) and electrostatic interactions (ϕel, equation (6)).

$${\phi }_{{{\rm{nb}}}}={\phi }_{{{\rm{hp}}}}+{\phi }_{{{\rm{cp}}}}+{\phi }_{{{\rm{el}}}},$$
(3)
$${\phi }_{{{\rm{hp}}}}=\left\{\begin{array}{ll}{\epsilon }_{{{\rm{rep}}}}{\left(\frac{\sigma }{r}\right)}^{8}-{\epsilon }_{ij}\left[\frac{4}{3}{\left(\frac{\sigma }{r}\right)}^{6}-\frac{1}{3}\right],\quad &r\le \sigma,\\ \left({\epsilon }_{{{\rm{rep}}}}-{\epsilon }_{ij}\right){\left(\frac{\sigma }{r}\right)}^{8},\quad \hfill&\sigma \le r,\end{array}\right.$$
(4)

where \({\epsilon }_{ij}={\epsilon }_{{{\rm{hp}}}}\sqrt{{({\epsilon }_{i}{\epsilon }_{j})}^{\alpha }}\), σ = 0.6 nm, ϵrep = 10 kJ mol−1, α = 0.27, ϵhp = 13.0 kJ mol−1 and ϵi, ϵj are residue specific hydrophobicities. The cation-π interactions are represented by:

$${\phi }_{{{\rm{cp}}}}(r)={\epsilon }_{cp,ij}\left[3{\left(\frac{{\sigma }_{cp}}{r}\right)}^{8}-4{\left(\frac{{\sigma }_{cp}}{r}\right)}^{6}\right],$$
(5)

where ϵcp,ij is the energy of the cation-π interaction, and σcp = 0.45 nm is the radius used for cation-π interactions. The parameterisation of ϵcp,ij was undertaken by Jafarinia et al.96. The electrostatic interactions are represented by:

$${\phi }_{el}=\frac{{q}_{i}{q}_{j}}{4\pi {\epsilon }_{0}{\epsilon }_{r}(r)r}{e}^{\left(-\kappa r\right)},$$
(6)
$${\epsilon }_{r}(r)={S}_{s}\left[1-\frac{{r}^{2}}{{z}^{2}}\frac{{e}^{r/z}}{{\left({e}^{r/z}-1\right)}^{2}}\right]$$
(7)

where Ss = 80 and z = 0.25 nm. The Debye screening coefficient, \(\kappa={\left({\epsilon }_{0}{\epsilon }_{r}{k}_{b}T\left.\right)/2{N}_{A}{e}^{2}I\right)}^{-0.5}\), where ϵ0 is the permittivity of free space, ϵr is the permittivity of water, kb is Boltzmann’s constant, NA isAvogadro’s number, e is the unit charge of an electron, T = 300 K and I equal to the experimental ion concentration (150 mM) unless specified otherwise.

Parameterisation data

Data originally used in the 1BPA parameterisation for FG-Nups is from Yamada et al.92, which contains the hydrodynamic radii (Rh) data for the FG-Nups of yeast. Additional data from the work of Dignon and coworkers for the HPS model94 and the Mpipi model from Joseph et al.93 contains experimental Rg information on a broader set of IDPs. Additional data on hnRNPA variants from Bremer et al.55 and human Nups from Kapinos et al.95 were incorporated in the reparameterisation. This information is included in Supplementary Table 8. The total number of IDPs with single molecule data now totals 66 molecules, with a wider variance in charged residue content.

Updated model parameters in the 1BPA-2.0 and 1BPA-2.1 models

The updates to the basic 1BPA-1.0 model96,97 are included in the following tables detailing the changes made: Supplementary Table 3 for changes to global parameters, Supplementary Table 4 for hydrophobicity changes, Supplementary Table 5 for aromatic residue hydrophobicity changes, Supplementary Table 6 for cation-π interaction changes, and Supplementary Table 7 for the neutral hydrophobic potential pairs. These parameters were found to reduce the mean error in δ(Rg/h), while maintaining performance on the Yamada FG-Nups data92 used in the original parameterisation.

Droplet simulation protocol

To be able to explore the behaviour of the transcription condensates we used droplet formation simulations to be able to look at the internal structure. We start by forming a condensed phase droplet at the start of the simulation, which is then inserted into an empty dilute phase. If PS is favoured, this droplet structure should remain stable throughout the subsequent simulation; if unstable this droplet would break up into a dilute phase of monomers. This is done because self-assembly and clustering of individual monomers into phase separated condensates can be a slow process to observe.

Molecules are placed in the initial cubic simulation box (using a random initial conformation), with their centre of mass placed upon a regular grid, with a small buffer region to avoid overlap between molecules. A temperature of 300 K and a timestep of 20 fs are used for all simulations in this work. For equilibration of the droplet, energy minimisation on the initial configuration is used (energy tolerance of 1 kJ mol−1 nm−1), then 50 ns NVT Langevin dynamics simulations (Nosé-hoover thermostat with τt = 100 ps), followed by 500 ns NPT Langevin dynamics (Nosé-hoover thermostat with τt = 100 ps and a Berendsen barostat with τp = 10 ps, 1 bar reference pressure and a compressiblity of 4.5  × 10−5 bar−1). The end state of the NPT equilibration step is inserted into a new periodic box with a volume chosen to give a total residue density of 80,000 μM, after recentering on the center of mass and after the molecules have been unwrapped through the previous periodic boundary conditions. A second energy minimisation step is applied in the new simulation box to relax the molecules after expansion of the box (energy tolerance of 1 kJ mol−1 nm−1). A final 3 μs NVT production run (Nosé-hoover thermostat with τt = 100 ps) is used for data collection. The trajectory is sampled every 5 ns to determine whether convergence was reached (see methods for details on molecular connectivity graph creation and convergence testing). Convergence information for a selection of systems is shown in Fig. 3.

Simulation analysis using a molecular connectivity graph

To assess condensate stability in the droplet simulations, a molecular connectivity graph, consisting of nodes and edges, is used to describe the clustering and assess the time evolution of the molecular network during the simulation. Each simulation frame can be described by a connectivity graph. In the graph representation each node corresponds to a unique molecule, with an edge between the nodes denoting the number of non-zero interactions (if zero interactions are present no edge is added). Within an edge information can be stored about the number and type of contacts between molecules, based on residue categorisation. An interaction was determined to be present using a cutoff of 0.7 nm between 1BPA residues. The cutoff of 0.7 nm represents the bead diameter (0.6 nm) plus an additional 0.1 nm buffer to account for thermal fluctuations, as previously explored in Ghavami et al.91. Computation and processing of the contact matrix for a simulation frame allows the creation of the molecule graph. This process is shown in Supplementary Fig. 2.

Readouts from a molecular connectivity graph

A molecular connectivity graph computed for a simulation can be used to get several different readouts about a simulation system. Information on the size of clusters can be found by identifying the discrete sub-graphs of nodes with only internal connections (no contacts outside of the cluster). Through assessment of the time evolution of the number of clusters, grouped based on number of molecules, and the number of molecules in clusters within specific size categories it is possible to determine whether the system has reached equilibrium. After convergence is established it is possible to examine the distribution of components within the different clusters. The percentage of species i in a cluster is computed using NiC/(NframesNi) where NiC is the number of molecules of i in a cluster of size C.

Contact map definition

From the system graph creation process, contact matrices for all molecule combinations, over all frames are computed. This information is also useful for understanding the specific residues which drive condensation. As such, this information is aggregated during graph computation. Contact maps can be divided into two categories: intramolecular contact maps (interactions between particles in the same molecule copy), corresponding to the diagonal sub-matrices (Iii) in Supplementary Fig. 2, and intermolecular contact maps (interactions between particles of two different molecules), corresponding to the off-diagonal sub-matrices (Iij) in Supplementary Fig. 2. Since individual molecules of the same type are indistinguishable at the macroscale, only unique combinations of molecules need to be stored in different matrices, such that a single contact map for each intermolecular pairing and intramolecular contact map is computed. These maps contain the data for all copies of the same molecule pairs and for all timeframes studied. The contact maps by particle index (Figs. 3a–c and 4a–c in manuscript and Supplementary Figs. 34-60 parts A,D,G- if D or G present) display the contact information as a contact probability: the contact information is normalised by the number of frames, Nframes and (\({({N}_{i}{N}_{j})}^{0.5}\)) where Ni and Nj are the number of copies of molecule types i and j in the simulation.

When the contact information is presented based on residue type in a molecule (Figs. 4d–f and 5d–f in manuscript and Supplementary Figs. 34-60 parts B,E,H- if E or H present), a summation over the information in the contact maps by particle index to group interactions by residue type (equivalent to a matrix reduction operation). The presentation of normalised residue contact maps (Supplementary Figs. 34-60 parts C,F,I- if F or I present) applies a second normalisation step to the contact map by residue type of \({({N}_{i,r1}{N}_{j,r2})}^{0.5}\), with Nj,r1 the number of residue r1 in molecule i and Nj,r2 the number of residue r2 in molecule j.

Interaction summary information

Data on the distribution of interactions between different interaction types is also extracted from the contact maps for a system. The contact data is aggregated by summation over the contact maps and grouped into five categories of attractive interactions: aromatic, aliphatic, aliphatic-aromatic, cation-π, electrostatic. To compute the fraction of interactions, Fint, between two species (Figs. 3d–f and 4d–f in manuscript) the number of interactions is divided by the total interactions between the species. The fraction of interactions, Fint, for a system (Supplementary Figs. 3 and 4) is the sum of the interactions from all contact maps (intramolecular and intermolecular) in the system divided by the total number of all interactions in the system. The theoretical upper bound of the number of interactions that are possible between two classes of residue types is NiNj, where Ni and Nj are the number of residues of type i and j, respectively, in the simulation. By normalising the number of actual interactions by the upper bound NiNj it is possible to compute the mean interaction propensity for a ternary mixture. This was done for FUS-SP1-TAF15 mixtures to generate the data in Supplementary Fig. 5. From these mean interaction propensities in the table in Supplementary Fig. 5A, we compute the possible fraction of interactions we expect to see in any FUS-SP1-TAF15 stochiometry by multiplying the interaction propensities in the table of Supplementary Fig. 5A with the number of residue types in the composition, as shown in Supplementary Fig. 5B-F.

The propensity for a specific interaction type is the sum of the interactions from all contact maps (intramolecular and intermolecular) in the system divided by the theoretical upper limit for the number of interactions, NiNj, where Ni, Nj are the number of residues of type i and j that form the interaction. The mean propensity (Fig. 7a) is the average propensity from all systems at 150mM ion concentration (34 compositions). To be able to generate the interaction triangles displaying Fint for all compositions (Fig. 7b–f), the mean propensity in Fig. 7a is multiplied by NiNj in the system (to get the expected number of interactions for the given composition), and divided by the expected total number of interactions (0.026NtotNtot).

Intermolecular contact propensity between two molecules of different types (in Fig. 6, Supplementary Figs. 6-8, 11-16 and Supplementary Tables 9-19) is computed by summation of the interactions in the corresponding intermolecular contact map for the species involved. To enable comparison between different compositions normalisation by the theortical upp NiLiNjLj is used, where Ni and Nj are the number of copies of molecule types i and j, and Li and Lj are the number of simulation residues per molecule of i and j, respectively.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.