Solution-state methyl NMR spectroscopy of large non-deuterated proteins enabled by deep neural networks

Karunanithy, Gogulan; Shukla, Vaibhav Kumar; Hansen, D. Flemming

doi:10.1038/s41467-024-49378-8

Download PDF

Article
Open access
Published: 13 June 2024

Solution-state methyl NMR spectroscopy of large non-deuterated proteins enabled by deep neural networks

Nature Communications volume 15, Article number: 5073 (2024) Cite this article

6414 Accesses
10 Citations
22 Altmetric
Metrics details

Subjects

Abstract

Methyl-TROSY nuclear magnetic resonance (NMR) spectroscopy is a powerful technique for characterising large biomolecules in solution. However, preparing samples for these experiments is demanding and entails deuteration, limiting its use. Here we demonstrate that NMR spectra recorded on protonated, uniformly ¹³C labelled samples can be processed using deep neural networks to yield spectra that are of similar quality to typical deuterated methyl-TROSY spectra, potentially providing information for proteins that cannot be produced in bacterial systems. We validate the methodology experimentally on three proteins with molecular weights in the range 42–360 kDa. We further demonstrate the applicability of our methodology to 3D NOESY spectra of Escherichia coli Malate Synthase G (81 kDa), where observed NOE cross-peaks are in good agreement with the available structure. The method represents an advance in the field of using deep learning to analyse complex magnetic resonance data and could have an impact on the study of large biomolecules in years to come.

Methyl probes in proteins for determining ligand binding mode in weak protein–ligand complexes

Article Open access 04 July 2022

The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis

Article Open access 04 January 2024

Characterizing proteins in a native bacterial environment using solid-state NMR spectroscopy

Article 13 January 2021

Introduction

Nuclear Magnetic Resonance (NMR) spectroscopy is a ubiquitous technique in material science, chemistry, structural biology, and clinical diagnosis. In the biosciences, NMR provides unprecedented insight into functional motions and non-covalent interactions with atomic-level resolution. However, NMR is notoriously insensitive, so maximising resolution and sensitivity is a perpetual challenge within all areas of NMR spectroscopy. Nuclear spin-relaxation, the process by which equilibrium magnetisation is restored and detectable NMR signal is lost, scales rapidly with molecular size, making it challenging to study large biomolecular systems by solution-state NMR. This has meant that individual NMR experiments are traditionally associated with size limits, above which most signals are broadened beyond detection.

Over many decades, a series of developments have raised the size-limits of detection for biomolecular NMR applications, combining advances in hardware, sample preparation and pulse sequence development. The introduction of methyl-TROSY methods¹, wherein methyl-bearing side chains are used to probe biomolecular structure and dynamics, provided a step-change in molecular weight limitations for solution-state biomolecular NMR and now routinely allow applications to systems of several hundreds of kDa. Using these techniques makes it possible to study systems up to the megadalton molecular weight range. A key requirement, however, for attaining high quality methyl-TROSY spectra is that the protein should be prepared with a very high level of deuteration. Consequently, in practice, for methyl-TROSY NMR studies, the proteins produced are completely deuterated with the exception of [¹H, ¹³C] labelled methyl moieties in specific side chains, typically those in isoleucine, leucine, methionine, and valine. There are now well-established protocols to introduce such labels, which rely on the addition of specifically labelled precursor compounds to minimal and deuterated media^2,3. However, this uniform deuteration has several disadvantages, including extra costs and typically lower yields of expressed protein. Furthermore, deuteration is not possible for many systems of considerable biological interest, including proteins that can only be expressed in mammalian systems. As such, the ability to obtain high-quality ¹³C-¹H correlation maps from uniformly ¹³C labelled protonated proteins is highly desirable. The uniform labelling is easier, cheaper, and gives access to peaks associated with all methyl-bearing side chains rather than just those where the appropriate precursor has been added during protein expression. Importantly, such a method would also avoid the need for deuteration and pave the way for characterisations of large proteins that can only be expressed in mammalian systems.

In this era of burgeoning applications and developments in AI, from computational structural biology^4,5 to sophisticated large language models⁶, it is natural to look for solutions within this field for the challenges encountered in characterising large proteins. In this context, we and others have recently demonstrated that deep neural networks (DNNs) can be trained to accurately transform^7,8,9 and analyse^10,11,12 complex NMR data. The most recent applications use supervised deep learning, where a DNN is supplied with an input and a target training dataset and through a training process the DNN attempts to determine the mapping between the two. Typically, this training requires very large amounts of training data, but importantly, as has been noted in several prior studies^8,13,14, it is possible to simulate an arbitrary amount of realistic training data for magnetic resonance based supervised deep learning, avoiding a significant potential data bottleneck. Deep learning methods have now been successfully applied to several tasks in magnetic resonance spectroscopy including the analysis of DEER data¹³, reconstruction of non-uniformly sampled spectra^7,9,14, peak-picking^11,15, and virtual homonuclear decoupling⁸.

In this study, we demonstrate that deep neural networks (DNNs) can be used, in conjunction with the traditional Fourier transform¹⁶, to deliver very high-quality ¹³C-¹H correlation spectra from uniformly ¹³C protonated samples, including large proteins whose size limits have traditionally rendered them inaccessible to NMR. The DNNs presented below are trained to map the broad ¹³C-¹H spectra of uniformly labelled protonated samples to spectra that are akin to classical methyl-TROSY spectra. This is achieved by removing the effect of one bond ¹³C-¹³C scalar couplings and increasing the resolution of both the ¹H and ¹³C dimensions by effectively sharpening the observed cross-peaks (Fig. 1). We robustly assess the trained DNNs on synthetic data and show the applicability of the trained DNNs on experimental data for proteins with increasing size: HDAC8 (42 kDa), MSG, (81 kDa), and α7α7 (360 kDa). Finally, we extend the method to obtain 3D Methyl NOESY NMR spectra of MSG, which can aid in chemical shift assignments and/or structural characterisations.

**Fig. 1: Overview of processing NMR spectra with FID-Net.**

Results

Attempting to obtain high-quality ¹³C-¹H correlation maps on large proteins using classical approaches, such as ¹³C-¹H HSQC spectra, is hindered by several factors. Firstly, since the proteins are uniformly ¹³C labelled they will be subject to one-bond ¹³C-¹³C scalar couplings that will evolve during indirect chemical shift evolutions, and split signals into multiplets and thus complicate interpretation of the spectrum. Of perhaps even greater significance, is the lack of deuteration in the system, which will lead to substantial line-broadenings in both the ¹³C and ¹H dimensions as a result of significantly increased dipolar relaxation. Consequently, peaks in the spectra will be very difficult to identify and difficult to assign to specific sites in the protein, making the spectra challenging to interpret and limiting the utility of such a labelling scheme. Other tools such as constant-time ¹³C-¹H HSQC spectra^17,18 also do not provide high-quality spectra of large uniformly labelled proteins, since the constant-time substantially skews the intensities and even renders many signals invisible. However, due to the inherent sensitivity of methyl groups, ¹³C-¹H correlation maps of protonated large proteins nonetheless contain a significant amount of information from many of the methyl groups present. The challenge is that these spectra are difficult to interpret, even by specialists, due to the poor resolution, see e.g. Fig. 1b.

Training and assessment using synthetic data

In order to transform ¹³C-¹H correlation maps from universally ¹³C labelled proteins into spectra that can easily be interpreted, we train two DNNs, both of which are based on the FID-Net architecture⁷. Briefly, the first network is trained to transform time-domain FIDs in the ¹³C dimension by removing a single cosine modulation corresponding to a ¹³C-¹³C coupling constant and reducing the decay rate of the peak such that it gives a sharper signal in frequency domain. The second DNN is trained to act on FIDs in the ¹H dimension. In this case, the network is trained only to reduce the decay rate of FIDs so that peaks are sharper in the frequency domain of this dimension. Both networks are trained independently solely on synthetic data (full parameters provided in the supplementary information).

For transforming a full ¹³C-¹H 2D plane of a uniformly labelled protein the workflow is as follows: the input 2D plane is first processed and Fourier transformed in the ¹H dimension before being transposed. This half-processed spectrum is then passed to the first DNN where the signal modulation due to one-bond ¹³C-¹³C couplings is removed and the signals are sharpened. Subsequently, the ¹³C dimension is processed and Fourier transformed as normal. The spectrum is then transposed back to the ¹H dimension, inverse Fourier transformed, and Hilbert transformed. The resulting time-domain data is passed to the second DNN to sharpen signals in the ¹H dimension. This ¹H dimension is then reprocessed, and Fourier transformed to yield the final frequency-domain spectrum.

In order to test and benchmark this approach, it was first applied to synthetic data. Rather than using randomly generated data, as is done in the training of the networks, we aimed to benchmark performance on synthetic data that were nonetheless reminiscent of actual ¹³C-¹H correlation maps of proteins. In order to do this, synthetic spectra were made using chemical shifts expected for real systems as sampled from the BMRB¹⁹. Using this approach, we generate one hundred synthetic spectra with a comparable number of peaks to the 42 kDa protein HDAC8²⁰, and one hundred synthetic spectra with a comparable number of peaks to MSG²¹ (see ref. ²² for data availability). These spectra contain all expected ¹³C-¹³C couplings as well as broad peaks as expected for large, protonated proteins. Given that these spectra are synthetically generated, we can also generate an idealised target spectrum in which all ¹³C-¹³C scalar couplings are removed, the linewidths are narrowed, and the positions of all peaks within this target spectrum are known. In order to test the performance of the DNNs, we transform the original synthetic spectra using our two trained DNNs. We then pick peaks in the resulting transformed spectra and compare the results against the known peak positions by quantifying the rate of true positives, false positives, and false negatives. In order to avoid any influence of the accuracy of the peak-picking algorithm used, which can vary, we only considered peaks that are isolated in the processed spectra (with a distance larger than processed linewidths), and those that are doublets in the original spectrum.

As shown in Fig. 2, for synthetic spectra, the FID-Net processing appears successful at significantly enhancing the resolution of spectra expected for uniformly ¹³C-labelled proteins, even when there are a number of heavily overlapping signals. Based on the high levels of true positive peaks and low levels of false positive peaks observed in the synthetic data, we proceeded to test the FID-Net based processing on experimental data, where the proteins are uniformly ¹³C labelled in the absence of deuteration.

**Fig. 2: Assessment using synthetic data.**

Application to experimental 2D ¹³C-¹H correlation spectra

To test the feasibility of our approach to transform experimental ¹³C-¹H correlation spectra of uniformly ¹³C labelled proteins, the method was applied to increasingly larger proteins, demonstrating the ability of the FID-Net approach to provide high quality correlation maps, similar in quality to methyl-TROSY spectra. In addition to higher expression yields, a key advantage of the methodology here is that information is provided on all methyl-bearing side chains, including, alanine ¹³C^β, isoleucine ¹³C^γ2 and ¹³C^δ1, leucine ¹³C^δ1,δ2, methionine ¹³C^ε, threonine ¹³C^γ2, and valine ¹³C^γ1,γ2. While methods do exist for specific labelling of nearly all methyl groups²³, these approaches typically lead to reduced yields and come with higher costs.

Following benchmarking on synthetic data, the trained FID-Net networks were first applied to the 42 kDa protein HDAC8²⁰. While a protein of this size is relatively small for methyl-TROSY studies, as shown in the ¹³C-¹H correlation spectrum in Fig. 3a (blue-green) of a non-deuterated, uniformly ¹³C labelled sample, signals in the methyl region of the spectrum are nonetheless broad and overlapped, making many of them difficult to discern. This also holds for a constant-time ¹³C-¹H HSQC spectrum, where the constant-time (27 ms or 54 ms) substantially skews intensities and renders many of the signals invisible, see Supplementary Fig. 5. Conversely, following application of the FID-Net Networks (orange spectrum), Fig. 3b, the signals are virtually decoupled in the ¹³C dimension and sharpened in both the ¹³C and ¹H dimensions. This dramatically simplifies peak identification. By overlaying the FID-Net transformed spectrum with a classical methyl-TROSY spectrum of an ILVM specifically labelled sample of HDAC8 (blue), Fig. 3c, where only the side chains of these amino acids are labelled, an excellent correspondence is seen between isoleucine, leucine, and valine methyl peaks. The linewidths of the peaks in both of these spectra are highly comparable and all expected peaks from the methyl-TROSY spectrum are recovered in the FID-Net processed spectrum of the uniformly labelled sample. Additional peaks are also visible in the FID-Net processed spectrum, due to the presence of additional labelled methyl groups, including, threonine and isoleucine ¹³C^γ2. Small peak-shifts are mainly due to the isotope shifts originating from deuteration²⁴. A full overlay of the FID-Net processed spectrum of HDAC8 and the methyl-TROSY spectrum is shown in Supplementary Fig. 1. All previously assigned methyl cross-peaks for HDAC8²⁵ were present in the FID-Net processed spectra of HDAC8.

**Fig. 3: FID-Net processed methyl HSQC spectra of uniformly ¹³C labelled non-deuterated proteins.**

To test the robustness of the FID-Net processing approach on larger systems, with substantially more cross-peaks and signal overlap, we next applied the FID-Net DNNs to study the methyl region of the protein MSG. This 723-residue protein has been studied extensively by NMR^21,26, but all of these studies have required deuteration to minimise the broadening of signals due to extensive relaxation. However, as shown in Fig. 3e, by coupling the intrinsic sensitivity of methyl groups with FID-Net processing, it is possible to obtain high-quality methyl-TROSY like spectra for this system at a lower cost and with the added bonus of signals associated with all methyl bearing side chains. As with the HDAC8 example above, clear agreement between the expected peaks in the ILV spectrum and FID-Net processed spectrum is attained and the linewidths in these two spectra are also similar. A full overlay of the FID-Net processed spectrum of MSG and the methyl-TROSY spectrum is shown in Supplementary Fig. 2. The FID-Net processed spectrum of MSG also agrees with the previously published chemical shift assignments for ¹³C^γ2 methyl groups of Ile and Thr, ¹³C^ε of Met, and ¹³C^β of Ala^27,28,29, with the exception of ¹³C^β of A633, which was not observed in the FID-Net processed spectra.

To push the limits of the proposed method we tested its performance on the 360 kDa α7α7 (half-proteasome) from T. acidophilum. This protein complex has an effective rotational correlation time of approximately 120 ns at 50 °C³⁰. The high degree of symmetry in the complex (composed of 14 monomeric units that form two heptameric rings) means that there are relatively few peaks in its spectra compared to its size, see Fig. 4. A full overlay of the FID-Net processed spectrum of α7α7 and the methyl-TROSY spectrum is shown in Supplementary Fig. 3. In the case of the 360 kDa α7α7 complex, it is clear that a number of peaks present in the deuterated, ILV labelled sample are fairly weak in the FID-Net decoupled spectrum (marked with an asterisk in Fig. 4), this is particularly evident in the isoleucine δ₁ region of the spectrum, and we therefore judge that currently systems such as the α7α7 complex are at the limit of our approach. Additional peaks in the spectrum are also clearly visible due to the presence of alanine, threonine, methionine and isoleucine ¹³C^γ2 methyl resonances.

**Fig. 4: Application to a 360 kDa complex.**

While for the proteins shown above the isoleucine ¹³C^δ1, leucine ¹³C^δ1,δ2 and valine ¹³C^γ1,γ2 methyl group resonances can be readily compared to those obtained in ILV-labelled samples using traditional methyl-TROSY spectra, peaks associated with alanine, threonine, methionine, and isoleucine ¹³C^γ2 methyl groups are less readily available. To verify the reliability of these additional peaks observed in uniformly ¹³C-labelled samples and demonstrate the extension of the methodology to 3D spectra, we recorded a ¹³C-¹³C-¹H NOESY spectra of MSG and, as demonstrated below, use this methodology to provide assignments for threonine, methionine, and isoleucine side chains.

Applications to three-dimensional NOESY spectra

Following the successful implementation of deep neural networks for the production of methyl spectra of similar quality to typical deuterated methyl-TROSY spectra just from uniformly ¹³C labelled samples in a protonated background, we applied the methodology to a 3D ¹³C-HSQC-NOESY-HSQC experiment acquired on a uniformly ¹³C labelled sample of MSG made in (non-deuterated) water. Despite the high molecular weight of MSG, we observed NOE cross-peaks among the inter methyl protons that were within a distance of 3.0 Å to 5.0 Å of each other (Figs. 5a–d and Supplementary Fig. 4).

**Fig. 5: NOESY spectra of non-deuterated 80 kDa MSG.**

In total, 292 NOE cross-peaks were observed among 170 methyl-bearing residues from different regions of the protein. Furthermore, like conventional NOESY spectra, we observed a correlation between the NOE cross-peaks volumes (V ) and the distance between the proton pairs (r), i.e. V ∝ 1/r ⁶, Fig. 5f. Using this experiment, we could easily characterise the contact between two methyls, for example, ¹³C^δ1 and ¹³C^γ2 of isoleucines as shown in Supplementary Fig. 4a. Similarly, the two geminal methyl resonances of leucine and valine can be linked using this spectrum (Fig. 5d). However, a combination of 3D ¹³C-HSQC-NOESY-HSQC and 3D HMBC-HMQC experiments would be the best method to link the geminal methyl resonances of leucine and valine without any ambiguity³¹. Additionally, using this approach, NOEs can be observed between methyl protons of all methyl-bearing residues (isoleucine, leucine, valine, methionine, alanine, and threonine) using one sample, which is not possible in the conventional method due to metabolic scrambling of amino acids in selective ¹³C labelling³². Therefore, the DNNs can be utilised to produce a 3D ¹³C-HSQC-NOESY-HSQC experiment acquired on a uniformly ¹³C labelled sample without deuteration, resulting in a spectrum of similar quality to a 3D ¹³C-HMQC-NOESY-HMQC experiment acquired on a specifically methyl labelled sample. This approach can be generally applied to large proteins and complexes without deuteration.

Discussion

The ability to characterise the regulation, interactions, and dynamics of large proteins in solution is paramount to understanding their molecular functions. The methyl-TROSY methodology is one of the most important developments in biomolecular NMR over the last decades and these methods have truly paved the way for NMR to offer key insights on larger biomolecular complexes, complementing other structural approaches including cryo-electron microscopy and AI-based predictions³³. However, the labelling requirements for such NMR experiments are demanding, ideally requiring perdeuteration and the use of specific precursors to introduce [¹H,¹³C]-labelled methyl moieties of specific locations. While the resulting spectra are of high-quality, the cost of such labelling is higher, typically leads to lower protein yields, and is inconsistent with protein production methods for many systems of interest such as eukaryotic and membrane proteins. Above, we presented an alternative method to classical methyl-TROSY NMR for characterising large proteins in solution, which is based on uniformly protonated, ¹³C-labelled samples and processing with FID-Net neural networks. With the approach one can characterise proteins up to about 350 kDa. The size limitation for FID-Net based processing arises because it is reliant on some signal still being present in the spectra and above this size most signals, even those associated with methyl moieties, are broadened beyond detection in non-deuterated systems.

The main disadvantage of the FID-Net method is that the process of peak-sharpening inevitably leads to a loss of the intrinsic shape of the cross-peaks, including the peak height. Accurately measuring peak intensities is critical in a number of NMR experiments, including diffusion and relaxation, so it is not advised to record these experiments in conjunction with the presented FID-Net processing nor is it advised to perform line-shape analyses. However, for a large body of NMR experiments, the main parameter of importance is the chemical shift as well as a reasonable estimate of the peak intensity, and in these cases we believe that FID-Net processing will prove useful. Such studies include classical chemical shift perturbation studies, such as, changes in chemical shifts upon addition of an interacting partner or changes related to varying conditions, e.g. pH. As we have shown above, the obtained intensities are of a sufficient quality such that the transformed spectra facilitate facile chemical shift assignment of methyl peaks by either NOESY spectra (as demonstrated here for MSG) or by point mutations, which often requires several samples. We are currently in the process of developing and training a DNN that is able to enhance the resolution of ¹³C-¹H spectra and provide quantitative mappings, which can be used for downstream analyses, e.g. relaxation experiments. This work remains ongoing.

A number of approaches have previously been suggested to overcome the limitations of methyl-TROSY highlighted above, particularly when making perdeuterated samples is not possible. Recent examples include the use of delayed decoupling, as has also been used for very large complexes with molecular weights in the MDa range³⁴, optimised NMR pulse sequences to probe methionine residues in proteins with molecular weights up to 240 kDa³², and the use of local deuteration of leucine residues to probe their methyl groups in membrane and insect cell derived proteins³⁵. While very powerful, these methods are limited in that they only consider a single residue type, thus restricting the number of available probes in the system. Conversely, the methodology developed here offers simultaneous access to all methyl bearing side chains in a protein, offering many more probes of biomolecular behaviour. By decoupling signals in the ¹³C dimension and sharpening them in both the ¹H and ¹³C dimensions, the resulting spectra resemble those given by perdeuterated samples with specific methyl labelling.

We believe that the FID-Net methodology presented here will significantly lower the barrier to entry for NMR of large systems. While methods exist for obtaining methionine and threonine assignments for well-studied systems such as MSG, processing with the FID-Net DNNs provides a straightforward approach, which does not require perdeuteration and offers insight into all methyl-bearing residues. We envisage that the idea of using DNNs for peak sharpening and simultaneous homonuclear virtual decoupling within NMR could be applied in other cases to improve spectra and that processing NMR data with DNNs will facilitate additional ventures within NMR. As such we see the presented method as paving the way for a plethora of ways for generally analysing and transforming NMR spectra with deep neural networks to push the capabilities and limits of NMR.

Methods

Initial considerations about the neural networks

In the present study our aim was to develop DNNs to map ¹³C-¹H methyl correlation spectra of large uniformly ¹³C-labelled proteins into spectra that are similar to methyl-TROSY spectra of highly deuterated proteins. Two objectives must be fulfilled to achieve this aim: (i) the one-bond ¹³C-¹³C homonuclear scalar couplings associated with methyl groups must be decoupled and (ii) the peaks must be sharpened, making them more easily resolvable, equivalent to slowing down the exponential decay of magnetisation in the time domain. It should be noted that these changes do not increase the information content in the spectrum, but they do make the information contained within the original spectra more easily interpretable by spectroscopists.

We employ the FID-Net architecture that we have previously shown to successfully perform a number transformations on time domain NMR data, including reconstructing non-uniformly sampled spectra^7,9,14 and homonuclear virtual decoupling⁸. The FID-Net architecture currently only transforms a set of 1D spectra, and two separate FID-Net DNNs were therefore trained: one was optimised for spectral parameters typically encountered in the ¹³C dimension and trained to both decouple and sharpen signals, whereas the second FID-Net DNN was optimised for spectral parameters in the ¹H dimension alone and was trained to only sharpen signals. A schematic illustration and summary of the effects of the neural networks is provided in Fig. 1b. In both cases the networks are trained and validated exclusively on synthetic data and then tested on experimentally acquired data.

Care must be taken when determining how and to what extent signals should be sharpened using DNNs. For example, signals from flexible regions of proteins that already give rise to sharp peaks could result in the presence of truncation artefacts in the spectrum. On the other hand, broad signals require significant attenuation of their relaxation to be clearly resolvable in the frequency domain. The broader the signal, the more attenuation of relaxation that is therefore desirable. To satisfy these requirements the following function is used for input ${R}_{2}^{{{{{{\rm{in}}}}}}}$ and target ${R}_{2}^{{{{{{\rm{tar}}}}}}}$ transverse relaxation rates in the training data:

$${R}_{2}^{{{{{{\rm{tar}}}}}}}=\max \left({R}_{2}^{\max }\tanh \left(\frac{{R}_{2}^{{{{{{\rm{in}}}}}}}}{{R}_{2}^{\max }}\right),\, {R}_{2}^{\max }\left(1-\tanh \left(\frac{{R}_{2}^{{{{{{\rm{in}}}}}}}}{{R}_{2}^{\max }}\right)\right)\right)$$

(1)

The effect of this is to make the linewidths in the target spectrum relatively uniform similar to methyl-TROSY spectra, where for the DNN relaxation rates above ${R}_{2}^{\max }$ are scaled down towards it, while those below ${R}_{2}^{\max }$ are scaled up towards it. A value of ${R}_{2}^{\max }=25\,{{{{{{\rm{s}}}}}}}^{-1}$ was chosen so that target spectra have linewidths similar to those observed in relatively structured residues in an ILV sample of a medium-to-large deuterated protein. Further details of the neural network architecture, training data parameters, and training procedures are provided in the supplementary information.

Once trained, the neural networks can easily be applied as part of processing scripts, examples of which are provided in the supplementary information. The DNNs are trained on a diverse range of NMR parameters (Supplementary Tables 1 and 2) and thus can be used without further retraining. The approach can be used with standard ¹H-¹³C HSQC or HMQC pulse sequences (vide infra).

The network architecture and training

Two networks were trained for the study: one for removing ¹³C-¹³C couplings and sharpening spectra in the ¹³C dimension and the second solely for sharpening spectra in the ¹H dimension (as described above). Both networks used the previously described FID-Net architecture. The input size for the ¹³C network is 1024 × 4 and for the ¹H network is 512 × 4. As was the case with previous FID-Net architectures, the networks consist of a series of stacked residual units, wherein each residual unit consists of dilated convolutional layers with kernel size 8 × 4. The filters are activated by either sigmoidal (50%) or tangent (50%) functions. The results of the activations are then multiplied and passed through another convolutional layer with kernel size 8 × 4. The output from each layer is combined to give the final output and also added to the input for the residual unit to form the input for the next layer. For the ¹³C network, the dilations employed are cycled through the values: 1, 2, 4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 32, 40, 48, 56, 64, and there are 128 filters for each convolutional layer. For the ¹H network the dilations employed are 1, 2, 4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 32, and there are 64 filters per convolutional layer.

For each network 500,000 test planes were created for training and 50,000 for testing using the parameters given in Supplementary Tables 1 and 2. The models were developed and trained using the Tensorflow library³⁶ with the Keras-front end³⁷. The cost function used to train the networks is the mean squared error in the frequency domain between the spectrum produced by the DNN and the target spectrum wherein the linewidth of peaks are set according to the R₂ scaling described above and for the ¹³C network the scalar coupling is removed. The RMSprop optimizer³⁸ was used in training. For both networks the learning rate was initially set to 10⁻⁴ until the validation loss value plateaued and was then reduced to 10⁻⁵ until it plateaued again where training was then ended.

Benchmarking using synthetic data

Once trained, to validate the performance of the two networks we use synthetically generated spectra. Rather than using arbitrary spectra as was done for training the networks, we attempt to generate realistic ¹³C-¹H correlation maps for uniformly labelled ¹³C proteins using chemical shift statistics from the BMRB¹⁹. In addition to containing terminal ¹³C moieties that give rise to doublets in the spectra as a result of a single ¹³C-¹³C scalar coupling these spectra also contain multiplets due to moieties that have multiple ¹³C-¹³C scalar couplings (though these are usually at higher ¹H frequencies as is observed in real spectra).

We generate 200 synthetic spectra in total: the first 100 spectra were chosen to have features similar to a ¹³C-¹H spectra of a protein with a similar size to HDAC8 whilst the second 100 were chosen to have features similar to a protein with a similar size to MSG. Example spectra are shown in Figs. 2a and 2b. The parameters used to generate the synthetic spectra for benchmarking are detailed in Table 1.

Table 1 Parameters used for benchmarking^a

Full size table

Once the synthetic spectra are made, they are processed using the deep neural network pipeline. Visual inspection of the transformed spectra suggests that the method is effective at decoupling and sharpening spectra, such that they can be interpreted more easily. We also note that where multiple ¹³C-¹³C couplings are present, the network will remove just one of the couplings such that triplets become doublets. To evaluate the performance of the pipeline quantitatively, the transformed spectra are peak picked using the built-in peak picker in nmrPipe. The results are compared to the known ground-truth values for peak positions (assuming no couplings were present in the spectra).

We focus on two key parameters in this evaluation: the number of true positive picked peaks and the number of false positive picked peaks. Given the difficulty of picking peaks from crowded regions automatically, and that our aim is to evaluate the performance of our alternative pipeline for analysing spectra we focus on isolated peaks where the performance of the peak picker is robust. Here, we define an isolated peak as any ground truth peak where the minimum distance from any other ground truth peak is greater than or equal to 0.06 ¹H ppm (0.24 ¹³C ppm). Furthermore, we ignore peaks that are part of doublets and that are more than 1.50 ¹H ppm from a target peak originating from a methyl moiety. Once peaks in the FID-Net processed spectrum have been picked, they are matched with the closest peak in the target peak list. Each peak can only be matched with a single target peak and we match in descending order of distance between picked peaks and target peaks until there are no picked peaks within the minimum distance to a target peak (set at 0.03 ¹H ppm) or there are no picked nor target peaks left. This process is then repeated for all synthetic spectra.

The true positive rate is defined as the percentage of peaks where we see a clear correspondence between a picked and target peak. The false positive rate is defined as the rate at which a peak identified in the FID-Net processed spectrum does not correspond with any peaks in the actual spectrum. Given the limitations of peak pickers, the true and false positive rates here likely represent a lower bound on the performance of the FID-Net processing pipeline.

General isotopic labelling

In this study, two different types of labelled protein samples were used for acquisition of the NMR data: (1) uniformly ¹³C,¹⁵N labelled sample made in ¹H₂O. (2) methyl Ile-¹³C^δ1,¹H^δ1, Leu-¹³C^δ1, ¹H^δ1/¹³C^δ2,¹H^δ2, and Val-¹³C^γ1,¹H^γ1/¹³C^γ2,¹H^γ2 labelled sample made in deuterated background. To express uniformly ¹³C,¹⁵N labelled proteins, we used M9 media that was made with ¹H₂O and supplemented with 1 g/L [¹H,¹⁵N]-ammonium chloride and 3 g/L of [¹H,¹³C]-glucose as the sole nitrogen and carbon sources. For expression of methyl Ile-¹³C^δ1,¹H^δ1, Leu-¹³C^δ1, ¹H^δ1/¹³C^δ2,¹H^δ2, and Val-¹³C^γ1,¹H^γ1/¹³C^γ2,¹H^γ2 labelled proteins, we used ²H₂O M9 media supplemented with 1 g/L [¹H,¹⁵N]-ammonium chloride and 3 g/L of [²H,¹²C]-glucose as the sole nitrogen and carbon sources. Methyl labelling was achieved by the addition of 60 mg/L alpha-ketobutyric acid [U-¹²C/²H, methyl-¹³CH₃] for labelling of isoleucines, and 90 mg/L α-ketoisovaleric acid [U-¹²C/²H, methyl-(¹³CH₃,¹²CD₃)] for labelling of valine and of leucine methyl groups. These precursors were added one hour prior to induction.

Expression and purification of Histone deacetylase 8 (HDAC8)

The Human HDAC8 construct described by Vannini et al. with a C-terminal 6X-histidine tag in ampicillin-resistant pET21b expression vector was transformed in BL21(λDE3) E. coli cells for protein expression^20,39. A single colony from the transformed plate was inoculated in 10 ml of LB media supplemented with ampicillin (100 μg/ml) at 37 °C. Once the LB culture reached an OD600 between 0.8 and 1.0, it was used to inoculate a 50 ml M9 minimal media pre-culture. This M9 pre-culture was used to inoculate 1 L of M9 media and grown at 37 °C to OD600 ≈ 0.8. HDAC8 expression was induced for > 16 h with 0.5 mM IPTG and 200 µM of ZnCl₂ at 21 °C. The cell pellet, collected by centrifugation, was re-suspended in lysis buffer containing 50 mM Tris‒HCl pH 8.0, 3 mM MgCl₂, 500 mM KCl, 10 mM imidazole, 5% glycerol, and 10 mM β-mercaptoethanol. Later, sonication was performed to lyse the cells after addition of small amounts of DNAse, lysozyme, protease inhibitors tablets (1 tablet per 50 ml, Roche), and 0.25 % IGEPAL. The supernatant fraction of the lysate after centrifugation at 40,000 x g for one hour was purified by Ni-NTA affinity chromatography using a linear imidazole gradient (10‒250 mM) in lysis buffer. Further, size-exclusion chromatography using a Superdex-75 column (GE Healthcare) was carried out in buffer containing 50 mM Tris‒HCl pH 8.0, 150 mM KCl, 1 mM TCEP, and 5% glycerol. Fractions containing purified HDAC8 was pooled together and concentrated by 10 kDa cut off Amicon (Millipore) ultra-filtration membranes. The concentrated sample was buffer exchanged into NMR-buffer (50 mM K₂HPO₄ pH 8.0, 30 mM KCl, 4 mM DTT, and 1 mM NaN₃) for NMR data acquisition.

Expression and purification of Malate Synthase G (MSG)

A small adjustment was made to the methods previously described for producing isotopically labelled MSG^26,31,40. Briefly, to produce the MSG protein, BL21 (DE3) E. coli cells were transformed with a kanamycin-resistant pET28a vector containing MSG gene with a C-terminal 6X-histidine tag. The protein expression protocol for MSG is same as HDAC8 up to induction. MSG expression was induced for >16 h with 1 mM IPTG at 21 °C. The cell pellet was collected by centrifugation and re-suspended in lysis buffer containing 20 mM Tris‒HCl pH 7.8, 300 mM NaCl, 10 mM imidazole, and 10 mM β-mercaptoethanol. The protein purification protocol for MSG is the same as for HDAC8, until the Ni-NTA affinity chromatography. The fractions containing MSG from Ni-NTA affinity chromatography were further purified by size exclusion chromatography using a Superdex-200 column (GE Healthcare) in buffer containing 20 mM Sodium phosphate pH 7.1, and 5 mM dithiothreitol. After gel filtration, the fractions containing pure protein were pooled, concentrated, and buffer exchanged into NMR-buffer (20 mM Sodium phosphate buffer pH 7.1, 5 mM DTT, 20 mM MgCl₂, 1 mM NaN₃) for NMR data acquisition using 30 kDa cut off Amicon (Millipore) ultra-filtration membranes.

Expression and purification of α-subunit complex (α7α7) of proteasome from T. acidophilum

In order to express the α-subunit complex (α7α7) proteasome from T. acidophilum, the αWT clone with N-terminal Histidine tag and a TEV protease site was transformed into BL21 (λDE3) E. coli cells^31,41. The protein expression protocol for MSG was followed for the α-subunit complex up to the induction step. The α-subunit complex culture was induced at OD600 ≈ 0.9 with 1 mM IPTG at 37 °C for 5 h. Afterwards, we lysed the cells with sonication in a lysis buffer (50 mM NaH2PO4 pH 8.0, 0.2 M NaCl, 10 mM imidazole) and purified them using Ni-NTA chromatography as described above for the purification of HDAC8 and MSG. After Ni-NTA affinity chromatography, TEV protease was introduced to cleave the 6X-histidine tag before dialyzing the protein against 2 L of dialysis buffer (50 mM Tris-HCl pH 8.0, 1 mM EDTA, 5 mM β-mercaptoethanol) overnight at 4 °C. The TEV cleavage of the protein was followed by another Ni-NTA affinity chromatography to eliminate the histidine tag and un-cleaved protein. Afterwards, size exclusion chromatography was performed using a Superdex 200 column (GE Healthcare) in a buffer containing 50 mM NaH₂PO₄ pH 7.5, and 100 mM NaCl. The fractions containing pure protein were concentrated and buffer exchanged into the NMR buffer (20 mM potassium phosphate pH 6.8, 50 mM NaCl, 1 mM EDTA, 2 mM DTT, 0.03% NaN₃) for NMR data acquisition using 30 kDa cut off Amicon (Millipore) ultra-filtration membranes.

NMR Acquisition of Two-dimensional ¹³C-¹H correlation spectra

The 2D HSQC spectrum of HDAC8 used as input for FID-Net was recorded on a uniformly [¹³C,¹⁵N]-labelled sample using a standard pulse programme with presaturation on a Bruker 700 MHz Avance III spectrometer equipped with Z-gradient triple-resonance TCI cryoprobe at 298 K and running Topspin 3.6.3. The data was acquired with 1024 and 256 complex points in the ¹H and ¹³C dimensions, respectively, with spectral widths of 10000 Hz and 5000 Hz. 16 scans were obtained per individual FID. 2D Methyl-TROSY spectrum of HDAC8 used for experimental verification was recorded on an ILVM specifically labelled sample on a Bruker 800 MHz Avance III spectrometer equipped with Z-gradient triple-resonance TCI cryoprobe. The data was acquired with 1024 and 256 complex points in the ¹H and ¹³C dimensions, respectively, with spectral widths of 12500 Hz and 4500 Hz. 4 scans were obtained per individual FID.

The 2D HSQC spectrum of MSG used as input for FID-Net was recorded on a uniformly [¹³C,¹⁵N]-labelled sample using a standard pulse programme with presaturation on a Bruker 800 MHz Avance III spectrometer equipped with Z-gradient triple-resonance TCI cryoprobe at 310 K and running Topspin 3.6.3. The data was acquired with 1024 and 256 complex points in the ¹H and ¹³C dimensions, respectively, with spectral widths of 12500 Hz and 5000 Hz. 16 scans were obtained per individual FID. The 2D methyl-TROSY spectrum of MSG used for experimental verification was recorded on an ILV specifically labelled sample on a Bruker 800 MHz Avance III spectrometer equipped with Z-gradient triple-resonance TCI cryoprobe and running Topspin 3.6.3. The data was acquired with 1024 and 192 complex points in the ¹H and ¹³C dimensions, respectively, with spectral widths of 10500 Hz and 5000 Hz. 16 scans were obtained per individual FID.

The 2D HSQC spectrum of α7α7 used as input for FID-Net was recorded on a uniformly [¹³C,¹⁵N]-labelled sample using a standard pulse programme with presaturation on a Bruker 950 MHz Avance HD spectrometer equipped with Z-gradient triple-resonance TCI cryoprobe at 323 K. The data was acquired with 1024 and 256 complex points in the ¹H and ¹³C dimensions, respectively, with spectral widths of 15200 Hz and 5263 Hz. 80 scans were obtained per individual FID. 2D Methyl-TROSY spectrum of α7α7 used for experimental verification was recorded on an ILV specifically labelled sample using a standard pulse programme on a Bruker 800 MHz Avance III spectrometer equipped with Z-gradient triple-resonance TCI cryoprobe at 323 K and running Topspin 3.6.3. The data was acquired with 768 and 132 complex points in the ¹H and ¹³C dimensions, respectively, with spectral widths of 12000 Hz and 4100 Hz. 16 scans were obtained per individual FID.

NMR acquisition of three-dimensional NOESY spectra

The 3D HSQC-NOESY-HSQC NMR experiment on MSG was performed on a ~ 400 µM MSG sample on a Bruker 950 MHz Avance HD spectrometer equipped with Z-gradient triple-resonance TCI cryoprobe at 310 K. The data was acquired with 1024, 142, and 124 complex points in ¹H, ¹³C_HSQC, and ¹³C_NOESY dimensions, respectively, with spectral widths of 15244 Hz (¹H), 6667 Hz (¹³C), and 6667 Hz (¹³C). Eight scans were collected per increment with a recycle delay of 1 s. The mixing time was 60 ms.

NMR data processing

All experimental NMR spectra were processed with nmrPipe⁴² version 2018.184.13.26 or using the python libraries nmrglue⁴³ 0.9 and numpy 1.23.5 and visualised/analysed with Cara 1.8.4.2 (http://cara.nmr.ch/) and nmrDraw 2018.184.13.26.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Training data for the two DNNs generated in this study have been deposited at Zenodo (www.zenodo.org), Hansen, D. F. (2024) “Solution-State Methyl NMR Spectroscopy of Large Non-Deuterated Proteins Enabled by Deep Neural Networks”. Zenodo. https://doi.org/10.5281/zenodo.10022405. Synthetic spectra used for assessment of FID-Net are available as PDF files. The full spectra used for experimental assessments, including, (i) input ¹³C-¹H HSQC, (ii) FID-Net processed ¹³C-¹H HSQC, and (iii) methyl-TROSY for HDAC8, MSG, and α7α7-proteasome are also available from Zenodo, along with the full ¹³C-¹³C-¹H NOESY spectrum of MSG. There reference PDB code used in this work is 1D8C. Source data are provided with this paper.

Code availability

Python code for using the networks described here (including pre-trained networks, examples and scripts for training the two DNNs) is available on GitHub: https://github.com/gogulan-k/FID-Net. Training scripts are available from Zenodo: https://doi.org/10.5281/zenodo.11080581⁴⁴.

References

Tugarinov, V., Hwang, P. M., Ollerenshaw, J. E. & Kay, L. E. Cross-Correlated Relaxation Enhanced ¹H− ¹³C NMR Spectroscopy of Methyl Groups in Very High Molecular Weight Proteins and Protein Complexes. J. Am. Chem. Soc. 125, 10420–10428 (2003).
Article CAS PubMed Google Scholar
Tugarinov, V. & Kay, L. E. An isotope labeling strategy for methyl TROSY spectroscopy. J. Biomol. NMR 28, 165–172 (2004).
Article CAS PubMed Google Scholar
Boisbouvier, J. & Kay, L. E. Advanced isotopic labeling for the NMR investigation of challenging proteins and nucleic acids. J. Biomol. NMR 71, 115–117 (2018).
Article CAS PubMed Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science (80-) 373, 871–876 (2021).
Article ADS CAS Google Scholar
Birhane, A., Kasirzadeh, A., Leslie, D. & Wachter, S. Science in the age of large language models. Nat. Rev. Phys. 5, 277–280 (2023).
Article Google Scholar
Karunanithy, G. & Hansen, D. F. FID-Net: A versatile deep neural network architecture for NMR spectral reconstruction and virtual decoupling. J. Biomol. NMR 75, 179–191 (2021).
Article CAS PubMed PubMed Central Google Scholar
Karunanithy, G., Mackenzie, H. W. & Hansen, D. F. Virtual Homonuclear Decoupling in Direct Detection Nuclear Magnetic Resonance Experiments Using Deep Neural Networks. J. Am. Chem. Soc. 143, 16935–16942 (2021).
Article CAS PubMed Google Scholar
Qu, X. et al. Accelerated Nuclear Magnetic Resonance Spectroscopy with Deep Learning. Angew. Chem. 132, 10383–10386 (2020).
Article ADS Google Scholar
Karunanithy, G., Yuwen, T., Kay, L. E. & Hansen, D. F. Towards autonomous analysis of chemical exchange saturation transfer experiments using deep neural networks. J. Biomol. NMR 76, 75–86 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, D.-W., Hansen, A. L., Yuan, C., Bruschweiler-Li, L. & Brüschweiler, R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun. 12, 5229 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Klukowski, P., Riek, R. & Güntert, P. Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA. Nat. Commun. 13, 6151 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Worswick, S. G., Spencer, J. A., Jeschke, G. & Kuprov, I. Deep neural network processing of DEER data. Sci. Adv. 4, eaat5218 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Hansen, D. F. Using Deep Neural Networks to Reconstruct Non-uniformly Sampled NMR Spectra. J. Biomol. NMR 73, 577–585 (2019).
Article CAS PubMed PubMed Central Google Scholar
Klukowski, P. et al. NMRNet: a deep learning approach to automated peak picking of protein NMR spectra. Bioinformatics 34, 2590–2597 (2018).
Article CAS PubMed Google Scholar
Grattan-Guinness, I. Joseph Fourier, Théorie analytique de la chaleur (1822). in Landmark Writings in Western Mathematics 1640–1940 354–365 (Elsevier, 2005). https://doi.org/10.1016/B978-044450871-3/50107-8.
Santoro, J. & King, G. C. A constant-time 2D overbodenhausen experiment for inverse correlation of isotopically enriched species. J. Magn. Reson. 97, 202–207 (1992).
ADS CAS Google Scholar
Vuister, G. W. & Bax, A. Resolution enhancement and spectral editing of uniformly ¹³C-enriched proteins by homonuclear broadband ¹³C decoupling. J. Magn. Reson. 98, 428–435 (1992).
ADS CAS Google Scholar
Ulrich, E. L. et al. BioMagResBank. Nucleic Acids Res. 36, D402–D408 (2008).
Article CAS PubMed Google Scholar
Werbeck, N. D. et al. A distal regulatory region of a class I human histone deacetylase. Nat. Commun. 11, 3841 (2020).
Article ADS PubMed PubMed Central Google Scholar
Tugarinov, V. & Kay, L. E. Ile, Leu, and Val methyl assignments of the 723-residue malate synthase G using a new labeling strategy and novel NMR methods. J. Am. Chem. Soc. 125, 13868–13878 (2003).
Article CAS PubMed Google Scholar
Karunanithy, G., Shukla, V. K. & Hansen, D. F. Solution-State Methyl NMR Spectroscopy of Large Non-Deuterated Proteins Enabled by Deep Neural Networks. https://doi.org/10.5281/zenodo.10022404 (2024).
Kerfah, R., Plevin, M. J., Sounier, R., Gans, P. & Boisbouvier, J. Methyl-specific isotopic labeling: a molecular tool box for solution NMR studies of large proteins. Curr. Opin. Struct. Biol. 32, 113–122 (2015).
Article CAS PubMed Google Scholar
Hansen, P. Isotope effects on chemical shifts of proteins and peptides. Mag. Res. Chem. 38, 1–10 (2000).
Article CAS Google Scholar
Shukla, V. K., Siemons, L. & Hansen, D. F. Intrinsic structural dynamics dictate enzymatic activity and inhibition. Proc. Natl Acad. Sci. 120, e2310910120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tugarinov, V., Muhandiram, R., Ayed, A. & Kay, L. E. Four-Dimensional NMR Spectroscopy of a 723-Residue Protein: Chemical Shift Assignments and Secondary Structure of Malate Synthase G. J. Am. Chem. Soc. 124, 10025–10035 (2002).
Article CAS PubMed Google Scholar
Sheppard, D., Guo, C. & Tugarinov, V. Methyl-detected ‘out-and-back’ NMR experiments for simultaneous assignments of Alaβ and Ileγ2 methyl groups in large proteins. J. Biomol. NMR 43, 229–238 (2009).
Article CAS PubMed Google Scholar
Ayala, I. et al. Asymmetric Synthesis of Methyl Specifically Labelled L ‐Threonine and Application to the NMR Studies of High Molecular Weight Proteins. ChemistrySelect 5, 5092–5098 (2020).
Article CAS Google Scholar
Godoy-Ruiz, R., Guo, C. & Tugarinov, V. Alanine Methyl Groups as NMR Probes of Molecular Structure and Dynamics in High-Molecular-Weight Proteins. J. Am. Chem. Soc. 132, 18340–18350 (2010).
Article CAS PubMed Google Scholar
Sprangers, R. & Kay, L. E. Quantitative dynamics and binding studies of the 20S proteasome by NMR. Nature 445, 618–622 (2007).
Article CAS PubMed Google Scholar
Siemons, L., Mackenzie, H. W., Shukla, V. K. & Hansen, D. F. Intra-residue methyl–methyl correlations for valine and leucine residues in large proteins from a 3D-HMBC-HMQC experiment. J. Biomol. NMR 73, 749–757 (2019).
Article CAS PubMed PubMed Central Google Scholar
Rößler, P., Mathieu, D. & Gossert, A. D. Enabling NMR Studies of High Molecular Weight Systems Without the Need for Deuteration: The XL‐ALSOFAST Experiment with Delayed Decoupling. Angew. Chem. Int. Ed. 59, 19329–19337 (2020).
Article Google Scholar
Vahidi, S. et al. An allosteric switch regulates Mycobacterium tuberculosis ClpP1P2 protease function as established by cryo-EM and methyl-TROSY NMR. Proc. Natl Acad. Sci. 117, 5895–5906 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Bolik-Coulon, N. et al. Less is more: A simple methyl-TROSY based pulse scheme offers improved sensitivity in applications to high molecular weight complexes. J. Magn. Reson. 346, 107326 (2023).
Article CAS PubMed Google Scholar
Dubey, A. et al. Local Deuteration Enables NMR Observation of Methyl Groups in Proteins from Eukaryotic and Cell‐Free Expression Systems. Angew. Chem. Int. Ed. 60, 13783–13787 (2021).
Article CAS Google Scholar
Abadi, M. et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint, 1603.04467, 2016. arxiv.org/abs/1603.04467. Software available from tensorflow.org.
Chollet, F. (2015). Keras. GitHub. https://github.com/fchollet/keras.
Tieleman, T. & Hinton, G. Lecture 6.5 - rmsprop: Divide the gradient by a running average of its recent magnitude. Coursera: Neural Networks for Machine Learning. Retrieved from https://www.coursera.org/learn/neural-networks (2012).
Vannini, A. et al. Substrate binding to histone deacetylases as shown by the crystal structure of the HDAC8-substrate complex. EMBO Rep. 8, 879–884 (2007).
Article CAS PubMed PubMed Central Google Scholar
Pritchard, R. B. & Hansen, D. F. Characterising side chains in large proteins by protonless 13C-detected NMR spectroscopy. Nat. Commun. 10, 1747 (2019).
Article ADS PubMed PubMed Central Google Scholar
Tugarinov, V., Sprangers, R. & Kay, L. E. Probing Side-Chain Dynamics in the Proteasome by Relaxation Violated Coherence Transfer NMR Spectroscopy. J. Am. Chem. Soc. 129, 1743–1750 (2007).
Article CAS PubMed Google Scholar
Delaglio, F. et al. Nmrpipe - a Multidimensional Spectral Processing System Based on Unix Pipes. J. Biomol. Nmr 6, 277–293 (1995).
Article CAS PubMed Google Scholar
Helmus, J. J. & Jaroniec, C. P. Nmrglue: an open source Python package for the analysis of multidimensional NMR data. J. Biomol. NMR 55, 355–367 (2013).
Article CAS PubMed PubMed Central Google Scholar
Karunanithy, G. & Retel, J. gogulan-k/FID-Net: v0.4-alpha. https://doi.org/10.5281/zenodo.11080581 (2024).

Download references

Acknowledgements

The BBSRC (BB/R000255/1; D.F.H.), Wellcome Trust (ref. 101569/z/13/z; D.F.H.), and the EPSRC are acknowledged for supporting the NMR facility at University College London. Access to ultra-high field NMR spectrometers was supported by the Francis Crick Institute through provision of access to the MRC Biomedical NMR Centre and by the University of Oxford Wellcome Institutional Strategic Support Fund, the John Fell Fund, as well as the Edward Penley Abraham Cephalosporin Fund, and the Engineering and Physical Sciences Research Council (EP/R029849/1). The Francis Crick Institute receives its core funding from Cancer Research UK (FC001029), the UK Medical Research Council (FC001029), and the Wellcome Trust (FC001029). This study made use of NMRbox: National Center for Biomolecular NMR Data Processing and Analysis, a Biomedical Technology Research Resource (BTRR), which is supported by NIH grant P41GM111135 (NIGMS). Some computational aspects of this work were supported by the Francis Crick Institute (D.F.H.) through provision of access to the Scientific Computing STP and the Crick data Analysis and Management Platform (CAMP). The Francis Crick Institute (CAMP) receives its core funding from Cancer Research UK (FC010233), the UK Medical Research Council (FC010233), and the Wellcome Trust (FC010233). For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising. This research is supported by the UKRI and EPSRC (EP/X036782/1; D.F.H.).

Author information

Authors and Affiliations

Department of Structural and Molecular Biology, Division of Biosciences, University College London, London, WC1E 6BT, UK
Gogulan Karunanithy, Vaibhav Kumar Shukla & D. Flemming Hansen
The Francis Crick Institute, London, NW1 1BF, UK
Vaibhav Kumar Shukla & D. Flemming Hansen

Authors

Gogulan Karunanithy
View author publications
Search author on:PubMed Google Scholar
Vaibhav Kumar Shukla
View author publications
Search author on:PubMed Google Scholar
D. Flemming Hansen
View author publications
Search author on:PubMed Google Scholar

Contributions

G.K. designed and trained all the DNNs, G.K. and V.K.S. produced isotope labelled samples; G.K., V.K.S., and D.F.H. performed NMR experiments; V.K.S assigned the chemical shifts of methyl spectrum. G.K. and D.F.H designed the research. All authors analysed the data, discussed the results, and wrote the paper.

Corresponding author

Correspondence to D. Flemming Hansen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Karunanithy, G., Shukla, V.K. & Hansen, D.F. Solution-state methyl NMR spectroscopy of large non-deuterated proteins enabled by deep neural networks. Nat Commun 15, 5073 (2024). https://doi.org/10.1038/s41467-024-49378-8

Download citation

Received: 26 October 2023
Accepted: 04 June 2024
Published: 13 June 2024
DOI: https://doi.org/10.1038/s41467-024-49378-8

This article is cited by

NMR-driven structure-based drug discovery by unveiling molecular interactions
- Gerald Platzer
- Moriz Mayer
- Robert Konrat
Communications Chemistry (2025)
Deep learning network for NMR spectra reconstruction in time-frequency domain and quality assessment
- Yao Luo
- Wenhan Chen
- Yanqin Lin
Nature Communications (2025)