Merging conformational landscapes in a single consensus space with FlexConsensus algorithm

Herreros, David; Perez Mata, Carlos; Sanchez Sorzano, Carlos Oscar; Carazo, Jose Maria

doi:10.1038/s41592-025-02841-w

Download PDF

Article
Open access
Published: 25 September 2025

Merging conformational landscapes in a single consensus space with FlexConsensus algorithm

Nature Methods (2025)Cite this article

1685 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

Structural heterogeneity analysis in cryogenic electron microscopy is experiencing a breakthrough in estimating more accurate, richer and interpretable conformational landscapes derived from experimental data. The emergence of new methods designed to tackle the heterogeneity challenge reflects this new paradigm, enabling users to gain a better understanding of protein dynamics. However, the question of how intrinsically different heterogeneity algorithms compare remains unsolved, which is crucial for determining the reliability, stability and correctness of the estimated conformational landscapes. Here, to overcome the previous challenge, we introduce FlexConsenus: a multi-autoencoder neural network able to learn the commonalities and differences among several conformational landscapes, enabling them to be placed in a shared consensus space with enhanced reliability. The consensus space enables the measurement of reproducibility in heterogeneity estimations, allowing users to either focus their analysis on particles with a stable estimation of their structural variability or concentrate on specific particle subsets detected by only certain methods.

Real-space heterogeneous reconstruction, refinement, and disentanglement of CryoEM conformational states with HetSIREN

Article Open access 22 April 2025

3DFlex: determining structure and motion of flexible proteins from cryo-EM

Article Open access 11 May 2023

Deep learning-based mixed-dimensional Gaussian mixture model for characterizing variability in cryo-EM

Article 29 July 2021

Main

Cryogenic electron microscopy (cryo-EM) is experiencing a paradigm shift in exploring conformational variability using experimental data. Unlike classical three-dimensional (3D) classification algorithms¹ that are confined to a set of reduced and stable states, new heterogeneity algorithms focus on estimating richer and more complete conformational landscapes with the possibility of retrieving any conformation from the approximated continuum.

Heterogeneity algorithms can be classified based on the information estimated from cryo-EM data. Heterogeneous reconstruction methods^2,3,4,5 rely on the direct estimate of electron density maps from a continuous function that maps every point in the conformational landscape to a 3D volume. On the other hand, deformation field-based methods directly estimate the motions responsible for driving a reference state to a new conformational state represented in a particle dataset^6,7,8.

The variety of methods, approaches and implementations allows one to explore the structural variability of any given dataset through a systematic approach. However, the large pool of existing algorithms also presents the challenge of comparing different results. Currently, there is a lack of validation tools in the cryo-EM heterogeneity field. Indeed, the most widely used validation approach is to reconstruct states from a small subset of images surrounding a given conformation in the landscape and to compare them with the conformations estimated by a given heterogeneity analysis method. However, this approximation is limited by the minimum number of particles needed to reconstruct a volume with enough resolution to detect a given conformational state. If the number of images required is too large, contamination from other structural changes in the reconstruction will prevent a complete validation of the conformational space. Furthermore, the accuracy of current heterogeneity analysis methods remains a subject of ongoing investigation. Therefore, it is crucial to extract consensus solutions from different approaches.

To overcome the challenge of comparing conformational landscapes obtained from different methods and/or runs, we propose in this work a new deep learning algorithm: FlexConsensus. FlexConsensus introduces a multi-autoencoder architecture designed to merge different conformational landscapes into a common latent space, which is subsequently decoded back into the original representations input to the network. In this way, it is possible not only to analyze the common conformational space but also to determine a consensus metric that measures the stability of every estimation in the original conformational landscapes to filter out only those regions with higher confidence. Lastly, the method also allows for converting among different conformational landscape representations, simplifying the comparison of techniques.

FlexConsensus proposes a framework that is not aimed at estimating states from cryo-EM data but, rather, focuses on mapping already estimated landscapes into a common consensus space while preserving the organization of the original landscapes and enhancing their interpretability. Therefore, the application of FlexConsensus does not involve computing or modifying a given conformational space with a priori information, as suggested by other methods, such as the incorporation of structural priors⁹, the disentanglement of the space into meaningful components⁴ or methods employing statistical approaches to analyze the landscape¹⁰, among others. Instead, analyzing the consensus spaces facilitates the identification of similarities and differences among various methods, assisting in the validation of estimations and providing tools to streamline the heterogeneity workflow based on the derived reliability scores.

Results

The following sections present some use cases to evaluate and discuss the precision and performance of FlexConsensus under different conditions. The analysis starts by evaluating the capabilities of FlexConsensus with two datasets from CryoBench¹¹, followed by two experimental datasets to assess its behavior in more realistic and challenging scenarios.

In the Supplementary Information, we include three additional analyses using simulated data, which provide a deeper understanding of the network’s training workflow and the characteristics of the consensus space.

We note that FlexConsensus learns by default a consensus space with the number of dimensions set to the minimum dimension among the input spaces. However, the latent space dimensionality is exposed as a customizable parameter in the corresponding Scipion¹² protocol form. It is important to note that setting a consensus space dimension smaller than the intrinsic dimensionality of the input latent dimensions means that some information from the inputs will potentially be ignored when mapping to the consensus space.

All the consensus landscapes presented in the following sections were defined to have three dimensions, thereby avoiding the need for any further dimensionality reduction step.

The workflow for training the network is described in detail in the Methods section and Fig. 6.

All the data were analyzed with the Scipion 3.8.0 software package. Inside Scipion, CryoDRGN version 3.4.0 was also used to estimate the conformational landscapes analyzed in this paper as well as Flexutils plugin version 3.3.0.

CryoSPARC 4.5.1, Relion 4.0 and Xmipp 3.24.12.0 packages were also used to preprocess the data.

Consensus results on CryoBench datasets

Before moving to testing FlexConsensus on experimental data, it is interesting to check its performance on controlled and challenging ‘standardized datasets’ capturing different ranges of motion and variability. Therefore, we tested FlexConsensus on two different datasets included in CryoBench¹¹.

The first dataset to be explored is the IgG-RL one, representing a set of random conformations for a disordered peptide linker connecting the Fab to the rest of the IgG complex. This dataset corresponds to a challenging disordered motion, which is very difficult to follow by all proposed heterogeneity analysis methods thus far, as indicated in the original CryoBench paper. Indeed, the conformational landscapes obtained by the different methods have notable differences. It is an excellent example for integrative tools such as FlexConsensus, where we can explore the similarities and differences among different methods.

The proposed experiment starts by estimating the conformational spaces using two different methods: HetSIREN and CryoDRGN. To simplify the FlexConsensus results analysis, HetSIREN was forced to estimate only a conformational latent space without refining the original angles, so its way of working is similar to CryoDRGN. The landscapes predicted by each method were then fed into FlexConsensus to estimate the consensus landscape. The results from this analysis are summarized in Fig. 1. The consensus space obtained from the two original estimations shows a clear difference in the distribution of states found by the two methods. This can be further quantified by analyzing the consensus errors measured in the consensus space.

**Fig. 1: FlexConsensus analysis resulting from comparing the estimations of HetSIREN and CryoDRGN for the IgG-RL dataset included in CryoBench¹¹.**

Thus, consensus metrics from the previous analysis were used to filter the landscape toward a stabilized representation with a more reliable state distribution. Based on the consensus, it is possible to define a statistical framework to determine a significant threshold based on the similarity of the distribution of states represented by the different consensus landscapes. To that end, it is possible to work under the assumption that the distributions of states of different methods in the consensus space should be the same, allowing us to derive the threshold that fulfills the previous assumption.

Following the previous reasoning, we proposed an approach based on FlexConsensus to derive the previous threshold based on a random permutation test of the distances between identical samples in the consensus space. It allows us to obtain a P value representing the probability of measuring the distances observed in the consensus space against random labeling. The Methods section describes this filtering process and the threshold computation from the previous P value in more detail.

The application of the previous test to the IgG-RL consensus space is summarized in Fig. 1. At the beginning of the figure, we show the unfiltered consensus spaces, followed by the plot of the P values that the previously described test yielded. Additionally, we show below the previous plot the permutation distribution of distances for the selected threshold, showing that most of the measurements computed from the random labeling fall below it. After applying the threshold highlighted in these plots, it is possible to get a representation of the consensus space, including only those images estimated to have a similar conformational state according to the two input methods.

For this experiment, the filtering process suggests that the distribution of states in the original spaces was, in general, not similar, which aligns with the findings found in the original CryoBench paper¹¹. This is shown when checking the threshold found with this test, as presented in the ‘P value analysis’ plot, which translates into a large fraction of discarded images (only approximately 10,000 images are kept after applying the threshold). Additionally, the filtered space isolates more independent antibody states than the original consensus, helping identify the different motions simulated in the dataset. In Supplementary Video 1, we inspect the filtered consensus space and the associated conformational states.

To better quantify the relationship between the antibody’s position and the location in the filtered consensus space where potentially different conformational states are more easily detected, we performed an additional analysis to determine if the previous two variables are correlated. To that end, the filtered consensus space was split into 10 uniformly distributed clusters with k-means. The representatives of each cluster were then converted to HetSIREN latent space vectors using the decoding capabilities of the FlexConsensus network, allowing us to recover the conformational states associated with the cluster representatives. Each volume was segmented using Segger¹³ to isolate the heavy and light chains of the protein. Because of this segmentation, it is possible to measure the center of mass of the fixed heavy chain and the center of mass of the light chain at the different conformations. To characterize the motion of the antibody, we measured the distance from the center of mass of the heavy chain to the center of mass of the light chain as well as the angle defined by the previous two centers of mass and the anchor point between the heavy and the light chains.

The next step was to characterize the presence of a correlation between the two measured variables (distance between centers of mass and angle) and the location in the filtered FlexConsensus space. To that end, we carried out a Mantel test, working under the hypothesis that the distance matrices defined by the FlexConsensus latent vectors and the two variables measured in the volumes are correlated. A summary of this experiment is provided in Extended Data Fig. 1a. As seen from the figure, the Mantel test returns a P value of 0.002, confirming the hypothesis that there is a correlation between the motions of the antibody at the level of maps and the location in the consensus space.

The second dataset explored was the MDSPike dataset included in CryoBench. This dataset was simulated from a long-timescale molecular dynamics simulation, yielding a free-energy landscape that was sampled to generate approximately 50,000 structures. These structures were then converted to electron density maps and projected to create the images in the dataset. The ground truth locations of these structures in the energy landscape are also provided.

Similar to the case before, HetSIREN (with no pose refinement) and CryoDRGN were executed to estimate the conformational space from this dataset, which was then analyzed with FlexConsensus. The resulting consensus spaces are shown in Fig. 2. As can be seen from the figure, the distribution of states estimated by the two methods was more similar in this case, which also translates into a filtered landscape losing a lower fraction of particles. In Supplementary Video 2, we provide an inspection of the filtered consensus space and the associated conformational states, which shows the transition from the one-up receptor-binding domain (RBD) state to the three-down state simulated in this dataset.

**Fig. 2: FlexConsensus analysis resulting from comparing the estimations of HetSIREN and CryoDRGN for the MDSPike dataset included in CryoBench¹¹.**

Apart from comparing estimations from different methods, the MDSpike dataset opens an interesting possibility for evaluating how methods compare to molecular-dynamics-determined energy landscapes. Therefore, we performed additional analysis to map experimental and simulated landscapes onto a common space, focusing on the possibility of correlating collective variables characteristic of a given motion with the consensus space. To that end, we trained a new FlexConsensus network with the ground truth latent coordinates obtained from the simulation and the experimental landscapes obtained with HetSIREN and CryoDRGN. Because the ground truth space was defined to have two dimensions, the FlexConsensus latent space was set to have this number of dimensions.

Once the network was trained, we took different latent vectors in the ground truth space along the two principal axes of the energy landscape, allowing us to define our collective variables. The extracted data were then input to FlexConsensus to map them in the consensus space, allowing us to evaluate their correlation with the consensus axes. The results of this experiment are summarized in Extended Data Fig. 1b. As seen from the figure, the collective variables extracted from the principal axes of the ground truth space have a strong correspondence with the main axis of the consensus space, implying that the consensus space has been able to capture the relevant structural information from the simulated motions.

Consensus results on the EMPIAR 10028 dataset

The next step in evaluating the capabilities of FlexConsensus is applying the method to a more realistic scenario. Therefore, we proposed the evaluation of FlexConsensus with the EMPIAR 10028 dataset¹⁴, a well-known and well-studied dataset showing different conformational states of the Plasmodium falciparum 80S ribosome bound to emetine. This dataset has been widely applied as a test case for the most recently developed heterogeneity methods, yielding a conformational landscape with well-defined features. In addition, the experimental images in the dataset mainly capture continuous conformational changes, although there is also a compositional variability component. Because of all these characteristics, the EMPIAR 10028 dataset supposes a realistic yet controlled scenario to evaluate the consensus when considering conformational landscapes estimated by different methods.

The dataset was first preprocessed with CryoSPARC¹⁵ inside Scipion¹² to yield a set of appropriately characterized experimental images to be analyzed by heterogeneity algorithms. This involves estimating contrast transfer function (CTF) information and particle alignments, which were subjected to a consensus analysis to improve their stability¹⁶.

A set of approximately 50,000 particles was obtained from the preprocessing step and further analyzed for conformational variability. The study was carried out with two different methods: HetSIREN⁵ and Zernike3D⁶. These two methods follow very different approaches to estimating conformational variability, HetSIREN being a heterogeneous reconstruction/refinement algorithm and Zernike3D being a deformation field-based method. Therefore, HetSIREN can extract continuous and compositional variability, whereas Zernike3D focuses on extracting continuous motions. Moreover, each method defines a conformational landscape with different dimensions, similar to the synthetic datasets analyzed in the previous sections.

In total, three independent conformational landscapes were estimated, corresponding to the execution of HetSIREN in reconstruction mode, the execution of HetSIREN in refinement mode and the execution of Zernike3D. The main difference between the reconstruction modes of HetSIREN is the algorithm’s initialization: in reconstruction mode, the initial volume required by HetSIREN is initialized to have only zero values. By contrast, HetSIREN in refinement mode receives as the initial volume a map calculated from all initial images. These landscapes were then fed to FlexConsensus to generate the consensus landscape and the error metrics needed to determine the reliability of the three estimations.

It is important to mention that FlexConsensus also works with estimations obtained from different executions of the same method under the same conditions, allowing the study of the consistency and reliability of a single algorithm’s estimations. Although we could have executed HetSIREN in this way, we considered comparing this method’s two modes of operation to determine if they impact the estimated conformational landscapes.

Before analyzing the results obtained for this dataset, we include some terminology that will be used from now on when comparing different results. We will refer to the space computed from the density distribution of the points in the consensus space associated with method X as ‘X subspace’. These subspaces will help identify the location of the points coming from input space X in the common consensus space, allowing us to quickly compare the differences among the different methods in the consensus. The consensus space, composed of the addition of all the involved subspaces, will be referred to as ‘common consensus space’.

The results obtained from the analysis of the common consensus space learned by FlexConsenus are summarized in Fig. 3. Figure 3a presents the common space with the previously described subspace associated with each input space (note that this two-dimensional image of consensus space corresponds to a selected view of the computed 3D space; full access to 3D space can be done through Flexutils, as described later on in the main text). At the top of Fig. 3a, the ‘Zernike3D subspace’ is presented, showing only those points coming from the Zernike 3D landscape (that is, the mapping onto the common space of the point cloud corresponding to the Zernike3D analysis) colored on top of the common consensus space (shown in grays). At the bottom, a similar representation is followed by the points coming from the two HetSIREN landscapes. The general disposition of the three subspaces is visually similar, indicating that the strategy of mapping all input results to the same common space is adequate. In detail, a difference between Zernike3D and HetSIREN is noticeable in the common landscape at the top rightmost corner, which is further highlighted by a circle drawn with a broken green line. The Zernike3D subspace in this area is unstructured (points are sparse), unlike both HetSIREN landscapes, which show highly ordered subspaces. Interestingly, the maps’ analysis from these points indicates that they correspond to specimens presenting a substantial compositional variation (they lack the 40S subunit, showing only the 70S one), which Zernike3D, by design, could not capture. The point clouds within the blue and yellow broken line circles correspond to particles presenting internal motions, which all three methods have captured well. To simplify the visualization of each subspace in Fig. 3a, we include in Fig. 3b the subspaces of each method by themselves. In addition to the results displayed in Fig. 3a, we include in Fig. 3c a highlight of the regions estimated to be less reliable for every method according to FlexConsensus. In Zernike3D, the higher errors are associated with the compositional components, which agrees with the results previously discussed. In the case of HetSIREN, these regions are found in the periphery of the landscape, which are also regions with a lower density than the central cloud. These differences are further highlighted in Extended Data Fig. 2. It should be noted that the consensus landscape has three dimensions as specified when defining the network, although Fig. 3 shows only a two-dimensional projection. Thus, some of the highest error points in HetSIREN seem to be located within the landscape, although the visualization in three dimensions reveals that they are also located in the periphery.

**Fig. 3: Consensus landscapes obtained by FlexConsensus for the EMPIAR 10028 dataset¹⁴.**

To further support the results observed from the inspection of the consensus space, in Extended Data Fig. 3 we show the error histograms computed from the representation error obtained from the comparison of the input spaces against their analogous decoded spaces. Similar to the previous sections, each consensus space (that is, the points in the consensus space coming from the Zernike3D and HetSIREN landscapes) can be decoded toward each input space. Therefore, we have three decoded spaces analogous to Zernike3D, three to HetSIREN in reconstruction mode and three to HetSIREN in refinement mode. With the previous histograms, Extended Data Fig. 3 also includes the consensus error derived directly from the common consensus space.

As expected, the two HetSIREN runs are more similar, meaning that the two executions were relatively stable. This also means that the HetSIREN operation mode has an inconsiderable effect in determining the conformational landscape in this specific case. It is also possible to detect how Zernike3D errors tend to deviate more from HetSIREN and the effect arising from the differences in the type of heterogeneity that both methods can detect. From the histograms, one can quickly identify and interactively determine the particles to exclude from future analysis, as they accurately summarize how reliable the heterogeneity estimation was for every particle.

To better understand the source of the differences detected by FlexConsensus between the two HetSIREN executions and Zernike3D, an additional experiment was conducted to analyze at volume level the conformations arising from the landscape regions detected to have the most substantial errors. To that end, a subset of the HetSIREN and Zernike3D landscapes was extracted based on the indices obtained from the images, which were estimated to have the most substantial error based on the consensus. The subset spaces were then clustered to get 10 different representatives of the conformations, which were used to decode their associated conformations with HetSIREN and Zernike3D. The different conformations obtained are presented in Supplementary Video 3. As seen from comparing the states recovered by both methods, HetSIREN tends to detect motions with lower amplitude and a strong compositional component. In comparison, Zernike3D estimates continuous motions with a larger amplitude. This result suggests that the consensus differences are not related to ‘wrong’ images but to estimation differences between the two methods.

Consensus results on SARS-CoV-2 D614G dataset

To further assess the capabilities of FlexConsensus under different experimental conditions, we evaluated the method with the SARS-CoV-2 D614G spike using a dataset obtained from Sriram Subramanian’s laboratory (University of British Columbia). This protein is well characterized and exhibits a wide range of motions, primarily affecting the prefusion state, the RBD and the N-terminal domains. The RBD transitions are particularly interesting among the motions the spike undergoes due to their localized and highly dynamic nature.

Similar to the pipeline followed in the previous section, the experimental dataset was preprocessed within Scipion¹², leading to 440,000 particle images with CTF and angular information. The structural variability captured by the particles was then approximated with two different software: HetSIREN⁵ and CryoDRGN². In contrast to Zernike3D discussed in the previous section, both HetSIREN and CryoDRGN follow the heterogeneous reconstruction approximation to extract the conformational landscape from a set of images. Their conformational landscapes should be more comparable because they follow a similar approximation to solve the structural heterogeneity problem.

It is worth mentioning that HetSIREN can further refine the poses of the images while estimating the conformational states, in contrast to CryoDRGN. However, it was decided to turn off this feature in HetSIREN so that the two methods behave similarly when estimating their conformational latent spaces, simplifying the data analysis.

Again, the analysis will follow the same terminology introduced in the previous experiment: ‘X subspace’, the space determining the location of the points coming from input space X in the common consensus space, and ‘common consensus space’, composed of all the subspaces.

The reliability of the two independently estimated landscapes was analyzed by FlexConsensus, leading to the consensus landscapes presented in Fig. 4. Fig. 4a includes the common consensus landscape in grays and the HetSIREN and CryoDRGN subspaces colored on top (note that this two-dimensional image of consensus space corresponds to a selected view of the computed 3D space; full access to 3D space can be done through Flexutils, as described later on in the main text). An initial inspection of these representations reveals three central structural regions corresponding to the RBDs in three-down, one-up and two-up states, correctly identified by both methods. However, HetSIREN concentrates more on the three-down and one-up states, unlike CryoDRGN, which focuses more on the two-up and less on the three-down. In addition, Fig. 4b shows the location of the images estimated to have a more substantial consensus error based on the structural variability estimation from HetSIREN and CryoDRGN. In both cases, these particles are located in the periphery of the consensus landscape, this arrangement being a more prominent feature in the case of HetSIREN. Particles located in the periphery are usually associated with states estimated wrongly due to bad particle images or random estimation errors. Again, it is important to note that the consensus landscape lives in a 3D space. Thus, some of the highest error points shown in Fig. 4b appear to be in the interior of the landscape. However, the visualization in three dimensions reveals that they are also located in the periphery. Overall, the presence of these periphery particles is not important compared to the inner region of the consensus landscapes, where a more thorough analysis can be performed to properly describe whether the differences in the distribution of state estimated by different methods are considerable.

**Fig. 4: Consensus landscapes obtained by FlexConsensus for the SARS-CoV-2 D614G spike variant.**

To more quantitatively assess the reliability of the structural states estimated by HetSIREN and CryoDRGN, we evaluated the representation error between the input spaces and those decoded from the consensus space. Four spaces were decoded, obtained when forwarding the HetSIREN and CryoDRGN consensus spaces through the decoders responsible for generating the original two spaces from the consensus. These four spaces were then used to compute the representation errors represented as histograms in Extended Data Fig. 4, and the consensus error was measured directly on the consensus space. The histogram analysis reveals a slight discrepancy between the two methods, as the error distributions between the spaces decoded from each consensus space do not fully overlap. This discrepancy suggests that the relative population of states estimated by CryoDRGN and HetSIREN is unequal, even though both methods correctly detect the main conformational states in the dataset. This is further supported when visually inspecting the consensus spaces included in Fig. 4, as it can be seen that the overall distribution of the particles over the three main spike states is not similar in the two methods.

It should be noted that the relative error scales will depend on the relative scale of the input space, making the comparison and interpretation of the error histograms difficult. In general, the scale of the conformational space for two different methods might not be similar, meaning that the latent vectors might be larger or smaller depending on how the method is designed. The differences in the scale of the latent vectors are transmitted through the decoders in FlexConsensus to find a mapping that is as close as possible to a given input. Therefore, the relative differences in scale are reflected in the relative representation error computed for a given input space. Thus, we applied the normalization described in the Supplementary Information to simplify the comparison of the error histograms presented in Extended Data Fig. 4 and improve their interpretability. This normalization makes it easier to understand the relative size of the errors while constraining them to the same range.

Similar to the previous experiments performed with CryoBench datasets, we filtered the consensus space obtained from the SARS-CoV-2 dataset to keep only those images estimated to have a significantly similar conformation. The results obtained for this analysis are summarized in Fig. 5.

**Fig. 5: Scheme of the stabilization process to improve the state distribution reliability of the shared FlexConsensus landscape.**

When comparing the results obtained from the two analyses presented in Fig. 5 and Extended Data Fig. 4, it is possible to see that both approaches detect that the distribution of states identified by HetSIREN and CryoDRGN was incompatible in an important fraction of the images. These results suggest the importance of validating the landscapes estimated by a given algorithm to detect these inconsistencies.

However, it is interesting to understand whether the filtered FlexConsensus space yields more meaningful landscape populations. To that end, we conducted a 3D classification of the original images into several classes, which were merged according to the RBD conformational state that they represent. Because both HetSIREN and CryoDRGN suggest a clear tendency toward the presence of three well-differentiated conformational states, the populations of the 3D classification will be considered as the ‘ground truth’ populations as they are obtained from a standard approach in the field.

In the case of the landscapes, Gaussian mixture-based clustering was used to compute the populations. To that end, three Gaussian components were estimated to cluster the spaces to get the relative populations to compare against the 3D classification ones. It should be noted that the populations were directly measured in the consensus subspaces of HetSIREN and CryoDRGN, which were obtained using FlexConsensus.

The relative populations obtained were as follows:

3D classification (Relion¹⁷): three down (20%), one up (60%) and two up (20%)
HetSIREN (FlexConsensus): three down (22%), one up (50%) and two up (28%)
CryoDRGN (FlexConsensus): three down (27%), one up (35%) and two up (38%)
FlexConsensus filtered landscape: three down (26%), one up (41%) and two up (31%)

By comparing the different populations against the 3D classification, it is possible to see how HetSIREN tends to be more conservative, estimating a set of populations that follow a similar trend to the 3D classification ones (that is, a larger number of one-up states compared to the other two). By contrast, the trend of CryoDRGN is to give a larger population to the two-up states. FlexConsensus filtered landscape tends to normalize the populations to a value between the two original methods, leading to a new set resembling the behavior seen in the classification results.

A question remains on analyzing the characteristics of those images estimated to have a large consensus error. It is important to understand whether they correspond to ‘bad’ particles or, instead, their position in conformational space arises from estimation errors from the heterogeneity methods. Therefore, an additional experiment was conducted to compare the reconstructions obtained from those images estimated to have a consensus error larger than a given threshold. The results obtained in this analysis are summarized in Extended Data Fig. 5. As shown in the figure, and as expected, these particles are distributed differently in the HetSIREN and CryoDRGN consensus spaces. In CryoDRGN, the images are placed close to the two-up region, whereas HetSIREN tends to distribute them along the one-up and three-down states. By contrast, the volume reconstructed from these images yields a clear one-up state, indicating that the quality of these images was not intrinsically bad but that, for yet unknown reasons, both HetSIREN and CryoDRGN did not properly position these images in conformational space close to other images in similar states of RBD opening. Consequently, although the conceptual distinction between images with large consensus errors being either intrinsically bad or badly estimated by a method is very important, it seems that we do not have a clear way to distinguish between these two options.

Discussion

The cryo-EM community’s great interest in the emerging field of conformational variability analysis is reflected in the increasing number of advanced methods published in recent years. However, the interest in these tools raises a new challenge in comparing the estimations of different techniques to assess their stability, reliability and accuracy. Moreover, the diverse nature of all these new methods makes comparing their results more complex, making it even more difficult to find robust approaches to accurately defining a valid consensus analysis.

To allow a better understanding of conformational landscapes, in this work we discuss a new method to overcome the challenges arising from the comparison of different heterogeneity algorithms, leading to a whole range of possibilities to extract more reliable and accurate conformational states from independent conformational landscapes. Our new FlexConsensus approach relies on a multi-autoencoder architecture specifically designed to robustly identify commonalities and differences among multiple conformational landscapes, ultimately defining a common consensus landscape with enhanced interpretability. From the consensus landscape, the network can automatically derive a consensus error metric for every particle in the conformational landscape, which can be used either to explicitly determine for any given image in a dataset its structural state’s reliability across various heterogeneity methods, determining subsets that can be confidently analyzed, or to analyze specific subsets of particles highlighted only by particular algorithms.

**Fig. 6: Scheme of the FlexConsensus training workflow.**

Methods

This section describes the FlexConsensus multi-autoencoder architecture and the training strategy followed to allow the network to learn meaningful consensus spaces and consensus metrics from input conformational spaces. After the architecture’s discussion, we include the description of the filtering method applied in this paper and a brief discussion on how to access and integrate FlexConsensus in a workflow.

FlexConsensus multi-autoencoder architecture

FlexConsensus architecture consists of a multi-autoencoder architecture with a variable number of encoders and decoders determined by the total number of inputs. Each input corresponds to a conformational landscape that the consensus network should consider, which may come from a different algorithm with its own characteristics. An architecture scheme is provided in Fig. 6.

Each input conformational landscape is forwarded through its encoder, which is composed of three fully connected layers with 1,024 neurons each and rectified linear unit (ReLU) activation. The output layer of every encoder is then fed to a standard linearly activated latent space layer of a variable number of neurons that the user can choose. This bottleneck will become the common conformational space after the network is trained.

Lastly, each encoded data are forwarded through the corresponding decoder, trying to restore the original conformational space based on the information of the common latent space. Similar to the encoders, each decoder comprises a set of three fully connected layers with 1,024 neurons each and ReLU activation, followed by a final layer with the same dimension as the corresponding initial conformational space and linear activation.

The network is trained using an Adam optimizer. By default, the learning rate is set to 10⁻⁵, and the batch size is 1,024. However, users can modify these hyperparameters to fit their needs.

As a reference, training the network takes approximately 16 seconds per epoch on an RTX Ada 6000 generation GPU, using default hyperparameters and two input spaces with 450,000 points each; VRAM consumption is approximately 2.6 GB.

Ensuring proper merging of conformational spaces

One of the main challenges in training a consensus network is correctly driving the network to merge spaces of varying characteristics in the same region of the latent space. Due to the properties of autoencoders, a simple mean square error between the original and predicted spaces results in a latent space where different distributions are well separated. Although this will help decrease the network’s representation error, it is a highly undesirable effect, as it completely obscures the understanding of the intrinsic relations among the distributions of the input spaces.

A possible approach to solving the previous problem is to bring the latent space distributions close by, ensuring that their distances are as small as possible. This can be easily achieved by minimizing the mean square error between all possible distribution pairs. In this way, the network is forced to learn a more meaningful representation compared to the original uncontiguous case:

$${L}_{1}=\min \mathop{\sum }\limits_{n=1}^{N}\mathop{\sum }\limits_{m=1}^{N}\sum _{i}\left\Vert {E}_{n}\left({{\bf{x}}}_{i}^{n}\right)-{E}_{m}\left({{\bf{x}}}_{i}^{m}\right)\right\Vert^{2}$$

(1)

being N, the total number of input spaces; E_n and E_m are the operators responsible for encoding a given space into the common consensus space; and i is an index that covers all input points. ${{\bf{x}}}_{i}^{n}$ is the representation of the i-th image latent vector in the n latent space.

However, the previous cost is not sufficient in our experiments, primarily due to the effect of the representation error of the decoder, which drove the result away from a fully merged consensus space. Therefore, an additional restriction is required to properly ensure that the local structure of the original conformational spaces is preserved as much as possible in the predicted consensus space. To that end, we propose comparing the distances between the original and predicted latent spaces. The self-distance matrices of every input and latent space batch are computed and compared on a pairwise basis following a similar approach to the previous cost. The minimization of the all-to-all pairwise mean square error among the self-distance matrices allows the network to learn to keep the original structure of the conformational spaces as consistent as possible. This is similar to the objective of Shannon mapping. Therefore, the Shannon mapping cost function can be written as:

$${L}_{2}=\min \mathop{\sum }\limits_{n=1}^{N}\left(\frac{{\sum }_{i,\;j}\frac{{\left({d}_{ij}({X}^{\;n})-{d}_{ij}({E}_{n}({X}^{\;n}))\right)}^{2}}{{d}_{ij}({X}^{\;n})}}{{\sum }_{i,\;j}{d}_{ij}({X}^{\;n})}\right)$$

(2)

being d_ij(X) the ij-th entry of the self-distance matrix computed from a set of coordinates X. The self-distance matrix is computed as the all-versus-all distance among the coordinates in the set.

In addition, a third cost is included in the total cost function to ensure that the distance distributions among the different consensus spaces are similar. To that end, we compute the self-pairwise distance matrix of the other encoded consensus spaces. The distance matrix is then calculated, and its entries are considered random samples from a distance distribution. We then add another regularization term based on the Wasserstein distance between two different distributions:

$${L}_{3}=\min \mathop{\sum }\limits_{n=1}^{N}\mathop{\sum }\limits_{m=1}^{N}W\left(D({E}_{n}({X}^{\;n})),D({E}_{m}({X}^{\;m}))\right)$$

(3)

being D, the operator that computes the distance matrix, and W, the function that computes the Wasserstein distance between two distance distributions.

Combining the previous three costs at the latent space level in our experiments led to a good representation of a continuous consensus space. In addition to the last three cost functions, a standard autoencoder representation error between the original and predicted conformational spaces based on the consensus space is also included. Because every conformational space might be affected by different estimation errors, only those regions consistently placed in the consensus space will have a low representation error. Therefore, the representation error provides a good measurement of the stability of the estimations, which can be used in conjunction with the consensus space to detect or filter out those regions in the conformational spaces that might have been estimated less reliably based on the agreement of all the spaces. The contribution of the previous regularizers in the overall cost function is summarized as follows:

$$L=\mathop{\sum }\limits_{n=1}^{N}\mathop{\sum }\limits_{m=1}^{N}\sum _{i}{\left\Vert {{\bf{x}}}_{i}^{n}-{D}_{n}\left({E}_{m}\left({{\bf{x}}}_{i}^{m}\right)\right)\right\Vert }^{2}+{L}_{1}+{L}_{2}+{L}_{3}$$

(4)

The first term represents the representation loss obtained by comparing the inputs against the decoded outputs, and D_n is the corresponding decoder responsible for transforming a given consensus space into the input space n.

FlexConsensus training strategy

The multi-encoder architecture proposed in FlexConensus introduces extra flexibility in the training step, depending on the path to backpropagate the gradients computed during every forward pass through each autoencoder.

The workflow followed to train FlexConsensus is shown in Fig. 6. In the simplified case proposed in the figure, two independent input spaces are fed to their corresponding encoders, generating a different representation in a shared latent space. The first step includes encoding both input spaces, leading to two independent representations in the consensus space. Once the consensus spaces are generated, they are forwarded sequentially through the different decoders, triggering independent backpropagation steps for every decoder. We found the sequential training of the decoders to be more stable than training the whole network on a single backpropagation step, leading to a faster and more accurate convergence.

Once the network is trained, the corresponding encoder can recreate the complete consensus space and the location of a specific input in that space. In addition, the different encoder/decoder combinations can be used to convert to different input spaces and measure the consensus error estimated for a given input.

Consensus set

Given the mapping of all points from the input spaces to a shared latent space, a natural question arises: which images indeed agree with one another? As illustrated in Fig. 6, we can calculate the distance between the projected points of space 1 and their corresponding projections from space 2 in the latent space. Although this approach can be extended to more than two spaces, we simplify the explanation by focusing on two spaces. We rank all points based on their average distance to points in the other space(s). Intuitively, the closest points represent agreement, whereas the most distant points indicate disagreement.

Using the top K particles from this list, we compare the distributions of their projections from space 1 and space 2. This comparison is based on the Wasserstein distance between the two sets of points. We approximate the Wasserstein distance in k dimensions to accelerate the calculations using the one-dimensional Wasserstein distance of the points projected onto κ random unit vectors¹⁸. We denote this observed distance as d_obs. To assess its significance, we compare d_obs to the distribution of distances obtained by randomizing the labels ‘space 1’ and ‘space 2’ (performing 100 randomizations). The proportion of randomized distances smaller than or equal to d_obs gives the P value, representing the probability of observing such a distance under random labeling. We expect that points with a small representation error in the latent space correspond to indistinguishable distributions of latent points from space 1 and space 2. However, as the number of particles K increases, a threshold is reached where the two distributions become distinguishable, indicating that the two sets of points can no longer be considered equivalent. This threshold is defined as the point where the P value falls below 0.05. The value of K at which this occurs establishes the set of consensus images.

FlexConsensus as part of heterogeneity analysis workflows—the case of integration inside Scipion

Flexibility analysis is becoming progressively a standard practice in the cryo-EM field, promoted by the new emerging tools that can extract more accurately the structural information captured in the experimental images. Therefore, it is common for experimentalists to analyze the flexibility of their biomolecules with these tools. However, they are restricted in the depth of interpretation of the results reported by any given method because of the lack of validation tools and the difficulty in performing a joint analysis of the results of different methods. Our method, FlexConsensus, aims to be inserted at this point in the analysis, allowing the integration and comparison of the results yielded by different algorithms to validate whether the conformational spaces they report agree.

The FlexConsensus algorithm has been integrated into Scipion as a protocol included in the Scipion Flexibility Hub¹⁹ and accessible through the Flexutils Scipion plugin. The protocol provides a graphical interface, allowing one to easily customize the network’s hyperparameters or select the neural network’s inputs.

Inside Scipion, it is possible to easily execute popular heterogeneity analysis methods such as CryoDRGN², 3DFlex⁷ or HetSIREN⁵, among others. The inputs needed to execute these algorithms are also organized in a form similar to the FlexConsensus case. The particles required to execute them can be imported into Scipion or processed inside by combining cryo-EM software such as Relion¹⁷ or CryoSPARC¹⁵. The conversions needed to adapt the outputs and inputs to different software are internally handled by Scipion, making the transition among packages smoother. These concepts also apply to the heterogeneity software integrated into Scipion and FlexConsenus.

The predictions obtained after training FlexConsensus are also automatically registered inside Scipion, allowing it to communicate with the integrated visualization, analysis and subset generation tools and giving the user greater flexibility to manipulate their data according to their specific needs.

For example, users can access the 3D space annotator to view the consensus space in three dimensions, allowing them to select any combination of axes/dimensions to inspect and interact with it. The annotation tool also enables the recovery of real-time confirmations from any region of the consensus space to visualize them inside the annotator or using the direct connection with ChimeraX²⁰. To extract the previous conformations, the annotator relies on the specific method that generated the latent space being analyzed to recover the same state seen by that method for any location in the conformational space. This avoids the need to reconstruct with a neighborhood of particles surrounding a given point in the conformational space.

Additionally, it is possible to extract the particles associated with a given region interactively and register them inside Scipion, allowing their subsequent processing in the workflow to analyze them further.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

CryoBench datasets are publicly available at https://cryobench.cs.princeton.edu/. The 80S ribosome experimental dataset analyzed in the paper is available in EMPIAR under entry 10028 (https://doi.org/10.6019/EMPIAR-10028). The SARS-CoV-2 D614 spike dataset will be published in a different work.

Code availability

The FlexConsensus algorithm is available through Scipion 3.0 (ref. ¹²) under the plugins and scipion-em-flexutils²¹ (https://github.com/scipion-em/scipion-em-flexutils) and the package Flexutils-Toolkit²² (https://github.com/I2PC/Flexutils-Toolkit). The protocol corresponding to the algorithm described in this paper is flexutils - train - FlexConsensus and flexutils - interactive consensus - FlexConsensus. Tutorials on how to set up and use FlexConsensus are provided on the following webpage: https://scipion-em.github.io/docs/release-3.0.0/docs/user/tutorials/flexibilityHub/main_page.html#tutorials. In addition, we provide an interactive Scipion workflow visualization showing an example of how to execute FlexConsensus here: https://scipion.i2pc.es//cryoemworkflowviewer/workflow/aac223841371a67510e9eab16ac8870246b30d68.

References

Scheres, S. H. et al. Disentangling conformational states of macromolecules in 3D-EM through likelihood optimization. Nat. Methods 4, 27–29 (2007).
Article CAS PubMed Google Scholar
Zhong, E. D., Bepler, T., Berger, B. & Davis, J. H. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 18, 176–185 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ludtke, S. J. & Muyuan, C. Deep learning-based mixed-dimensional gaussian mixture model for characterizing variability in cryo-EM. Nat. Methods 18, 930–936 (2021).
Article PubMed PubMed Central Google Scholar
Luo, Z., Ni, F., Wang, Q. & Ma, J. OPUS-DSD: deep structural disentanglement for cryo-EM single-particle analysis. Nat. Methods 20, 1729–1738 (2023).
Article PubMed PubMed Central Google Scholar
Herreros, D. et al. Real-space heterogeneous reconstruction, refinement, and disentanglement of CryoEM conformational states with HetSIREN. Nat. Commun. 16, 3751 (2025).
Article CAS PubMed PubMed Central Google Scholar
Herreros, D. et al. Approximating deformation fields for the analysis of continuous heterogeneity of biological macromolecules by 3D Zernike polynomials. IUCrJ 8, 992–1005 (2021).
Article CAS PubMed PubMed Central Google Scholar
Punjani, A. & Fleet, D. J. 3DFlex: determining structure and motion of flexible proteins from cryo-EM. Nat. Methods 20, 860–870 (2023).
Article CAS PubMed PubMed Central Google Scholar
Schwab, J., Kimanius, D., Burt, A., Dendooven, T. & Scheres, S. H. W. DynaMight: estimating molecular motions with improved reconstruction from cryo-EM images. Nat. Methods 21, 1855–1862 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, M., Toader, B. & Lederman, R. Integrating molecular models into CryoEM heterogeneity analysis using scalable high-resolution deep Gaussian mixture models. J. Mol. Biol. 435, 168014 (2023).
Article CAS PubMed PubMed Central Google Scholar
Guilles, M. A. & Singer, A. Cryo-EM heterogeneity analysis using regularized covariance estimation and kernel regression. Proc. Natl Acad. Sci. 122, e2419140122 (2025).
Article Google Scholar
Jeon, M. et al. CryoBench: diverse and challenging datasets for the heterogeneity problem in cryo-EM. In 38th Conference on Neural Information Processing Systems (NeurIPS 2024) https://proceedings.neurips.cc/paper_files/paper/2024/file/a2ef5ba272df8f168dc38037cc946be0-Paper-Datasets_and_Benchmarks_Track.pdf (2024).
de la Rosa-Trevín, J. M. et al. Scipion: a software framework toward integration, reproducibility and validation in 3D electron microscopy. J. Struct. Biol. 195, 93–99 (2016).
Article PubMed Google Scholar
Pintilie, G. & Chiu, W. Comparison of Segger and other methods for segmentation and rigid-body docking of molecular components in cryo-EM density maps. Biopolymers 97, 742–760 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wong, W. et al. CryoEM structure of the Plasmodium falciparum 80s ribosome bound to the anti-protozoan drug emetine. Elife 3, e03080 (2014).
Article PubMed PubMed Central Google Scholar
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. CryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
Article CAS PubMed Google Scholar
Sorzano, C. O. S. et al. Image processing in cryo-electron microscopy of single particles: the power of combining methods Methods Mol. Biol. 2305, 257–289 (2021).
Kimanius, D., Dong, L., Sharov, G., Nakane, T. & Scheres, S. H. W. New tools for automated cryo-EM single-particle analysis in RELION-4.0. Biochem. J. 478, 4169–4185 (2021).
Article CAS PubMed Google Scholar
Bonneel, N., Rabin, J., Peyré, G. & Pfister, H. Sliced and Radon Wasserstein barycenters of measures. J. Math. Imaging Vision 51, 22–45 (2015).
Article Google Scholar
Herreros, D. et al. Scipion Flexibility Hub: an integrative framework for advanced analysis of conformational heterogeneity in cryoEM. Acta Crystallogr. D Struct. Biol. 79, 569–584 (2023).
Article CAS PubMed PubMed Central Google Scholar
Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).
Article CAS PubMed Google Scholar
Herreros, D., Mata, C. P., Sorzano, C. O. S. & Carazo, J. M. scipion-em/scipion-em-flexutils: Release 3.4.0 (v3.4.0). Zenodo https://doi.org/10.5281/zenodo.16917557 (2025).
Herreros, D., Mata, C. P., Sorzano, C. O. S. & Carazo, J. M. I2PC/Flexutils-Toolkit: Release 1.3.2 (v1.3.2). Zenodo https://doi.org/10.5281/zenodo.16917560 (2025).

Download references

Acknowledgements

The authors acknowledge the financial support from the Ministry of Science, Innovation, and Universities (BDNS no. 716450) to Instruct-ES as part of the Spanish participation in Instruct-ERIC; the European Strategic Infrastructure Project (ESFRI) in the area of Structural Biology, grant PID2022-136594NB-I00 (J.M.C. and C.O.S.S.), funded by MICIU /AEI/10.13039/501100011033/ and ‘ERDF A way of making Europe’ by the European Union; the Spanish State Research Agency AEI/10.13039/501100011033, through the Severo Ochoa Programme for Centres of Excellence in R&D (CEX2023-001386-S) (J.M.C. and C.O.S.S.); Comunidad Autónoma de Madrid through grant S2022/BMD-7232 (J.M.C. and C.O.S.S.); the European Union and Horizon 2020 through grant HighResCells (ERC - 2018 - SyG, Proposal: 810057) (J.M.C. and C.O.S.S.); and the European Union and Horizon Europe through grant Fragment Screen Proposal: 101094131 (J.M.C. and C.O.S.S.). We thank S. Subramaniam for providing us with the data processed and presented in this work and for his support and feedback.

Author information

These authors jointly supervised this work: Carlos Oscar Sanchez Sorzano, Jose Maria Carazo.

Authors and Affiliations

Centro Nacional de Biotecnologia-CSIC, Madrid, Spain
David Herreros, Carlos Oscar Sanchez Sorzano & Jose Maria Carazo
PKF Attest innCome, Madrid, Spain
Carlos Perez Mata

Authors

David Herreros
View author publications
Search author on:PubMed Google Scholar
Carlos Perez Mata
View author publications
Search author on:PubMed Google Scholar
Carlos Oscar Sanchez Sorzano
View author publications
Search author on:PubMed Google Scholar
Jose Maria Carazo
View author publications
Search author on:PubMed Google Scholar

Contributions

D.H. developed and tested the FlexConsensus method presented throughout the paper. D.H. wrote the manuscript. C.P.M. helped with the data preprocessing. C.O.S.S. and J.M.C. jointly supervised this work.

Corresponding author

Correspondence to David Herreros.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Evaluation of CryoBench dataset results for the IgG-RL and MDSpike datasets.

Panel a) shows the evaluation of the correlation between the FlexConsensus latent space vectors obtained for the IgG-RL dataset from CryoBench¹¹ and two internal variables (d and θ) designed to characterize simulated disordered motions (the Mantel test used to evaluate the significance of this correlation was two-sided, yielding a p-value of 0.002). Panel b) shows the comparison of the location of the collective variables measured from the ground truth energy landscape of the MDSpike dataset from CryoBench¹¹ in its original energy landscape and FlexConsensus space. Three landscapes were input to the consensus: the ground truth energy landscape, the conformational landscape of HetSIREN, and the conformational landscape of CryoDRGN. The results highlight the correlation of the original collective variables (CV1 and CV2) with the consensus space, which are almost aligned with the consensus space’s principal components (PC).

Extended Data Fig. 2 Highlight of the main differences found in the consensus landscape estimated for the EMPIAR 10028 dataset.

The top panel shows the regions where Zernike3D and HetSIREN disagree due to the presence of a strong compositional change that Zernike3D cannot capture. The bottom panel shows a strong difference in the periphery of the landscape, mainly associated with wrongly estimated conformations with increasing noise in the case of HetSIREN. Since Zernike3D cannot add noise to the conformation it estimates, its periphery is closer to the central region of the consensus space, as shown in the Figure.

Extended Data Fig. 3 FlexConsensus error histograms computed for the EMPIAR 10028 dataset.

The error metric corresponds to the standard Root Mean Squared Error between every original space and the spaces decoded by FlexConsensus from the consensus space, as well as the consensus error derived from the consensus space. The three images presented show a similar error tendency for all the decoders: the two HetSIREN executions remain more similar, meaning that the two independent estimations have similar characteristics and are reliable for most particles. In contrast, Zernike3D deviates more from HetSIREN, an effect mainly occurring due to the differences in the approximation followed by each method and the differences in the type of variability they can estimate. Apart from these differences, one can see that the distribution of the errors is similar in the three cases, and it allows us to easily identify and exclude those particles on the right tail of the distributions, which are the ones estimated with lower reliability based on these three executions.

Extended Data Fig. 4 FlexConsensus error histograms computed for the SARS-CoV-2 D614G spike.

The error metric corresponds to the standard Root Mean Squared Error between every original space and the spaces decoded by FlexConsensus from the consensus space, as well as the consensus error derived from the consensus space. The histograms reveal that the heterogeneity estimations yielded by HetSIREN and CryoDRGN suffer from a slight deviation, suggesting that the landscapes estimated by both methods might not follow the same distribution of states.

Extended Data Fig. 5 Analysis of those images estimated to have a larger consensus error in the SARS-CoV-2 dataset.

Reconstruction obtained from the SARS-CoV-2 dataset, considering only those images estimated to be within the 5% largest consensus error (around 21k particles from the original 440k). The landscapes show the distribution of these images in the consensus space: For CryoDRGN, most of these images are located in the two-up state region, while in HetSIREN, they are distributed along the regions corresponding to the one-up and three-down states. In contrast, the reconstruction calculated directly from these particles corresponds to a clear one-up state, indicating that the two heterogeneity methods wrongly assigned these images to regions of the conformational landscape.

Supplementary information

Supplementary Information

Supplementary information file, Supplementary figures and Supplementary videos.

Reporting Summary

Peer Review File

Supplementary Video 1

Example of conformational states sampled from the FlexConsensus space obtained for the IgG-RL dataset from CryoBench¹¹.

Supplementary Video 2

Example of conformational states sampled from the FlexConsensus space obtained for the MDSpike dataset from CryoBench¹¹.

Supplementary Video 3

Comparison of the conformational states detected by HetSIREN and Zernike3D for the EMPIAR 10028 dataset, sampled from those regions in the FlexConsensus space estimated to have the largest consensus error.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Herreros, D., Perez Mata, C., Sanchez Sorzano, C.O. et al. Merging conformational landscapes in a single consensus space with FlexConsensus algorithm. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02841-w

Download citation

Received: 07 January 2025
Accepted: 21 August 2025
Published: 25 September 2025
DOI: https://doi.org/10.1038/s41592-025-02841-w