Abstract
Predicting the outcome of a crystallization process remains a long-standing challenge in solid state chemistry. This stems from the subtle interplay between thermodynamics and kinetics that results in a complex crystal energy landscape, spanned by many polymorphs and other metastable intermediates. Molecular simulations are uniquely positioned to unravel this interplay, as they constitute a framework that can compute free energies (thermodynamics), barriers (kinetics), and visualize the crystallization mechanisms at high resolution. We show here how recent progress in computational methods, and their augmentation with Machine Learning, has advanced our ability to predict crystal structure and simulate crystal nucleation.

Similar content being viewed by others
Introduction
Crystalline materials are central to a wide range of applications1,2. Pharmaceutical drugs are often formulated as crystals of pharmaceutically active ingredients, i.e., organic molecules. In recent years, nanoscopic crystals of metals have been increasingly used as catalysts for chemical reactions involved in energy applications. Furthermore, new classes of crystalline materials, such as Metal Organic Frameworks (MOFs), are being developed to address emergent challenges in gas storage and separation for environmental and chemical sensing applications. Although crystallization from the melt, that is, in a single-component system, can already be a complex process involving multiple steps, many practical applications of crystallization take place in multicomponent systems. For instance, pharmaceutical formulation and material design have considerably expanded the combinatorial space of potential structures by exploring the use of co-crystals, which are crystals composed of two different molecules, an excipient and an active compound3. Another example is the case of nanoalloys that leverage synergistic effects between elements in catalysis, and mixing-and-matching different metal units with various organic linkers in MOFs. The next design frontier thus hinges on our ability to reliably obtain a specific crystal, co-crystal, or solvate at the end of the crystallization process. However, predicting the outcome of a crystallization event remains a challenging and elusive goal. Numerous examples from experiments and computations highlight the complexity of the crystallization process, which can give rise to several possible outcomes, that is, different crystal structures or polymorphs, even in the case of single-component systems, and intricate pathways involving metastable intermediate states whose lifetime may range between seconds and many years in some cases. Computational chemistry and materials science are uniquely poised to unravel the interplay between kinetics and thermodynamics. These methods provide a framework that allows for: (i) the determination of the key thermodynamic (free energies of the crystalline phases) and kinetic (free energy barriers of formation) properties, and (ii) the direct observation of the microscopic mechanisms underlying crystallization with very high spatial (of the order of 1 Å), and temporal (of the order of 1 fs) resolutions. These methods directly connect the microscopic (molecular) properties, such as molecular structure and interactions, to the macroscopic properties of the crystal, i.e., its thermodynamic properties, and symmetry. In this Review, we examine how recent advances in crystal structure prediction and in simulation of crystal nucleation events pave the way for a deeper understanding of crystallization. In Section “Inherent Complexities of Crystal Formation: Challenges for Molecular Simulation”, we discuss inherent complexities associated with crystallization and analyze recent successes enabled by computational studies. Such challenges include the intrinsically nonequilibrium nature of the process, as well as the closeness, in terms of free energy, between crystal phases and, more generally, between competing metastable states, which ultimately results in a complex interplay between thermodynamics and kinetics. In Section “Resilient force fields for evolving environments”, we examine how advances in our ability to model intermolecular interactions with the advent of partitioned, quantum-based, force fields and machine-learned potentials open the door to high-quality free energy calculations and to the reliable ranking of metastable and stable structures. In Section “Unraveling the mystery of order formation”, we focus on the analysis of the crystallization pathway generally characterized by geometric variables known as “reaction coordinates” or “collective variables”. We discuss how recent work leverages machine learning to thoroughly explore the configuration space and further our ability to identify new crystallization pathways through, for instance, the determination of collective variables. In the last section, we present two future directions, which include the study of far-from-equilibrium crystallization processes that give rise to unusual crystallization patterns and structures, and the formation of self-adaptive crystals that can crystallize reversibly in response to environmental cues.
Inherent complexities of crystal formation: challenges for molecular simulation
Rivaling crystal structures and competing crystallization pathways
It has long been recognized that most, if not all, compounds can crystallize into more than one single crystal structure, or polymorph. This phenomenon was reported almost two centuries ago on the example of benzamide4 and was later identified to play a significant role in the formation of the crystal phase5. This observation became known as Ostwald’s rule of stages, or Ostwald’s step rule, and summarizes the often sinuous crystallization pathway. Rather than proceeding directly from the liquid phase, be it a supercooled melt or a supersaturated solution, into the stable crystal phase, crystallization generally proceeds through a series of transitions. Each transition corresponds to the system leaving a metastable state to reach the closest (meta)stable state6. This statement opens the door to several questions (see Fig. 1). How many metastable states are accessible and explored during crystallization? Is there a metric that unambiguously defines which (meta)stable state is the closest? If so, should the closest state be the most similar from a geometry standpoint, or the state that is easiest to reach from an energy standpoint, i.e., that requires overcoming the lowest free energy barrier?
For each scenario, the evolution of the system is indicated by arrows on a Gibbs free energy (G) vs. a reaction coordinate (RC) plot. (Left) Ostwald’s step rule indicating that the solute (S) crystallizes into the least stable polymorph (1), before converting into the second least stable polymorph (2), and finally into the most stable polymorph (3). (Middle) A metastable intermediate (MI) may form prior to the onset of the least stable crystalline polymorph (1), thereby resulting in an augmented step rule with the sequence (S) → (MI) → (1) → (2) → (3). (Right) Competing kinetics: If the fastest forming polymorph, with the lowest free energy barrier of nucleation, is form (2) as shown in the plot, then (S) will crystallize into form (2) rather than in the least stable polymorph (1). Then, form (2) then converts into the stable polymorph (3).
The existence of polymorphs often provides many metastable states for the incipient crystal to choose from, even for the simplest systems. In 2011, Salzmann et al. reported that there were fifteen known polymorphs for water, with three new phases discovered in previous years, and indicated that future research efforts could uncover additional polymorphs at high pressures and temperatures, but also at negative pressures7. Organic compounds can also exhibit many polymorphs, and even small changes in crystallization conditions such as, e.g., in the solvent used, concentration, temperature, or pressure, can change the crystallization product by favoring the formation of a specific polymorph8,9. This in turn has a strong impact on the thermodynamic and physical properties of the crystal so obtained, since polymorphs have different properties, and calls for the computation of a polymorphic landscape over the space spanned by the crystallization conditions. Polymorphic forms are also observed during crystallization processes involving more than one compound, as shown in recent work on pharmaceutical cocrystallization10. For instance, changes in the conditions during mechanochemical cocrystallization can allow for the selection of the stable room temperature polymorph or of the metastable high-temperature form of the cocrystal10. Polymorphic crystallization can even become more challenging in some cases when multiple polymorphs form concomitantly during crystallization11. Similarly, a seed of the thermodynamically stable polymorph can serve as a substrate for the growth of a metastable polymorph with common geometric features, giving rise to a phenomenon known as cross-nucleation12,13,14. In addition to metastable polymorphs serving as potential intermediate states along the crystallization pathway, recent reports have shown that crystallization may also proceed through amorphous precursors. This is the case for calcium carbonate, which, in addition to exhibiting crystalline polymorphs, e.g., calcite, aragonite, and vaterite, can present multiple amorphous phases15, and this polyamorphism has a strong impact on biomineralization processes. Finally, the advent of experimental methods16, which now allow for the in situ observation of the synthesis of complex hybrid materials have revealed even more complex pathways. For instance, the synthesis of the MIL-53 MOF17 was recently observed to proceed through the aggregation and transformation of pre-formed nuclei. These examples emphasize the need for theories and molecular simulation methods to encompass all possible crystalline forms to perform accurate crystal structure predictions and provide insights into the crystallization process.
Liquid metastability and preordering
However, during crystallization, the metastable states available to the system extend well beyond crystalline, polymorphic, forms. This can result, for instance, in a double nucleation process during vapor-crystal nucleation. As shown by density functional theory calculations on simple fluids and model globular proteins18, metastable liquid droplets may form as transient intermediate states between the vapor and solid phases. This nucleation pathway differs from what the classical nucleation theory (CNT) tells us to expect. Indeed, CNT assumes a nucleation pathway that proceeds through the formation of an embryo of the new, thermodynamically stable, phase within the metastable parent phase. CNT provides an explanation for the onset of a free energy barrier that arises from a balance between an unfavorable free energy term, associated with the free energy cost when creating the interface between the two phases, and a favorable free energy term, resulting from the conversion of the metastable phase of higher free energy into the stable phase with the lowest free energy. The presence of metastable intermediate states during the liquid → solid transition pathway implies that nucleation becomes a two-stage process or, in other words, becomes nonclassical19 (see Fig. 2). The complete characterization of the features of the parent phase, that is, the supercooled liquid, is thus of paramount importance. Tanaka proposed a two-order parameter analysis for the liquid state that relies on a bond-orientational (BO) structural order parameter and a density order parameter20. The BO order parameter accounts for the low local free-energy configurations and the onset of transient local structural ordering at the center of the crystallization process in many systems. For instance, experiments showed that Silicon exhibited two distinct liquid phases or, equivalently, two liquid polymorphs, below the melting point, i.e., the thermodynamically stable High-Density Liquid (HDL) phase and the metastable Low-Density Liquid (LDL) phase21. This has a direct impact on the crystallization of Silicon. Molecular dynamics simulations showed that crystallization from the HDL phase proceeded through a two-step process, with the initial formation of a droplet of the metastable LDL phase followed by the nucleation of the solid phase at the LDL-HDL liquid interface22. The concept of liquid polymorphism has since appeared to be quite general to molecular liquids23. For example, water also exhibits two phases24, i.e., a LDL and an HDL phase, as confirmed by first-principles simulations25 and classical simulations26, and structural changes in supercooled water have been shown to play an important role in the crystallization process27,28.
a Example of nonclassical protein crystallization using optical microscopy. A transient gel-like structure forms before collapsing during crystal growth, as can be seen in a quartz cuvette used for SANS experiments (right panel). Snapshots of this sample are shown 2.5 and 23 h after preparation (left panel). Reprinted with permission from Cryst. Growth Des. 2021, 21, 6971. Copyright 2021 American Chemical Society. b Crystallization of poly(9,9-di-n-octylfluorenyl-2,7-diyl (PFO) from (Top Left) an isotropic disordered melt state (ISO state) and (Top Right) a liquid-crystalline ordered state (NEM state). (Bottom) Height histograms obtained from the AFM data shown above. Reprinted with permission from Chem. Mater. 2022, 34, 10744. Copyright 2022 American Chemical Society.
The formation of a metastable intermediate via preordering of the liquid thus appears to be a common first step in the crystallization process29, as shown, e.g., in protein solutions30 and in polymers31. Very interestingly, the preordering of the supercooled liquid phase, prior to the onset of crystallization, has been shown to occur during both homogeneous and heterogeneous nucleation processes32. This can lead to demixing prior to crystallization, with the formation of liquid precursors with a high concentration in one of the components of the mixture, as shown in simulations of crystallization from metal alloys33,34 and in experiments on polymer mixtures35,36.
Towards a crystallization from first-principles
Molecular simulation is well-poised to shed light on several, yet elusive, aspects of the crystallization process. These have remained long-standing challenges in solid-state chemistry as famously noted by Maddox in 1988 in the “Crystals from First Principles” article37, and twenty years later by Sanderson in the paper “Model predicts structure of crystals”38. If the crystallization process is under thermodynamic control, predicting which crystal structure forms at the end of crystallization amounts to determining (i) which crystal, co-crystal, and solvate structures exist, and (ii) which of these is the thermodynamically stable structure. To this end, the Cambridge Crystallographic Data Center (CCDC) has been organizing crystal structure prediction (CSP) blind tests, every three years since 199939, thereby providing a regular assessment of progress in the field. This has led to the development of algorithms that are increasingly successful and efficient in handling complex systems involving, for instance, flexible compounds and multicomponent systems such as in co-crystals. This has remained a very challenging endeavor, given the large number of possible structures (see, e.g., the many polymorphs of ROY40) and how close to each other most structures are in terms of free energy. Furthermore, an additional challenge for molecular simulation methods, specific to the crystalline states, is the high accuracy required of force fields to achieve the correct ranking for the stability of the structures. We will discuss recently developed strategies for obtaining high-accuracy force fields, as well as advances in Machine Learning-augmented search algorithms for crystal structures, in the next Section.
Crystallization is not always under thermodynamic control, however, and kinetics often plays a significant role in the process41, leading to the formation of metastable polymorphs that may subsist for extended periods of time. This means that being able to simulate the entire crystallization process, and therefore shedding light on the mechanisms that take place at the microscopic scale during crystal nucleation and growth, is of prime importance to understand which sequence of structures may form during crystallization, and to measure for how long each structure should be expected to remain. Given the large timescales spanned by the process, enhanced sampling simulations rather than Boltzmann sampling simulations are required42. These enhanced sampling simulations have enabled the computation of the free energy barriers for the nucleation of a crystalline cluster of a critical size, i.e., large enough to have a likelihood of 50% of growing into a crystalline phase. Such computations can thus enable the prediction of the rates at which crystal nucleation takes place43,44, although discrepancies between experimental rates and simulated rates are often observed43. Furthermore, these methods give us access to the interplay between polymorph selection and crystal nucleation, as shown, e.g., through the competition between body-centered and face-centered cubic structures during simulations of crystal nucleation in simple fluids45 and later confirmed by density functional theory calculations46. They can also show how the conditions of crystallization may favor the selection of a specific polymorph early in the process, i.e., during crystal nucleation or later, i.e., during crystal growth47, and shed light on the underlying molecular mechanisms48, or, in other systems, the existence of metastable amorphous precursors49. Molecular simulations can also help us unravel the contributions from thermodynamics and kinetics, since simulations can both predict the complete phase diagram of compounds and multicomponent mixtures50, but also the domain of occurrence of metastable polymorphs51, also termed as kinetic phase diagrams in recent work52. Determining a domain of occurrence relies on obtaining the free energy of the metastable polymorphs using molecular simulations51,52 and determining the conditions for which a metastable polymorph has a lower free energy than the supercooled liquid and, as such, will likely form from the liquid during crystallization1. Obtaining the entire crystallization pathway from enhanced sampling simulations, however, is often complicated by the difficulty of defining one or more reaction coordinates, or collective variables, that capture all structural changes that may occur during crystallization. While experiments have supported the use of the Steinhardt order parameters in the case of simple fluids53, the case of molecular systems is much more complex and requires advanced strategies54. We will discuss recently developed approaches in Section “Unraveling the mystery of order formation” and examine how Machine Learning has enabled significant advances towards solving this challenge in recent years.
Resilient force fields for evolving environments
Limitations of off-the-shelf force fields
The properties of crystalline materials hinge on a subtle structure-energy balance. Our ability to accurately model the interconnection between atomic/molecular spatial arrangements and the energy of the crystalline assembly is central to the long-standing challenge underlying the reliable prediction of crystal structures55, as demonstrated by the results of successive CCDC blind tests39. This has become an even more pressing need as CSP studies have tackled increasingly complex cases involving flexible molecules56, co-crystals, and solvates. In such cases, in addition to accounting for packing patterns and polymorphic forms correctly, the energy of the crystals must be accurate across the molecular conformation and composition spaces, thereby placing additional requirements on the transferability of the energy functions to a wide range of molecular environments. This leads to the following question: are available models accurate enough to describe polymorphism, for which several crystal forms can exist, solvates, in which the solvent is incorporated into the lattice in different amounts and locations, and cocrystals, which contain multiple components in varying amounts? The conventional approach has long consisted in employing an atomistic potential, or force field, that is a sum of parametric functions based on physical approximations. Each of these functions models a different type of contribution to the potential energy and its parameters are fitted to reproduce a set of experimental data (either thermodynamic or structural) and, in some cases, results from ab initio calculations for a large library of molecules. The resulting force fields include COMPASS57, GAFF58, Dreiding59, and CVFF60, and have been successful for many applications. This is not the case, however, for the prediction of crystal structures as demonstrated by results obtained during CSP blind tests39. Blind tests have indeed revealed that generic force fields were unable to recover the experimentally observed structure for a simple rigid molecule like Azetidine, or the crystal structure of flexible molecules61. The reason is that the parameters used in generic force fields have been fitted to match the properties of a large library of molecules and, as a result, often lack the specificity, both in terms of interaction patterns and elements involved in the interactions, and level of accuracy required for a successful CSP. To address this issue, a tailor-made force field (TMFF) can be developed on a molecule-by-molecule basis. TMFFs have been shown to provide excellent results for rigid molecules like acetylene, urea,62, or cocrystals63. However, this comes with an increase in computational cost and a decreased transferability64.
Having a highly accurate force field is thus key for the reliable prediction of crystal structures in polymorphic systems. Such a force field should cover the entire energy landscape, from the local energy minima to the energy barriers between crystal structures. Recent work has focused on exploring computationally the crystal energy landscape of polymorphic molecules65,66,67. These studies have revealed the importance of the interplay between close packing, hydrogen bonding, and π − π stacking, since the lattice energy for many different packing types can fall within a few kilojoules per mole of the most stable structure. We examine in the next two Sections how recent progress in the development of partitioned quantum force fields and in machine learned potentials have enabled advances towards the efficient and thorough exploration of the crystal energy landscape.
Creating partitioned and seamless quantum force fields
Developing a force field that can accurately model the packing patterns and the stability of crystal structures is a formidable task. Such a force field should predict lattice energies and free energies of crystal structures that are accurate enough to provide a reliable ranking of polymorphs with respect to their stability. This is a challenging task given that differences between lattice energies for polymorphs of the same compound can be as small as 1 kJ/mol. Tailoring an empirical force field with parameters fitted to reproduce the experimental data of polymorphs is also fraught with difficulty. Experimental sublimation enthalpies measure the energy difference between the isolated compound in the gas phase and the crystalline state, thus providing access to the lattice energy. However, the data obtained with different methods can exhibit discrepancies of the order of a few kJ/mol as revealed by the recent critical assessment of a large experimental dataset68. Furthermore, the available data may be scarce, as many of the polymorphs, cocrystals, and solvates may not have been obtained experimentally.
First-principles calculations, which rely on electronic structure (quantum) calculations, and quantum-based force fields can both circumvent the need for experimental data and provide high-accuracy results69. While ab initio calculations rapidly become untractable for complex systems and large timescales, Density Functional Theory (DFT) methods have emerged as an efficient alternative in recent years70,71. However, having accurate enough exchange-correlation functionals to properly account for van der Waals (vDW) interactions remains challenging72. This has led to the development of dispersion-corrected DFT methods such as DFT-D3 or DFT-D4 in which a correction scheme is introduced to describe long-ranged electron correlation effects and gives high-accuracy results for the molecular total energy, energy gradient, and frequencies using the molecular geometry as the only input73. The reliability of the DFT-D3 approach was tested across the X23 test set, that includes crystal structures with van der Waals interactions and hydrogen bonding74. The Tkatchenko-Scheffler (TS-vdW) method was recently introduced to include screening effects, as well as a treatment of the many-body vdW energy to infinite order75 (see Fig. 3). This approach was applied to paracetamol polymorphs and gave the correct stable structure76, paving the way for molecular dynamics simulations with a complete many-body treatment of van der Waals interactions77. Another approach consists in performing an energy decomposition analysis (EDA) by partitioning the interaction energy into a sum of physically meaningful terms. The symmetry-adapted perturbation theory (SAPT) framework78 provides a partitioning into 4 contributions: electrostatics, exchange-repulsion, induction, and London dispersion. Applications of a SAPT(DFT) approach led to the correct ranking of the cyclotrimethylene trinitramine crystal polymorphs and predicted the lowest-energy structure obtained experimentally79, as well as for trinitrobenzene80.
The top plot shows a partition of the diphenylalanine-based molecular solid into repeating units of an alanine-based “tube” surrounded by six “zipper” units consisting of two diphenyls each. The bottom plots shows the dispersion-corrected energy (in blue) and the energy without the TS-vdW correction (in red) for 4 different distorsions corresponding to (clockwise starting from the top left corner) equal expansion along x and y, equal expansion along x and compression along y, expansion along z, and isotropic expansion. The dispersion-corrected curves are more parabolic and much closer to the elastic limit, thereby providing an explanation for the stability and rigidity of the structure. Reprinted with permission from Acc. Chem. Res. 2014, 47, 3208. Copyright 2014 American Chemical Society.
Adaptive machine learning potentials
In recent years, computational methods have been enhanced with machine learning to design force fields, or machine learned potentials (MLPs), develop exchange-correlation functionals for DFT, and predict material properties81,82. MLPs have become instrumental in crystal structure prediction and simulations of the crystallization process in recent years. MLPs are atomistic potentials built on universal approximators that can model interactions without the need to assume any functional form for the interaction energy83,84,85. Since the MLP expression for the potential energy and forces is highly nonlinear, it succeeds in capturing the complex interdependence between the local atomic environment and the energy and greatly outperform the functional forms found in empirical force fields. Building a MLP only requires: (i) choosing a large set of descriptors, i.e., typically of the order of 102 two- or many-body functions of the atomic coordinates, to characterize the environment around each atom86, and (ii) having a large dataset of reference DFT calculations, using the methods outlined in the previous Section, to train, validate, and assess the accuracy of the MLP. Using a high quality training dataset ensures that the MLP will have a level of accuracy similar to DFT once the training step is complete87. The MLP thus consists of an analytic nonlinear expression of the interaction energy and atomic forces as a function of the atomic coordinates. This means that the atomic potential energy and forces can be evaluated instantly, thereby matching the level of accuracy of DFT at a fraction of the cost. Different types of learning approaches can be used to build a MLP88, leading to neural network potentials (NNPs), Gaussian approximation potentials (GAP), moment tensor potentials (MTP), and spectral neighbor analysis potentials (SNAP).
NNPs have been refined to model a wide range of local atomic environments and, as such, are ideally suited to model crystallization processes89. Recent work has shown that simulations performed on MLPs can shed light on nonclassical nucleation pathways in complex materials, as shown in recent work on ZnO90. Piaggi et al.91 developed a NNP for water and performed enhanced sampling simulations to study ice nucleation with a DFT-level accuracy (see Fig. 4). This work opens the door to modeling nucleation processes in realistic environments, where chemical reactions can play a crucial role. They then used this NNP to model a heterogeneous nucleation process and performed simulations on systems of ~3 × 104 water molecules to simulate ice nucleation on microcline feldspar surfaces92. Other recent developments include global MLPs such as symmetric gradient-domain machine learning (sGDML) force fields93. sGDML potentials are designed to account for nonlocal interactions in large molecules and have been shown to model accurately molecules from the MD17 dataset, such as aspirin and paracetamol, but also molecules containing up to 370 atoms as shown, e.g., for small peptides and nanotubes from the MD22 dataset. We also mention recent successes in the development of machine learned coarse-grained (MLCG) models. A MLCG potential for water94 was recently shown to predict accurately the melting point of ice and the temperature of maximum density for liquid water with an efficiency greater by two orders of magnitude when compared to conventional atomistic models.
Atomic configurations are taken from simulations used to calculate melting temperatures for the cubic (Ic) and hexagonal (Ih) phases for water, with (inset) the corresponding error distributions. Reprinted with permission from J. Chem. Theory Comput. 2021, 17, 3065. Copyright 2021 American Chemical Society.
Unraveling the mystery of order formation
Beyond conventional order parameters
Monitoring and analyzing the onset of order in atomic and molecular systems during simulations requires having a reliable measure that is characteristic of the symmetry of the incipient phase, i.e., a quantity often referred to as order parameter. Well-known examples include the Steinhardt order parameters95 for atomic systems. In their pioneering work, ten Wolde et al. showed that the Q6 Steinhardt order parameter could be used as a reaction coordinate, or collective variable, to explore the crystal nucleation pathway in atomic systems45. Q6 had the advantage of taking distinct values for the liquid and the solid, but also had the advantage of taking similar values for the body-centered cubic (BCC) and face-centered cubic (FCC) polymorphs, thereby allowing for the system to pick its preferred structure throughout the nucleation process. The differentiation between different types of polymorph and symmetries can then be carried out by considering sets of Steinhardt order parameters, either as 2D maps96,97,98 or as weighted linear combinations99. However, pre-ordering in the supercooled liquid prior to the onset of crystalline order100 and, more generally, the formation of metastable precursors101, cannot be captured by an order parameter characteristic of crystalline order only. Similarly, generalizing to the different types of orders and symmetries found in molecular crystals requires a systematic approach. This has resulted in the design of ML-based approaches to determining the order parameters. Such methods leverage machine learning to analyze system configurations, detect changes in the local environment, and assign a label (order, symmetry, density) to these environments.
The identification of local atomic arrangements can be achieved using tools similar to those employed in the definition of machine-learned potentials102. This involves encoding information about the local environment into symmetry functions and training a neural network to recognize a given phase or polymorph, as shown for the crystallization of atomic liquids and water102. Another approach consists of using a global shape descriptor or, in other words, a histogram of the relative frequencies of different local structure types103. This two-step method, which relies on performing a pattern analysis on the radial distribution followed by a classification of the local structures, was successfully applied to crystals, quasi-crystals, and amorphous phases found in colloidal systems. Alternatively, the local environment can be encoded into a graph structure, with nodes representing neighboring atoms and edges keeping track of the adjacency between atoms into an adjacency matrix as proposed in Common Neighbor Analysis (CNA)104,105. The original implementation of CNA provides accurate results on low temperature crystalline phases but does not perform as well for crystal-liquid interfaces, especially for low-density liquids. This was addressed by implementing adaptive cutoffs to define nodes, as well as directional and bidirectional edges to properly account for particles located at interfaces, and taking advantage of nonlinear manifold learning to classify environments106. Neural networks models for the classification of the fluid and amorphous phases of water, as well as of ice polymorphs, were recently developed using either bond-orientational order parameters107 or various functions of the Cartesian atomic coordinates108 and applied to simulations of heterogeneous nucleation processes109. By computing the diffraction image corresponding to a given crystal, Ziletti et al.110 leveraged convolutional neural networks, which are commonly used in image classification111, to correctly classify a dataset of 105 simulated crystal structures, with varying levels of defect concentrations. Very interestingly, attentive response maps show that the way the network learns from the data is very similar to how human experts analyze diffraction images110.
Enhanced sampling simulations of crystallization
In recent years, ML methods have considerably improved our ability to detect and characterize the onset of a different type of order and symmetry112. The resulting machine-learned order parameters are crucial in advancing our understanding of the phase transition process, as they can be used as reaction coordinates (RC) or collective variables (CV) to span the transition pathway. When taking place in the vicinity of a phase boundary, phase transitions, regardless of whether they involve a fluid → solid or a polymorph I → polymorph II transition, proceed through the nucleation of a new phase within the parent phase. Nucleation is a phenomenon associated with a large free energy barrier and slow kinetics. As a result, nucleation is often referred to as an “activated process” or a “rare event”, occurring on a timescale that far exceeds that accessible by molecular simulations. Enhanced sampling simulations have emerged as a powerful approach to overcome this challenge. These methods bypass the real-time dynamics of the system by implementing a non-Boltzmann sampling scheme, e.g., by adding to the Hamiltonian a bias potential energy that favors the formation of the new phase. This additional potential energy can be, for example, a harmonic function of a CV45 and act as a mechanical spring that tethers the system to a target value of the CV, giving rise to atomic forces that promote the onset of crystalline order for increasing values of the reaction coordinate. The actual free energy profile can then be evaluated by subtracting the contribution from the bias potential energy, thereby providing access to the free energy barrier and the atomic-scale mechanism for the process13,43,45. Enhanced sampling techniques for solid-state processes include, among others, umbrella sampling45, forward-flux sampling52, metadynamics113, or driven adiabatic free energy sampling114. In a recent metadynamics study of the B4-B1 polymorphic transition in GaN113, simulations using a MLP were performed for different system sizes, revealing a sequential change of the transition mechanism from collective modes to nucleation and growth. ML can also be leveraged to provide access to novel CVs. In a study of the polymorphic A15-BCC (body-centered cubic) transition in tungsten, neural network classifiers of the local atomic environment were computed for the two phases and then combined to obtain a one-dimensional CV. The resulting 1D CV was used in driven adiabatic free energy simulations to explore the transition pathway114. Neural networks can also learn an entropic CV from thermodynamic datasets. Performing umbrella sampling simulations with this machine-learned entropic CV can then shed light on the crystal nucleation pathway in atomic systems115. Dietrich et al. recently proposed a graph neural network approach to determine CVs for nucleation processes116. This was achieved by building a molecular graph from the coordinates of the atoms in the simulated system, embedding the coordinates as well as the information on their local environment into a high-dimensional representation, and feeding this representation to a neural network that provides a CV as output (see Fig. 5). This approach enabled an order-of-magnitude gain in efficiency when compared to approaches that compute complex CVs and their spatial derivatives for each step of an enhanced sampling simulation.
The graph is obtained by computing a neighbor list for each atom (top left). The atomic coordinates (bottom left) and the neighborhood information (top middle) are embedded into a high-dimensional representation fed to a neural network for CV generation (top right). Reprinted from J. Chem. Theory Comput. 2024, 20, 1600. Copyright 2024 American Chemical Society.
The ML-guided identification of a reduced set of CVs for enhanced sampling simulations has garnered considerable attention in recent years. Such approaches generally reframe the learning process as dimensionality reduction, classification of metastable states, or identification of slow modes117. Recent work has focused on the use of spectral maps118 and autoencoders119 to embed the slow kinetics of the process into CVs. Using a reweighted autoencoded variational Bayesian approach in enhanced sampling simulations, Zou et al. were able to optimize CVs for the crystal nucleation of urea (see Fig. 6) and glycine from aqueous solutions, assess the relative stability of their polymorphs, and reveal nonclassical nucleation mechanisms in both systems120,121.
a shows two polymorphs, denoted as form I (\({P}_{\bar{4{2}_{1}}m}\) - left) and form IV (\({P}_{{2}_{1}{2}_{1}2}\) - right)120. The two forms can be differentiated by the angle θ2 between the characteristic vectors v1 and V2 of two neighboring urea molecules, with v1 indicating the dipole of urea and v2 the vector connecting the two nitrogen bonds, and configurations of the two forms with the θ2 angle along [001]. b Enhanced sampling simulation along machine learned-CV χ6 (top) and corresponding free energy surface (bottom). Forms I (in yellow) and IV (in green) were frequently visited during the enhanced sampling simulations, with forms I and IV, as well as the liquid (L), indicated on the free energy profile. Error bars for the free energy are shown through the shaded-blue color. Reprinted with permission from J. Phys. Chem. B 2021, 125, 47, 13049–13056. Copyright 2021 American Chemical Society.
Recent advances in crystal structure predictions
As discussed by Woodley and Catlow122, first-principles CSP starts with the generation of a large set of candidate crystal structures. This initial screening has remained an especially challenging task, as it involves an extensive search of the configurational space to generate unit-cell structures. Methods commonly used to explore the crystalline energy landscape include simulated annealing, basin hopping, genetic algorithms, topological modeling, and molecular packing methods such as molecular dynamics simulations. For the latter, enhanced sampling simulations recently emerged as a promising approach to not only explore the energy landscape, but to also provide direct access to the free energy differences between competing crystal structures and shed light on the mechanisms underlying polymorphic transitions. Recent applications of an enhanced sampling approach known as well-tempered metadynamics simulations enabled the thermodynamic and mechanistic characterization of the I-III polymorphic transition in carbon dioxide123. These approaches can be readily extended to the case of solvates and co-crystals. For instance, another enhanced sampling simulation method known as adiabatic free energy dynamics was recently applied to examine polymorphic transitions in 1:1 cocrystals of resorcinol and urea124. Metadynamics simulations can also be instrumental in reducing the common overprediction of the number of plausible structures by clustering subsets of structures that can easily interconvert at finite temperature and pressure125. This approach was recently applied to post-process a CSP dataset of 555 crystal structures for ibuprofen, identify 65% of these structures were labile, and obtain a final reduced dataset that contained all experimentally known structures126. Recent work has also highlighted how a threshold algorithm could cluster potential energy minima into basins for a series of organic molecules, including benzene, acrylic acid, and resorcinol, thereby providing an efficient method to address overprediction in computational CSPs127.
Deep learning models have become increasingly instrumental to CSPs, with earlier studies focusing on atomic systems and multicomponent alloys128. Using as input the multiperspective atomic fingerprints (MAFPs) for a specific atomic site, i.e., Kaiser–Bessel smeared delta functions of the coordinates of the neighboring atoms with varying shifts (or perspectives) in the origin for the function, a variational autoencoder was used to transform the 3072-dimensional input into a 64-dimensional latent representation that was then fed to a classifier capable of predicting which combinations of elements led to specific structural topologies129. This approach was then applied to perform crystal structure evaluation, or, equivalently, the likelihood of a given crystal structure, and CSP, thereby demonstrating the ability of deep learning models to advance CSP.
ML models can also accelerate the computation of the energies of crystal structures and, therefore, enable a more thorough and reliable CSP82. As discussed above, a successful computational CSP for an organic compound involves two crucial steps. In a first step, an initial screening is performed, with an extensive search of the configurational space generating a set of candidate structures. In a second step, high-accuracy calculations, typically DFT calculations supplemented with a many-body dispersion correction (DFT+MBD), assess the stability of the candidate structures and identify the structure with the lowest free energy. The initial screening is subject to two seemingly incompatible constraints. During this step, energy calculations need to be highly efficient to enable screening a large set of structures, but also highly accurate to avoid excluding the lowest-energy structure from the candidate set. Wengert et al. proposed a Δ − ML approach that trains a machine learning model to learn the difference between a baseline method and a high-accuracy method, like DFT-MBD130. They subsequently applied their Δ − ML approach to a test case used in a recent CSP blind test, i.e., the tricyano-1,4-dithiino[C]-isothiazole (C8N4S) molecule, and showed that the experimentally observed polymorph was correctly predicted. Furthermore, the Δ − ML CSP was performed with an accuracy comparable to that of the DFT+MBD level of accuracy and a speed-up of at least 3 orders of magnitude. Recent applications of Δ − ML CSP now include co-crystals, as shown on a series of co-crystals of the pharmaceutically active molecule paracetamol131 (see Fig. 7). Similarly, Egorova et al.132 used a multifidelity statistical machine learning approach to predict high-accuracy hybrid DFT energies of crystal structures from results obtained with an inexpensive force field, and obtained accurate predictions of energies and crystal structure ranking for oxalic acid, maleic hydrazide, and urazole. Furthermore, ML potentials have recently emerged as extremely efficient and accurate tools in CSP studies. Very interestingly, the MACE-OFF ML force fields leverage an equivariant message-passing architecture and exhibit transferability, generalizing well in the chemical and configuration space133. MACE-OFF force fields were recently tested on a dataset of experimental and hypothetical polymorphs for 20 organic molecules134 and provided reasonable lattice energies and crystal structures across the dataset, despite being only trained on a large database of organic molecules and dimers. This bodes well for the success of condensed-phase and molecule-specific versions of these ML force fields in CSP studies, as evidenced by recent results obtained with fine-tuned MACE force fields for the sublimation enthalpies of crystals of paracetamol and aspirin135.
a Parity plot for the baseline method (DFTB+D4) and for the Δ-ML lattice energies per molecule for both single-component and co-crystals against the target level of theory (PBE(0)+MBD). b Overlay of the optimized experimental naphtalene: paracetamol co-crystal at the target level (green), baseline level (gray) and Δ-ML (blue). Adapted from J. Chem. Theory Comput. 2022, 18, 4586.
Outlook
In this Review, we examined how recent progress in the molecular simulation of the solid state and its formation have considerably advanced our ability to predict the outcome of crystallization from a liquid phase or through the interconversion between polymorphic forms. This progress was enabled by marked improvements in the accuracy with which solid-state properties can now be computed from first-principles calculations, but also from the emergence of novel enhancing sampling simulations, which allow for a thorough exploration of all possible polymorphic scenarios. In recent years, Machine Learning has also served as a great accelerator for the field. Once they have been trained on high-quality datasets, ML models can provide high-accuracy results that are on par with dispersion-corrected DFT, but at a fraction of their cost using Δ-ML approaches and machine learned potentials. ML approaches can also be used to propose new collective variables and, when combined with enhanced sampling simulations, shed light on the free energy pathway underlying crystal nucleation or polymorphic transitions. Tailoring these approaches will allow to tackle challenging systems including conformational polymorphs for flexible compounds, co-crystals, and solvates.
We conclude the Review by highlighting two research directions for molecular simulation in the next few years, namely advancing our understanding of (i) far-from-equilibrium crystallization, and (ii) (self)-adaptive crystals. Crystallization under far-from-equilibrium conditions is dominated by kinetics and often leads to the formation of non-equilibrium patterns. Such patterns arise from a competition between the order associated with equilibrium crystal symmetries and the disorder associated with diffusive, convective, and elastic deformation processes at interfaces. This, in turn, requires the development of control strategies to obtain materials with optimal properties. In optoelectronics and semiconductors, this can be achieved, for instance, by using rapid thermal annealing, homoepitaxy, or building blocks that promote hierarchical self-assembly and self-organization on large length scales. Thermal annealing can facilitate the formation of self-organizing patterns that lead to microstructured organic optoelectronic devices136. Flash infrared-annealed perovskite films, grown on lithographically patterned Au nucleation seeds, yield larger crystallites with longer charge carrier lifetimes, thus resulting in optoelectronic devices with an improved performance137. Using small molecule organic semiconductors as templates for homoepitaxy enables the control of surface diffusion and aggregation effects and leads to smooth surfaces as required by practical applications138. Hierarchical self-assembly (HSA) has also been recently demonstrated in the crystallization of perylene diimide crystals139. Such HSA is usually a hallmark of biological systems. It appears that biomimetic approaches to HSA can be achieved by using elemental building blocks with specific properties, e.g., with magic-size clusters140. This is key to obtaining a self-organization behavior spanning several orders of magnitude in length scales. Molecular modifiers can also enable the control of the crystallization kinetics. Modifiers, which are molecules with chemical structures and properties similar to the solute, interact with the crystal surface, impact the anisotropic rates of crystallization, crystal size and morphology, and can promote and/or inhibit nucleation and growth. Examples related to the formation of kidney stones include, for instance, inhibitors such as methyluric acid and poly(ethyleneimine) that suppress the crystal growth of ammonium urate141, or the dimethylester of L-cystine that inhibits the growth of L-cystine single crystals142. When a pair of inhibitors is used, synergistic or antagonistic cooperativity may occur between the two inhibitors, depending on the inhibitor combination and concentration. This was recently observed during the crystallization of haematin, which is a model compound for the physiology of malaria parasites143. Furthermore, the suppression of haematin crystal growth appears to follow nonclassical mechanisms, with some inhibitors promoting the nucleation of large haematin nanocrystals which, when incorporating into the growing crystal, induce lattice strain and hinder crystal growth144. Molecular simulation could thus be key to unraveling the interplay between kinetics and thermodynamics during these far-from-equilibrium processes and to shedding light on the underlying molecular mechanisms.
The second research direction focuses on the molecular simulation of self-adaptive crystals. As discussed by Zhang et al.145, high levels of structural order and flexibility are not necessarily mutually exclusive. This can be observed in many biological systems, such as, e.g., microtubules, flagella, and viruses, and in synthetic assemblies. Ferritin crystals mixed with hydrogel polymers can expand to 180% of their original dimensions and 500% of their original volume. Such changes occur without any loss of shape and structure and appear to be facilitated by the hydrogel network, which helps crystals avoid fragmentation and self-heal efficiently145. Similarly, in polymer systems, reversible covalent chemistry can be leveraged to disconnect chemical bonds and form new linkages, leading to topological transitions, or “macromolecular metamorphosis146. In recent years, the design and synthesis of “smart” materials, designed as a combination of elemental building blocks through encodable interactions, has drawn considerable attention. Such systems often exhibit emergent properties and can lead to self-adaptive crystals. A “smart” material can be obtained, for example, by leveraging complementary DNA molecules to form 3D crystalline assemblies of gold nanoparticles147. The resulting nanocrystals can then reversibly dissolve and form during heating and cooling cycles. This opened the door to the design of crystalline materials that reconfigure when subjected to an external stimulus and thus adapt in response to environmental cues148. Self-adapting colloidal crystals can also exhibit shape memory and become crystal actuators149. Such DNA-engineered crystals can be compressed into irregular shapes with wrinkles and creases, and then regain their initial crystalline morphology and internal structure after rehydration. Self-assembled DNA crystal can respond to a wide range of stimuli, including temperature, ionic strength, pH, and redox potential, and power an actuation process via a cooperative dissociation or cohesion of many DNA sticky ends acting as crystal contacts. When the actuator expands, its increased crystal porosity and cavity allow for the encapsulation of nanoparticles or proteins, thereby enabling their release upon contraction150. Molecular simulation and enhanced sampling methods, supplemented with efficient coarse-grained and machine learned potentials151,152, are ideally well suited to explore and analyze this subtle interplay between structural order and flexibility, thereby opening the door to biomimetic adaptive materials for a wide range of applications in sensing and mechanoactuation for nano- and microrobotics.
References
Bernstein, J. Polymorphism in Molecular Crystals (Oxford University Press, 2020). Comprehensive discussion of the underpinnings of polymorphism and comprehensive summary of the state-of-the-art in the field.
Moulton, B. & Zaworotko, M. J. From molecules to crystal engineering: supramolecular isomerism and polymorphism in network solids. Chem. Rev. 101, 1629–1658 (2001).
Bolla, G., Sarma, B. & Nangia, A. K. Crystal engineering of pharmaceutical cocrystals in the discovery and development of improved drugs. Chem. Rev. 122, 11514–11603 (2022).
Wohler, F. & von Liebig, J. Benzamide polymorphism. J. Ann. Pharm. 3, 249 (1832).
Ostwald, W. Studien uber die Bildung und Umwandlung fester Korper. 1. Abhandlung: Ubersattigung und Uberkaltung. Z. Phys. Chem. 22, 289 (1897).
Cardew, P. T. Ostwald rule of stages- myth or reality? Cryst. Growth Des. 23, 3958–3969 (2023).
Salzmann, C. G., Radaelli, P. G., Slater, B. & Finney, J. L. The polymorphism of ice: five unresolved questions. Phys. Chem. Chem. Phys. 13, 18468–18480 (2011).
Cruz-Cabeza, A. J., Feeder, N. & Davey, R. J. Open questions in organic crystal polymorphism. Commun. Chem. 3, 142 (2020).
Desgranges, C. & Delhommelle, J. Insights into the molecular mechanism underlying polymorph selection. J. Am. Chem. Soc. 128, 15104–15105 (2006).
Germann, L. S., Arhangelskis, M., Etter, M., Dinnebier, R. E. & Friščić, T. Challenging the Ostwald rule of stages in mechanochemical cocrystallisation. Chem. Sci. 11, 10092–10100 (2020).
Bernstein, J., Davey, R. J. & Henck, J.-O. Concomitant polymorphs. Angew. Chem. Int. Ed. 38, 3440–3461 (1999).
Yu, L. Nucleation of one polymorph by another. J. Am. Chem. Soc. 125, 6380–6381 (2003). First experimental evidence of cross-nucleation on the example of D-mannitol or when a seed of one polymorph nucleates another polymorph without any polymorphic conversion.
Desgranges, C. & Delhommelle, J. Molecular mechanism for the cross-nucleation between polymorphs. J. Am. Chem. Soc. 128, 10368–10369 (2006).
Desgranges, C. & Delhommelle, J. Molecular simulation of cross-nucleation between polymorphs. J. Phys. Chem. B 111, 1465–1469 (2007).
Cartwright, J. H., Checa, A. G., Gale, J. D., Gebauer, D. & Sainz-Díaz, C. I. Calcium carbonate polyamorphism and its role in biomineralization: how many amorphous calcium carbonates are there? Angew. Chem. Int. Ed. 51, 11960–11970 (2012).
Van Vleet, M. J., Weng, T., Li, X. & Schmidt, J. In situ, time-resolved, and mechanistic studies of metal–organic framework nucleation and growth. Chem. Rev. 118, 3681–3721 (2018).
Salionov, D. et al. Unraveling the molecular mechanism of MIL-53 (Al) crystallization. Nat. Commun. 13, 3762 (2022).
Lutsko, J. F. On the role of metastable intermediate states in the homogeneous nucleation of solids from solution. Adv. Chem. Phys. 151, 137 (2012).
Karthika, S., Radhakrishnan, T. & Kalaichelvi, P. A review of classical and nonclassical nucleation theories. Cryst. Growth Des. 16, 6663–6681 (2016).
Tanaka, H. Bond orientational order in liquids: towards a unified description of water-like anomalies, liquid-liquid transition, glass transition, and crystallization: Bond orientational order in liquids. Eur. Phys. J. E 35, 1–84 (2012). Theoretical analysis shedding light on pre-ordering in supercooled liquids, most notably of the interplay between density and crystal-like order, and its consequence on crystallization.
Beye, M., Sorgenfrei, F., Schlotter, W. F., Wurth, W. & Föhlisch, A. The liquid-liquid phase transition in silicon revealed by snapshots of valence electrons. Proc. Natl. Acad. Sci. USA 107, 16772–16776 (2010).
Desgranges, C. & Delhommelle, J. Role of liquid polymorphism during the crystallization of silicon. J. Am. Chem. Soc. 133, 2872–2874 (2011).
Kurita, R. & Tanaka, H. On the abundance and general nature of the liquid–liquid phase transition in molecular systems. J. Condens. Matter Phys 17, L293 (2005).
Gallo, P. et al. Water: a tale of two liquids. Chem. Rev. 116, 7463–7500 (2016).
Gartner III, T. E., Piaggi, P. M., Car, R., Panagiotopoulos, A. Z. & Debenedetti, P. G. Liquid-liquid transition in water from first principles. Phys. Rev. Lett. 129, 255702 (2022).
Li, Y., Li, J. & Wang, F. Liquid–liquid transition in supercooled water suggested by microsecond simulations. Proc. Natl. Acad. Sci. USA 110, 12209–12212 (2013).
Moore, E. B. & Molinero, V. Structural transformation in supercooled water controls the crystallization rate of ice. Nature 479, 506–508 (2011).
Li, T., Donadio, D. & Galli, G. Ice nucleation at the nanoscale probes no man’s land of water. Nat. Commun. 4, 1887 (2013).
Desgranges, C. & Delhommelle, J. Can ordered precursors promote the nucleation of solid solutions? Phys. Rev. Lett. 123, 195701 (2019).
Maier, R. et al. Protein crystallization from a preordered metastable intermediate phase followed by real-time small-angle neutron scattering. Cryst. Growth Des. 21, 6971–6980 (2021).
Pirela, V., Campoy-Quiles, M., Muller, A. J. & Martín, J. Unraveling the influence of the preexisting molecular order on the crystallization of semiconducting semicrystalline Poly (9, 9-di-n-octylfluorenyl-2, 7-diyl (PFO). Chem. Mater. 34, 10744–10751 (2022)..
Arai, S. & Tanaka, H. Surface-assisted single-crystal formation of charged colloids. Nat. Phys. 13, 503–509 (2017).
Desgranges, C. & Delhommelle, J. Unraveling the coupling between demixing and crystallization in mixtures. J. Am. Chem. Soc. 136, 8145–8148 (2014).
Choudhuri, D., Matteson, S. & Knox, R. Nucleation of coupled body-centered-cubic and closed-packed structures in liquid Ni-Cr alloys. Scr. Mater. 199, 113857 (2021).
Tanaka, H. & Nishi, T. New types of phase separation behavior during the crystallization process in polymer blends with phase diagram. Phys. Rev. Lett. 55, 1102 (1985).
Jin, J., Chen, H., Muthukumar, M. & Han, C. C. Kinetics pathway in the phase separation and crystallization of iPP/OBC blends. Polymer 54, 4010–4016 (2013).
Maddox, J. Crystals from first-principles. Nature 335, 201 (1988).
Sanderson, K. Model predicts structure of crystals. Nature 450, 771–771 (2007).
Hunnisett, L., Cole, J. & Sadiq, G. What have we learned from the 7th blind test of crystal structure prediction?–triumphs, challenges, and insights. Acta Crystallogr. 78, 136–136 (2022).
Levesque, A., Maris, T. & Wuest, J. D. ROY reclaims its crown: new ways to increase polymorphic diversity. J. Am. Chem. Soc. 142, 11873–11883 (2020).
Desgranges, C. & Delhommelle, J. Unusual crystallization behavior close to the glass transition. Phys. Rev. Lett. 120, 115701 (2018).
Sosso, G. C. et al. Crystal nucleation in liquids: open questions and future challenges in molecular dynamics simulations. Chem. Rev. 116, 7078–7116 (2016).
Auer, S. & Frenkel, D. Prediction of absolute crystal-nucleation rate in hard-sphere colloids. Nature 409, 1020–1023 (2001).
Peters, B. Reaction Rate Theory and Rare Events (Elsevier, 2017).
Ten Wolde, P. R., Ruiz-Montero, M. J. & Frenkel, D. Numerical evidence for bcc ordering at the surface of a critical fcc nucleus. Phys. Rev. Lett. 75, 2714 (1995). Pioneering work showing that enhanced sampling simulations provide access to the nucleation mechanism and the underlying free energy profile.
Oxtoby, D. W. Nucleation of first-order phase transitions. Acc. Chem. Res. 31, 91–97 (1998).
Desgranges, C. & Delhommelle, J. Polymorph selection during the crystallization of softly repulsive spheres: the inverse power law potential. J. Phys. Chem. B 111, 12257–12262 (2007).
Desgranges, C. & Delhommelle, J. Polymorph selection during the crystallization of Yukawa systems. J. Chem. Phys. 126, 054501 (2007).
Schilling, T., Schöpe, H. J., Oettel, M., Opletal, G. & Snook, I. Precursor-mediated crystallization process in suspensions of hard spheres. Phys. Rev. Lett. 105, 025701 (2010).
Chew, P. Y. & Reinhardt, A. Phase diagrams-why they matter and how to predict them. J. Chem. Phys. 158, 030902 (2023).
Desgranges, C. & Delhommelle, J. Controlling polymorphism during the crystallization of an atomic fluid. Phys. Rev. Lett. 98, 235502 (2007).
Gispen, W. & Dijkstra, M. Kinetic phase diagram for nucleation and growth of competing crystal polymorphs in charged colloids. Phys. Rev. Lett. 129, 098002 (2022).
Gasser, U., Weeks, E. R., Schofield, A., Pusey, P. & Weitz, D. Real-space imaging of nucleation and growth in colloidal crystallization. Science 292, 258–262 (2001). Confocal microscopy experiments visualizing the crystallization process in colloidal suspensions and demonstrating how the Steinhardt order parameters measure the onset of order in experimental snapshots.
Peters, B. & Trout, B. L. Obtaining reaction coordinates by likelihood maximization. J. Chem. Phys. 125 (2006). Systematic approach screening a set of candidate collective variables to identify an optimal reaction coordinate for the simulation of nucleation processes.
Catlow, C. R. A. Crystal structure prediction: achievements and opportunities. IUCrJ 10, 143–144 (2023).
Habgood, M., Sugden, I. J., Kazantsev, A. V., Adjiman, C. S. & Pantelides, C. C. Efficient handling of molecular flexibility in ab initio generation of crystal structures. J. Chem. Theory Comput. 11, 1957–1969 (2015).
Sun, H. COMPASS: an ab initio force-field optimized for condensed-phase applications overview with details on alkane and benzene compounds. J. Phys. Chem. B 102, 7338–7364 (1998).
Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174 (2004).
Mayo, S. L., Olafson, B. D. & Goddard, W. A. DREIDING: a generic force field for molecular simulations. J. Phys. Chem. 94, 8897–8909 (1990).
Dauber-Osguthorpe, P. et al. Structure and energetics of ligand binding to proteins: Escherichia Coli dihydrofolate reductase-trimethoprim, a drug-receptor system. Proteins 4, 31–47 (1988).
Day, G. et al. A third blind test of crystal structure prediction. Acta Crystallogr. B Struct. Sci 61, 511–527 (2005).
Neumann, M. A. & Perrin, M.-A. Energy ranking of molecular crystals using density functional theory calculations and an empirical van der Waals correction. J. Phys. Chem. B 109, 15531–15541 (2005).
Neumann, M. A., Leusen, F. J. & Kendrick, J. A major advance in crystal structure prediction. Angew. Chem. Intl. Ed. 47, 2427–2430 (2008).
Mattei, A. et al. Efficient crystal structure prediction for structurally related molecules with accurate and transferable tailor-made force fields. J. Chem. Theory Comput. 18, 5725–5738 (2022).
Yang, S. & Day, G. M. Global analysis of the energy landscapes of molecular crystal structures by applying the threshold algorithm. Commun. Chem. 5, 86 (2022).
Price, S. S. L. Computed crystal energy landscapes for understanding and predicting organic crystal structures and polymorphism. Acc. Chem. Res. 42, 117–126 (2009).
Hoja, J. et al. Reliable and practical computational description of molecular crystal polymorphs. Sci. Adv. 5, eaau3338 (2019). Combination of a state-of-the-art crystal structure sampling strategy with the most successful first-principles energy ranking strategy of the latest blind test of organic crystal structure prediction methods.
Chickos, J. S. & Gavezzotti, A. Sublimation enthalpies of organic compounds: a very large database with a match to crystal structure determinations and a comparison with lattice energies. Cryst. Growth Des. 19, 6566–6576 (2019).
Nikhar, R. & Szalewicz, K. Reliable crystal structure predictions from first principles. Nat. Commun. 13, 3095 (2022).
Mohr, S. et al. Accurate and efficient linear scaling DFT calculations with universal applicability. Phys. Chem. Chem. Phys. 17, 31360–31370 (2015).
Prentice, J. C. et al. The ONETEP linear-scaling density functional theory program. J. Chem. Phys. 152, 174111 (2020).
Grimme, S., Hansen, A., Brandenburg, J. G. & Bannwarth, C. Dispersion-corrected mean-field electronic structure methods. Chem. Rev. 116, 5105–5154 (2016).
Caldeweyher, E. et al. A generally applicable atomic-charge dependent London dispersion correction. J. Chem. Phys. 150, 154122 (2019).
Moellmann, J. & Grimme, S. DFT-D3 study of some molecular crystals. J. Phys. Chem. C 118, 7615–7621 (2014).
Tkatchenko, A., DiStasio Jr, R. A., Car, R. & Scheffler, M. Accurate and efficient method for many-body van der Waals interactions. Phys. Rev. Lett. 108, 236402 (2012).
DiStasio Jr, R. A., von Lilienfeld, O. A. & Tkatchenko, A. Collective many-body van der Waals interactions in molecular systems. Proc. Natl. Acad. Sci. USA 109, 14791–14795 (2012).
Kronik, L. & Tkatchenko, A. Understanding molecular crystals with dispersion-inclusive density functional theory: pairwise corrections and beyond. Acc. Chem. Res. 47, 3208–3216 (2014).
Parker, T. M., Burns, L. A., Parrish, R. M., Ryno, A. G. & Sherrill, C. D. Levels of symmetry adapted perturbation theory (SAPT). i. efficiency and performance for interaction energies. J. Chem. Phys. 140, 094106 (2014).
Podeszwa, R., Rice, B. M. & Szalewicz, K. Predicting structure of molecular crystals from first principles. Phys. Rev. Lett. 101, 115503 (2008).
Aina, A. A., Misquitta, A. J. & Price, S. L. A non-empirical intermolecular force-field for trinitrobenzene and its application in crystal structure prediction. J. Chem. Phys. 154, 094123 (2021).
Kulik, H. J. et al. Roadmap on machine learning in electronic structure. Electron. Struct. 4, 023004 (2022).
Han, Y. et al. Machine learning accelerates quantum mechanics predictions of molecular crystals. Phys. Rep. 934, 1–71 (2021).
Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007). Pioneering work on machine-learned potentials or how machine learning enables the calculation of interaction energies with a DFT-like accuracy, but at a much lower computational cost than DFT.
Manzhos, S., Wang, X., Dawes, R. & Carrington, T. A nested molecule-independent neural network approach for high-quality potential fits. J. Phys. Chem. A 110, 5295–5304 (2006).
Pun, G. P., Batra, R., Ramprasad, R. & Mishin, Y. Physically informed artificial neural networks for atomistic modeling of materials. Nat. Commun. 10, 2339 (2019).
Behler, J. Four generations of high-dimensional neural network potentials. Chem. Rev. 121, 10037–10072 (2021).
Deringer, V. L., Caro, M. A. & Csányi, G. Machine learning interatomic potentials as emerging tools for materials science. Adv. Mater. 31, 1902765 (2019).
Kocer, E., Ko, T. W. & Behler, J. Neural network potentials: a concise overview of methods. Annu. Rev. Phys. Chem 73, 163–186 (2022).
Chapman, J. & Ramprasad, R. Nanoscale modeling of surface phenomena in aluminum using machine learning force fields. J. Phys. Chem. C 124, 22127–22136 (2020).
Goniakowski, J., Menon, S., Laurens, G. & Lam, J. Nonclassical nucleation of Zinc Oxide from a physically motivated machine-learning approach. J. Phys. Chem. C 126, 17456–17469 (2022).
Piaggi, P. M., Weis, J., Panagiotopoulos, A. Z., Debenedetti, P. G. & Car, R. Homogeneous ice nucleation in an ab initio machine-learning model of water. Proc. Natl. Acad. Sci. USA 119, e2207294119 (2022). Ice nucleation simulations using a highly efficient machine-learned potential for water.
Piaggi, P. M., Selloni, A., Panagiotopoulos, A. Z., Car, R. & Debenedetti, P. G. A first-principles machine-learning force field for heterogeneous ice nucleation on microcline feldspar. Faraday Discuss. 249, 98–113 (2024).
Chmiela, S. et al. Accurate global machine learning force fields for molecules with hundreds of atoms. Sci. Adv. 9, eadf0873 (2023).
Chan, H. et al. Machine learning coarse grained models for water. Nat. commun. 10, 379 (2019).
Steinhardt, P. J., Nelson, D. R. & Ronchetti, M. Bond-orientational order in liquids and glasses. Phys. Rev. B 28, 784 (1983).
Desgranges, C. & Delhommelle, J. Crystallization mechanisms for supercooled liquid Xe at high pressure and temperature: Hybrid Monte Carlo molecular simulations. Phys. Rev. B 77, 054201 (2008).
Lechner, W. & Dellago, C. Accurate determination of crystal structures based on averaged local bond order parameters. J. Chem. Phys. 129, 114707 (2008).
Tsurusawa, H., Russo, J., Leocmach, M. & Tanaka, H. Formation of porous crystals via viscoelastic phase separation. Nat. Mater. 16, 1022–1028 (2017).
Desgranges, C. & Delhommelle, J. Molecular insight into the pathway to crystallization of aluminum. J. Am. Chem. Soc. 129, 7012–7013 (2007).
Hu, Y.-C. & Tanaka, H. Revealing the role of liquid preordering in crystallisation of supercooled liquids. Nat. Commun. 13, 4519 (2022).
Vekilov, P. G. The two-step mechanism of nucleation of crystals in solution. Nanoscale 2, 2346–2357 (2010). Proposal of a two-step mechanism of nucleation, where the crystalline nucleus forms in metastable, dense, liquid regions that are suspended in the solution.
Geiger, P. & Dellago, C. Neural networks for local structure detection in polymorphic systems. J. Chem. Phys. 139, 164105 (2013).
Phillips, C. L. & Voth, G. A. Discovering crystals using shape matching and machine learning. Soft Matter 9, 8552–8568 (2013).
Honeycutt, J. D. & Andersen, H. C. Molecular dynamics study of melting and freezing of small Lennard-Jones clusters. J. Phys. Chem. 91, 4950–4963 (1987).
Faken, D. & Jónsson, H. Systematic analysis of local atomic structure combined with 3D computer graphics. Comput. Mater. Sci. 2, 279–286 (1994).
Reinhart, W. F., Long, A. W., Howard, M. P., Ferguson, A. L. & Panagiotopoulos, A. Z. Machine learning for autonomous crystal structure identification. Soft Matter 13, 4733–4745 (2017).
Hernandes, V. F., Marques, M. S. & Bordin, J. R. Phase classification using neural networks: application to supercooled, polymorphic core-softened mixtures. J. Condens. Matter Phys 34, 024002 (2021).
Fulford, M., Salvalaglio, M. & Molteni, C. DeepIce: a deep neural network approach to identify ice and water molecules. J. Chem. Inf. Model. 59, 2141–2149 (2019).
DeFever, R. S., Targonski, C., Hall, S. W., Smith, M. C. & Sarupria, S. A generalized deep learning approach for local structure identification in molecular simulations. Chem. Sci. 10, 7503–7515 (2019).
Ziletti, A., Kumar, D., Scheffler, M. & Ghiringhelli, L. M. Insightful classification of crystal structures using deep learning. Nat. Commun. 9, 2775 (2018).
Desgranges, C. & Delhommelle, J. Molecular Networking: Statistical Mechanics in the Age of AI and Machine Learning (CRC Press, 2024).
Desgranges, C. & Delhommelle, J. Towards a machine learned thermodynamics: exploration of free energy landscapes in molecular fluids, biological systems and for gas storage and separation in metal–organic frameworks. Mol. Syst. Desgn. Eng. 6, 52–65 (2021).
Santos-Florez, P. A., Yanxon, H., Kang, B., Yao, Y. & Zhu, Q. Size-dependent nucleation in crystal phase transition from machine learning metadynamics. Phys. Rev. Lett. 129, 185701 (2022).
Rogal, J., Schneider, E. & Tuckerman, M. E. Neural-network-based path collective variables for enhanced sampling of phase transformations. Phys. Rev. Lett. 123, 245701 (2019).
Desgranges, C. & Delhommelle, J. Crystal nucleation along an entropic pathway: teaching liquids how to transition. Phys. Rev. E 98, 063307 (2018).
Dietrich, F. M., Advincula, X. R., Gobbo, G., Bellucci, M. A. & Salvalaglio, M. Machine learning nucleation collective variables with graph neural networks. J. Chem. Theory Comput. 20, 1600 (2024). Development of a graph-based machine learning model for the determination of nucleation collective variables and ist application in enhanced sampling simulations of nucleation processes.
Bonati, L., Trizio, E., Rizzi, A. & Parrinello, M. A unified framework for machine learning collective variables for enhanced sampling simulations: mlcolvar. J. Chem. Phys. 159, 014801 (2023).
Rydzewski, J. Spectral map: Embedding slow kinetics in collective variables. J. Phys. Chem. Lett. 14, 5216–5220 (2023).
Belkacemi, Z., Gkeka, P., Lelièvre, T. & Stoltz, G. Chasing collective variables using autoencoders and biased trajectories. J. Chem. Theory Comput. 18, 59–78 (2021).
Zou, Z., Tsai, S.-T. & Tiwary, P. Toward automated sampling of polymorph nucleation and free energies with the SGOOP and metadynamics. J. Phys. Chem. B 125, 13049–13056 (2021).
Zou, Z., Beyerle, E. R., Tsai, S.-T. & Tiwary, P. Driving and characterizing nucleation of urea and glycine polymorphs in water. Proc. Natl. Acad. Sci. USA 120, e2216099120 (2023). A machine learning-augmented molecular dynamics approach that automatically learns nucleation reaction coordinates and applies them in enhanced sampling simulations of crystal nucleation in solution.
Woodley, S. M. & Catlow, R. Crystal structure prediction from first principles. Nat. Mater. 7, 937–946 (2008).
Gimondi, I. & Salvalaglio, M. CO2 packing polymorphism under pressure: Mechanism and thermodynamics of the I-III polymorphic transition. J. Chem. Phys. 147, 114502 (2017).
Song, H., Vogt-Maranto, L., Wiscons, R., Matzger, A. J. & Tuckerman, M. E. Generating cocrystal polymorphs with information entropy driven by molecular dynamics-based enhanced sampling. J. Phys. Chem. Lett. 11, 9751–9758 (2020).
Francia, N. F., Price, L. S., Nyman, J., Price, S. L. & Salvalaglio, M. Systematic finite-temperature reduction of crystal energy landscapes. Cryst. Growth Des. 20, 6847 (2020).
Francia, N. F., Price, L. S. & Salvalaglio, M. Reducing crystal structure overprediction of ibuprofen with large scale molecular dynamics simulations. CrystEngComm 23, 5575 (2021). Shows on the example of ibuprofen how molecular dynamics simulations can be used to significantly reduce the number of plausible crystal structures commonly obtained in crystal structure prediction studies.
Butler, P. W. & Day, G. M. Reducing overprediction of molecular crystal structures via threshold clustering. Proc. Natl. Acad. Sci. USA 120, e2300516120 (2023). Efficiency of the threshold algorithm to cluster potential energy minima into basins and alleviate the common issue of overprediction in crystal structure prediction studies.
Graser, J., Kauwe, S. K. & Sparks, T. D. Machine learning and energy minimization approaches for crystal structure predictions: a review and new horizons. Chem. Mater. 30, 3601–3612 (2018).
Ryan, K., Lengyel, J. & Shatruk, M. Crystal structure prediction via deep learning. J. Am. Chem. Soc. 140, 10158–10168 (2018).
Wengert, S., Csányi, G., Reuter, K. & Margraf, J. T. Data-efficient machine learning for molecular crystal structure prediction. Chem. Sci. 12, 4536–4546 (2021).
Wengert, S., Csányi, G., Reuter, K. & Margraf, J. T. A hybrid machine learning approach for structure stability prediction in molecular co-crystal screenings. J. Chem. Theory Comput. 18, 4586–4593 (2022). Development of a Δ-ML potential, that accurately accounts for long-range interactions, for co-crystals of an active pharmaceutical ingredient and various co-formers.
Egorova, O., Hafizi, R., Woods, D. C. & Day, G. M. Multifidelity statistical machine learning for molecular crystal structure prediction. J. Phys. Chem. A 124, 8065–8078 (2020).
Kovács, D. P. et al. Mace-off: Short-range transferable machine learning force fields for organic molecules. J. Am. Chem. Soc. 147, 17598 (2025).
Price, L. S., Paloni, M., Salvalaglio, M. & Price, S. L. One size fits all? development of the CPOSS209 data set of experimental and hypothetical polymorphs for testing computational modeling methods. Cryst. Growth Des. 25, 3186 (2025).
Della Pia, F. et al. Accurate and efficient machine learning interatomic potentials for finite temperature modelling of molecular crystals. Chem. Sci. 16, 11419 (2025).
Bangsund, J. S. et al. Formation of aligned periodic patterns during the crystallization of organic semiconductor thin films. Nat. Mater. 18, 725–731 (2019).
Guunzler, A. et al. Shaping perovskites: in situ crystallization mechanism of rapid thermally annealed, prepatterned perovskite films. ACS Appl. Mater. Interfaces 13, 6854–6863 (2021).
Dull, J. T. et al. A comprehensive picture of roughness evolution in organic crystalline growth: the role of molecular aspect ratio. Mater. Horiz. 9, 2752–2761 (2022).
Kim, Y.-J. et al. Hierarchical self-assembly of perylene diimide (PDI) crystals. J. Phys. Chem. Lett. 11, 3934–3940 (2020).
Han, H. et al. Multiscale hierarchical structures from a nanocluster mesophase. Nat. Mater. 21, 518–525 (2022).
Tang, W., Smith, C., Parry, C. B., Meegan, J. & Rimer, J. D. Molecular imposters functioning as versatile growth modifiers of urate crystallization. Cryst. Growth Des. 23, 6107–6118 (2023).
Shtukenberg, A. G., Ward, M. D. & Kahr, B. Crystal growth inhibition by impurity stoppers, now. J. Cryst. Growth 597, 126839 (2022).
Ma, W., Lutsko, J. F., Rimer, J. D. & Vekilov, P. G. Antagonistic cooperativity between crystal growth modifiers. Nature 577, 497–501 (2020). Scanning probe microscopy and molecular modelling study that shows how pairs of inhibitors, that inhibits haematin crystal growth according to two different mechanisms, exhibit synergistic or antagonistic cooperativity depending on the conditions.
Ma, W. et al. Nonclassical mechanisms to irreversibly suppress β-hematin crystal growth. Commun. Biol. 6, 783 (2023).
Zhang, L., Bailey, J. B., Subramanian, R. H., Groisman, A. & Tezcan, F. A. Hyperexpandable, self-healing macromolecular crystals with integrated polymer networks. Nature 557, 86–91 (2018).
Sun, H. et al. Macromolecular metamorphosis via stimulus-induced transformations of polymer architecture. Nat. Chem. 9, 817–823 (2017).
Nykypanchuk, D., Maye, M. M., Van Der Lelie, D. & Gang, O. DNA-guided crystallization of colloidal nanoparticles. Nature 451, 549–552 (2008).
Liu, M. et al. From nanoscopic to macroscopic materials by stimuli-responsive nanoparticle aggregation. Adv. Mater. 35, 2208995 (2023).
Lee, S. et al. Shape memory in self-adapting colloidal crystals. Nature 610, 674–679 (2022).
Zheng, M., Li, Z., Zhang, C., Seeman, N. C. & Mao, C. Pow ering ≈50 μm motion by a molecular event in DNA crystals. Adv. Mater. 34, 2200441 (2022).
Sajini, K., Desgranges, C. & Delhommelle, J. Advancing the design of gold nanomaterials with machine-learned potentials. Nano Ex. 6, 022001 (2025).
McCandler, C. A., Pihlajamaäki, A., Malola, S., Haäkkinen, H. & Persson, K. A. Gold–thiolate nanocluster dynamics and intercluster reactions enabled by a machine learned interatomic potential. ACS Nano 18, 19014–19023 (2024).
Azuri, I., Adler-Abramovich, L., Gazit, E., Hod, O. & Kronik, L. Why are diphenylalanine-based peptide nanostructures so rigid? insights from first principles calculations. J. Am. Chem. Soc. 136, 963–969 (2014).
Piaggi, P. M., Panagiotopoulos, A. Z., Debenedetti, P. G. & Car, R. Phase equilibrium of water with hexagonal and cubic ice using the scan functional. J. Chem. Theory Comput. 17, 3065–3077 (2021).
Acknowledgements
This material is based upon work supported by the National Science Foundation under Grant No. 2240526.
Author information
Authors and Affiliations
Contributions
C.D. and J.D. contributed equally to the conceptualization, design and implementation of the research, and to the writing of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Chemistry thanks Pablo M. Piaggi and the other, anonymous, reviewer for their contribution to the peer review of this work. Peer review reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Desgranges, C., Delhommelle, J. Deciphering the complexities of crystalline state(s) with molecular simulations. Commun Chem 8, 281 (2025). https://doi.org/10.1038/s42004-025-01667-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42004-025-01667-z