Automated parametrization of small molecules within the Martini 3 coarse-grained model guided by experimental log P values

Kelidou, Maria; Stroh, Kai Steffen; Risselada, Herre Jelger

doi:10.1038/s41598-025-24757-3

Download PDF

Article
Open access
Published: 23 October 2025

Automated parametrization of small molecules within the Martini 3 coarse-grained model guided by experimental log P values

Maria Kelidou¹,
Kai Steffen Stroh² &
Herre Jelger Risselada¹

Scientific Reports volume 15, Article number: 37169 (2025) Cite this article

2065 Accesses
Metrics details

Subjects

Abstract

Molecular dynamics simulations play an important role in investigating biological systems. However, simulating large-scale systems can be computationally expensive, which can be improved by the employment of a coarse-graining force field. This study focuses on the automated parametrization of small molecules within the CGCompiler framework. This optimization approach utilizes a mixed-variable particle swarm algorithm to avoid the manual tweaking of parameters. Particularly, the optimization focuses on matching experimentally known log P values of partitioning in water-octanol phases, reproducing atomistic density profiles in lipid bilayers, and optimizing overall shape and volume aspects of the modeled atomistic molecules. After the atomistic to coarse-grained mapping, the model’s accuracy is evaluated through a fitness function, which combines structural and dynamic targets, to accurately capture the shape and behavior of the small molecule in question. Through the investigation of the interactions between small molecules and cellular membranes, this optimization process supports the development of accurate coarse-grained models for small molecules relevant to drug discovery. Our work demonstrates promising results in automating the high-fidelity parametrization of small molecules using the Martini 3 force-field guided by experimental log P values.

Introduction

Molecular dynamics (MD) simulations are a vital tool in the field of molecular biology and drug discovery, offering a highly-detailed insight of (bio)molecules at an atomic level^1,2. To explore and analyze more complex systems over larger length and longer time scales, the use of coarse-grained strategies becomes essential³. The widespread adoption of coarse-grained force fields like the Martini model for biomolecular simulations stems from their ability to merge common chemical groups consisting of multiple heavy atoms into distinct single interaction sites^4,5,6,7. This approach has become particularly popular because of its transferability across various applications in biomolecular science, soft matter, and nanoscience. The downside of the building block approach is that the parametrization of molecules within coarse-grained models is a highly frustrating and tedious task, as chemical groups must be encoded into one out of hundreds of predefined bead types.

Beyond ongoing efforts to create databases of already-parametrized molecules based on human parametrization efforts, such as the Martini database⁶, work is underway to fully automate this pipeline. This automation aims to enable the parametrization of existing small molecule databases widely used in pharmaceutical research for drug development purposes. To this end, multiple automated approaches have been proposed, including machine learning-based methods (e.g., graph neural networks) and artificial intelligence-driven techniques (e.g., evolutionary algorithms and swarm optimization)^{8,9,10,11,12,13,14}. These automated approaches optimize molecular parametrization workflows, thereby accelerating drug discovery timelines through the efficient exploration of molecular configurations enabled by coarse-grained modeling methodologies^15,16. Additionally, automated parametrization could help address the challenge of keeping up with the rapidly growing number of known compounds and targets in drug discovery. Yet, while automated approaches can generate initial parametrizations quickly, they often lack the nuanced understanding of molecular behavior that comes from careful reproduction of properties derived from atomistic simulations and experiments.

To this end, we are developing the CGCompiler¹² approach that automizes high-fidelity (re)parametrization within the Martini 3 model using mixed-variable particle swarm optimization. This method circumvents the problem of assigning predefined nonbonded interaction types (discrete variables) while simultaneously optimizing bond length (continuous variables). By overcoming the inherent dependency between nonbonded and bonded interactions, CGCompiler performs a multiobjective optimization that matches provided targets derived from atomistic simulations as well as experimentally derived targets.

The standard parametrization procedure entails the manual setting of all initial (trial) force-field parameters and their subsequent changes to fit the desired properties. The CGCompiler requires only the initial mapping of the atomistic structure and its coarse-grained parametrization. This step can greatly benefit from the development of automated mapping schemes^13,14, whose crude parametrization also provides a valuable starting point for refinement by CGCompiler. Afterwards a mixed-variable particle swarm optimization algorithm is employed to accomplish the molecule’s optimization, thereby overcoming the hurdles of tweaking the parameters by hand and facilitating a more accurate and efficient parametrization. The model is evaluated based on a list of properties and their target values provided by the user (fitness function).

Partition coefficients, particularly octanol-water partition coefficients, play a crucial role in small molecule and drug design^17,18,19. They serve as primary indicators of hydrophobicity and membrane permeability, making them essential tools in assessing a compound’s potential as a drug candidate. Given that the octanol-water partition coefficients of common small molecules have been well experimentally determined, reproducing these coefficients represents the primary goal in guiding the parametrization of small molecules.

In addition to partition coefficients, atomistic density profiles within lipid bilayers provide a complementary and membrane-specific target for parametrization. Unlike bulk partitioning, density profiles investigate the spatial distribution and orientation of molecules across the heterogeneous lateral membrane interface directly, capturing interactions with different chemical groups within the lipid and the insertion depth within the bilayer^20,21,22,23. Incorporating such information allows coarse-grained models to more precisely account for additional structural and electrostatic effects that are often absent when optimizing solely against octanol–water partitioning free energies. Furthermore, the density profiles of individual beads correspond to the orientation of molecules in the membrane, enabling more precise parametrization of the local molecular chemistry within molecules that are not uniquely determined by log P values alone.

For this purpose, we extended CGCompiler to optimize molecules based on the free energy of transfer between octanol and water phases, as well as based on the atomistic density profiles within lipid bilayers. We also incorporated a scheme for the bonded parameters to simultaneously match the Solvent Accessible Surface Area (SASA). Our focus is on the parametrization of dopamine and serotonin, two biologically highly relevant neurotransmitters. Their roles in mediating both physiological and psychological processes make them important targets for parametrization^24,25,26,27. Furthermore, the investigation of the interactions between dopamine and serotonin and cellular membranes, as well as their receptors, is fundamental to understanding and treating a variety of neurological disorders^28,29.

We report a significant advance in the automated parametrization of small molecules within the Martini 3 force field by extending CGCompiler to simultaneously optimize against experimental log P values and atomistic density profiles in lipid bilayers. The inclusion of the density profiles of mapped interaction sites provides a direct membrane-specific target alongside bulk partitioning data, ensuring more accurate reproduction of molecular orientation and insertion behavior at biologically relevant interfaces. Incorporating diverse targets improves the accuracy of membrane interaction modeling and enhances the capability of coarse-grained parametrization to account for subtle but biologically relevant effects, such as electrostatic interactions(the presence of net charge) and local molecular chemistry.

Methods

The initial step in the coarse-graining process involved determining the grouping of atoms into beads. For dopamine, the process was carried out by hand and the result can be seen in Fig. 1. For the initial mapping of serotonin, we used Auto-Martini¹³, but a finer adjustment of the parameters was necessary for Martini 3.

CGCompiler

Small molecule parametrization in Martini 3 requires careful adjustment of many parameters to match several goals. Identifying the right parameters to improve specific behaviors, especially in complex interactions, is a difficult and time-consuming task. Automation becomes crucial for handling large molecule databases, organizing the parametrization process into a clear, hierarchical system. One automated method is Particle Swarm Optimization (PSO), known for efficiently finding the best solutions in complex, multidimensional spaces. PSO is ideal for optimizing continuous variables in coarse-grained models, though it faces challenges with predefined, discrete parameters in building block models like Martini. The CGCompiler Python package¹² provides efficient coarse-grained molecule parametrization through mixed-variable particle swarm optimization. This method optimizes both categorical (predefined bead types) and continuous (bonds, angles, dihedrals, etc.) variables simultaneously. Built on the GROMACS simulation engine^30,31,32,33, CGCompiler substantially simplifies force field parametrization, particularly for building-block approaches.

The parametrization workflow, which can be seen in Fig. 2, involves selecting mapping and bead sizes, assigning chemical bead types, and choosing bonded terms and parameters. The presented algorithm optimizes bead size, chemical bead type, and bonded parameters simultaneously. The workflow includes the user providing the target data and creating a set of CG training systems. The optimization algorithm iteratively generates candidate solutions, runs MD simulations, scores solutions based on how well targets are reproduced, updates solutions using the swarm’s knowledge, and repeats until termination criteria are met.

The parametrization of small molecules made it imperative to introduce new targets and devise alternative methodologies for their calculation within the domain of the CGCompiler. As small molecules are much smaller and more flexible than proteins or lipids, additional metrics are needed to capture their physical properties. One of the relevant targets for small molecules is the Solvent Accessible Surface Area (SASA). It is a widely used metric in molecular biology and computational chemistry that quantifies the extent of a molecule’s surface that is accessible to a solvent³⁴. This measurement is crucial in understanding the interactions, dynamics, as well as the structures of biomolecules in various environments. The SASA target value was obtained through the GROMACS tool gmx sasa, computed as an average through high-sampling atomistic simulations of each small molecule. Due to the reduced resolution of coarse-grained models, perfect agreement with atomistic SASA is not expected. Nonetheless, including SASA as an objective provides a useful guide for capturing the overall molecular shape and solvent-exposed surface during parametrization.

Investigating the behavior and thermodynamic properties of small molecules across different solvents is an important task, which is often expressed through the partition coefficient or equivalently log P value. Therefore, we implemented the calculation of the partition coefficient into the CGCompiler through the application of the Multistate Bennett Acceptance Ratio (MBAR) method^35,36. This approach allowed us to accurately compute the necessary free energy of transfer for determining the partition coefficient, as defined by the equation adapted from³⁷ to account for the free energy transfer from octanol to water instead of water to octanol:

$$\begin{aligned} \text {log}P=\frac{\Delta G_{\text {transfer}}}{RT\text {ln}(10)} \end{aligned}$$

(1)

where

$$\begin{aligned} \Delta G_{\text {transfer}}=\Delta G_{o\rightarrow g} - \Delta G_{w \rightarrow g} \end{aligned}$$

(2)

The calculation of solvation free energies in octanol and water typically employs a thermodynamic cycle involving transfer to the gas phase to establish a system-independent reference state. However, this approach presents significant challenges in accuracy. The transfer free energy ($\Delta G_{\text {transfer}}$) is computed as the difference between $\Delta G_{o\rightarrow g}$ and $\Delta G_{w \rightarrow g}$, where the subscripts $o \rightarrow g$ and $w\rightarrow g$ denote transfer to the gas phase. These individual terms involve large values of several hundred kJ/mol due to the switching off of non-bonded interactions during the alchemical transformation. As a result, typical sampling errors of several kJ/mol become comparable to the magnitude of $\Delta G_{\text {transfer}}$ itself, rendering the calculations inherently inaccurate and computationally expensive for high-throughput applications.

To circumvent these limitations, we implement a chemical perturbation scheme utilizing a fixed reference topology with a predetermined $\Delta G_{o\rightarrow w}^{\text {ref}}$ value³⁸. This approach enables the calculation of transfer free energies for newly parametrized molecules according to the equation:

$$\begin{aligned} \Delta G_{o \rightarrow w}^{\text {new}} = \left( \Delta G_{o\rightarrow g}^{\text {ref}} - \Delta G_{w \rightarrow g}^{\text {ref}}\right) + \left( \Delta \Delta G_{w}^{\text {ref} \rightarrow \text {new}} - \Delta \Delta G_{o}^{\text {ref} \rightarrow \text {new}} \right) \end{aligned}$$

(3)

where $\Delta \Delta G^{\text {ref} \rightarrow \text {new}}$ represents the relative free energy difference between new molecule parameters and reference parameters, obtained through on-the-fly chemical perturbation, and $\Delta G^{\text {ref}}$ denotes the known reference free energies of the predefined reference molecule. It is however important to emphasize that precise determination of $\Delta G^{\text {ref}}$ is crucial as it establishes the fundamental reference point for all subsequent log P estimations obtained via chemical perturbation and therefore largely determines the systematic error.

Optimizing parametrization solely on the octanol-water partitioning free energies may lack several key membrane-specific interactions, including electrostatic effects with lipid headgroups and the ordered structural characteristics of the membrane interface. Therefore, we investigated the interfacial behavior of small molecules at lipid membranes by calculating local density profiles within a coarse-grained POPC membrane, as an additional objective in the multi-objective optimization scheme. These calculations were compared against atomistic simulations of our small molecules using the CHARMM36 force field, computed through the GROMACS tool gmx density. It is important to symmetrize the density profiles, as the slow binding and unbinding kinetics of small molecules can result in highly asymmetric profiles, skewing the density matching process. As the leaflet affinity is by definition identical for a symmetric bilayer, symmetrizing the density profile is fully justified.

The convergence behavior in swarm optimization algorithms exhibits a direct correlation with problem dimensionality. As the number of dimensions increases, maintaining sufficient population diversity requires proportionally larger swarm sizes. In our implementation, we employed swarm sizes of 72 particles for dopamine-related parameters (six interaction sites) and 48 particles for serotonin-related parameters (five interaction sites), balancing computational efficiency with the molecular complexity of each system. These specific choices of parameters were guided by a balance between computational efficiency and convergence quality. Larger swarms tend to improve global search capabilities, but beyond certain sizes, the computational cost is not reflected in the convergence quality. We implemented a consistent optimization protocol across both systems, utilizing 50 iterations per convergence cycle. Equal weights (1.0) were assigned to objective functions within each system to maintain balanced optimization dynamics.

Fitness function

The CGCompiler evaluates parametrization performance using a cost function.

$$\begin{aligned} \text {cost} = \sum _{o} w_o f_o \end{aligned}$$

(4)

This function aims to be minimized and consists of multiple normalized objective functions ($f_o$), each assigned a user-defined weight ($w_o$). These weights enable users to prioritize and balance the significance of various parametrization goals. In our parametrization procedure we used four objective functions of equal weights, namely SASA, bond distributions^39,40, the octanol-water partition coefficient, and mapped-bead density distributions.

Atomistic simulations

For the generation of target bond and density distribution data, we conducted atomistic simulations of the small molecule in a 3nm cubic box of water ($\approx$ 900 molecules) and in a (6, 6, 8)nm box of water-POPC ($\approx$ 6400 water and 85 POPC molecules) using the CHARMM36 force field^41,42,43. After an energy minimization and NPT equilibration of 100ns, followed a production run of 1$\mu$s. During production, we used a time step of 2fs, the velocity-rescale algorithm as the thermostat with a coupling time of $\tau _t=0.1$ps, and the Parinnelo-Rahman as a barostat with a coupling time of $\tau _p=2.0$ps.

Coarse-grained simulations

In the CGCompiler, we conducted three parallel sets of simulations of the small molecules in GROMACS 2023.2; once in water, once in two separate boxes of water and octanol for the free energy calculations, and once in water-POPC membrane. The same models of small molecules were used and evaluated in all training systems. In the first system, we chose the fitness as a function of SASA and the bond distribution data. In the second system, we used the value of log P as the fitness and we compared it to the value from experimental data listed in Table 1. In the third training system, we chose the fitness as a function of the mapped-bead density distribution. In the first and third training systems, we obtained the reference data from our high-sampling atomistic simulations.

For the first and third training system, we conducted the required energy minimization and two stages of NPT equilibration with time steps $dt=2$fs and 20fs, followed by a production run of 400ns with a time step of 20fs. During production, we used the velocity-rescale algorithm as the thermostat with a coupling time of $\tau _t=1$ps, and the Parinnelo-Rahman as a barostat with a coupling time of $\tau _p=12.0$ps. For the subsequent free energy calculation simulations, we chose the following lambda states: 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.55, 0.6, 0.63, 0.65, 0.68, 0.7, 0.72, 0.73, 0.75, 0.77, 0.8, 0.85, 0.9, 0.95, 1.0. These simulations entailed a production run of 10ns for each different lambda state, using the same settings as in the initial training system.

Results

The CGCompiler generates optimized dopamine and serotonin parameters in .itp format. Figure 3 shows the convergence behavior of CGCompiler’s total cost function across 50 iterations. This composite metric combines four normalized objective functions: bond distributions, solvent accessible surface area (SASA), octanol-water partition coefficient ($\log P_{OW}$), and density distributions.

In Fig. 4, we have plotted the bond distribution comparison between the best candidate solutions from the CGCompiler with the atomistic reference data. Distribution overlap was optimized using the earth mover’s distance criterion¹². The Earth Mover’s Distance (EMD) is a measure of dissimilarity between two probability distributions that captures the minimum amount of work needed to transform one distribution into another. Using EMD rather than peak fitting has the advantage of effectively capturing both the peak position and width of a distribution within a single parameter. In our dopamine model the bonds are between beads C1-C5 and C5-N, which are located in the dopamine tail. In our serotonin model, the bond is between C4-Q1, where Q1 is the serotonin tail. The relevant beads are labeled in Fig. 1. We can see that there is good agreement with the means of the atomistic target distributions though the width of the distribution generally tends to be somewhat wider within the coarse-grained model particularly in case of C4-Q1. Though a stronger force constant would result in a narrower bond distribution, it may potentially compromise simultaneous optimization of other objectives, including SASA calculations that determine molecular shape and volume and even log P values and local density profiles. We, however, note that the optimization outcome represents the best balance among four objectives rather than optimal performance for any single objective.

Table 1 Comparison of experimental and computational partition coefficient value. Experimental values were taken from^44,45. The errors represent the standard deviation of the free energy difference, propagated to log P.

Full size table

The log P values presented in Table 1 demonstrate strong agreement between experimental and calculated values, indicating that the models are expected to effectively capture the overall oil-water partitioning tendencies of these small molecules, including their insertion behavior within biological lipid membranes. The composition of the optimized molecules and their relative hydrophobicity can be seen in Fig. 5.

In biology, dopamine and serotonin tend to bind strongly to lipid membranes^47,48. In our simulations, as can be seen in Fig. 6, the density profile of dopamine shows peak positions at slightly shallower insertion depths compared to the atomistic reference. Serotonin exhibits a closer matching of insertion depths overall, although there is still a noteworthy shift toward shallower insertion depths. Both molecules show slightly elevated binding energies compared to the atomistic reference.

To validate our automated parametrization approach against human efforts, we parametrized small molecules that are already available in the Martini 3 database⁶. For dopamine and serotonin there still exists no corresponding human-made model. Pyrrolidine and phenol were selected as small molecules because they are small, favorably interact with lipid membranes, and somewhat resemble dopamine and serotonin. As a starting point of the optimization we used the corresponding models from the Martini 3 small molecule database. We followed the same parametrization protocol as with dopamine and serotonin, maintaining the same components of the cost function, with the exception of phenol, whose ring structure is based on bond constraints of a fixed length (no bond distribution). Target partition coefficients for both molecules were obtained from the PubChem XLogP3 3.0 tool⁴⁹. Figure 7 shows the convergence behavior of CGCompiler’s total cost function across 50 iterations.

Table 2 Comparison of predicted, CGCompiler, and Martini 3 database partition coefficient value. Predicted data were taken from⁴⁹. The errors represent the standard deviation of the free energy difference, propagated to log P.

Full size table

The CGCompiler log P values presented in Table 2 are sufficiently close to the predicted values, indicating that the models are expected to effectively capture the overall oil-water partitioning tendencies of pyrrolidine and phenol. The composition of the optimized molecules and their relative hydrophobicity can be seen in Fig. 8.

In Fig. 9, pyrrolidine membrane insertions match well with the atomistic reference, although the human-made coarse-grained reference (CG reference) exhibits elevated values. However, the insertion depth of bead SC5 shows slightly poorer agreement with the human-made coarse-grained reference based on peak position. Based on peak height and concomitant distribution width, however, performance is better. This is because the EMD criterion considers all overall distribution features, not just the peak position. For phenol, both insertion depths and binding energies display closer agreement overall. The human-made phenol model is clearly too hydrophilic, as evidenced by a log P value that is too small and membrane insertion that is too shallow.

For both molecules, a consistent tendency toward shallower insertion depths compared to atomistic simulation remains, similar to what was previously observed for dopamine and serotonin. This suggests that matching log P values natively results in a somewhat shallower membrane insertion and therefore molecules behave effectively too hydrophilic when interacting with lipid membranes. Interestingly, this tendency aligns with recent reports of overly hydrophilic protein-membrane interactions in Martini 3^50,51,52, indicating that this issue may extend beyond amino acids. Some care must be taken, as our log P value simulations were based on dry octanol, in accordance with Ref.³⁷, whereas the human-based model used hydrated octanol containing a 0.3 mole fraction of water⁶. For small molecules with log P values close to 0 (e.g. pyrrolidine with a log P value of 0.5), the difference in solvation free energy between wet and dry octanol is expected to be negligible.

Finally, in Fig. 10, we plot the bond distribution comparison between the best candidate solutions from CGCompiler and the atomistic reference data. The overlap of the distributions was optimized using the Earth Mover’s Distance criterion¹². As can be seen, there is good agreement with the mean of the atomistic target distribution.

Discussion

The parametrization of molecules within building block coarse-grained models is a highly laborious and tedious task, as chemical groups must be encoded into one out of hundreds of predefined bead types. Recent advances in computational chemistry have led to the development of several automated approaches for molecular parametrization, including machine learning-based methods and artificial intelligence-driven techniques^{8,9,10,11,12,13,14}. Building upon our CGCompiler framework¹², we have enhanced the parametrization capabilities within the Martini 3 force field through the integration of mixed-variable particle optimization. This advancement specifically targets high-fidelity parametrization of small molecules by incorporating experimental partitioning data, atomistic density profiles, and molecular volume/shape considerations into the optimization process. The current implementation demonstrates strong potential for automated parametrization of small molecules in the Martini 3 force field, offering significant advantages over ongoing manual parametrization efforts⁶. It enables precise molecular characterization through systematic integration of experimental partitioning data with structural and dynamical information from atomistic simulations regarding molecular flexibility, volume, and shape. This simultaneous optimization of multiple competing objectives can exceed human capabilities.

The octanol-water partition coefficient represents a fundamental metric in molecular characterization, providing essential insights into solubility properties across different solvents and interfacial behaviors. As a cornerstone of building-block coarse-grained force field methodology, the log P value delivers a comprehensive measure of molecular partitioning. While this metric offers valuable predictions regarding membrane permeation and insertion properties, solely parametrizing molecules based on reproducing log P values faces two critical limitations: (i) Chemical locality: The log P value contains limited information about chemical locality effects across the molecule, particularly concerning hydrophobicity distribution around interaction sites. (ii) Effect of charge: Charged molecules such as serotonin and dopamine exhibit amphiphilic nature at the octanol-water interface, yet the explicit effect of charge itself is not captured in the parametrization due to the absence of partial charges within the coarse-grained model.

To address these limitations while maintaining accurate log P values, we have additionally implemented local density profile comparison of mapped beads within lipid membranes as an additional objective function in CGCompiler. The lipid membrane interface provides a more physiologically relevant environment and features additional interactions with zwitterionic head groups as well as the presence of a distinct liquid crystalline ordering. Although experimental measurements of membrane-molecule interactions remain more challenging to obtain than octanol-water partitioning data, this limitation can be effectively bridged through strategic application of atomistic simulations. In our optimization framework, experimental octanol–water partitioning free energies are reproduced alongside atomistic density profiles of membrane interactions, ensuring accurate parametrization of both bulk partitioning and membrane-specific behaviors. This dual-target strategy enhances predictive accuracy while maintaining computational tractability by leveraging the complementary strengths of experimental and atomistic references.

Our computational analysis shows that combining log P values with accurately reproducing local density profiles for individually mapped beads in coarse-grained simulations provides valuable insight into the overall molecular orientation and behavior at membrane interfaces. This serves as a benchmark for model quality. However, the question remains as to which matched features are most important for the quality of the model, as well as how to define model quality. For now, this is still human-determined. In our current simulations, we assigned the same weight to matching bond distributions, log P values and membrane density profiles. We observed that matching (dry) octanol log P values results in a tendency for shallower membrane insertion than in atomistic simulations. This is consistent with an inherent more hydrophilic nature. Similarly, precise matching of density profiles is anticipated to result in molecules that are inherently too hydrophobic, according to their log P value in (dry) octanol. Our log P value simulations were based on dry octanol according to the puristic physical chemical standard, other studies often include a 0.3 mole fraction of water conform with more common pharmaceutical practices. However, when both are available, we would argue that dry octanol log P values should always be preferred to wet octanol log P values. This is because coarse-grained models are unable to model either the local interfacial structure or the substantial concomitant entropic surfactant effects caused by water-octanol micellization, which significantly affects the solvation of small molecules⁵³.

Ultimately, due to the inherent uncertainty surrounding the accuracy of the modeled reference systems, it is surprisingly difficult to make a fair comparison of model quality. Within our limited framework of reference, the resulting optimized models performed better overall than human-made models, which is not surprising given that optimization aims to improve performance within such a framework. In this study, equal weights (1.0) were initially assigned to all objective functions. Thoroughly optimizing these weights would require a computationally demanding process that is beyond the scope of this study. As part of the force field’s philosophy, matching some targets, such as log P values, may be deemed more essential than matching others, such as bond distributions, whose width tends to deviate inherently from atomistic simulations. This choice of weighting could be improved in future studies to better align with the philosophy of force fields⁵⁴. However, the quality of the (automated) parametrization remains natively restricted by a limited, human-defined target set. The models that provide the best fit within that benchmarking subset are not necessarily the models that perform best in other domains. This is the prevailing problem in force-field parametrization. It is debatable whether a model optimized for most of the domains can be considered optimal when it performs more weakly in an individual domain of interest. Similarly, we anticipate that our models will natively perform best in the area of lipid membrane interactions with small molecules, as well as the subsequent change in membrane properties⁵⁵.

While the automated high-fidelity parametrization of small molecules using mixed-variable swarm optimization represents a significant technological advance, it remains a computationally intensive endeavor that requires substantial computational resources. Even with access to dedicated computing infrastructure, parametrization of individual molecules necessitates several days of computational time. Consequently, systematic application of this methodology to extensive molecular databases containing millions of compounds is computationally prohibitive. Instead, we envision its primary utility in research contexts requiring highly accurate coarse-grained models for focused studies involving smaller sets of specifically targeted small molecules.

Data availability

The itp files for dopamine, serotonin, pyrrolidine and phenol are provided in the appendix. Remaining datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

De Vivo, M., Masetti, M., Bottegoni, G. & Cavalli, A. Role of molecular dynamics and related methods in drug discovery. J. Med. Chem. 59, 4035–4061. https://doi.org/10.1021/acs.jmedchem.5b01684 (2016) (PMID: 26807648).
Article CAS PubMed Google Scholar
Durrant, J. D. & McCammon, J. A. Molecular dynamics simulations and drug discovery. BMC Biol. 9, 71. https://doi.org/10.1186/1741-7007-9-71 (2011).
Article CAS PubMed PubMed Central Google Scholar
Pak, A. J. & Voth, G. A. Advances in coarse-grained modeling of macromolecular complexes. Curr. Opin. Struct. Biol. 52, 119–126 (2018).
Article CAS PubMed PubMed Central Google Scholar
Marrink, S. J., Risselada, H. J., Yefimov, S., Tieleman, D. P. & de Vries, A. H. The martini force field: Coarse grained model for biomolecular simulations. J. Phys. Chem. B 111, 7812–7824. https://doi.org/10.1021/jp071097f (2007).
Article CAS PubMed Google Scholar
Souza, P. C. T. et al. Martini 3: A general purpose force field for coarse-grained molecular dynamics. Nat. Methods 18, 382–388. https://doi.org/10.1038/s41592-021-01098-3 (2021).
Article CAS PubMed Google Scholar
Alessandri, R. et al. Martini 3 coarse-grained force field: Small molecules. Adv. Theory Simul. 5, 2100391. https://doi.org/10.1002/adts.202100391 (2022).
Article CAS Google Scholar
Marrink, S. J. et al. Two decades of martini: Better beads, broader scope. WIREs Comput. Mol. Sci. 13, e1620. https://doi.org/10.1002/wcms.1620 (2023).
Article Google Scholar
Empereur-Mot, C. et al. Swarm-cg: Automatic parametrization of bonded terms in martini-based coarse-grained models of simple to complex molecules via fuzzy self-tuning particle swarm optimization. ACS Omega 5, 32823–32843 (2020).
Article CAS PubMed PubMed Central Google Scholar
Empereur-Mot, C. et al. Automatic multi-objective optimization of coarse-grained lipid force fields using swarmcg. J. Chem. Phys. 156, 024801 (2022).
Article ADS CAS PubMed Google Scholar
Empereur-Mot, C. et al. Automatic optimization of lipid models in the martini force field using swarmcg. J. Chem. Inf. Model. 63, 3827–3838 (2023).
Article CAS PubMed PubMed Central Google Scholar
Perrone, M., Capelli, R., Empereur-Mot, C., Hassanali, A. & Pavan, G. M. Lessons learned from multiobjective automatic optimizations of classical three-site rigid water models using microscopic and macroscopic target experimental observables. J. Chem. Eng. Data 68, 3228–3241 (2023).
Article CAS PubMed PubMed Central Google Scholar
Stroh, K. S., Souza, P. C. T., Monticelli, L. & Risselada, H. J. Cgcompiler: Automated coarse-grained molecule parametrization via noise-resistant mixed-variable optimization. J. Chem. Theory Comput. 19, 8384–8400. https://doi.org/10.1021/acs.jctc.3c00637 (2023) (PMID: 37971301).
Article CAS PubMed PubMed Central Google Scholar
Bereau, T. & Kremer, K. Automated parametrization of the coarse-grained martini force field for small organic molecules. J. Chem. Theory Comput. 11, 2783–2791. https://doi.org/10.1021/acs.jctc.5b00056 (2015).
Article CAS PubMed Google Scholar
Potter, T. D., Barrett, E. L. & Miller, M. A. Automated coarse-grained mapping algorithm for the martini force field and benchmarks for membrane-water partitioning. J. Chem. Theory Comput. 17, 5777–5791 (2021).
Article CAS PubMed PubMed Central Google Scholar
Souza, P. C. et al. Protein-ligand binding with the coarse-grained martini model. Nat. Commun. 11, 3714 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Kjølbye, L. R. et al. Towards design of drugs and delivery systems with the martini coarse-grained model. QRB Discov. 3, e19 (2022).
Article PubMed PubMed Central Google Scholar
Kujawski, J., Popielarska, H., Myka, A., Drabińska, B. & Bernard, M. The log p parameter as a molecular descriptor in the computer-aided drug design: An overview. Comput. Methods Sci. Technol. 18, 81–88. https://doi.org/10.12921/cmst.2012.18.02.81-88 (2012).
Article Google Scholar
Arslan, E., Findik, B. K. & Aviyente, V. A blind SAMPL6 challenge: insight into the octanol-water partition coefficients of drug-like molecules via a DFT approach. J. Comput. Aided Mol. Des. 34, 463–470 (2020).
Article ADS CAS PubMed Google Scholar
Wu, Y. M., Salas, Y. L., Leung, Y. C., Hunter, L. & Ho, J. Predicting octanol-water partition coefficients of fluorinated drug-like molecules: A combined experimental and theoretical study. Aust. J. Chem. 73, 677–685. https://doi.org/10.1071/CH19648 (2020).
Article CAS Google Scholar
Gul, G., Faller, R. & Ileri-Ercan, N. Coarse-grained modeling of polystyrene-modified cnts and their interactions with lipid bilayers. Biophys. J . 122, 1748–1761. https://doi.org/10.1016/j.bpj.2023.04.005 (2023).
Article CAS PubMed PubMed Central Google Scholar
Soleimani, A. & Risselada, H. J. Smartini3 parametrization of multi-scale membrane models via unsupervised learning methods. Sci. Rep. 14, 25714. https://doi.org/10.1038/s41598-024-75490-2 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Centi, A., Dutta, A., Parekh, S. H. & Bereau, T. Inserting small molecules across membrane mixtures: Insight from the potential of mean force. Biophys. J . 118, 1321–1332. https://doi.org/10.1016/j.bpj.2020.01.039 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, K. W., Wang, Y. & Hall, C. K. Development of a coarse-grained lipid model, LIME 2.0, for DSPE using multistate iterative Boltzmann inversion and discontinuous molecular dynamics simulations. Fluid Phase Equilib. 521, 112704 (2020).
Article CAS PubMed PubMed Central Google Scholar
Volkow, N. D., Fowler, J. S., Wang, G.-J., Swanson, J. M. & Telang, F. Dopamine in drug abuse and addiction: Results of imaging studies and treatment implications. Arch. Neurol. 64, 1575–1579. https://doi.org/10.1001/archneur.64.11.1575 (2007) https://jamanetwork.com/journals/jamaneurology/articlepdf/794743/nnr70005_1575_1579.pdf.
Article PubMed Google Scholar
Channer, B. et al. Dopamine, immunity, and disease. Pharmacol. Rev. 75, 62–158 (2023).
Article CAS PubMed PubMed Central Google Scholar
Moncrieff, J. et al. The serotonin theory of depression: A systematic umbrella review of the evidence. Mol. Psychiatry 28, 3243–3256 (2023).
Article PubMed Google Scholar
de Vries, L. P., van de Weijer, M. P. & Bartels, M. The human physiology of well-being: A systematic review on the association between neurotransmitters, hormones, inflammatory markers, the microbiome and well-being. Neurosci. Biobehav. Rev. 139, 104733. https://doi.org/10.1016/j.neubiorev.2022.104733 (2022).
Article PubMed Google Scholar
Lolicato, F. et al. Membrane-dependent binding and entry mechanism of dopamine into its receptor. ACS Chem. Neurosci. 11, 1914–1924. https://doi.org/10.1021/acschemneuro.9b00656 (2020) (PMID: 32538079).
Article CAS PubMed Google Scholar
Kalinichenko, L. S., Kornhuber, J., Sinning, S., Haase, J. & Müller, C. P. Serotonin signaling through lipid membranes. ACS Chem. Neurosci. 15, 1298–1320 (2024).
Article CAS PubMed Google Scholar
Abraham, M. J. et al. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1, 19–25 (2015).
Article ADS Google Scholar
Van Der Spoel, D. et al. Gromacs: Fast, flexible, and free. J. Comput. Chem. 26, 1701–1718 (2005).
Article ADS PubMed Google Scholar
Pronk, S. et al. Gromacs 4.5: A high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29, 845–854 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lindahl, E., Hess, B. & Van Der Spoel, D. Gromacs 3.0: A package for molecular simulation and trajectory analysis. Mol. Model. Annual 7, 306–317 (2001).
Article CAS Google Scholar
Borges-Araójo, L., Souza, P. C. T., Fernandes, F. & Melo, M. N. Improved parameterization of phosphatidylinositide lipid headgroups for the martini 3 coarse-grain force field. J. Chem. Theory Comput. 18, 357–373. https://doi.org/10.1021/acs.jctc.1c00615 (2022) (PMID: 34962393).
Article CAS Google Scholar
Shirts, M. R. & Chodera, J. D. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129, 124105. https://doi.org/10.1063/1.2978177 (2008) https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/1.2978177/15418484/124105_1_online.pdf.
Article ADS CAS PubMed PubMed Central Google Scholar
Wu, Z. et al. alchemlyb: The simple alchemistry library. J. Open Source Softw. 9, 6934. https://doi.org/10.21105/joss.06934 (2024).
Article ADS PubMed PubMed Central Google Scholar
Bannan, C. C., Calabró, G., Kyu, D. Y. & Mobley, D. L. Calculating partition coefficients of small molecules in octanol/water and cyclohexane/water. J. Chem. Theory Comput. 12, 4015–4024. https://doi.org/10.1021/acs.jctc.6b00449 (2016) (PMID: 27434695).
Article CAS PubMed PubMed Central Google Scholar
Mey, A. S. J. S. et al. Best practices for alchemical free energy calculations [article v1.0]. Living J. Comput. Mol. Sci.2 (2020).
Michaud-Agrawal, N., Denning, E. J., Woolf, T. B. & Beckstein, O. Mdanalysis: A toolkit for the analysis of molecular dynamics simulations. J. Comput. Chem. 32, 2319–2327. https://doi.org/10.1002/jcc.21787 (2011) https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.21787.
Article ADS CAS PubMed PubMed Central Google Scholar
Richard J. Gowers et al. MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations. In Proceedings of the 15th Python in Science Conference (eds Sebastian, B. & Scott, R.) 98–105. https://doi.org/10.25080/Majora-629e541a-00e (2016).
Lee, J. et al. Charmm-gui input generator for namd, gromacs, amber, openmm, and charmm/openmm simulations using the charmm36 additive force field. J. Chem. Theory Comput. 12, 405–413. https://doi.org/10.1021/acs.jctc.5b00935 (2016).
Article CAS PubMed Google Scholar
Brooks, B. R. et al. Charmm: The biomolecular simulation program. J. Comput. Chem. 30, 1545–1614. https://doi.org/10.1002/jcc.21287 (2009) https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.21287.
Article ADS CAS PubMed PubMed Central Google Scholar
Jo, S., Kim, T., Iyer, V. G. & Im, W. Charmm-gui: A web-based graphical user interface for charmm. J. Comput. Chem. 29, 1859–1865. https://doi.org/10.1002/jcc.20945 (2008) https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.20945.
Article ADS CAS PubMed Google Scholar
Mack, F. & Bönisch, H. Dissociation constants and lipophilicity of catecholamines and related compounds. Naunyn Schmiedebergs Arch. Pharmacol. 310, 1–9. https://doi.org/10.1007/BF00499868 (1979).
Article CAS PubMed Google Scholar
Duffy, E. M. & Jorgensen, W. L. Prediction of properties from simulations: Free energies of solvation in hexadecane, octanol, and water. J. Am. Chem. Soc. 122, 2878–2888. https://doi.org/10.1021/ja993663t (2000).
Article ADS CAS Google Scholar
Lütge, S., Krebs, M. & Risselada, H. J. Toward the evolutionary optimisation of small molecules within coarse-grained simulations: Training molecules to hide behind lipid head groups. J. Phys. Chem. B https://doi.org/10.1021/acs.jpcb.4c08200 (2025).
Article PubMed PubMed Central Google Scholar
Postila, P. A., Vattulainen, I. & Róg, T. Selective effect of cell membrane on synaptic neurotransmission. Sci. Rep. 6, 19345. https://doi.org/10.1038/srep19345 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Sengupta, D. & Huster, D. The dynamic structure of the lipid bilayer and its modulation by small molecules. J. Phys. Chem. B 129, 8639–8640. https://doi.org/10.1021/acs.jpcb.5c04373 (2025).
Article CAS PubMed Google Scholar
Cheng, T. et al. Computation of octanol-water partition coefficients by guiding an additive model with knowledge. J. Chem. Inf. Model. 47, 2140–2148 (2007).
Article ADS CAS PubMed Google Scholar
van Hilten, N., Stroh, K. S. & Risselada, H. J. Efficient quantification of lipid packing defect sensing by amphipathic peptides: Comparing martini 2 and 3 with charmm36. J. Chem. Theory Comput. 18, 4503–4514 (2022).
Article PubMed PubMed Central Google Scholar
Thomasen, F. E., Pesce, F., Roesgaard, M. A., Tesei, G. & Lindorff-Larsen, K. Improving martini 3 for disordered and multidomain proteins. J. Chem. Theory Comput. 18, 2033–2041 (2022).
Article CAS PubMed Google Scholar
Claveras Cabezudo, A., Athanasiou, C., Tsengenes, A. & Wade, R. C. Scaling protein-water interactions in the martini 3 coarse-grained force field to simulate transmembrane helix dimers in different lipid environments. J. Chem. Theory Comput. 19, 2109–2119 (2023).
Article CAS PubMed Google Scholar
Soleimani, A. & Risselada, H. J. Pure graphene acts as an “entropic surfactant’’ at the octanol-water interface. ACS Nano 17, 13554–13562 (2023).
Article CAS PubMed PubMed Central Google Scholar
Risselada, H. J. Martini 3: A coarse-grained force field with an eye for atomic detail. Nat. Methods 18, 342–343 (2021).
Article CAS PubMed Google Scholar
Saha Roy, D. et al. Serotonin promotes vesicular association and fusion by modifying lipid bilayers. J. Phys. Chem. B 128, 4975–4985 (2024).
Article CAS PubMed Google Scholar
Jülich Supercomputing Centre. JUWELS Cluster and Booster: Exascale Pathfinder with Modular Supercomputing Architecture at Juelich Supercomputing Centre. J. Large-Scale Res. Facil. https://doi.org/10.17815/jlsrf-7-183 (2021).
Article Google Scholar

Download references

Acknowledgements

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC-2033 – 390677874 -RESOLV. The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS⁵⁶ at Jülich Supercomputing Centre (JSC).

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Physics Department, Technische Universität Dortmund, Dortmund, 44227, Germany
Maria Kelidou & Herre Jelger Risselada
Laboratory of Biology and Modeling of the Cell, École Normale Supérieur de Lyon, Lyon, 69007, France
Kai Steffen Stroh

Authors

Maria Kelidou
View author publications
Search author on:PubMed Google Scholar
Kai Steffen Stroh
View author publications
Search author on:PubMed Google Scholar
Herre Jelger Risselada
View author publications
Search author on:PubMed Google Scholar

Contributions

H.J.R. and K.S.S designed the work. M.K. conducted the simulations. M.K. and K.S.S. performed all analysis. All authors wrote and reviewed the manuscript.

Corresponding author

Correspondence to Herre Jelger Risselada.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kelidou, M., Stroh, K.S. & Risselada, H.J. Automated parametrization of small molecules within the Martini 3 coarse-grained model guided by experimental log P values. Sci Rep 15, 37169 (2025). https://doi.org/10.1038/s41598-025-24757-3

Download citation

Received: 03 March 2025
Accepted: 15 October 2025
Published: 23 October 2025
Version of record: 23 October 2025
DOI: https://doi.org/10.1038/s41598-025-24757-3